PTHREAD_CANCEL_DISABLE does not prevent pthread_cancel from cancelling thread
Originator: | Per.Mildner.usenet | ||
Number: | rdar://9137682 | Date Originated: | 15-Mar-2011 10:28 PM |
Status: | Open | Resolved: | |
Product: | Mac OS X | Product Version: | 10.8 (12A269), 10.6.6 10J567 |
Classification: | Crash/Hang/Data Loss | Reproducible: | Always |
### Added new information November 3, 2016 Still happens in macOS 10.12. As far as I understand this means that macOS 10.12 does not conform to the Single Unix Specification. ### Added new information 24-Oct-2013 00:44 AM Still happens in OS X 10.9. (foo.c in the transcript is the program in this bug report) bash-3.2$ cc -g -Wall -Werror -pthread foo.c && ./a.out 100000000 10 Starting test, 100000000 iterations, sleep interval 10ms foo.c:46: ERROR cancelled while PTHREAD_CANCEL_DISABLE Illegal instruction: 4 bash-3.2$ cc --version Apple LLVM version 5.0 (clang-500.2.78) (based on LLVM 3.3svn) Target: x86_64-apple-darwin13.0.0 Thread model: posix bash-3.2$ file ./a.out ./a.out: Mach-O 64-bit executable x86_64 bash-3.2$ bash-3.2$ uname -a Darwin Ps-iMac.local 13.0.0 Darwin Kernel Version 13.0.0: Thu Sep 19 22:22:27 PDT 2013; root:xnu-2422.1.72~6/RELEASE_X86_64 x86_64 ### Original report Summary: A thread that has disabled cancellation with PTHREAD_CANCEL_DISABLE can still get cancelled when another thread does pthread_cancel() Steps to Reproduce: See the below transcript. It may take hours for it to happen with this code but it happens far too often in our real code. Expected Results: A call to write() should not be cancelled while PTHREAD_CANCEL_DISABLE is in effect. Actual Results: The cleanup handler is called, despite PTHREAD_CANCEL_DISABLE being in effect (and it intentionally calls abort() to expose the bug). Regression: Notes: The following transcript shows the code and how it was run and some information about the machine. This is problem has been present in earlier versions of 10.6.x too and on other hardware. (I have updated XCode to 3.2.6 today before collecting the System Profiler report so the XCode version used in the transcript was the one before 3.2.6) bash$ gcc -g -Wall -Werror -pthread cancel_leak.c && ./a.out 100000000 10 Starting test, 100000000 iterations, sleep interval 10ms cancel_leak.c:46: ERROR cancelled while PTHREAD_CANCEL_DISABLE Abort trap bash$ uname -a Darwin big-brothers-imac.local 10.6.0 Darwin Kernel Version 10.6.0: Wed Nov 10 18:13:17 PST 2010; root:xnu-1504.9.26~3/RELEASE_I386 i386 bash$ sw_vers ProductName: Mac OS X ProductVersion: 10.6.6 BuildVersion: 10J567 bash$ gcc --version i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664) Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. bash$ cat cancel_leak.c /* gcc -g -Wall -Werror -pthread cancel_leak.c && ./a.out 100000000 10 */ #include <unistd.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <errno.h> #define PTHREAD_CHECK(X) do{ \ int pthread_result = (X); \ if (pthread_result != 0) \ { \ fprintf(stderr, "%s:%d: ERROR code %d from pthread operation\n", \ __FILE__, (int)__LINE__, \ (int)pthread_result); \ fflush(stderr); \ abort(); \ } \ } while(0) #define CLIB_CHECK(X) do{ \ int clib_result = (X); \ if (clib_result == -1) \ { \ int errno_value = errno; \ fprintf(stderr, "%s:%d: ERROR errno %d from clib operation\n", \ __FILE__, (int)__LINE__, \ (int)errno_value); \ fflush(stderr); \ abort(); \ } \ } while(0) struct thread_cookie { int wait_fd; int write_fd; }; /* Should never be called on a POSIX compliant system */ static void cleanup_routine(void *arg) { (void)arg; fprintf(stderr, "%s:%d: ERROR cancelled while PTHREAD_CANCEL_DISABLE\n", __FILE__, (int)__LINE__); fflush(stderr); abort(); } static size_t cookie_wait_fd_read_attempts = 0; static size_t cookie_wait_fd_read = 0; static size_t cookie_write_fd_written= 0; static size_t pipefd_a_written = 0; /* RUN IN SEPARATE THREAD */ static void *pthread_start_routine(void *vcookie) { struct thread_cookie *pcookie = (struct thread_cookie*)vcookie; int oldstate; char buf[1]; { ssize_t result; PTHREAD_CHECK(pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &oldstate)); cookie_wait_fd_read_attempts++; do { CLIB_CHECK(result = read(pcookie->wait_fd, &buf[0], 1)); if (result < 1) { fprintf(stderr, "%s:%d ERROR read only %ld bytes\n", __FILE__, (int)__LINE__, (long)result);fflush(stderr); } } while (result < 1); cookie_wait_fd_read++; PTHREAD_CHECK(pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &oldstate)); /* PTHREAD_CANCEL_DISABLE should prevent cancel here, but it does not. */ pthread_cleanup_push(cleanup_routine, NULL); do { CLIB_CHECK(result = write(pcookie->write_fd, &buf[0], 1)); if (result < 1) { fprintf(stderr, "%s:%d ERROR wrote only %ld bytes\n", __FILE__, (int)__LINE__, (long)result);fflush(stderr); } } while (result < 1); cookie_write_fd_written++; pthread_cleanup_pop(0); /* uninstall and do not call cleanup handler */ } return NULL; } static void bad_usage(void) { fprintf(stderr, "Usage: a.out ITERATIONS [MS SLEEP]\n");fflush(stderr); exit(EXIT_FAILURE); } int main(int argc, char **argv) { pthread_t thread; void *thread_return = NULL; struct timespec nanosleep_interval; struct timespec nanosleep_rem; long ms_interval = 200; struct thread_cookie cookie; int pipefd_a[2]; int pipefd_b[2]; long iterations; (void)argc; (void)argv; if (argc != 2 && argc != 3) { bad_usage(); exit(EXIT_FAILURE); } iterations = strtol(argv[1], NULL, 10); if (iterations <= 0) { bad_usage(); } if (argc == 3) { ms_interval = strtol(argv[2], NULL, 10); if (ms_interval < 0) { bad_usage(); } } fprintf(stderr, "Starting test, %ld iterations, sleep interval %ldms\n", (long)iterations, (long)ms_interval);fflush(stderr); nanosleep_interval.tv_sec = 0; nanosleep_interval.tv_nsec = 1000*1000*ms_interval; while (iterations-- > 0) { CLIB_CHECK(pipe(pipefd_a)); CLIB_CHECK(pipe(pipefd_b)); cookie.wait_fd = pipefd_a[0]; /* input */ cookie.write_fd = pipefd_b[1]; /* output */ PTHREAD_CHECK(pthread_create(&thread,NULL,&pthread_start_routine, &cookie)); if (ms_interval > 0) { CLIB_CHECK(nanosleep(&nanosleep_interval, &nanosleep_rem)); } /* Expects thread to wait for wait_fd now */ { char data[1] = { 'X' }; ssize_t result; do { CLIB_CHECK(result = write(pipefd_a[1], &data[0], 1)); if (result < 1) { fprintf(stderr, "%s:%d ERROR wrote only %ld bytes\n", __FILE__, (int)__LINE__, (long)result);fflush(stderr); } } while (result < 1); pipefd_a_written++; /* Expects thread to wake up soon and write to cookie.write_fd */ } /* This may (and is allowed to) cancel thread while it waits for cookie.wait_fd. The bug manifests itself it this cancels the thread while it writes to cookie.write_fd */ PTHREAD_CHECK(pthread_cancel(thread)); PTHREAD_CHECK(pthread_join(thread, &thread_return)); CLIB_CHECK(close(pipefd_a[0])); CLIB_CHECK(close(pipefd_a[1])); CLIB_CHECK(close(pipefd_b[0])); CLIB_CHECK(close(pipefd_b[1])); } exit(EXIT_SUCCESS); } bash$ 05-Aug-2012 12:01 PM Per Mildner: Still happens in OS X 10.8 on the same iMac as in my original bug-report, both 64-bit and 32-bit: bash$ gcc -g -Wall -Werror -pthread cancel_leak.c && ./a.out 100000000 10 Starting test, 100000000 iterations, sleep interval 10ms cancel_leak.c:46: ERROR cancelled while PTHREAD_CANCEL_DISABLE Illegal instruction: 4 bash$ gcc --version i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00) Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. bash$ uname -a Darwin imac 12.0.0 Darwin Kernel Version 12.0.0: Sun Jun 24 23:00:16 PDT 2012; root:xnu-2050.7.9~1/RELEASE_X86_64 x86_64 Also happens with cc (i.e. clang) on OS X 10.8: bash$ cc -g -Wall -Werror -pthread cancel_leak.c && ./a.out 100000000 10 Starting test, 100000000 iterations, sleep interval 10ms cancel_leak.c:46: ERROR cancelled while PTHREAD_CANCEL_DISABLE Illegal instruction: 4 bash$ cc --version Apple clang version 4.0 (tags/Apple/clang-421.0.57) (based on LLVM 3.1svn) Target: x86_64-apple-darwin12.0.0 Thread model: posix Also happens in 32-bit code on OS X 10.8: bash$ cc -m32 -g -Wall -Werror -pthread cancel_leak.c && ./a.out 100000000 10 Starting test, 100000000 iterations, sleep interval 10ms cancel_leak.c:74: ERROR cancelled while PTHREAD_CANCEL_DISABLE Illegal instruction: 4 bash$ file a.out a.out: Mach-O executable i386
Comments
Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!
Still happens in macOS 10.15 (19A602) Catalina.
It takes anything from less than a second to several minutes for the bug to manifest itself with the above test program and the parameters in the transcript.
Still happens in macOS 10.13 (High Sierra). As far as I understand this means that macOS 10.13 does not conform to the Single Unix Specification.
Still broken in macOS Sierra 10.12.6
The NDA prevents me from discussing what happens in the High Sierra betas.