thread_get_state returns KERN_SUCCESS and SP of 0 causing crashes in JavaScriptCore

Originator:ben
Number:rdar://27607384 Date Originated:7/29/2016
Status:Open Resolved:
Product:iOS SDK Product Version:
Classification:Serious Bug Reproducible:Sometimes
 
We recently enabled the execution of JavaScript via JavaScriptCore on dispatch queues in our app. After releasing this version of the app, we started getting a number of crashes in the JSC GC thread and had to disable this feature.

The basic issue is the JSC GC needs to know the bounds of the stack for any thread that has ever called into JSC to begin its mark phase. To do this it calls thread_get_state and to get the stack pointer of the target thread:

https://github.com/WebKit/webkit/blob/5b3d24f/Source/JavaScriptCore/heap/MachineStackMarker.cpp#L919

Under certain circumstances, the stack pointer returned by thread_get_state is 0x0, which causes the JSC GC to attempt to scan the stack starting from 0x0 causing a segfault.

The threads returning 0x0 for the SP appear to be GCD threads that are either just created, just about to exit, or just about to be reused by the kernel (not really sure which since the pthread source is no longer published). In particular, the target thread's call stack has only a single stack frame that looks like this:

  Thread #32:
  0 libsystem_pthread.dylib 0x00000001 start_wqthread + 0

I wrote a little test app that stress tests the thread_get_state call and I'm able to reproduce this issue of thread_get_state returning a SP of 0 on both an iPhone 5S running iOS 8.4.1 and an iPhone 6S running iOS 9.3.2.

See the attached ThreadGetStateBadSP.png for an example of the state of the app when this occurs.

Steps to Reproduce:
You may have to run the test app for several minutes for this issue to repro. Also note the #defines at the top; I am using different sets of parameters to stress GCD thread creation and destruction on a 5S vs. a 6S.

1. Run ThreadGetStateTest project.
2. Tweak the #defines at the top depending on device--I've included sets of parameters that seem to repro the issue on both a 5S and 6S.
3. Set a breakpoint where it says "SET BREAKPOINT HERE", which is when thread_get_state returns KERN_SUCCESS and a SP of 0.
4. Run the app for some number of minutes until you hit the breakpoint.

Expected Results:
It seems like a valid thread should not have a SP of 0. Either that or this behavior should be documented so that JSC's GC and other GCs know to expect a NULL stack pointer and should handle it appropriately.

Actual Results:
thread_get_state returns a SP of 0 for some work queue threads, which is unexpected

Version:
iOS 8.4.1, iOS 9.3.2

Notes:

Configuration:
iPhone 5S 8.4.1, iPhone 6S 9.3.2

Attachments:
'ThreadGetStateBadSP.png' and 'ThreadGetStateTest.zip' were successfully uploaded.

ThreadGetStateBadSP.png => http://imgur.com/a/anfMa
ThreadGetStateTester.cpp => http://hastebin.com/xutetikura.cpp

Comments

Apple wrote back to say: "Can you guys use thread_get_register_pointer_values() as an alternative?"

My comment: This seems like it would fix the issue because thread_get_register_pointer_values appears to be a wrapper function around thread_get_state that a) filters out any pointers pointing to the null page and b) properly takes into account the red zone beneath the stack pointer.

Here is the JSC bug that provides a big more context from that side: https://bugs.webkit.org/show_bug.cgi?id=160337


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!