Race condition in StartWebThread causing crash

Originator:ben
Number:rdar://34021573 Date Originated:8/22/2017
Status:Open Resolved:
Product:iOS WebKit Product Version:10.3.x
Classification:Crash/Hang/Data Loss Reproducible:Sometimes
 
We have been seeing an increasing number of crashes in our app due to a race condition in StartWebThread. The crash involves a race between the main thread and the web thread.

The main thread crashes in this RELEASE_ASSERT in ThreadIdentifierData:

  https://opensource.apple.com/source/WTF/WTF-7603.1.30.1.33/wtf/ThreadIdentifierDataPthreads.cpp.auto.html (line 78)

The main thread's stack is:

  https://gist.github.com/bnham/d5a78782c34f1fc85ee6a06f8364979b

This RELEASE_ASSERT fires if the pthread TSD key hasn't yet been created. The TSD key is created by WTF::initializeThreading, which is called by JSC::initializeThreading. So basically, the crash occurs because the MT called WTF::ThreadIdentifierData::initialize before another thread finished calling WTF::initializeThreading.

The thread that's responsible for calling WTF::initializeThreading is the WebThread (which is forked off by StartWebThread). The WebThread has this call stack in the crash:

  https://gist.github.com/bnham/6646f15ad3d76606544bac03be0d23bf

If you look at the code, the WebThread is stuck two lines *before* the call to JSC::initializeThreading (which is what calls WTF::initializeThreading)--it's executing WTF::initializeMainThreadPlatform:

  https://opensource.apple.com/source/WebCore/WebCore-7603.1.30.1.33/platform/ios/wak/WebCoreThread.mm.auto.html (line 653)

In theory, there is some synchronization in the existing code that should prevent this from happening. In pseudo-code, the synchronization code looks like this:

  RunWebThread: (running on the web thread)
    JSC::initializeThreading()
    // other init...

    pthread_mutex_lock(&lock)
    pthread_cond_signal(&cond, &lock)
    pthread_mutex_unlock(&lock)

  StartWebThread: (running on the main thread)
    pthread_lock(&lock)
    pthread_cond_wait(&cond, &lock)
    pthread_unlock(&lock)

    WTF::initializeApplicationUIThreadIdentifier() // calls code which eventually asserts

However, this synchronization code is buggy because pthread_cond_wait is allowed to spuriously wake up, and pthread_cond_wait is only called once. It is very easy to demonstrate that spurious wakeups exist with pthread condition variables, e.g. here is a simple test program: https://gist.github.com/bnham/f2ca3430a7c4f6f714a918b62702125e

The fix is to add a boolean flag variable and to check whether that boolean variable is set while calling pthread_cond_wait in a loop, i.e. textbook usage of Mesa condition variables.

A possible mitigation for affected apps is to call some JSC API that eventually calls JSC::initializeThreading early on in app initialization.

WebKit bugzilla: https://bugs.webkit.org/show_bug.cgi?id=175852

Steps to Reproduce:
It's a race condition. Look at the stack traces I provided which clearly show the race condition in action.

Expected Results:
It crashes...sometimes. Like I said, it's a race condition.

Observed Results:
You should be able to use a WebView (or other things that eventually call WebKitInitialize, like -[NSAttributedString(NSAttributedStringUIFoundationAdditions) initWithData:options:documentAttributes:error:]) without crashing.

Version:
iOS 10.3.x

Comments


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!