Deadlock in SecTrustSettingsEvaluateCert()

Originator:wiml
Number:rdar://25175159 Date Originated:15-Mar-2016 01:33 PM
Status:Open Resolved:
Product:OS X Product Version:10.11.1
Classification:Hang Reproducible:Yes
 
Summary:
Security.framework will deadlock on a mutex in SecTrustSettingsEvaluateCert() when an application calls SecTrustEvaluate().

I've attached sampler excerpt showing the situation. However, it is also possible to see the bug by inspecting the released Apple source code (e.g. Security-57337.20.44).

Scenario: [excerpted from our investgation]

The application calls SecTrustEvaluate(), which somewhere along the way calls SecTrustSettingsEvaluateCert(), which (for some reason?) wants to evaluate the app's code signing. It does that by dispatching to another thread. (So the main thread is blocked in dispatch_wait() via the Security::Dispatch::Group helper class.) That other thread does some stuff and eventually needs to evaluate a SecTrust, so it calls SecTrustEvaluate(), which eventually calls down to SecTrustSettingsEvaluateCert().

Problem: The first thing SecTrustSettingsEvaluateCert() does is take out a lock on a mutex. So the app deadlocks when the second thread tries to take out that same mutex.

You can see this in Apple's published source code: see http://www.opensource.apple.com/source/libsecurity_keychain/libsecurity_keychain-55050.9/lib/SecTrustSettings.cpp, and the first line of SecTrustSettingsEvaluateCert() is "StLock<Mutex> _(sutCacheLock())" (a RAII-style mutex acquisition).

That mutex is a recursive mutex, and the comment above it on line 130 suggests that Apple made it recursive to deal with this *exact* situation, but that broke when they inserted a dispatch_group.

Steps to Reproduce:
1. Install OmniPlan.
2. Set up OS X Server as a DAV server.
3. Attempt to create an OmniPlan document sync on that DAV server.

Expected Results:
No deadlock

Actual Results:
Deadlock

Version:
10.11.1/15B42
Security-57337.20.44


Notes:


Configuration:
This does not occur in all configurations, but on configurations where it happens, it happens reliably.

The parallelization in SecStaticCode::validateResources() checks the disk type (SSD versus non-SSD), so reproduction may depend on the media type on which the host app is stored.

Comments


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!