task cancelations while using HTTP2 sometimes terminate the connection
| Originator: | Reflejo | ||
| Number: | rdar://24848965 | Date Originated: | 25-Feb-2016 04:07 PM |
| Status: | Closed | Resolved: | |
| Product: | iOS SDK | Product Version: | Xcode 7.2.1 (7C1002) |
| Classification: | Crash/Hang/Data Loss | Reproducible: | Always |
When switching from HTTP/1.1 to HTTP2 we saw a decrease on our success rate, the culprit seems to be request cancelations. When using an HTTP2 endpoint with iOS9, there seems to be a race condition where a request cancelation ends up closing the socket instead of the individual stream. This has a side effect: while multiplexing requests on specific conditions, other requests might fail throwing:
```
Error Domain=NSURLErrorDomain Code=-1017 "cannot parse response" UserInfo={NSUnderlyingError=0x1556098f0 {Error Domain=kCFErrorDomainCFNetwork Code=-1017 "(null)" UserInfo={NSErrorPeerAddressKey=<CFData 0x1556a3bd0 [0x1a1778b68]>{length = 16, capacity = 16, bytes = 0x100201bb3400a2ec0000000000000000}, _kCFStreamErrorCodeKey=-1, _kCFStreamErrorDomainKey=4}}, NSErrorFailingURLStringKey=https://api-http2.lyft.com/users/65073180/location, NSErrorFailingURLKey=https://api-http2.lyft.com/users/65073180/location, _kCFStreamErrorDomainKey=4, _kCFStreamErrorCodeKey=-1, NSLocalizedDescription=cannot parse response}
```
The code path that gets hit when the necessary conditions are met for the race to happen is:
We call `cancel` on a `NSURLSessionTask` which:
- ends up calling `-[__NSCFLocalSessionTask _onqueue_cancel_with_error:]`
- which checks for self->_connKey to be `nil` otherwise:
- calls `[self invalidateUnpurgeableConnectionsForConnectionCacheKey:]`
- which ends up closing not only the stream but the entire connection.
in this particular case, when the race occurs _connKey is not `nil` (my guess: because there are multiple streams) and then the connection gets terminated.
You can find:
- Full CFNetwork diagnostic here: https://gist.github.com/Reflejo/b4ea68e73be3059326ec#file-cfnetwork_http2_break-log
- The TCP dump for this condition here: https://gist.github.com/Reflejo/2e3f352308d9861955ff
- The code that reproduces this condition. Note that we are trying it on a throttled(3G) iPhone but the problem exists even running it as a swift script. here: https://gist.github.com/Reflejo/56d280acf78425d79408
Steps to Reproduce:
Run https://gist.github.com/Reflejo/56d280acf78425d79408 on a throttled device pointing to an http2 server
Expected Results:
Cancelations should send a STREAM_RST and not close the stream; not the connection. Also other requests should not be affected by one cancelation
Actual Results:
The connection is terminated and multiplexed requests sometime fail (even the ones that are not being canceled)
Comments
Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!
MartÃn Conte Mac Donell
It'd be really important for us if this fix can be back ported to iOS9 since we can't rely on http2 cancellations at the moment. Thanks for considering it.