NSNetService sometimes silently changes port after resolving

Originator:argentumko
Number:rdar://27691664 Date Originated:04-Aug-2016
Status:Open Resolved:
Product:OS X SDK Product Version:10.11.6 (15G31)
Classification:Other Bug Reproducible:Sometimes
 
Summary:
When the computer is connected to the same network simultaneously by Wi-Fi and Ethernet, NSNetService instances supplied by NSNetServiceBrowser and subsequently resolved may sometimes contain a "stale" port which changes under the hood to a "correct" port (in about 0.5s) without notifying its delegate in any way. The issue mainly manifests itself when the actual Bonjour service on the network is stopped and then quickly restarted on the same device in 0.5...1.0s with the same type, domain and name, but a different port.

Attachments include a pair of projects that can be used to reproduce the issue (see "Steps to reproduce" for more details), and a packet capture from Wireshark that contains network packets from both network interfaces at the time of observing the issue, reproduced by those projects (you might want to filter down to mdns traffic only).

Steps to Reproduce:
1. You'll need two apps: an iOS app that performs service publishing, and an OS X app that performs service browsing.
2. In the iOS app, use NSNetService to publish a Bonjour service. The service should be created with a static service name, random port, and NSNetServiceListenForConnections publishing option. This will ensure that the service will have the same name across multiple service publishes, but a different port every time – this correspond to a realistic usage scenario. Additionally, provide a UI affordance to stop the service and start it with an interval of 0.5s.
3. In the OS X app, use NSNetServiceBrowser to discover Bonjour services with the same type/domain. When a service is discovered, request to resolve it immediately. Provide a UI (or just logs) to print the port that a NSNetService has right after it's resolved. Additionally, check/print its port with a 1-second delay after resolving.
4. Connect a Mac to the same network as an iOS device with both Wi-Fi and Ethernet. Run both apps simultaneously on these devices; keep the OS X app in foreground.
5. On the iOS app side, start the service; observe the service being discovered and resolved on the OS X side. On the iOS app side, stop and restart the service with a small delay. Observe the service being removed, then discovered and resolved on the OS X side.

Expected Results:
1. The port of the published service changes every time the service is restarted.
2. The service discovered on OS X side is expected to always be resolved with a new, correct port, regardless of the timing of that service being stopped and restarted, and regardless of the network configuration on the Mac.
3. The port of a NSNetService instance does not change after being resolved, at least not without calling a netServiceDidResolveAddress callback again.

Actual Results:
1. The port of the published service, indeed, correctly changes every time it's restarted.
2. The service discovered on OS X side *sometimes* will have a "stale" port when it's resolved – it's not possible to reconnect to that port because the service no longer listens on it.
3. If an NSNetService instance has a "stale" port upon resolving, that port will silently change under the hood in about 0.5s after resolving, without notifying the delegate in any way, which is unexpected.

Version:
Xcode 7.3.1 (7D1014); OS X 10.11.6 (15G31)

Notes:
Two workarounds have been found:

1. When the service is discovered by NSNetServiceBrowser, do not resolve it immediately. Instead, postpone that by 0.5...1s. Experiments show that in this case the service is always resolved already with a correct port number.
2. When the service is resolved, do not use it immediately. Instead, wait 0.5...1s before actually using the resolved information in the service (port and/or TXT record). Experiments show that in this case that information (specifically, the port) is always up to date after the delay.

Both these workarounds don't feel like a proper solution to the problem due to relying on time characteristics of reproducing the issue on specific hardware and network configuration.

The issue itself feels like a race condition in service resolving affected by some caching and the fact that the service is being discovered and resolved on more than one network interface.

Configuration:
The issue has been observed on MacBook Pro (Retina, 15-inch, Mid 2015) running OS X 10.11.6 (also observed on OS X 10.12 beta 4) connected by both Wi-Fi and Ethernet to the same network as an iPhone 6S running iOS 9.3.3. However, this issue has also been observed on other hardware configurations.

This issue does *not* occur when the Mac is connected by *either* Wi-Fi or Ethernet, but not both at the same time.

Attachments:
'bonjour.pcapng' and 'BonjourProjects.zip' were successfully uploaded.

Comments


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!