-[NSWindow removeChildWindow:] can set firstResponder of wrong window

Originator:rick
Number:rdar://20166537 Date Originated:2015-03-15
Status:Open Resolved:
Product:OS X Product Version:10.10.2
Classification:Crash Reproducible:Always
 
Summary:
It looks like child window management tries to be smart with respect to the first responder of the parent window possibly being inside of the child window. Upon removal of the child window, NSWindow does a search to see if the parent window's first responder is within the child window that's about to be removed, and if so, it fixes that up. That's awesome. Except it doesn't seem to quite work right in all cases, and can lead to crashes.

I'm sorry, but I don't have a sample project that can demonstrate this. I tried, but unfortunately I couldn't get the conditions just right. But I have copious logs that should pretty clearly show the problem. As well as steps to cause it to happen with our shipping app so that you can see it in person.

I have a parent window, a child window, and a popover. Parent -> Child -> Popover. The child window displays many popovers, but only ever 1 at a time. As you move your mouse around it'll destroy and recreate popovers as needed.

When calling -[NSPopover close], the window it's attached to receives -[NSWindow removeChildWindow:] with the popover as the argument. Before this call, the NSWindow's firstResponder is a view that's within the popover. After the call, the NSWindow's firstResponder is STILL a view that's within the popover, but now the window has no more strong references to the popover. The app itself has no more strong references to the popover either. This means the popover and its views will get deallocated next time the autorelease pool drains. The next time something references the window's firstResponder and sends it a message, a crash will happen.

Steps to Reproduce:
- Have Parent -> Child -> Popover window relationship
- Have Child window's firstResponder be a view within Popover
- Close the popover

Expected Results:
- Child window's firstResponder to get reset to Child window's initialFirstResponder (or anything within Child Window's contentView)


Actual Results:
- Parent window's firstResponder gets reset to Parent window's initialFirstResponder
- Child window's firstResponder is not modified, and remains a pointer to a view within the popover that's likely to get deallocated soon.


Version:
10.10.2 14C109

Notes:
Since I can't yet reproduce this with a sample app, I've tried to get as much log info as possible to help shed light. Attached is a log file. To get these logs I have the following in our NSWindow subclass of which Parent and Child window inherit:

- (BOOL)makeFirstResponder:(NSResponder *)aResponder {
	BOOL result = [super makeFirstResponder:aResponder];
	NSLog(@"%s self=%@ responder=%@ result=%@", __PRETTY_FUNCTION__, self, aResponder, result?@"YES":@"NO");
	NSLog(@"call stack : %@", [NSThread callStackSymbols]);
	return result;
}

- (void)removeChildWindow:(NSWindow *)childWin {
	NSLog(@"%s self=%@ childWin=%@", __PRETTY_FUNCTION__, self, childWin);
	NSLog(@"first responder = %@", [self firstResponder]);
	NSLog(@"first responder window = %@", [[self firstResponder] respondsToSelector:@selector(window)] ? [(NSView *)[self firstResponder] window] : @"not a view");
	NSLog(@"child windows before: %@", [self childWindows]);
	[super removeChildWindow:childWin];
	NSLog(@"child windows after: %@", [self childWindows]);
	if ([[self firstResponder] respondsToSelector:@selector(window)] && [(NSView *)[self firstResponder] window] == childWin) {
		NSLog(@"Invalid First Responder %@. Expect a crash.", [self firstResponder]);
//		[self makeFirstResponder:self.initialFirstResponder];
	}
}

If you look at the log, which is quite long (goes to show how rare it can be, hence the difficulty with the sample project)... look near the end. Search for "Expect a crash". You'll see:

Invalid First Responder <NSButton: 0x60800034ac90>. Expect a crash.

Then the last line of the log:
*** -[NSButton isKindOfClass:]: message sent to deallocated instance 0x60800034ac90

It goes to show that I'm able to correctly identify when this crash is about to happen. The workaround is to fix up the firstResponder myself (the commented line above). So here's what's interesting... if you look up above the "Expect a crash" line, this is where the fun begins:
2015-03-14 16:27:13.133 2BUA8C4S2C.com.agilebits.onepassword-osx-helper[35776:45860647] -[OPHelperWindow removeChildWindow:] self=<OPHelperWindow: 0x100f8a4c0> childWin=<_NSPopoverWindow: 0x10ad23080>

We see the OPHelperWindow at 0x100f8a4c0 getting removeChildWindow with the popover. You can see that its first responder is currently the button that gets us in trouble later, and that its child windows list is just the popover. Everything looks fine at this point.

But now look just below, at 2015-03-14 16:27:13.133. Who's getting the -makeFirstResponder: message? It's not our OPHelperWindow at 0x100f8a4c0, instead it's an OPHelperWindow at 0x100dbe110. That's the Parent window in the Parent -> Child -> Popover relationship. You can clearly see in its callstack that Parent window is getting its firstResponder modified as a result of the -removeChildWindow call on the Child. Three -makeFirstResponder: calls are made during this process. None of which are done on the child window, our OPHelperWindow at 0x100f8a4c0. So yeah, clearly its firstResponder will stay the same and become invalid.

If you look at the other invocations of -removeChildWindow: in the logs, you should notice that the calls to -makeFirstResponder go to the window we expect. So it's often doing the right thing. But then sometimes, for whatever reason, it's not, then things go sideways.


Steps to reproduce in a live app:
- Download 1Password 5.1 : https://cache.agilebits.com/dist/1P/mac4/1Password-5.1.zip
- Launch, create a new empty vault.
- Download the attached agilekeychain, which will get sample data (easier to reproduce with lots of ). Its password is 'top1000'
- Go into Sync, select Primary vault, sync via Folder, and point it to the agilekeychain. You'll be prompted for the password. Enter top1000
- Now click on 1Password mini's icon in the system menu bar
- Mouse over to Logins, which will expand a list of logins which should take up the entire vertical height of your display
- Mouse over one of the logins. Notice that a popover displays. So now you see the Parent -> Child -> Popover window relationship
- Move the mouse cursor down, and you'll notice the popover gets destroyed and another popover gets created. So far things are working great.
- Now two-finger scroll slowly. As the login items are scrolling by, the popovers will get destroyed and created still. Again, things are working as expected, mostly.
- Keep doing this, but slowly move your cursor down towards the bottom of the screen
- Once the cursor ends up leaving the Child window's frame from the bottom, I almost always get a crash.


Log file : http://cl.ly/text/3i2r3s0D2P2p

Comments


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!