(xnu) Impossible to intelligently handle EINTR returned by close

Originator:mark
Number:rdar://15577254 Date Originated:2013-12-03
Status:Open Resolved:
Product:OS X Product Version:OS X 10.9 13A603, xnu 2422.1.72
Classification:Serious Bug Reproducible:Always
 
It is impossible to intelligently handle the case where close fails with EINTR. This is a problem for user code and also for the kernel itself, which may restart such calls without being aware of whether the file descriptor closed. Because the meaning of close failing with EINTR is not well-defined, a retried close may result in an error or may close an unrelated file descriptor (if the same FD number was reclaimed in a multithreaded program), while an unhandled EINTR may result in a file descriptor not being closed and thus being leaked.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html:

> If close() is interrupted by a signal that is to be caught, it shall return
> -1 with errno set to [EINTR] and the state of fildes is unspecified.

This unfortunately means that an application is unable to intelligently handle close failing with EINTR:
 - If the close is retried and the FD was actually closed before EINTR,
   then the retry will either fail with an unrelated error (EBADF).
   Worse, if the FD was reused, the close will succeed and close an
   unrelated FD.
 - If the close is not retried and the FD was not closed before EINTR,
   the FD will be leaked.
Neither approach is viable.

In http://austingroupbugs.net/view.php?id=529, the Austin Group acknowledged the problem but chose not to address it prescriptively.

Other systems do guarantee what the state of close’s FD argument is when it fails with EINTR. On Linux, the FD is always closed from the perspective of user space, and the close must not be retried. See http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-09/3000.html and http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=ee731f4f7880b09ca147008ab46ad4e5f72cb8bf .

On Mac OS X, it is impossible to discriminate between the two possible meanings of close failing with EINTR. In typical environments (see <sys/cdefs.h> for __DARWIN_NON_CANCELABLE), user code calling close gets close$UNIX2003 (32-bit) or close (64-bit, which is always UNIX2003 and needs no $UNIX2003 symbol suffix). On the kernel side, these both enter 10.9.0 xnu-2422.1.72/bsd/kern/kern_descrip.c close. The first thing that does is call pthread_testcancel, which can cause the entire system call to stop and return EINTR, before any actual work has been done. It then calls close_nocancel, which will close the FD before EINTR can be returned, but it appears that EINTR still may be returnable after the FD is actually closed, since callee closef_locked says that it might return anything if fo_close is called. (The documentation above closef_locked names closef_finish instead of fo_close, but the code changed between 10.8 and 10.9 and closef_finish is now gone, and the comment should say fo_close.) It is thus not possible to know whether EINTR happened before or after the FD was closed.

See Steps to Reproduce.

Steps to Reproduce:

A related problem exists even when EINTR is not returned to user space. If SA_RESTART behavior is enabled for a signal, either because no user-space signal handler is provided for a signal, or because one was provided but the handler’s struct sigaction’s sa_flags contained SA_RESTART, the kernel will restart (reissue) the system call instead of returning the failure with code EINTR to user space. When this happens, the kernel itself makes no attempt to discriminate between whether the initial close actually closed the user FD or not. It blindly reissues the close for an FD that may no longer be valid, or may have been reclaimed.

Be mindful that although EINTR is unlikely when closing a FD to a file on an ordinary in-kernel on-disk filesystem, FDs can refer to other things where close failing with EINTR is far more likely. EINTR might be experienced when closing an FD to a file on a network filesystem, for example: the fo_close for NFS can fail with EINTR after the FD has been closed from user space’s perspective if there is unflushed data to be written at the time of the close. A similar problem might occur for FDs to things that aren’t on the filesystem at all, such as network sockets.

See Additional Notes.

Expected Results:

The semantics for close failing with EINTR should be well-defined. I would prefer that the semantics be that when close fails with EINTR, its FD argument is guaranteed to be closed from user space, and the close need not be retried. The kernel should never restart an interrupted close system call unless it’s certain that the file descriptor was not closed from user space perspective.

Actual Results:

It is impossible to intelligently handle close failing with EINTR. It is possible to attempt to handle this case by retrying the close, which can result in the retry failing with EBADF, or worse, an unrelated FD being closed. If not retrying the close, then it is possible to leak FDs. When SA_RESTART behavior is enabled for a signal, the kernel itself does not handle this situation intelligently, either, and blindly retries the close, potentially resulting in the initial close attempt failing and returning EBADF to user space, or an unrelated FD being closed.

Additional Notes:

Workaround: I have partially worked around this problem in my application by using close$NOCANCEL instead of close, which does not call pthread_testcancel before close_nocancel, and thus eliminates the possibility of failing with EINTR before the FD has actually been closed. https://codereview.chromium.org/23455051 . With that in place, it’s safe to ignore close failing with EINTR. https://codereview.chromium.org/100253002 . This still does not fully address the problem because close may be interrupted by a signal with SA_RESTART behavior, and in such cases, the kernel will restart the close.

Comments


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!