NSURL incorrectly handles file paths with precomposed chars from NFS and SMB volumes

Originator:tempelmann
Number:rdar://FB8957502 Date Originated:30 Dec 2020
Status:Open Resolved:
Product:macOS Product Version:11.1
Classification:Bug Reproducible:Always
 
macOS prefers use decomposed unicode characters when handling file names and paths. HFS+, for instance, always decomposed names before storing them on disk, making it easy to look up names later, because even if the looked-up name was precomposed, the lookup function would decompose the searched name first and then browse the directory and look for identical (case-insensitive) occurances.

In short file names on HFS+ are preserving case but do not preserve normalization.

With APFS, this has changed, IIRC: It does also preserve the unicode composition. This makes lookups more complicated for the FS code, but, well, it now works. So far, so good.

Problem is that I have run into some issues with SMB and NFS mounts: These may also be shared with Linux system, which prefer precomposed file names.

So, if a user on a Linux system creates a folder that contains an Umlaut such as "ü", it'll end up precomposed on the (ext4) file system.

Now, I can nicely access files in such a folder from macOS, as long as I only use POSIX functions (which includes the shells such as bash and zsh).

However, when I use NSURL operations, some work and others give me a -260 error. For instance, getting NSURLCanonicalPathKey fails, even in macOS 11.1. Other higher-level functions fail as well, such as trying to open the item with [NSWorkspace openURLs:...]. The NSURL's path property does still hold the original precomposed name, and if I get the path and pass it to a POSIX function, it works. And some of the more basic getResource accessors work as well. Just not the more complex ones.

I suspect that the code makes calls to fileSystemRepresentation and related functions that normalize (i.e. decomposed) the path before processing it further - but passing this decomposed path to the SMB or NFS server will cause the "file not found" error because these servers are doing normalization-sensitive lookups.

You can also see the effects in Finder: If you mount a Linux share that contains folder and files with precomposed "ü" in their names, either via SMB or NFS, the Finder will fail at several operations, such as Open, Quick Look, Rename, but will succeed in showing basic attributes such as size and dates. Even Get Info works, curiously, even when I invoke it through AppleEvents from my app (Find Any File), curiously!

So, to reproduce this, create a dir and a file inside the dir, both using precomposed chars, and see if you can access them via SMB and NFS.

Here's a simple script to create files with both precomposed and decomposed "ü" chars:

# decomposed ü (u+¨):
ue_decomp=$'u\xCC\x88'
# precomposed ü:
ue_precomp=$'\xC3\xBC'
# create two files with the different compositions
echo "${ue_decomp} (decomposed)" > dec_${ue_decomp}
echo "${ue_precomp} (precomposed)" > pre_${ue_precomp}

I hope I gave you enough information to understand the issue. If not, tell me what you tried and I'll see how to give you a better method.


--- added an hour later ---

Actually, what I wrote above about what works and what I tested with a NFS mount of a share on a Synology NAS.

If I mount the same share via SMB, it's even worse: I cannot see files inside a folder with a _de_composed name (I can see the folder, though). This suggests that SMB (no idea whether it's macOS's SMB client or the SMB server) does convert any name into precomposed representation, assuming all names must be precomposed on the server side.

So, for now, please test this with NFS, where at least no side (NFS client and server) seems to mess with the pass and keeps them in their original normalization when passing paths in both directions.

I wonder, however, if you (FS devs at Apple) decided that what SMB does is the correct way, assuming that remove file systems should always use precomposed chars. If that's true, then you may want to make NFS behave the same way (which would then lead to the same issues that I find with SMB, where I cannot access folder contents whose path contains decomposed chars on the server side), but at least then it's consistent behavior. Currently, NFS and SMB do not behave consistently (SMB tries to be normalization-_in_sensitive while NFS is normalization-sensitive, just like on iOS).


--- added a bit later ---

Good news, everyone! After thinking about the differences of NFS vs. SMB I came up with a simple demonstration that can also happen in the real world. No need for bash scripts!

Simply do this:

1. Set up a server with NFS and SMB hosting. I am using a Synology NAS.
2. Mount the same share both with NFS and SMB protocols.
3. Open the SMB volume and create a folder named "smb_ü" (this creates a precomposed name on the server). Inside it, place a text file. Verify that you can open the text file.
4. Open the NFS volume and create a folder named "nfs_ü" (this creates a decomposed name on the server). Inside it, place a text file. Verify that you can open the text file.
5. Open nfs_ü folder on the SMB share. This does not even show the text file inside. That's because NFS created a decomposed folder name and SMB can't browse those. This may be an issue with the SMB client or server, but I assume you want it to behave that way, assuming all files on a SMB server should be precomposed by default. Only that your NFS client does not play along. So you may want to make NFS behave the same way, i.e. precompose paths when creating files on the server side.
6. Open the smb_ü folder on the NFS share. You'll see the file inside, but double clicking doesn't work. That's the bug I mentioned above about the NSURL ops incorrectly normalizing the path in some cases, thereby failing to access the precomposed path from the server.


--- Response by Apple ---

This a known NFS issue with precomposed and decomposed.
As mentioned, Linux systems preform precomposed file names (NFC), while macOS/iOS userspace frameworks all default decomposed (NFD).
So no matter what is provided to them (NFC or NFD) any pathname that comes in from an Apple framework will always be in NFD and the FS has to deal with.
You should mount your NFS share using “nfc” parameter to instruct the client to use precomposed instead of the default decomposed.
We were able to open both precomposed/decomposed files and folders while mounting with “nfc” enabled.
Please let us know if it helps to resolve the issue.

mount_nfs manual page :  
   nfc     Convert name strings to Unicode Normalization Form C (NFC) when sending them to the NFS server.  This option may be used to improve interoperability with NFS clients and servers that typically use names in the NFC form.

Comments


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!