NSURL cannot handle Unicode strings

Originator:craig.hockenberry
Number:rdar://6923664 Date Originated:
Status:Open Resolved:
Product:iPhone SDK Product Version:All
Classification:Serious Bug Reproducible:Always
 
Summary: 
URLs that contain Unicode characters are not handled correctly by NSURL.

Steps to Reproduce:
1) Open the attached project.
2) Click on the "Open Unicode Domain" button.

Expected Results:
The URL should open in the user's browser.

Actual Results:
The NSURL instance returned by [NSURL URLWithString:@"http://✪df.ws/d61"] is nil.

Regression:
This is a problem on both Mac and iPhone.

Notes:
If the domain name is decoded according to RFC 3492, the NSURL instance returned is valid and can be used to open a URL.

The sample project is located here: http://files.iconfactory.net/craig/bugs/UnicodeURLTouch.zip

Comments

Workaround

I created an NSURL category to work around this limitation. Available here: http://files.iconfactory.net/sean/NSURL+IFUnicodeURL/

Updated bug report

The URL used in the test cases "http://✪df.ws/d61" now works on the iPhone because of a server configuration change that allowed Unicode URLs in the Host: HTTP header. RFC 2616 (section 3.2.1 and 3.2.2) states that the Host: header field should conform to RFC 2396 (so punycode should be used in this case as it was for the domain.)

Use "http://✩.ws/꼙" as a new test URL. It will fail because the server doesn't know how to handle a Host: with Unicode.

By craig.hockenberry at May 31, 2009, 9:08 p.m. (reply...)

Updated bug report

Upon further RTFM, adding I tried adding percent escapes to the URL string conform to RFC 2396. For example:


NSString *test = @"http://✪df.ws/d61"; // OK

NSString URLString = [(NSString ) CFURLCreateStringByAddingPercentEscapes(NULL, (CFStringRef) test, CFSTR(""), NULL, kCFStringEncodingUTF8) autorelease];

NSURL *URL = [NSURL URLWithString:URLString];

On the Mac, this allows NSWorkspace's -openURL: to process the URL correctly. Unfortunately, this is not the case on the iPhone. An updated test case is attached.

Note that manually converting the Unicode host name to punycode (in -openUnicodePath:) causes UIApplication's -openURL: to work correctly. It should not be the caller's responsibility to do the conversion required by RFC 3492.

By craig.hockenberry at May 27, 2009, 7:35 p.m. (reply...)

Gus, this workaround seems to work fine on the Mac, but leads to a 404 page using the example URL on the iPhone. A 301 redirect ends up at http://daringfireball.net/d61. I don't understand why: https://twitter.com/chockenberry/statuses/1931339560

By craig.hockenberry at May 27, 2009, 3:33 a.m. (reply...)

silly formatting. You get the idea :)

CFURLCreateStringByAddingPercentEscapes?

What's wrong with using CFURLCreateStringByAddingPercentEscapes ? That's what VoodooPad does before handing a string to NSURL's URLWithString:, and the links work fine...

For example, this works:

@interface NSURL (UnicodeStuff) + (id)URLWithUTF8String:(NSString *)URLString; @end

@implementation NSURL (UnicodeStuff) + (id)URLWithUTF8String:(NSString *)URLString {

URLString = [(NSString *) CFURLCreateStringByAddingPercentEscapes(NULL, (CFStringRef) URLString, CFSTR(""), NULL, kCFStringEncodingUTF8) autorelease];

return [self URLWithString:URLString];

}

@end

@implementation UnicodeURLController

  • (IBAction)openUnicodeDomain:(id)sender { NSURL *URL = [NSURL URLWithUTF8String:@"http://✪df.ws/d61"]; if (URL) { [[NSWorkspace sharedWorkspace] openURL:URL]; } else { NSLog(@"Could not handle Unicode URL"); NSBeep(); } }

I know you'd expect URLWithString to "just work" - but I'd imagine there's a ton of legacy code that would break should they change it.


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!