(NS)XMLElement fails to correctly encode certain characters when using ISO-8859-* charsets, breaking the resulting XML's validity
| Originator: | kastansn | ||
| Number: | rdar://27105764 | Date Originated: | 2016/06/30 |
| Status: | Open | Resolved: | |
| Product: | OS X SDK | Product Version: | |
| Classification: | Reproducible: | Always |
Summary: When XML data contains elements consisting of certain offending characters like the en dash (–), umlauts or Emoji, an attempt to convert such XML documents into the ISO-8859-15 character set (and others) fails. Worse, in addition the XML document's validity is broken because a closing quotation mark is omitted in the resulting XML. Steps to Reproduce: Paste the following code into a macOS playground: import Cocoa // Select an offending character, for example a simple – a.k.a. – a.k.a. U+2013 (nothing too exotic, is it?) // The bug is also triggered by other characters such as Ä or 🤔. let char = "–" // Create a XML document which contains the offending character. let attribute = XMLElement.attribute(withName: "enDash", stringValue: char) as! XMLNode let element = XMLElement.element(withName: "Demo", children: nil, attributes: [attribute]) as! XMLElement let doc = XMLDocument.document(withRootElement: element) as! XMLDocument // This is the statement that causes the bug. // Some other encodings, f.ex. "UTF-8" or "ISO-2022-KR", do work correctly. doc.characterEncoding = "ISO-8859-15" // This object already contains corrupted XML. The next step is only necessary to visualize it. let data = doc.xmlData // Prints // <Demo enDash=""></Demo> // and renders the whole XML invalid because of the missing closing quotation mark after the semicolon. // // It *should* print the following: // <Demo enDash="–"></Demo> print(String.init(data: data, encoding: String.Encoding.utf8)!) Expected Results: Cocoa should be able to correctly translate the whole XML into the ISO-8859-15 charset. Actual Results: It fails converting the offending character, which is bad, but additionally breaks the XML's validity by omitting a closing quotation mark: <Demo enDash=""></Demo> Version: Xcode 8.0 beta (8S128d) OS X 10.11.5 (15F34) Configuration: This bug is present in OS X SDKs down to at least version 10.9
Comments
Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!