UITextView can't handle emojis with more than one Unicode codepoint

Originator:villev
Number:rdar://35050622 Date Originated:October 18 2017
Status:Open Resolved:No
Product:UIKit or Swift compiler Product Version:
Classification: Reproducible:Yes
 
Summary: 
UITextView text count is incorrectly calculated when using emojis where a single emoji contains more than one unicode codepoint. When typing emoji "๐Ÿ‘ฆ๐Ÿฟ" to UITextView 64 times textView.text.characters.count will return the correct number: 64. However when typing 65 or more characters, the textView.text.characters.count will return double the correct amount: 65 -> 130, 66 -> 132 etc.

If using "๐Ÿ‘ฆ"-emoji, which only has a single Unicode codepoint, the calculation is correct.


Steps to Reproduce:
Create a textview and insert 65 or more "๐Ÿ‘ฆ๐Ÿฟ"-emojis in it and print textView.characters.count.

Expected Results:
65 or more

Actual Results:
130, 132, 134...

Version/Build:
XCode 9, Swift 3

Comments

Our team made some research on the issue.

Here is summary of the findings:

  • UITextView and UITextField seem to use NSBigMutableString as their internal buffer (i.e. when you invoke .text on them, thatโ€™s what gets returned)

  • When an NSBigMutableString that contains less or 256 UTF-16 units gets bridged to a Swift String, it works correctly

  • When an NSBigMutableString that contains more than 256 UTF-16 units gets bridged to a Swift String, grapheme clustering is not performed correctly โ€” i.e. each code point gets converted to a Character, even ones that should be clustered together with previous code points (like variation selectors)


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!