NSLinguisticTagger fails to separate numbers from nouns
| Originator: | phase.of.matter | ||
| Number: | rdar://14190417 | Date Originated: | 18-Jun-2013 01:34 PM |
| Status: | Open | Resolved: | |
| Product: | 10.9 | Product Version: | 13A476u |
| Classification: | Other Bug | Reproducible: | Always |
Summary: In the first Mavericks beta, the NSLinguisticTagger fails to separate numbers following nouns in specific situations (did not happen in 10.8). In the following sentences the chunk "Creatinine 95" (and in some even "GFR 32") comes back as one token instead of two: "GFR, 32 years, Creatinine 95, male, caucasian." "GFR 32 years, Creatinine 95, male, caucasian." "32 years, Creatinine 95, male, caucasian." In the following sentences, "Creatinine" and "95" come back as their own tokens as expected: "GFR, 32 years Creatinine 95, male, caucasian." "GFR 32 years Creatinine 95, male, caucasian." "GFR, Creatinine 95, male, caucasian." "Creatinine 95, male, caucasian." Expected Results: If the "NSLinguisticTaggerJoinNames" option is not given, no name/noun joining should be attempted. Actual Results: Even if the "NSLinguisticTaggerJoinNames" is not given, the tagger seems to generate heuristics which prompts it to join tokens into nouns that it might think are names. Regression: The problem with the joining is that the tagger does not (and I don't expect it to) correctly parse this "sentence". It is thus important that it does not try to be smart and just returns the tokens. "95" is obviously a number, in no instance should it be joined to a noun if the join-names option is not given.
Comments
Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!