10.9 SDK NSLinguisticTagger -enumerateTagsInRange:scheme:options:usingBlock: and possibleTagsAtIndex:scheme:tokenRange:sentenceRange:scores: do not find the same results
| Originator: | chaos42 | ||
| Number: | rdar://18096401 | Date Originated: | 21-Aug-2014 04:55 PM |
| Status: | Open | Resolved: | |
| Product: | OS X SDK | Product Version: | 10.9.3/13D65, Xcode 5.1.1 5B1008 |
| Classification: | Other Bug | Reproducible: | Sometimes |
Summary:
-[NSLinguisticTagger enumerateTagsInRange:scheme:options:usingBlock:] and repeated use of -[NSLinguisticTagger possibleTagsAtIndex:scheme:tokenRange:sentenceRange:scores:] do not find the same lexical class tags when used on the same text. In particular, possibleTags appears to work much more reliably than enumerateTags. enumerateTags often finds nouns when it should be finding verbs, and often finds Other Words when it should be finding anything else. For instance, on the sentence fragment "describe Contents", enumerateTags finds both words are nouns. possibleTags correctly finds a verb and a noun, the former with score 1, so enumerateTags should definitely be finding that. Sometimes the opposite is true as well (e.g. "start Write" works better with enumerateTags than possibleTags).
Steps to Reproduce:
1. Open the attached Xcode project
2. Run the attached Xcode project
Expected Results:
The pairs of output lines should be the same
Actual Results:
The pairs of output lines are different.
Version:
10.9.3/13D65, Xcode 5.1.1 5B1008
Notes:
Configuration:
Attachments:
#import <Foundation/Foundation.h>
int main(int argc, const char * argv[])
{
@autoreleasepool
{
NSString *names = @"describe Contents\nfind View By Id\nstart Write";
NSMutableArray *wordsByPartOfSpeech;
for (NSString *spaceyString in [names componentsSeparatedByString:@"\n"])
{
NSLinguisticTagger *lingusticTagger = [[NSLinguisticTagger alloc] initWithTagSchemes:@[NSLinguisticTagSchemeLexicalClass] options:NSLinguisticTaggerOmitWhitespace];
[lingusticTagger setString:spaceyString];
wordsByPartOfSpeech = [NSMutableArray array];
NSInteger searchIndex = 0;
while (searchIndex < [spaceyString length])
{
NSArray *scores = nil;
NSRange tokenRange;
NSRange sentenceRange;
NSArray *possibleTags = [lingusticTagger possibleTagsAtIndex:searchIndex scheme:NSLinguisticTagSchemeLexicalClass tokenRange:&tokenRange sentenceRange:&sentenceRange scores:&scores];
NSDictionary *dict;
dict = @{ @"partOfSpeech" : possibleTags.count == 0 ? NSLinguisticTagOtherWord : possibleTags[0], @"word" : [spaceyString substringWithRange:tokenRange]};
[wordsByPartOfSpeech addObject:dict];
searchIndex += tokenRange.length + 1;
}
NSLog(@"possibleTags: %@", wordsByPartOfSpeech);
wordsByPartOfSpeech = [NSMutableArray array];
[lingusticTagger enumerateTagsInRange:NSMakeRange(0, spaceyString.length) scheme:NSLinguisticTagSchemeLexicalClass options:NSLinguisticTaggerOmitWhitespace usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
NSDictionary *dict = @{ @"partOfSpeech" : tag, @"word" : [spaceyString substringWithRange:tokenRange]};
[wordsByPartOfSpeech addObject:dict];
}];
NSLog(@"enumerateTags: %@", wordsByPartOfSpeech);
}
}
return 0;
}
Comments
Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!