10.9 SDK NSLinguisticTagger -enumerateTagsInRange:scheme:options:usingBlock: and possibleTagsAtIndex:scheme:tokenRange:sentenceRange:scores: do not find the same results

Originator:chaos42
Number:rdar://18096401 Date Originated:21-Aug-2014 04:55 PM
Status:Open Resolved:
Product:OS X SDK Product Version:10.9.3/13D65, Xcode 5.1.1 5B1008
Classification:Other Bug Reproducible:Sometimes
 
Summary:
-[NSLinguisticTagger enumerateTagsInRange:scheme:options:usingBlock:] and repeated use of -[NSLinguisticTagger possibleTagsAtIndex:scheme:tokenRange:sentenceRange:scores:] do not find the same lexical class tags when used on the same text. In particular, possibleTags appears to work much more reliably than enumerateTags. enumerateTags often finds nouns when it should be finding verbs, and often finds Other Words when it should be finding anything else. For instance, on the sentence fragment "describe Contents", enumerateTags finds both words are nouns. possibleTags correctly finds a verb and a noun, the former with score 1, so enumerateTags should definitely be finding that. Sometimes the opposite is true as well (e.g. "start Write" works better with enumerateTags than possibleTags).

Steps to Reproduce:
1. Open the attached Xcode project
2. Run the attached Xcode project


Expected Results:
The pairs of output lines should be the same

Actual Results:
The pairs of output lines are different. 

Version:
10.9.3/13D65, Xcode 5.1.1 5B1008

Notes:


Configuration:


Attachments:
#import <Foundation/Foundation.h>

int main(int argc, const char * argv[])
{

    @autoreleasepool
    {
        NSString *names = @"describe Contents\nfind View By Id\nstart Write";
        NSMutableArray *wordsByPartOfSpeech;
        for (NSString *spaceyString in [names componentsSeparatedByString:@"\n"])
        {
            NSLinguisticTagger *lingusticTagger = [[NSLinguisticTagger alloc] initWithTagSchemes:@[NSLinguisticTagSchemeLexicalClass] options:NSLinguisticTaggerOmitWhitespace];
            [lingusticTagger setString:spaceyString];
            
            wordsByPartOfSpeech = [NSMutableArray array];
            NSInteger searchIndex = 0;
            while (searchIndex < [spaceyString length])
            {
                NSArray *scores = nil;
                NSRange tokenRange;
                NSRange sentenceRange;
                
                NSArray *possibleTags = [lingusticTagger possibleTagsAtIndex:searchIndex scheme:NSLinguisticTagSchemeLexicalClass tokenRange:&tokenRange sentenceRange:&sentenceRange scores:&scores];
                NSDictionary *dict;
                dict = @{ @"partOfSpeech" : possibleTags.count == 0 ? NSLinguisticTagOtherWord : possibleTags[0], @"word" : [spaceyString substringWithRange:tokenRange]};
                [wordsByPartOfSpeech addObject:dict];
                searchIndex += tokenRange.length + 1;
            }
            NSLog(@"possibleTags: %@", wordsByPartOfSpeech);
            wordsByPartOfSpeech = [NSMutableArray array];
            [lingusticTagger enumerateTagsInRange:NSMakeRange(0, spaceyString.length) scheme:NSLinguisticTagSchemeLexicalClass options:NSLinguisticTaggerOmitWhitespace usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
                NSDictionary *dict = @{ @"partOfSpeech" : tag, @"word" : [spaceyString substringWithRange:tokenRange]};
               [wordsByPartOfSpeech addObject:dict];
            }];
            NSLog(@"enumerateTags: %@", wordsByPartOfSpeech);

        }
    }
    return 0;
}

Comments


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!