python - Getting AttributeError on nltk Textual entailment classifier -
im referring link in section http://www.nltk.org/book/ch06.html#recognizing-textual-entailment
def rte_features(rtepair): extractor = nltk.rtefeatureextractor(rtepair) features = {} features['word_overlap'] = len(extractor.overlap('word')) features['word_hyp_extra'] = len(extractor.hyp_extra('word')) features['ne_overlap'] = len(extractor.overlap('ne')) features['ne_hyp_extra'] = len(extractor.hyp_extra('ne')) return features rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml']) extractor = nltk.rtefeatureextractor(rtepair) --------------------------------------------------------------------------- attributeerror traceback (most recent call last) <ipython-input-39-a7f96e33ba9e> in <module>() ----> 1 extractor = nltk.rtefeatureextractor(rtepair) c:\users\ravina\anaconda2\lib\site-packages\nltk\classify\rte_classify.pyc in __init__(self, rtepair, stop, lemmatize) 65 66 #get set of word types text , hypothesis ---> 67 self.text_tokens = tokenizer.tokenize(rtepair.text) 68 self.hyp_tokens = tokenizer.tokenize(rtepair.hyp) 69 self.text_words = set(self.text_tokens) attributeerror: 'list' object has no attribute 'text'
its exact code mentioned in book, can me whats going wrong here. ravina
take @ type signatures. type python shell:
import nltk x = nltk.corpus.rte.pairs(['rte3_dev.xml']) type(x)
tells x
of type list.
now, type:
help(nltk.rtefeatureextractor)
which tells you:
:param rtepair:
rtepair
features should extracted
clearly, x
not have correct type calling nltk.rtefeatureextractor
. instead:
type(x[33]) <class 'nltk.corpus.reader.rte.rtepair'>
a single item of list have correct type.
update: mentioned in comment section, extractor.text_words
shows empty strings. seems due changes made in nltk since documentation written. long story short: won't able fix without downgrading older version of nltk or fixing problem in nltk yourself. inside file nltk/classify/rte_classify.py
, find following piece of code:
class rtefeatureextractor(object): … import nltk nltk.tokenize import regexptokenizer tokenizer = regexptokenizer('([a-z]\.)+|\w+|\$[\d\.]+') self.text_tokens = tokenizer.tokenize(rtepair.text) self.text_words = set(self.text_tokens)
if run same regexptokenizer
exact text extractor, produce empty strings:
import nltk rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])[33] nltk.tokenize import regexptokenizer tokenizer = regexptokenizer('([a-z]\.)+|\w+|\$[\d\.]+') tokenizer.tokenize(rtepair.text)
returns ['', '', …, '']
(i.e., list of empty strings).
Comments
Post a Comment