python - Getting AttributeError on nltk Textual entailment classifier -


im referring link in section http://www.nltk.org/book/ch06.html#recognizing-textual-entailment

def rte_features(rtepair):     extractor = nltk.rtefeatureextractor(rtepair)     features = {}     features['word_overlap'] = len(extractor.overlap('word'))     features['word_hyp_extra'] = len(extractor.hyp_extra('word'))     features['ne_overlap'] = len(extractor.overlap('ne'))     features['ne_hyp_extra'] = len(extractor.hyp_extra('ne'))     return features rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])  extractor = nltk.rtefeatureextractor(rtepair) --------------------------------------------------------------------------- attributeerror                            traceback (most recent call last) <ipython-input-39-a7f96e33ba9e> in <module>() ----> 1 extractor = nltk.rtefeatureextractor(rtepair)  c:\users\ravina\anaconda2\lib\site-packages\nltk\classify\rte_classify.pyc in __init__(self, rtepair, stop, lemmatize)      65       66         #get set of word types text , hypothesis ---> 67         self.text_tokens = tokenizer.tokenize(rtepair.text)      68         self.hyp_tokens = tokenizer.tokenize(rtepair.hyp)      69         self.text_words = set(self.text_tokens)  attributeerror: 'list' object has no attribute 'text' 

its exact code mentioned in book, can me whats going wrong here. ravina

take @ type signatures. type python shell:

import nltk x = nltk.corpus.rte.pairs(['rte3_dev.xml']) type(x) 

tells x of type list.

now, type:

help(nltk.rtefeatureextractor) 

which tells you:

:param rtepair: rtepair features should extracted

clearly, x not have correct type calling nltk.rtefeatureextractor. instead:

type(x[33]) <class 'nltk.corpus.reader.rte.rtepair'> 

a single item of list have correct type.


update: mentioned in comment section, extractor.text_words shows empty strings. seems due changes made in nltk since documentation written. long story short: won't able fix without downgrading older version of nltk or fixing problem in nltk yourself. inside file nltk/classify/rte_classify.py, find following piece of code:

class rtefeatureextractor(object):     …     import nltk     nltk.tokenize import regexptokenizer     tokenizer = regexptokenizer('([a-z]\.)+|\w+|\$[\d\.]+')     self.text_tokens = tokenizer.tokenize(rtepair.text)     self.text_words = set(self.text_tokens) 

if run same regexptokenizer exact text extractor, produce empty strings:

import nltk rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])[33] nltk.tokenize import regexptokenizer tokenizer = regexptokenizer('([a-z]\.)+|\w+|\$[\d\.]+') tokenizer.tokenize(rtepair.text) 

returns ['', '', …, ''] (i.e., list of empty strings).


Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -