csv - tokenize in python using pandas -
i trying tokenize dataframe 1 coulmn , using followng code:
def main(args): df = pd.dataframe(pd.read_csv(args[1]), index= none) doc_set = pd.dataframe(df.country) tokenizer = regexptokenizer(r'\w+') en_stop = get_stop_words('en') p_stemmer = porterstemmer() texts = [] print doc_set in doc_set: raw = i.lower() tokens = tokenizer.tokenize(raw) stopped_tokens = [i in tokens if not in en_stop] stemmed_tokens = [p_stemmer.stem(i) in stopped_tokens] texts.append(stemmed_tokens)
this code outputs me header of dataframe have created csv file: please me in finding whats wrong in approach.
when python starts spitting out things make no sense me, have gotten in habit of downloading latest source, compiling /usr/local , reinstalling pip. strangely, fixes things.
Comments
Post a Comment