csv - tokenize in python using pandas -

April 15, 2014

i trying tokenize dataframe 1 coulmn , using followng code:

def main(args):     df = pd.dataframe(pd.read_csv(args[1]), index= none)     doc_set = pd.dataframe(df.country)     tokenizer = regexptokenizer(r'\w+')     en_stop = get_stop_words('en')     p_stemmer = porterstemmer()     texts = []     print doc_set     in doc_set:         raw = i.lower()         tokens = tokenizer.tokenize(raw)             stopped_tokens = [i in tokens if not in en_stop]                     stemmed_tokens = [p_stemmer.stem(i) in stopped_tokens]                 texts.append(stemmed_tokens)

this code outputs me header of dataframe have created csv file: please me in finding whats wrong in approach.

when python starts spitting out things make no sense me, have gotten in habit of downloading latest source, compiling /usr/local , reinstalling pip. strangely, fixes things.

Search This Blog

Color

csv - tokenize in python using pandas -

Comments

Post a Comment

Popular posts from this blog

Redirect to a HTTPS version using .htaccess -

Unlimited choices in BASH case statement -

javascript - jQuery: Add class depending on URL in the best way -