Getting The Top Terms for each Topic in LDA in R -

March 15, 2015

i implementing lda simple data sets , able topic modelling issue when trying organise top 6 terms according topics , getting numerical values ( maybe indexes )

# docs dataset formatted , cleaned     dtm<- termdocumentmatrix(docs, control = list(removepunctuation = true, stopwords=true)) ldaout<-lda(dtm,k,method="gibbs",control=list(nstart=nstart,seed=seed,best=best,burnin=burnin,iter=iter,thin=thin))  # 6 top terms in each topic  ldaout.terms<-as.matrix(terms(ldaout,6))      write.csv(ldaout.terms,file=paste("ldagibbs",k,"topicstoterms.csv"))

the topicstoterms file generated ,

    topic 1 topic 2 topic 3  1   1        5       3   2   2        1       4   3   3        2       1   4   4        3       2   5   5        4       5

while want terms (top words each topic) in tables , following -

    topic 1   topic 2     topic 3    1     hat       cat        food

you need 1 line of code fix problem:

> text = read.csv("~/desktop/your_data.csv") #your initial dataset > docs = corpus(vectorsource(text)) #converting corpus > docs = tm_map(docs, content_transformer(tolower)) #cleaning > ... #cleaning > dtm = documenttermmatrix(docs) #creating document term matrix > rownames(dtm) = text

after adding last line, can proceed remaining code, , you'll terms, , not indexes. hope helped.

Search This Blog

Color

Getting The Top Terms for each Topic in LDA in R -

Comments

Post a Comment

Popular posts from this blog

Redirect to a HTTPS version using .htaccess -

Unlimited choices in BASH case statement -

javascript - jQuery: Add class depending on URL in the best way -