Getting The Top Terms for each Topic in LDA in R -
i implementing lda simple data sets , able topic modelling issue when trying organise top 6 terms according topics , getting numerical values ( maybe indexes )
# docs dataset formatted , cleaned dtm<- termdocumentmatrix(docs, control = list(removepunctuation = true, stopwords=true)) ldaout<-lda(dtm,k,method="gibbs",control=list(nstart=nstart,seed=seed,best=best,burnin=burnin,iter=iter,thin=thin)) # 6 top terms in each topic ldaout.terms<-as.matrix(terms(ldaout,6)) write.csv(ldaout.terms,file=paste("ldagibbs",k,"topicstoterms.csv"))
the topicstoterms file generated ,
topic 1 topic 2 topic 3 1 1 5 3 2 2 1 4 3 3 2 1 4 4 3 2 5 5 4 5
while want terms (top words each topic) in tables , following -
topic 1 topic 2 topic 3 1 hat cat food
you need 1 line of code fix problem:
> text = read.csv("~/desktop/your_data.csv") #your initial dataset > docs = corpus(vectorsource(text)) #converting corpus > docs = tm_map(docs, content_transformer(tolower)) #cleaning > ... #cleaning > dtm = documenttermmatrix(docs) #creating document term matrix > rownames(dtm) = text
after adding last line, can proceed remaining code, , you'll terms, , not indexes. hope helped.
Comments
Post a Comment