how to do LDA in R -

March 15, 2011

my task apply lda on dataset of amazon reviews , 50 topics

i have extracted review text in vector , trying apply lda

i have created dtm

matrix <- create_matrix(dat, language="english", removestopwords=true,  stemwords=false, stripwhitespace=true, tolower=true)  <<documenttermmatrix (documents: 100000, terms: 174632)>> non-/sparse entries: 4096244/17459103756 sparsity           : 100% maximal term length: 218 weighting          : term frequency (tf)

but when try following error:

lda <- lda(matrix, 30)

error in lda(matrix, 30) :    each row of input matrix needs contain @ least 1 non-zero entry

searched solutions , used slam

    matrix1 <- rollup(matrix, 2, na.rm=true, fun = sum)

still getting same error

i new can me or suggest me reference study this.it helpful

there no empty rows in original matrix , contains 1 column contain reviews

i have been assigned kind of similar task , learning , doing , have developed , sharing code snippet , hope help.

library("topicmodels") library("tm")  func<-function(input){  x<-c("i eat broccoli , bananas.",         "i ate banana , spinach smoothie breakfast.",  "chinchillas , kittens cute.", "my sister adopted kitten yesterday.", "look @ cute hamster munching on piece of broccoli.")    #whole file lowercased #text<-tolower(x)  #deleting common words text #text2<-setdiff(text,stopwords("english"))  #splitting text vectors each vector word.. #text3<-strsplit(text2," ")  # generating structured text i.e. corpus docs<-corpus(vectorsource(x))

creating content transformers i.e functions used modify objects in r..

tospace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))  #removing special charecters..  docs <- tm_map(docs, tospace, "/") docs <- tm_map(docs, tospace, "@") docs <- tm_map(docs, tospace, "\\|") docs <- tm_map(docs, removenumbers)  # remove english common stopwords docs <- tm_map(docs, removewords, stopwords("english"))  # remove punctuations docs <- tm_map(docs, removepunctuation)  # eliminate white spaces docs <- tm_map(docs, stripwhitespace)  docs<-tm_map(docs,removewords,c("\t"," ",""))  dtm<- termdocumentmatrix(docs, control = list(removepunctuation = true, stopwords=true))      #print(dtm)   freq<-colsums(as.matrix(dtm))     print(names(freq))   ord<-order(freq,decreasing=true)  write.csv(freq[ord],"word_freq.csv")

setting parameters lda

        burnin<-4000         iter<-2000         thin<-500         seed<-list(2003,5,63,100001,765)         nstart<-5         best<-true          #number of topics         k<-3  # docs topics         ldaout<-lda(dtm,k,method="gibbs",control=list(nstart=nstart,seed=seed,best=best,burnin=burnin,iter=iter,thin=thin))      ldaout.topics<-as.matrix(topics(ldaout))     write.csv(ldaout.topics,file=paste("ldagibbs",k,"docstotopics.csv"))

Search This Blog

Color

how to do LDA in R -

creating content transformers i.e functions used modify objects in r..

setting parameters lda

Comments

Post a Comment

Popular posts from this blog

Redirect to a HTTPS version using .htaccess -

Unlimited choices in BASH case statement -

javascript - jQuery: Add class depending on URL in the best way -