logistic regression - model.predictProbabilities() for LogisticRegression in Spark? -


i'm running multi-class logistic regression (withlbfgs) spark 1.6.

given x , possible labels {1.0,2.0,3.0} final model only output best prediction, 2.0.

if i'm interested know second best prediction, 3.0, how retrieve information?

in naivebayes use model.predictprobabilities() function each sample output vector probabilities each possible outcome.

there 2 ways logistic regression in spark: spark.ml , spark.mllib.

with dataframes can use spark.ml:

import org.apache.spark import sqlcontext.implicits._  def p(label: double, a: double, b: double) =   new spark.mllib.regression.labeledpoint(     label, new spark.mllib.linalg.densevector(array(a, b)))  val data = sc.parallelize(seq(p(1.0, 0.0, 0.5), p(0.0, 0.5, 1.0))) val df = data.todf  val model = new spark.ml.classification.logisticregression().fit(df) model.transform(df).show 

you raw predictions , probabilities:

+-----+---------+--------------------+--------------------+----------+ |label| features|       rawprediction|         probability|prediction| +-----+---------+--------------------+--------------------+----------+ |  1.0|[0.0,0.5]|[-19.037302860930...|[5.39764620520461...|       1.0| |  0.0|[0.5,1.0]|[18.9861466274786...|[0.99999999431904...|       0.0| +-----+---------+--------------------+--------------------+----------+ 

with rdds can use spark.mllib:

val model = new spark.mllib.classification.logisticregressionwithlbfgs().run(data) 

this model not expose raw predictions , probabilities. can take @ predictpoint. multiplies vectors , picks class highest prediction. weights publicly accessible, copy algorithm , save predictions instead of returning highest one.


Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -