logistic regression - model.predictProbabilities() for LogisticRegression in Spark? -

September 15, 2013

i'm running multi-class logistic regression (withlbfgs) spark 1.6.

given x , possible labels {1.0,2.0,3.0} final model only output best prediction, 2.0.

if i'm interested know second best prediction, 3.0, how retrieve information?

in naivebayes use model.predictprobabilities() function each sample output vector probabilities each possible outcome.

there 2 ways logistic regression in spark: spark.ml , spark.mllib.

with dataframes can use spark.ml:

import org.apache.spark import sqlcontext.implicits._  def p(label: double, a: double, b: double) =   new spark.mllib.regression.labeledpoint(     label, new spark.mllib.linalg.densevector(array(a, b)))  val data = sc.parallelize(seq(p(1.0, 0.0, 0.5), p(0.0, 0.5, 1.0))) val df = data.todf  val model = new spark.ml.classification.logisticregression().fit(df) model.transform(df).show

you raw predictions , probabilities:

+-----+---------+--------------------+--------------------+----------+ |label| features|       rawprediction|         probability|prediction| +-----+---------+--------------------+--------------------+----------+ |  1.0|[0.0,0.5]|[-19.037302860930...|[5.39764620520461...|       1.0| |  0.0|[0.5,1.0]|[18.9861466274786...|[0.99999999431904...|       0.0| +-----+---------+--------------------+--------------------+----------+

with rdds can use spark.mllib:

val model = new spark.mllib.classification.logisticregressionwithlbfgs().run(data)

this model not expose raw predictions , probabilities. can take @ predictpoint. multiplies vectors , picks class highest prediction. weights publicly accessible, copy algorithm , save predictions instead of returning highest one.

Search This Blog

Color

logistic regression - model.predictProbabilities() for LogisticRegression in Spark? -

Comments

Post a Comment

Popular posts from this blog

Redirect to a HTTPS version using .htaccess -

Unlimited choices in BASH case statement -

javascript - jQuery: Add class depending on URL in the best way -