logistic regression - model.predictProbabilities() for LogisticRegression in Spark? -
i'm running multi-class logistic regression (withlbfgs) spark 1.6.
given x , possible labels {1.0,2.0,3.0} final model only output best prediction, 2.0.
if i'm interested know second best prediction, 3.0, how retrieve information?
in naivebayes use model.predictprobabilities() function each sample output vector probabilities each possible outcome.
there 2 ways logistic regression in spark: spark.ml
, spark.mllib
.
with dataframes can use spark.ml
:
import org.apache.spark import sqlcontext.implicits._ def p(label: double, a: double, b: double) = new spark.mllib.regression.labeledpoint( label, new spark.mllib.linalg.densevector(array(a, b))) val data = sc.parallelize(seq(p(1.0, 0.0, 0.5), p(0.0, 0.5, 1.0))) val df = data.todf val model = new spark.ml.classification.logisticregression().fit(df) model.transform(df).show
you raw predictions , probabilities:
+-----+---------+--------------------+--------------------+----------+ |label| features| rawprediction| probability|prediction| +-----+---------+--------------------+--------------------+----------+ | 1.0|[0.0,0.5]|[-19.037302860930...|[5.39764620520461...| 1.0| | 0.0|[0.5,1.0]|[18.9861466274786...|[0.99999999431904...| 0.0| +-----+---------+--------------------+--------------------+----------+
with rdds can use spark.mllib
:
val model = new spark.mllib.classification.logisticregressionwithlbfgs().run(data)
this model not expose raw predictions , probabilities. can take @ predictpoint
. multiplies vectors , picks class highest prediction. weights publicly accessible, copy algorithm , save predictions instead of returning highest one.
Comments
Post a Comment