Spark 1.6: filtering DataFrames generated by describe() -
the problem arises when call describe
function on dataframe:
val statsdf = mydataframe.describe()
calling describe function yields following output:
statsdf: org.apache.spark.sql.dataframe = [summary: string, count: string]
i can show statsdf
calling statsdf.show()
+-------+------------------+ |summary| count| +-------+------------------+ | count| 53173| | mean|104.76128862392568| | stddev|3577.8184333911513| | min| 1| | max| 558407| +-------+------------------+
i standard deviation , mean statsdf
, when trying collect values doing like:
val temp = statsdf.where($"summary" === "stddev").collect()
i getting task not serializable
exception.
i facing same exception when call:
statsdf.where($"summary" === "stddev").show()
it looks cannot filter dataframes generated describe()
function?
i have considered toy dataset had containing health disease data
val stddev_tobacco = rawdata.describe().rdd.map{ case r : row => (r.getas[string]("summary"),r.get(1)) }.filter(_._1 == "stddev").map(_._2).collect
Comments
Post a Comment