Spark 1.6: filtering DataFrames generated by describe() -


the problem arises when call describe function on dataframe:

val statsdf = mydataframe.describe() 

calling describe function yields following output:

statsdf: org.apache.spark.sql.dataframe = [summary: string, count: string] 

i can show statsdf calling statsdf.show()

+-------+------------------+ |summary|             count| +-------+------------------+ |  count|             53173| |   mean|104.76128862392568| | stddev|3577.8184333911513| |    min|                 1| |    max|            558407| +-------+------------------+ 

i standard deviation , mean statsdf, when trying collect values doing like:

val temp = statsdf.where($"summary" === "stddev").collect() 

i getting task not serializable exception.

i facing same exception when call:

statsdf.where($"summary" === "stddev").show()

it looks cannot filter dataframes generated describe() function?

i have considered toy dataset had containing health disease data

val stddev_tobacco = rawdata.describe().rdd.map{      case r : row => (r.getas[string]("summary"),r.get(1)) }.filter(_._1 == "stddev").map(_._2).collect 

Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -