Spark 1.6: filtering DataFrames generated by describe() -

January 15, 2014

the problem arises when call describe function on dataframe:

val statsdf = mydataframe.describe()

calling describe function yields following output:

statsdf: org.apache.spark.sql.dataframe = [summary: string, count: string]

i can show statsdf calling statsdf.show()

+-------+------------------+ |summary|             count| +-------+------------------+ |  count|             53173| |   mean|104.76128862392568| | stddev|3577.8184333911513| |    min|                 1| |    max|            558407| +-------+------------------+

i standard deviation , mean statsdf, when trying collect values doing like:

val temp = statsdf.where($"summary" === "stddev").collect()

i getting task not serializable exception.

i facing same exception when call:

statsdf.where($"summary" === "stddev").show()

it looks cannot filter dataframes generated describe() function?

i have considered toy dataset had containing health disease data

val stddev_tobacco = rawdata.describe().rdd.map{      case r : row => (r.getas[string]("summary"),r.get(1)) }.filter(_._1 == "stddev").map(_._2).collect

Search This Blog

Color

Spark 1.6: filtering DataFrames generated by describe() -

Comments

Post a Comment

Popular posts from this blog

Redirect to a HTTPS version using .htaccess -

Unlimited choices in BASH case statement -

javascript - jQuery: Add class depending on URL in the best way -