java - How to compute summary statistic on Cassandra table with Spark DataFrame? -
i'm trying min, max mean of cassandra/spark data need java.
import org.apache.spark.sql.dataframe; import static org.apache.spark.sql.functions.*; dataframe df = sqlcontext.read() .format("org.apache.spark.sql.cassandra") .option("table", "sometable") .option("keyspace", "somekeyspace") .load(); df.groupby(col("keycolumn")) .agg(min("valuecolumn"), max("valuecolumn"), avg("valuecolumn")) .show();
edited show working version: make sure put " around sometable , somekeyspace
just import data dataframe
, apply required aggregations:
import org.apache.spark.sql.dataframe; import static org.apache.spark.sql.functions.*; dataframe df = sqlcontext.read() .format("org.apache.spark.sql.cassandra") .option("table", sometable) .option("keyspace", somekeyspace) .load(); df.groupby(col("keycolumn")) .agg(min("valuecolumn"), max("valuecolumn"), avg("valuecolumn")) .show();
where sometable
, somekeyspace
store table name , keyspace respectively.
Comments
Post a Comment