java - How to compute summary statistic on Cassandra table with Spark DataFrame? -


i'm trying min, max mean of cassandra/spark data need java.

import org.apache.spark.sql.dataframe; import static org.apache.spark.sql.functions.*;  dataframe df = sqlcontext.read()         .format("org.apache.spark.sql.cassandra")         .option("table",  "sometable")         .option("keyspace", "somekeyspace")         .load();  df.groupby(col("keycolumn"))         .agg(min("valuecolumn"), max("valuecolumn"), avg("valuecolumn"))         .show(); 

edited show working version: make sure put " around sometable , somekeyspace

just import data dataframe , apply required aggregations:

import org.apache.spark.sql.dataframe; import static org.apache.spark.sql.functions.*;  dataframe df = sqlcontext.read()         .format("org.apache.spark.sql.cassandra")         .option("table", sometable)         .option("keyspace", somekeyspace)         .load();  df.groupby(col("keycolumn"))         .agg(min("valuecolumn"), max("valuecolumn"), avg("valuecolumn"))         .show(); 

where sometable , somekeyspace store table name , keyspace respectively.


Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -