scala - What is the nature of the Key, Value, and InputFormat types passed to Spark's StreamingContext.fileStream[K, V, F]("directory") -
from understand, streaming text files directory requires key of type longwritable
, value of text
, , format of textinputformat
. these passed automatically in textfilestream()
method.
is key in case line number, value being text on line?
what should key , value types parquetinputformat
- , more generally, how can figure out myself regarding other file types?
also, how these types relate dstream
returned method? if pass parquet file has rows of, say, 100 columns, how parsed rdds , dstreams spark?
for parquetinputformat think key type must void, , value type object representing data.
ssc.filestream[void, yourobject, parquetinputformat[yourobject]]("hdfs:...")
Comments
Post a Comment