Writing Spark RDD as Gzipped file in Amazon s3 -
i have output rdd in spark code written in python. want save in amazon s3 gzipped file. have tried following functions. below function correctly saves output rdd in s3 not in gzipped format.
output_rdd.saveastextfile("s3://<name-of-bucket>/")
the below function returns error:: typeerror: saveashadoopfile() takes @ least 3 arguments (3 given)
output_rdd.saveashadoopfile("s3://<name-of-bucket>/", compressioncodecclass="org.apache.hadoop.io.compress.gzipcodec" )
please guide me correct way this.
you need specify output format well.
try this:
output_rdd.saveashadoopfile("s3://<name-of-bucket>/", "org.apache.hadoop.mapred.textoutputformat", compressioncodecclass="org.apache.hadoop.io.compress.gzipcodec")
you can use of hadoop-supported compression codecs:
- gzip: org.apache.hadoop.io.compress.gzipcodec
- bzip2: org.apache.hadoop.io.compress.bzip2codec
- lzo: com.hadoop.compression.lzo.lzopcodec
Comments
Post a Comment