pyspark - trouble in adding spark-csv package in Cloudera VM -
i using cloudera quickstart vm test out pyspark work. 1 task, need add spark-csv package. , here did:
pyspark_driver_python=ipython pyspark -- packages com.databricks:spark-csv_2.10:1.3.0
pyspark started fine, did warnings as:
**16/02/09 17:41:22 warn util.utils: hostname, quickstart.cloudera resolves loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface eth0) 16/02/09 17:41:22 warn util.utils: set spark_local_ip if need bind address 16/02/09 17:41:26 warn util.nativecodeloader: unable load native-hadoop library platform... using builtin-java classes applicable**
then ran code in pyspark:
yelp_df = sqlctx.load( source="com.databricks.spark.csv", header = 'true', inferschema = 'true', path = 'file:///directory/file.csv')
but getting error message:
py4jjavaerror: error occurred while calling o19.load.: java.lang.runtimeexception: failed load class data source: com.databricks.spark.csv @ scala.sys.package$.error(package.scala:27)
what have gone wrong?? in advance help.
try this
pyspark_driver_python=ipython pyspark --packages com.databricks:spark-csv_2.10:1.3.0
without space, there's typo.
Comments
Post a Comment