pyspark - trouble in adding spark-csv package in Cloudera VM -


i using cloudera quickstart vm test out pyspark work. 1 task, need add spark-csv package. , here did:

pyspark_driver_python=ipython pyspark -- packages com.databricks:spark-csv_2.10:1.3.0 

pyspark started fine, did warnings as:

**16/02/09 17:41:22 warn util.utils: hostname, quickstart.cloudera resolves loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface eth0) 16/02/09 17:41:22 warn util.utils: set spark_local_ip if need bind address 16/02/09 17:41:26 warn util.nativecodeloader: unable load native-hadoop library platform... using builtin-java classes applicable** 

then ran code in pyspark:

yelp_df = sqlctx.load(      source="com.databricks.spark.csv",         header = 'true',         inferschema = 'true',         path = 'file:///directory/file.csv') 

but getting error message:

py4jjavaerror: error occurred while calling o19.load.: java.lang.runtimeexception: failed load class data source:    com.databricks.spark.csv @ scala.sys.package$.error(package.scala:27) 

what have gone wrong?? in advance help.

try this

pyspark_driver_python=ipython pyspark --packages com.databricks:spark-csv_2.10:1.3.0

without space, there's typo.


Comments

Popular posts from this blog

javascript - jQuery: Add class depending on URL in the best way -

caching - How to check if a url path exists in the service worker cache -

Redirect to a HTTPS version using .htaccess -