Supported Platforms¶
Sparkling Water can run on top of Spark in the various ways; however starting Sparkling Water requires different configurations on different environments:
Local¶
In this case Sparkling Water runs as a local cluster (Spark master
variable points to one of the values local
, local[*]
or additional local modes available at
Spark Master URLs).
Standalone Spark Cluster¶
Spark documentation: Spark Standalone Mode
YARN¶
Spark documentation: Running Spark on YARN
When submitting a Sparkling Water application to a CHD or Apache Hadoop cluster, the command to submit may look like:
./spark-submit --master=yarn --deploy-mode=client --class water.SparklingWaterDriver
--driver-memory=8G --num-executors=3 --executor-memory=3G --conf "spark.executor.extraClassPath=-Dhdp.version=current"
sparkling-water-assembly-2.4.12-SNAPSHOT-98-all.jar
When submitting a Sparkling Water application to an HDP Cluster, the command to submit may look like:
./spark-submit --master=yarn --deploy-mode=client --class water.SparklingWaterDriver --conf "spark.yarn.am.extraJavaOptions=-Dhdp.version=current"
--driver-memory=8G --num-executors=3 --executor-memory=3G --conf "spark.executor.extraClassPath=-Dhdp.version=current"
sparkling-water-assembly-2.4.12-SNAPSHOT-98-all.jar
The only difference between the HDP cluster and the CDH and Apache Hadoop clusters is that we need to add -Dhdp.version=current
to both the spark.executor.extraClassPath
and spark.yarn.am.extraJavaOptions
(resp., spark.driver.extraJavaOptions
) configuration properties in the HDP case.
Mesos¶
Spark documentation: Running Spark on Mesos