Supported Platforms¶

Sparkling Water can run on top of Spark in the various ways; however starting Sparkling Water requires different configurations on different environments:

Local¶

In this case Sparkling Water runs as a local cluster (Spark master variable points to one of the values local, local[*] or additional local modes available at Spark Master URLs).

Standalone Spark Cluster¶

Spark documentation: Spark Standalone Mode

YARN¶

Spark documentation: Running Spark on YARN

When submitting a Sparkling Water application to a CHD or Apache Hadoop cluster, the command to submit may look like:

./spark-submit --master=yarn --deploy-mode=client --class water.SparklingWaterDriver
--driver-memory=8G --num-executors=3 --executor-memory=3G --conf "spark.executor.extraClassPath=-Dhdp.version=current"
sparkling-water-assembly-2.4.11-SNAPSHOT-91-all.jar

When submitting a Sparkling Water application to an HDP Cluster, the command to submit may look like:

./spark-submit --master=yarn --deploy-mode=client --class water.SparklingWaterDriver --conf "spark.yarn.am.extraJavaOptions=-Dhdp.version=current"
--driver-memory=8G --num-executors=3 --executor-memory=3G --conf "spark.executor.extraClassPath=-Dhdp.version=current"
sparkling-water-assembly-2.4.11-SNAPSHOT-91-all.jar

The only difference between the HDP cluster and the CDH and Apache Hadoop clusters is that we need to add -Dhdp.version=current to both the spark.executor.extraClassPath and spark.yarn.am.extraJavaOptions (resp., spark.driver.extraJavaOptions) configuration properties in the HDP case.

Mesos¶

Spark documentation: Running Spark on Mesos