Supported Platforms¶
Sparkling Water can run on top of Spark in various ways; however, starting Sparkling Water requires different configurations on different environments:
Local¶
In this case, Sparkling Water runs as a local cluster (Spark master
variable points to one of the values local, local[*] or additional local modes available at
Spark Master URLs).
Standalone Spark Cluster¶
Spark documentation: Spark Standalone Mode
YARN¶
Spark documentation: Running Spark on YARN
When submitting a Sparkling Water application to a CHD or Apache Hadoop cluster, the command to submit may look like:
./spark-submit --master=yarn --deploy-mode=client --class ai.h2o.sparkling.SparklingWaterDriver
--driver-memory=8G --num-executors=3 --executor-memory=3G --conf "spark.executor.extraClassPath=-Dhdp.version=current"
sparkling-water-assembly-3.46.0.6-1-3.5-all.jar
When submitting a Sparkling Water application to an HDP Cluster, the command to submit may look like:
./spark-submit --master=yarn --deploy-mode=client --class ai.h2o.sparkling.SparklingWaterDriver --conf "spark.yarn.am.extraJavaOptions=-Dhdp.version=current"
--driver-memory=8G --num-executors=3 --executor-memory=3G --conf "spark.executor.extraClassPath=-Dhdp.version=current"
sparkling-water-assembly-3.46.0.6-1-3.5-all.jar
The only difference between the HDP cluster and the CDH and Apache Hadoop clusters is that we need to add -Dhdp.version=current to both the spark.executor.extraClassPath and spark.yarn.am.extraJavaOptions (resp., spark.driver.extraJavaOptions) configuration properties in the HDP case.
Mesos¶
Spark documentation: Running Spark on Mesos