Get started with Sparkling Water in a few easy steps
1. Download Spark if not already installed from: Spark Downloads Page
2. Download Sparkling Water and point it to the existing installation of Spark by setting the SPARK_HOME environment variable:
export SPARK_HOME='/path/to/spark/installation'
3. From your terminal, run:
cd ~/Downloads
unzip sparkling-water-1.2.6.zip
cd sparkling-water-1.2.6
bin/sparkling-shell
4. Create H2O cloud inside Spark cluster:
import org.apache.spark.h2o._
val h2oContext = new H2OContext(sc).start()
// Or if you know the number of Spark workers:
// val h2oContext = new H2OContext(sc).start( <number of Spark workers> )
import h2oContext._
5. Follow this demo, which imports airlines and weather data and runs predictions on delays.
Launch Sparkling Water on Hadoop using Yarn.
1. Download Spark if not already installed from: Spark Downloads Page.
2. Download Sparkling Water and point it to the existing installation of Spark by setting the SPARK_HOME environment variable:
wget /sparkling-water-1.2.6.zip
export SPARK_HOME='/path/to/spark/installation'
3. Set the HADOOP_CONF_DIR and Spark MASTER environmental variables.
export HADOOP_CONF_DIR=/etc/hadoop/conf
export MASTER="yarn-client"
4. Use spark-submit to launch Sparkling Shell on YARN.
unzip sparkling-water-1.2.6.zip
cd sparkling-water-1.2.6/
bin/sparkling-shell --num-executors 3 --executor-memory 2g --master yarn-client
5. Create H2O cloud inside Spark cluster:
import org.apache.spark.h2o._
val h2oContext = new H2OContext(sc).start()
import h2oContext._
Launch H2O on a Standalone Spark Cluster
1. Download Spark if not already installed from: Spark Downloads Page.
2. Download Sparkling Water and point it to the existing installation of Spark by setting the SPARK_HOME environment variable:
export SPARK_HOME='/path/to/spark/installation'
3. From your terminal, run:
cd ~/Downloads
unzip sparkling-water-1.2.6.zip
cd sparkling-water-1.2.6
bin/launch-spark-cloud.sh
export MASTER="spark://localhost:7077"
bin/sparkling-shell
4. Create H2O cloud inside Spark cluster:
import org.apache.spark.h2o._
val h2oContext = new H2OContext(sc).start()
// Or if you know the number of Spark workers:
// val h2oContext = new H2OContext(sc).start( <number of Spark workers> )
import h2oContext._