Get started with Sparkling Water in a few easy steps
1. Download Spark if not already installed from: Spark Downloads Page
Chose Spark release : 1.2.0
Chose a package type: Pre-built for Hadoop 2.4 and later
2. Download Sparkling Water and point it to the existing installation of Spark by setting the SPARK_HOME environment variable:
export SPARK_HOME='/path/to/spark/installation'
3. From your terminal, run:
cd ~/Downloads
cd sparkling-water-1.2.6
4. Create H2O cloud inside Spark cluster:
import org.apache.spark.h2o._
val h2oContext = new H2OContext(sc).start()
// Or if you know the number of Spark workers:
// val h2oContext = new H2OContext(sc).start( <number of Spark workers> )
import h2oContext._
5. Follow this demo, which imports airlines and weather data and runs predictions on delays.
Launch Sparkling Water on Hadoop using Yarn.
1. Download Spark if not already installed from: Spark Downloads Page.
Chose Spark release : 1.2.0
Chose a package type: Pre-built for Hadoop 2.4 and later
2. Download Sparkling Water and point it to the existing installation of Spark by setting the SPARK_HOME environment variable:
export SPARK_HOME='/path/to/spark/installation'
3. Set the HADOOP_CONF_DIR and Spark MASTER environmental variables.
export HADOOP_CONF_DIR=/etc/hadoop/conf
export MASTER="yarn-client"
4. Use spark-submit to launch Sparkling Shell on YARN.
cd sparkling-water-1.2.6/
bin/sparkling-shell --num-executors 3 --executor-memory 2g --master yarn-client
5. Create H2O cloud inside Spark cluster:
import org.apache.spark.h2o._
val h2oContext = new H2OContext(sc).start()
import h2oContext._
Launch H2O on a Standalone Spark Cluster
1. Download Spark if not already installed from: Spark Downloads Page.
Chose Spark release : 1.2.0
Chose a package type: Pre-built for Hadoop 2.4 and later
2. Download Sparkling Water and point it to the existing installation of Spark by setting the SPARK_HOME environment variable:
export SPARK_HOME='/path/to/spark/installation'
3. From your terminal, run:
cd ~/Downloads
cd sparkling-water-1.2.6
export MASTER="spark://localhost:7077"
4. Create H2O cloud inside Spark cluster:
import org.apache.spark.h2o._
val h2oContext = new H2OContext(sc).start()
// Or if you know the number of Spark workers:
// val h2oContext = new H2OContext(sc).start( <number of Spark workers> )
import h2oContext._
Gradle-style specification for Maven artifacts
See the h2o-droplets github repository for a working example.
repositories {
dependencies {
compile "ai.h2o:sparkling-water-core_2.10:1.2.6"
See Maven Central for artifact details.