Sparkling Water

Version 1.3.1

Integrating worlds of H2O and Spark

  Download and Run Install on Hadoop Run Standalone Sparkling Water Use from Maven

Get started with Sparkling Water in a few easy steps

1. Download Spark if not already installed from: Spark Downloads Page

Chose Spark release : 1.3.1
Chose a package type: Pre-built for Hadoop 2.4 and later

2. Download Sparkling Water and point it to the existing installation of Spark by setting the SPARK_HOME environment variable:

export SPARK_HOME='/path/to/spark/installation'

3. From your terminal, run:

cd ~/Downloads
unzip sparkling-water-1.3.1.zip
cd sparkling-water-1.3.1
bin/sparkling-shell

4. Create H2O cloud inside Spark cluster:

import org.apache.spark.h2o._
val h2oContext = new H2OContext(sc).start()
// Or if you know the number of Spark workers:
// val h2oContext = new H2OContext(sc).start( <number of Spark workers> )
import h2oContext._

5. Follow this demo, which imports airlines and weather data and runs predictions on delays.

Launch Sparkling Water on Hadoop using Yarn.

1. Download Spark if not already installed from: Spark Downloads Page.

Chose Spark release : 1.3.1
Chose a package type: Pre-built for Hadoop 2.4 and later

2. Download Sparkling Water and point it to the existing installation of Spark by setting the SPARK_HOME environment variable:

wget /sparkling-water-1.3.1.zip
export SPARK_HOME='/path/to/spark/installation'

3. Set the HADOOP_CONF_DIR and Spark MASTER environmental variables.

export HADOOP_CONF_DIR=/etc/hadoop/conf
export MASTER="yarn-client"

4. Use spark-submit to launch Sparkling Shell on YARN.

unzip sparkling-water-1.3.1.zip
cd sparkling-water-1.3.1/
bin/sparkling-shell --num-executors 3 --executor-memory 2g --master yarn-client

5. Create H2O cloud inside Spark cluster:

import org.apache.spark.h2o._
val h2oContext = new H2OContext(sc).start()
import h2oContext._

Launch H2O on a Standalone Spark Cluster

1. Download Spark if not already installed from: Spark Downloads Page.

Chose Spark release : 1.3.1
Chose a package type: Pre-built for Hadoop 2.4 and later

2. Download Sparkling Water and point it to the existing installation of Spark by setting the SPARK_HOME environment variable:

export SPARK_HOME='/path/to/spark/installation'

3. From your terminal, run:

cd ~/Downloads
unzip sparkling-water-1.3.1.zip
cd sparkling-water-1.3.1
bin/launch-spark-cloud.sh
export MASTER="spark://localhost:7077"
bin/sparkling-shell

4. Create H2O cloud inside Spark cluster:

import org.apache.spark.h2o._
val h2oContext = new H2OContext(sc).start()
// Or if you know the number of Spark workers:
// val h2oContext = new H2OContext(sc).start( <number of Spark workers> )
import h2oContext._