Sparkling Water

Version 2.1.1

Integrating worlds of H2O and Spark

  Download Run on Hadoop Run on Standalone Cluster Use from Maven R/Python/Spark

Get started with Sparkling Water in a few easy steps

1. Download Spark (if not already installed) from the Spark Downloads Page

Choose Spark release : 2.1.0
Choose a package type: Pre-built for Hadoop 2.4 and later

2. Point SPARK_HOME to the existing installation of Spark and export variable MASTER.

export SPARK_HOME="/path/to/spark/installation"
# To launch a local Spark cluster with 3 worker nodes with 2 cores and 1g per node.
export MASTER="local[*]"

3. From your terminal, run:

cd ~/Downloads
unzip sparkling-water-2.1.1.zip
cd sparkling-water-2.1.1
bin/sparkling-shell --conf "spark.executor.memory=1g"

4. Create an H2O cloud inside the Spark cluster:

import org.apache.spark.h2o._
val h2oContext = H2OContext.getOrCreate(sc)
import h2oContext._

5. Follow this demo, which imports airlines and weather data and runs predictions on delays.

Integration info