Sparkling Water

Version 3.26.2-2.4

Integrating worlds of H2O and Spark

  Download Run on Hadoop Run on Standalone Cluster Kluster Use from Maven RSparkling PySparkling Spark Package
Download Sparkling Water

Integration info

Get started with Sparkling Water in a few easy steps

1. Download Spark (if not already installed) from the Spark Downloads Page

Choose Spark release : 2.4.* except 2.4.2
Choose a package type: Pre-built for Hadoop 2.7 and later

2. Point SPARK_HOME to the existing installation of Spark and export variable MASTER.

export SPARK_HOME="/path/to/spark/installation"
# To launch a local Spark cluster.
export MASTER="local[*]"

3. From your terminal, run:

cd ~/Downloads
unzip sparkling-water-3.26.2-2.4.zip
cd sparkling-water-3.26.2-2.4
bin/sparkling-shell --conf "spark.executor.memory=1g"

4. Create an H2O cloud inside the Spark cluster:

import org.apache.spark.h2o._
val h2oContext = H2OContext.getOrCreate(spark)
import h2oContext._

5. Follow this demo, which imports airlines and weather data and runs predictions on delays.