Download Sparkling Water 3.32.1.2-1-3.0

Download Run on Hadoop Run on Standalone Cluster Kluster Use from Maven RSparkling PySparkling Spark Package

Download Sparkling Water

Integration info

H2O version: 3.32.1.2 zipf (documentation)
Spark version: 3.0.* (documentation)
Sparkling Water: Documentation Changelog
Release date: Tue May 04 15:47:37 UTC 2021

Get started with Sparkling Water in a few easy steps

1. Download Spark (if not already installed) from the Spark Downloads Page

Choose Spark release : 3.0.*
Choose a package type: Pre-built for Hadoop 2.7 and later

2. Point SPARK_HOME to the existing installation of Spark and export variable MASTER.

export SPARK_HOME="/path/to/spark/installation"
# To launch a local Spark cluster.
export MASTER="local[*]"

3. From your terminal, run:

cd ~/Downloads
unzip sparkling-water-3.32.1.2-1-3.0.zip
cd sparkling-water-3.32.1.2-1-3.0
bin/sparkling-shell --conf "spark.executor.memory=1g"

4. Create an H₂O cloud inside the Spark cluster:

import ai.h2o.sparkling._
val h2oContext = H2OContext.getOrCreate()
import h2oContext._

5. Follow this demo, which imports airlines and weather data and runs predictions on delays.

Download Sparkling Water

Integration info

H2O version: 3.32.1.2 zipf (documentation)
Spark version: 3.0.* (documentation)
Sparkling Water: Documentation Changelog
Release date: Tue May 04 15:47:37 UTC 2021

Launch Sparkling Water on Hadoop using Yarn.

1. Download Spark (if not already installed) from the Spark Downloads Page.

Choose Spark release : 3.0.*
Choose a package type: Pre-built for Hadoop 2.7 and later

2. Point SPARK_HOME to an existing installation of Spark:

export SPARK_HOME='/path/to/spark/installation'

3. Set the HADOOP_CONF_DIR and Spark MASTER environmental variables.

export HADOOP_CONF_DIR=/etc/hadoop/conf
export MASTER="yarn"

4. Download Spark and use sparkling-shell to launch Sparkling Shell on YARN.

wget /sparkling-water-3.32.1.2-1-3.0.zip
unzip sparkling-water-3.32.1.2-1-3.0.zip
cd sparkling-water-3.32.1.2-1-3.0/
bin/sparkling-shell --num-executors 3 --executor-memory 2g --master yarn --deploy-mode client

5. Create an H₂O cloud inside the Spark cluster:

import ai.h2o.sparkling._
val h2oContext = H2OContext.getOrCreate()
import h2oContext._

Download Sparkling Water

Integration info

H2O version: 3.32.1.2 zipf (documentation)
Spark version: 3.0.* (documentation)
Sparkling Water: Documentation Changelog
Release date: Tue May 04 15:47:37 UTC 2021

Launch H2O on a Standalone Spark Cluster

1. Download Spark (if not already installed) from the Spark Downloads Page.

Choose Spark release : 3.0.*
Choose a package type: Pre-built for Hadoop 2.7 and later

2. Point SPARK_HOME to an existing installation of Spark:

export SPARK_HOME='/path/to/spark/installation'

3. From your terminal, run:

cd ~/Downloads
unzip sparkling-water-3.32.1.2-1-3.0.zip
cd sparkling-water-3.32.1.2-1-3.0
bin/launch-spark-cloud.sh
export MASTER="spark://localhost:7077"
bin/sparkling-shell

4. Create an H₂O cloud inside the Spark cluster:

import ai.h2o.sparkling._
val h2oContext = H2OContext.getOrCreate()
import h2oContext._

Integration info

H2O version: 3.32.1.2 zipf (documentation)
Spark version: 3.0.* (documentation)
Sparkling Water: Documentation Changelog
Release date: Tue May 04 15:47:37 UTC 2021

Kluster

Kluster mode of Sparkling Water supports connection to external H2O clusters (standalone/hadoop). The H2O cluster needs to be started with a corresponding H2O, which can be downloaded as below.

1. Download and unpack Sparkling Water distribution

2. Download corresponding H2O driver for your Hadoop distribution (e.g., hdp2.2, cdh5.4) or standalone one:

export H2O_DRIVER_JAR=$(/path/to/sparkling-water-3.32.1.2-1-3.0/bin/get-h2o-driver.sh hdp2.2)

3. Set path to sparkling-water-assembly-extensions-3.32.1.2-1-3.0-all.jar which is bundled in Sparkling Water archive:

SW_EXTENSIONS_ASSEMBLY=/path/to/sparkling-water-3.32.1.2-1-3.0/jars/sparkling-water-assembly-extensions-3.32.1.2-1-3.0-all.jar

4. Start H2O cluster on Hadoop:

hadoop -jar $H2O_DRIVER_JAR -libjars $SW_EXTENSIONS_ASSEMBLY -sw_ext_backend -jobname test -nodes 3 -mapperXmx 6g

5. In your Sparkling Water application, create H2OContext:

Scala

import ai.h2o.sparkling._
val conf = new H2OConf().setExternalClusterMode().useManualClusterStart().setCloudName("test")
val hc = H2OContext.getOrCreate(conf)

Python

from pysparkling import *
conf = H2OConf().setExternalClusterMode().useManualClusterStart().setCloudName("test")
hc = H2OContext.getOrCreate(conf)

List of supported Hadoop distributions: cdh5.4 cdh5.5 cdh5.6 cdh5.7 cdh5.8 cdh5.9 cdh5.10 cdh5.13 cdh5.14 cdh5.15 cdh5.16 cdh6.0 cdh6.1 cdh6.2 cdh6.3 cdp7.0 cdp7.1 cdp7.2 hdp2.2 hdp2.3 hdp2.4 hdp2.5 hdp2.6 hdp3.0 hdp3.1 mapr4.0 mapr5.0 mapr5.1 mapr5.2 mapr6.0 mapr6.1 iop4.2

For more info, please follow Kluster documentation.

Integration info

H2O version: 3.32.1.2 zipf (documentation)
Spark version: 3.0.* (documentation)
Sparkling Water: Documentation Changelog
Release date: Tue May 04 15:47:37 UTC 2021

RSparkling

RSparkling is a R client for the Sparkling Water applications. To use it:

1. Download and unpack Sparkling Water distribution

cd ~/Downloads
unzip sparkling-water-3.32.1.2-1-3.0.zip
cd sparkling-water-3.32.1.2-1-3.0

Now, continue inside R or RStudio and prepare the environment.

2. Install RSparkling dependency, SparklyR:

install.packages("sparklyr")

3. Install Spark:

library(sparklyr)
spark_install(version = "3.0.2")

4. Install H2O of correct version:

install.packages("h2o", type = "source", repos = "https://h2o-release.s3.amazonaws.com/h2o/rel-zipf/2/R")

5. Install RSparkling for Sparkling Water 3.32.1.2-1-3.0:

From S3 repository:

install.packages("rsparkling", type = "source", repos = "http://h2o-release.s3.amazonaws.com/sparkling-water/spark-3.0/3.32.1.2-1-3.0/R")

From downloaded distribution:

# rsparkling_3.32.1.2-1-3.0.tar.gz is available at the downloaded distribution.
install.packages(repos=NULL, type="source", "rsparkling_3.32.1.2-1-3.0.tar.gz")

6. Initialize RSparkling

library(rsparkling)

7. Connect to Spark

sc <- spark_connect(master = "local", version = "3.0.2")

8. Now, H2OContext is available and we can use any H2O features available in R.

hc <- H2OContext.getOrCreate()

For more and detailed information, please follow the installation and usage instructions on RSparkling page

H2O R Client

Once you have H2OContext available in RSParkling, any commands available in the R client can be used. For more information, please visit H2O-R page.

Sparkling Water

Integration info

Get started with Sparkling Water in a few easy steps

Integration info

Launch Sparkling Water on Hadoop using Yarn.

Integration info

Launch H2O on a Standalone Spark Cluster

Integration info

Kluster

Integration info

Gradle-style specification for Maven artifacts

Integration info

RSparkling

H2O R Client

Integration info

PySparkling

PySparkling installed from PyPi repository

H2O Python Client

Integration info

Sparkling Water as Spark Package

Documentation