Sparkling Water users

Sparkling Water is a gradle project with the following submodules:

  • Core: Implementation of H2OContext, H2ORDD, and all technical integration code.

  • Examples: Application, demos, and examples.

  • ML: Implementation of MLlib pipelines for H2O-3 algorithms.

  • Assembly: This creates “fatJar” (composed of all other modules).

  • py: Implementation of (H2O-3) Python binding to Sparkling Water.

The best way to get started is to modify the core module or create a new module (which extends the project).

Note

Sparkling Water is only supported with the latest version of H2O-3.

Sparkling Water is versioned according to the Spark versioning, so make sure to use the Sparkling Water version that corresponds to your installed version of spark.

Getting started with Sparking Water

This section contains links that will help you get started using Sparkling Water.

Download Sparkling Water

  1. Navigate to the Downloads page.

  2. Click Sparkling Water or scroll down to the Sparkling Water section.

  3. Select the version of Spark you have to download the corresponding version of Sparkling Water.

Sparkling Water documentation

The documentation for Sparkling Water is separate from the H2O-3 user guide. Read this documentation to get started with Sparkling Water.

Sparkling Water tutorials

This section contains demos and examples showcasing Sparkling Water.

PySparkling

PySparkling can be installed by downloading and running the PySparkling shell or by using pip. PySparkling can also be installed from the PyPI repository. Follow the instructions for how to install PySparkling on the Download page for Sparkling Water.

PySparkling documentation

Documentation for PySparkling is available for the following versions:

RSparkling

The RSparkling R package is an extension package for sparklyr that creates an R front-end for the Sparkling Water package from H2O-3. This provides an interface to H2O-3’s high performance, distributed machine learning algorithms on Spark using R.

This package implements basic functionality by creating an H2OContext, showing the H2O Flow interface, and converting between Spark DataFrames. The main purpose of this package is to provide a connector between sparklyr and H2O-3’s machine learning algorithms.

The RSparkling package uses sparklyr for Spark job deployment and initialization of Sparkling Water. After that, you can use the regular H2O R package for modeling.

RSparkling documentation

Documentation for RSparkling is available for the following versions: