Welcome to H2O 3
==================
New Users
---------
If you're just getting started with H2O, here are some links to help you
learn more:
- `Recommended Systems `_: This one-page PDF provides a basic overview of
the operating systems, languages and APIs, Hadoop resource manager
versions, cloud computing environments, browsers, and other resources
recommended to run H2O. At a minimum, we recommend the following for
compatibility with H2O:
- **Operating Systems**: Windows 7 or later; OS X 10.9 or later, Ubuntu
12.04, or RHEL/CentOS 6 or later
- **Languages**: Java 7 or later; Scala v 2.10 or later; R v.3 or
later; Python 2.7.x or 3.5.x (Scala, R, and Python are not required
to use H2O unless you want to use H2O in those environments, but Java
is always required)
- **Browsers**: Latest version of Chrome, Firefox, Safari, or Internet
Explorer (An internet browser is required to use H2O's web UI, Flow)
- **Hadoop**: Cloudera CDH 5.2 or later (5.3 is recommended); MapR
v.3.1.1 or later; Hortonworks HDP 2.1 or later (Hadoop is not
required to run H2O unless you want to deploy H2O on a Hadoop
cluster)
- **Spark**: v 1.3 or later (Spark is only required if you want to run
`Sparkling Water `__)
- `Downloads page `_: First things first - download a copy of H2O here by
selecting a build under "Download H2O" (the "Bleeding Edge" build
contains the latest changes, while the latest alpha release is a more
stable build), then use the installation instruction tabs to install
H2O on your `client of choice `_
(standalone, R, Python, Hadoop, or Maven).
For first-time users, we recommend downloading the latest alpha
release and the default standalone option (the first tab) as the
installation method. Make sure to install Java if it is not already
installed.
- **Tutorials**: To see a step-by-step example of our algorithms in
action, select a model type from the following list:
- `Deep Learning `_
- `Gradient Boosting Machine (GBM) `_
- `Generalized Linear Model (GLM) `_
- `Kmeans `_
- `Distributed Random Forest (DRF) `_
- `Getting Started with Flow `_: This document describes our new intuitive
web interface, Flow. This interface is similar to IPython notebooks,
and allows you to create a visual workflow to share with others.
- `Launch from the command line `_: This document describes some of the additional options that you can configure when launching H2O (for example, to specify a different directory for saved Flow data, allocate more memory, or use a flatfile for quick configuration of a cluster).
- `Algorithms `_: This document describes the science behind our algorithms and provides a detailed, per-algo view of each model type.
Experienced Users
-----------------
If you've used previous versions of H2O, the following links will help
guide you through the process of upgrading to H2O 3.0.
- `Recommended Systems `_: This one-page PDF provides a basic overview of
the operating systems, languages and APIs, Hadoop resource manager
versions, cloud computing environments, browsers, and other resources
recommended to run H2O.
- `Migration Guide `_: This document provides a comprehensive guide to
assist users in upgrading to H2O 3.0. It gives an overview of the
changes to the algorithms and the web UI introduced in this version
and describes the benefits of upgrading for users of R, APIs, and
Java.
- `Porting R Scripts `_: This document is designed to assist users who have
created R scripts using previous versions of H2O. Due to the many
improvements in R, scripts created using previous versions of H2O
need some revision to work with H2O 3.0. This document provides a
side-by-side comparison of the changes in R for each algorithm, as
well as overall structural enhancements R users should be aware of,
and provides a link to a tool that assists users in upgrading their
scripts.
- `Recent Changes `_: This document describes the most recent changes in
the latest build of H2O. It lists new features, enhancements
(including changed parameter default values), and bug fixes for each
release, organized by sub-categories such as Python, R, and Web UI.
- `H2O Classic vs H2O 3.0 `_: This document presents a side-by-side
comparison of H2O 3.0 and the previous version of H2O. It compares
and contrasts the features, capabilities, and supported algorithms
between the versions. If you'd like to learn more about the benefits
of upgrading, this is a great source of information.
- `Contributing code `_: If you're interested in contributing code to H2O,
we appreciate your assistance! This document describes how to access
our list of Jiras that are suggested tasks for contributors and how
to contact us.
Sparkling Water Users
---------------------
Sparkling Water is a gradle project with the following submodules:
- Core: Implementation of H2OContext, H2ORDD, and all technical
integration code
- Examples: Application, demos, examples
- ML: Implementation of MLLib pipelines for H2O algorithms
- Assembly: Creates "fatJar" composed of all other modules
- py: Implementation of (h2o) Python binding to Sparkling Water
The best way to get started is to modify the core module or create a new
module, which extends a project.
Users of our Spark-compatible solution, Sparkling Water, should be aware
that Sparkling Water is only supported with the latest version of H2O.
For more information about Sparkling Water, refer to the following
links.
Sparkling Water is versioned according to the Spark versioning, so make
sure to use the Sparkling Water version that corresponds to the
installed version of Spark:
- Use `Sparkling Water
1.2 `__
for Spark 1.2
- Use `Sparkling Water
1.3 `__
for Spark 1.3+
- Use `Sparkling Water
1.4 `__
for Spark 1.4
- Use `Sparkling Water
1.5 `__
for Spark 1.5
Getting Started with Sparkling Water
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- `Download Sparkling Water `_: Go here to download Sparkling Water.
- `Sparkling Water Development Documentation `_: Read this document first
to get started with Sparkling Water.
- `Launch on Hadoop and Import from HDFS `_: Go here to learn how to start
Sparkling Water on Hadoop.
- `Sparkling Water Tutorials `_: Go here for demos and examples.
- `Sparkling Water K-means Tutorial `_: Go here to view a demo that uses
Scala to create a K-means model.
- `Sparkling Water GBM Tutorial `_: Go here to view a demo that uses
Scala to create a GBM model.
- `Sparkling Water on YARN `_: Follow these instructions to run Sparkling Water on a YARN cluster.
- `Building Applications on top of H2O `_ : This short tutorial describes project building and demonstrates the capabilities of Sparkling Water using Spark Shell to build a Deep Learning model.
- `Sparkling Water FAQ `_: This FAQ provides answers to many common
questions about Sparkling Water.
- `Connecting RStudio to Sparkling Water `_: This illustrated tutorial describes how to use RStudio to connect to Sparkling Water.
Sparkling Water Blog Posts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- `How Sparkling Water Brings H2O to Spark `_
- `H2O - The Killer App on Spark `_
- `In-memory Big Data: Spark + H2O `_
Sparkling Water Meetup Slide Decks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- `Sparkling Water Meetups `_
- `Interactive Session on Sparkling Water `_
- `Sparkling Water Hands-On `_
- `Additional Sparkling Water Meetup meeting notes `_
PySparkling
~~~~~~~~~~~~
**Note**: PySparkling requires `Sparkling Water 1.5 `__ or later.
H2O's PySparkling package is not available through ``pip``. (There is
`another `__ similarly-named
package.) H2O's PySparkling package requires
`EasyInstall `__.
To install H2O's PySparkling package, use the egg file included in the
distribution.
1. Download `Spark 1.5.1 `__.
2. Set the ``SPARK_HOME`` and ``MASTER`` variables as described on the
`Downloads
page `__.
3. Download `Sparkling Water
1.5 `__
4. In the unpacked Sparkling Water directory, run the following command:
``easy_install --upgrade sparkling-water-1.5.6/py/dist/pySparkling-1.5.6-py2.7.egg``
Python Users
--------------
Pythonistas will be glad to know that H2O now provides support for this
popular programming language. Python users can also use H2O with IPython
notebooks. For more information, refer to the following links.
- Click
`here `__
to view instructions on how to use H2O with Python.
- `Python readme `_: This document describes how to setup and install the
prerequisites for using Python with H2O.
- `Python docs <../h2o-py/docs/index.html>`_: This document represents the definitive guide to using
Python with H2O.
- `Python Parity `_: This document is is a list of Python capabilities that
were previously available only through the H2O R interface but are
now available in H2O using the Python interface.
- `Grid Search in Python `_: This notebook demonstrates the use of grid search in Python.
R Users
-------
Don't worry, R users - we still provide R support in the latest version
of H2O, just as before. The R components of H2O have been cleaned up,
simplified, and standardized, so the command format is easier and more
intuitive. Due to these improvements, be aware that any scripts created
with previous versions of H2O will need some revision to be compatible
with the latest version.
We have provided the following helpful resources to assist R users in
upgrading to the latest version, including a document that outlines the
differences between versions and a tool that reviews scripts for
deprecated or renamed parameters.
Currently, the only version of R that is known to be incompatible with
H2O is R version 3.1.0 (codename "Spring Dance"). If you are using that
version, we recommend upgrading the R version before using H2O.
To check which version of H2O is installed in R, use
``versions::installed.versions("h2o")``.
- Click
`here `__
to view instructions for using H2O with R.
- `R User Documentation <../h2o-r/h2o_package.pdf>`_: This document contains all commands in the H2O
package for R, including examples and arguments. It represents the
definitive guide to using H2O in R.
- `Porting R Scripts `_: This document is designed to assist users who have
created R scripts using previous versions of H2O. Due to the many
improvements in R, scripts created using previous versions of H2O
will not work. This document provides a side-by-side comparison of
the changes in R for each algorithm, as well as overall structural
enhancements R users should be aware of, and provides a link to a
tool that assists users in upgrading their scripts.
- `Connecting RStudio to Sparkling Water `_: This illustrated tutorial
describes how to use RStudio to connect to Sparkling Water.
Ensembles
---------
Ensemble machine learning methods use multiple learning algorithms to
obtain better predictive performance.
- `H2O Ensemble GitHub repository `_: Location for the H2O Ensemble R
package.
- `Ensemble Documentation `_: This documentation provides more details on
the concepts behind ensembles and how to use them.
API Users
--------------
API users will be happy to know that the APIs have been more thoroughly
documented in the latest release of H2O and additional capabilities
(such as exporting weights and biases for Deep Learning models) have
been added.
REST APIs are generated immediately out of the code, allowing users to
implement machine learning in many ways. For example, REST APIs could be
used to call a model created by sensor data and to set up auto-alerts if
the sensor data falls below a specified threshold.
- `H2O 3 REST API Overview `_: This document describes how the REST API
commands are used in H2O, versioning, experimental APIs, verbs,
status codes, formats, schemas, payloads, metadata, and examples.
- `REST API Reference `_: This document represents the definitive guide to the H2O REST API.
- `REST API Schema Reference `_: This document represents the definitive guide to the H2O REST API schemas.
Java Users
--------------
For Java developers, the following resources will help you create your
own custom app that uses H2O.
- `H2O Core Java Developer Documentation <../h2o-core/javadoc/index.html>`_: The definitive Java API guide
for the core components of H2O.
- `H2O Algos Java Developer Documentation <../h2o-algos/javadoc/index.html>`_: The definitive Java API guide
for the algorithms used by H2O.
- `h2o-genmodel (POJO) Javadoc <../h2o-genmodel/javadoc/index.html>`_: Provides a step-by-step guide to creating and implementing POJOs in a Java application.
SDK Information
---------------
The Java API is generated and accessible from the `download
page `_.
- `Central
repository `_
- `View code on
Github `_
- `Apache
License `_
Developers
--------------
If you're looking to use H2O to help you develop your own apps, the
following links will provide helpful references.
For the latest version of IDEA IntelliJ, run ``./gradlew idea``, then
click **File > Open** within IDEA. Select the ``.ipr`` file in the
repository and click the **Choose** button.
For older versions of IDEA IntelliJ, run ``./gradlew idea``, then
**Import Project** within IDEA and point it to the `h2o-3 directory `_.
**Note**: This process will take longer, so we recommend using the
first method if possible.
For JUnit tests to pass, you may need multiple H2O nodes. Create a
"Run/Debug" configuration with the following parameters:
::
Type: Application
Main class: H2OApp
Use class path of module: h2o-app
After starting multiple "worker" node processes in addition to the JUnit
test process, they will cloud up and run the multi-node JUnit tests.
- `Recommended Systems `_: This one-page PDF provides a basic overview of
the operating systems, languages and APIs, Hadoop resource manager
versions, cloud computing environments, browsers, and other resources
recommended to run H2O.
- `Developer Documentation `_: Detailed instructions on how to build and
launch H2O, including how to clone the repository, how to pull from
the repository, and how to install required dependencies.
- Click
`here `__
to view instructions on how to use H2O with Maven.
- `Maven install `_: This page provides information on how to build a
version of H2O that generates the correct IDE files.
- `apps.h2o.ai `_: Apps.h2o.ai is designed to support application
developers via events, networking opportunities, and a new, dedicated
website comprising developer kits and technical specs, news, and
product spotlights.
- `H2O Droplet Project Templates `_: This page provides template info for projects
created in Java, Scala, or Sparkling Water.
- `H2O Scala API Developer Documentation <../h2o-scala/scaladoc/index.html>`_: The definitive Scala API guide
for H2O.
- `Hacking Algos `_: This blog post by Cliff walks you through building a
new algorithm, using K-Means, Quantiles, and Grep as examples.
- `KV Store Guide `_: Learn more about performance characteristics when
implementing new algorithms.
- `Contributing code `_: If you're interested in contributing code to H2O,
we appreciate your assistance! This document describes how to access
our list of Jiras that contributors can work on and how to contact
us. **Note**: To access this link, you must have an `Atlassian
account `__.