General

H2O System Requirements

64-bit Java 1.6 or higher (Java 1.7 is fine, for example)

While a minimum of 2g ram is needed on the machine where H2O will be running, the amount of memory needed for H2O to run efficiently is dependent on the size and nature of data, and the algorithm employed. A good heuristic is that the amount of memory available be at least four times the size of the data being analyzed.

A reasonably modern web browser (for example, the latest version of Firefox, Safari or IE.)

Users who are running H2O on a server must ensure that the data are available to that server (either via their network settings, or because the data are on the same server.) Users who are running H2O on a laptop must ensure that the data are available to that laptop. The specification of network settings is beyond the scope of this documentation. Advanced users may find additional documentation on running in specialized environments helpful: Getting Started with Development in H2O.

For multinode clusters utilizing several servers, it is strongly reccomended that all servers and nodes be symmetric and identically configured. For example, allocating different amounts of memory to nodes in the same cluster can adversely impact performance.

User Interaction

Users have several options for interacting with H2O.

A web browser can be used to communicate directly with the embedded web server inside any of the H2O nodes. All H2O nodes contain an embedded web server, and they are all equivalent peers.

Users can also choose to interface with the H2O embedded web server via the REST API. The REST API accepts HTTP requests and returns JSON-formatted responses.

H2O can also be used via the H2O for R package, available from 0xdata. This package uses H2O’s REST API under the hood. Users can install the R package from the H2O maintained cran. The H2O zip file, and R+ H2O installation details are available at: http://0xdata.com/downloadtable/.

Data sets are not transmitted directly through the REST API. Instead, the user sends a command (containing an HDFS path to the data set, for example) either through the browser based GUI or via the REST API to ingest data from disk.

The data set is assigned a KEY in H2O that the user may refer to in the future commands to the web server.

How Data is Ingested into H2O

For step by step instructions on how to carry out data ingestion and parse, please see the Data section of this User Guide: Data.

Supported input data file formats include CSV, Gzip-compressed CSV, MS Excel (XLS), ARFF, SVM-Light, HIVE file format, and others.

Using H2O

Step by step instructions on how to use each of the algorithms and tools can be found in tutorials . Users have a variety of options for accessing and running H2O. For instructions on how to get started using H2O (for example through R, using Java, or via git-hub), please see the Quick Start Guides, and Walk Through Tutorials. New users may also find the Glossary useful for familiarizing themselves with H2O’s computing and statistics terms.