Get started with H2O in 3 easy steps
1. Download H2O. This is a zip file that contains everything you need to get started.
2. From your terminal, run:
cd ~/Downloads
unzip h2o-3.38.0.3.zip
cd h2o-3.38.0.3
java -jar h2o.jar
3. Point your browser to http://localhost:54321
Use H2O directly from Python
1. Prerequisite: Python 2.7.x, 3.5.x to 3.7.x
2. Install dependencies (prepending with `sudo` if needed):
pip install requests
pip install tabulate
pip install future
Optional: install `matplotlib` (required for plotting in Python):
pip install matplotlib
At the command line, copy and paste these commands one line at a time:
# The following command removes the H2O module for Python.
pip uninstall h2o
# Next, use pip to install this version of the H2O Python module.
pip install /Python/h2o-3.38.0.3-py2.py3-none-any.whl
Conda Installation
Available at https://anaconda.org/h2oai/h2o/
To install this package with conda run:
conda install -c h2oai h2o
Use H2O directly from R
Copy and paste these commands into R one line at a time:
# The following two commands remove any previously installed H2O packages for R.
if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }
if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }
# Next, we download packages that H2O depends on.
pkgs <- c("RCurl","jsonlite")
for (pkg in pkgs) {
if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) }
}
# Now we download, install and initialize the H2O package for R.
install.packages("h2o", type="source", repos="/R")
# Finally, let's load H2O and start up an H2O cluster
library(h2o)
h2o.init()
Run H2O on Hadoop in just 3 steps
1. Download H2O for your version of Hadoop. This is a zip file that contains everything you need to get started.
2. Unpack the zip file and launch a 6g instance of H2O:
unzip h2o-3.38.0.3-*.zip
cd h2o-3.38.0.3-*
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 6g
3. Point your browser to H2O (see "Open H2O Flow in your web browser" in the output below):
Determining driver host interface for mapper->driver callback...
[Possible callback IP address: 172.16.2.181]
[Possible callback IP address: 127.0.0.1]
...
Waiting for H2O cluster to come up...
H2O node 172.16.2.188:54321 requested flatfile
Sending flatfiles to nodes...
[Sending flatfile to node 172.16.2.188:54321]
H2O node 172.16.2.188:54321 reports H2O cluster size 1
H2O cluster (1 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)
Open H2O Flow in your web browser: http://172.16.2.188:54321
(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...
Gradle-style specification for Maven artifacts
See the h2o-droplets github repository for a working example.
def h2oProjectVersion = "3.38.0.3" repositories { maven { url "/maven/repo/" } } dependencies { compile "ai.h2o:h2o-algos:${h2oProjectVersion}" compile "ai.h2o:h2o-app:${h2oProjectVersion}" compile "ai.h2o:h2o-core:${h2oProjectVersion}" compile "ai.h2o:h2o-genmodel:${h2oProjectVersion}" compile "ai.h2o:h2o-persist-hdfs:${h2oProjectVersion}" compile "ai.h2o:h2o-web:${h2oProjectVersion}" }Setup H2O on Kubernetes using Helm
Helm can be used to deploy H2O into a kubernetes cluster. Helm requires the KUBECONFIG environment variable to be set up properly, or stating the kubeconfig destination explicitly. Please refer to Helm's documentation for further information.
helm repo add h2o https://charts.h2o.ai
helm install basic-h2o h2o/h2o
helm test basic-h2o
There are various settings and modifications available. To inspect the configuration options available, use the "helm inspect values h2o/h2o --version 3.38.0.3" command.
Setup H2O on Kubernetes with kubectl
1. Set-up kubernetes cluster and kubectl.
2. (Optional) Adjust the 'default' namespace in the following YAML, if required.
apiVersion: apps/v1 kind: StatefulSet metadata: name: h2o-cluster-stateful-set namespace: default spec: serviceName: h2o-service podManagementPolicy: "Parallel" replicas: 1 selector: matchLabels: app: h2o-cluster template: metadata: labels: app: h2o-cluster spec: containers: - name: h2o-cluster image: 'h2oai/h2o-open-source-k8s:docker-image-version' command: ["/bin/bash", "-c", "java -XX:+UseContainerSupport -XX:MaxRAMPercentage=50 -jar /opt/h2oai/h2o-3/h2o.jar"] ports: - containerPort: 54321 protocol: TCP readinessProbe: httpGet: path: /kubernetes/isLeaderNode port: 8081 initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 1 resources: limits: cpu: '1' memory: 1Gi requests: cpu: '1' memory: 1Gi env: - name: H2O_KUBERNETES_SERVICE_DNS value: h2o-cluster-service.default.svc.cluster.local - name: H2O_NODE_LOOKUP_TIMEOUT value: '180' - name: H2O_NODE_EXPECTED_COUNT value: '1' - name: H2O_KUBERNETES_API_PORT value: '8081' --- apiVersion: v1 kind: Service metadata: name: h2o-cluster-service namespace: default spec: type: ClusterIP clusterIP: None selector: app: h2o-cluster ports: - protocol: TCP port: 80 targetPort: 54321
Environment variables:
H2O_KUBERNETES_SERVICE_DNS - [MANDATORY] Crucial for the clustering to work. The format usually follows the {service-name}.{project-name}.svc.cluster.local pattern. This setting enables H2O node discovery via DNS. It must be modified to match the name of the headless service created. Also, pay attention to the rest of the address to match the specifics of your Kubernetes implementation.
H2O_NODE_LOOKUP_TIMEOUT - [OPTIONAL] Node lookup constraint. Time before the node lookup is ended.
H2O_NODE_EXPECTED_COUNT - [OPTIONAL] Node lookup constraint. Expected number of H2O pods to be discovered.
H2O_KUBERNETES_API_PORT - [OPTIONAL] Port for Kubernetes API checks and probes to listen on. Defaults to 8080.
3. Issue "kubectl apply -f filename.yaml" to deploy H2O into Kubernetes.
4. (Optional) Adjust the YAML file to spawn more nodes or allocate more resources for the H2O cluster.