H2O Module

h2o.h2o Module

This module implements the communication REST layer for the python <-> H2O connection.

h2o.h2o.ddply(frame, cols, fun)[source]
h2o.h2o.deeplearning(x, y, validation_x=None, validation_y=None, **kwargs)[source]

Build a supervised Deep Learning model (kwargs are the same arguments that you can find in FLOW)

h2o.h2o.frame(key)[source]

Retrieve metadata for a key that points to a Frame.

Parameters:key – A pointer to a Frame in H2O.
Returns:Meta information on the frame
h2o.h2o.gbm(x, y, validation_x=None, validation_y=None, **kwargs)[source]

Build a Gradient Boosted Method model (kwargs are the same arguments that you can find in FLOW)

h2o.h2o.glm(x, y, validation_x=None, validation_y=None, **kwargs)[source]

Build a Generalized Linear Model (kwargs are the same arguments that you can find in FLOW)

h2o.h2o.import_file(path)[source]

Import a single file or collection of files.

Parameters:path – A path to a data file (remote or local).
Returns:A new H2OFrame
h2o.h2o.import_frame(path=None, vecs=None)[source]

Import a frame from a file (remote or local machine). If you run H2O on Hadoop, you can access to HDFS

Parameters:path – A path specifying the location of the data to import.
Returns:A new H2OFrame
h2o.h2o.init(ip='localhost', port=54321)[source]

Initiate an H2O connection to the specified ip and port.

Parameters:
  • ip – A IP address, default is “localhost”.
  • port – A port, default is 54321.
Returns:

None

h2o.h2o.kmeans(x, validation_x=None, **kwargs)[source]

Build a KMeans model (kwargs are the same arguments that you can find in FLOW)

h2o.h2o.locate(path)[source]

Search for a relative path and turn it into an absolute path. This is handy when hunting for data files to be passed into h2o and used by import file. Note: This function is for unit testing purposes only.

Parameters:path – Path to search for
Returns:Absolute path if it is found. None otherwise.
h2o.h2o.network_test()[source]
h2o.h2o.parse(setup, h2o_name, first_line_is_header=(-1, 0, 1))[source]

Trigger a parse; blocking; removeFrame just keep the Vec keys.

Parameters:
  • setup – The result of calling parse_setup.
  • h2o_name – The name of the H2O Frame on the back end.
  • first_line_is_header – -1 means data, 0 means guess, 1 means header.
Returns:

A new parsed object

h2o.h2o.parse_setup(rawkey)[source]
Parameters:rawkey – A collection of imported file keys
Returns:A ParseSetup “object”
h2o.h2o.rapids(expr)[source]

Fire off a Rapids expression.

Parameters:expr – The rapids expression (ascii string).
Returns:The JSON response of the Rapids execution
h2o.h2o.remove(key)[source]

Remove key from H2O.

Parameters:key – The key pointing to the object to be removed.
Returns:Void
h2o.h2o.run_test(sys_args, test_to_run)[source]
h2o.h2o.upload_file(path, destination_key='')[source]

Upload a dataset at the path given from the local machine to the H2O cluster.

Parameters:
  • path – A path specifying the location of the data to upload.
  • destination_key – The name of the H2O Frame in the H2O Cluster.
Returns:

A new H2OFrame

Example

Here is a small example (H2O on Hadoop) :

import h2o
h2o.init(ip="192.168.1.10", port=54321)
--------------------------  ------------------------------------
H2O cluster uptime:         2 minutes 1 seconds 966 milliseconds
H2O cluster version:        0.1.27.1064
H2O cluster name:           H2O_96762
H2O cluster total nodes:    4
H2O cluster total memory:   38.34 GB
H2O cluster total cores:    16
H2O cluster allowed cores:  80
H2O cluster healthy:        True
--------------------------  ------------------------------------
pathDataTrain = ["hdfs://192.168.1.10/user/data/data_train.csv"]
pathDataTest = ["hdfs://192.168.1.10/user/data/data_test.csv"]
trainFrame = h2o.import_frame(path=pathDataTrain)
testFrame = h2o.import_frame(path=pathDataTest)

#Parse Progress: [##################################################] 100%
#Imported [hdfs://192.168.1.10/user/data/data_train.csv'] into cluster with 60000 rows and 500 cols

#Parse Progress: [##################################################] 100%
#Imported ['hdfs://192.168.1.10/user/data/data_test.csv'] into cluster with 10000 rows and 500 cols

trainFrame[499]._name = "label"
testFrame[499]._name = "label"

model = h2o.gbm(x=trainFrame.drop("label"),
      y=trainFrame["label"],
      validation_x=testFrame.drop("label"),
      validation_y=testFrame["label"],
      ntrees=100,
      max_depth=10
      )

#gbm Model Build Progress: [##################################################] 100%

predictFrame = model.predict(testFrame)
model.model_performance(testFrame)

Table Of Contents

Previous topic

H2OFrame

Next topic

model Package

This Page