H2O Module¶

`h2o.h2o` Module¶

This module implements the communication REST layer for the python <-> H2O connection.

h2o.h2o.ddply(frame, cols, fun)[source]¶

h2o.h2o.deeplearning(x, y, validation_x=None, validation_y=None, **kwargs)[source]¶: Build a supervised Deep Learning model (kwargs are the same arguments that you can find in FLOW)

h2o.h2o.frame(key)[source]¶

Retrieve metadata for a key that points to a Frame.

Parameters:	key – A pointer to a Frame in H2O.
Returns:	Meta information on the frame

h2o.h2o.gbm(x, y, validation_x=None, validation_y=None, **kwargs)[source]¶: Build a Gradient Boosted Method model (kwargs are the same arguments that you can find in FLOW)

h2o.h2o.glm(x, y, validation_x=None, validation_y=None, **kwargs)[source]¶: Build a Generalized Linear Model (kwargs are the same arguments that you can find in FLOW)

h2o.h2o.import_file(path)[source]¶

Import a single file or collection of files.

Parameters:	path – A path to a data file (remote or local).
Returns:	A new H2OFrame

h2o.h2o.import_frame(path=None, vecs=None)[source]¶

Import a frame from a file (remote or local machine). If you run H2O on Hadoop, you can access to HDFS

Parameters:	path – A path specifying the location of the data to import.
Returns:	A new H2OFrame

h2o.h2o.init(ip='localhost', port=54321)[source]¶

Initiate an H2O connection to the specified ip and port.

Parameters:	ip – A IP address, default is “localhost”. port – A port, default is 54321.
Returns:	None

h2o.h2o.kmeans(x, validation_x=None, **kwargs)[source]¶: Build a KMeans model (kwargs are the same arguments that you can find in FLOW)

h2o.h2o.locate(path)[source]¶

Search for a relative path and turn it into an absolute path. This is handy when hunting for data files to be passed into h2o and used by import file. Note: This function is for unit testing purposes only.

Parameters:	path – Path to search for
Returns:	Absolute path if it is found. None otherwise.

h2o.h2o.network_test()[source]¶

h2o.h2o.parse(setup, h2o_name, first_line_is_header=(-1, 0, 1))[source]¶

Trigger a parse; blocking; removeFrame just keep the Vec keys.

Parameters:	setup – The result of calling parse_setup. h2o_name – The name of the H2O Frame on the back end. first_line_is_header – -1 means data, 0 means guess, 1 means header.
Returns:	A new parsed object

h2o.h2o.parse_setup(rawkey)[source]¶

Parameters:	rawkey – A collection of imported file keys
Returns:	A ParseSetup “object”

h2o.h2o.rapids(expr)[source]¶

Fire off a Rapids expression.

Parameters:	expr – The rapids expression (ascii string).
Returns:	The JSON response of the Rapids execution

h2o.h2o.remove(key)[source]¶

Remove key from H2O.

Parameters:	key – The key pointing to the object to be removed.
Returns:	Void

h2o.h2o.run_test(sys_args, test_to_run)[source]¶

h2o.h2o.upload_file(path, destination_key='')[source]¶

Upload a dataset at the path given from the local machine to the H2O cluster.

Parameters:	path – A path specifying the location of the data to upload. destination_key – The name of the H2O Frame in the H2O Cluster.
Returns:	A new H2OFrame

Example¶

Here is a small example (H2O on Hadoop) :

import h2o
h2o.init(ip="192.168.1.10", port=54321)
--------------------------  ------------------------------------
H2O cluster uptime:         2 minutes 1 seconds 966 milliseconds
H2O cluster version:        0.1.27.1064
H2O cluster name:           H2O_96762
H2O cluster total nodes:    4
H2O cluster total memory:   38.34 GB
H2O cluster total cores:    16
H2O cluster allowed cores:  80
H2O cluster healthy:        True
--------------------------  ------------------------------------
pathDataTrain = ["hdfs://192.168.1.10/user/data/data_train.csv"]
pathDataTest = ["hdfs://192.168.1.10/user/data/data_test.csv"]
trainFrame = h2o.import_frame(path=pathDataTrain)
testFrame = h2o.import_frame(path=pathDataTest)

#Parse Progress: [##################################################] 100%
#Imported [hdfs://192.168.1.10/user/data/data_train.csv'] into cluster with 60000 rows and 500 cols

#Parse Progress: [##################################################] 100%
#Imported ['hdfs://192.168.1.10/user/data/data_test.csv'] into cluster with 10000 rows and 500 cols

trainFrame[499]._name = "label"
testFrame[499]._name = "label"

model = h2o.gbm(x=trainFrame.drop("label"),
      y=trainFrame["label"],
      validation_x=testFrame.drop("label"),
      validation_y=testFrame["label"],
      ntrees=100,
      max_depth=10
      )

#gbm Model Build Progress: [##################################################] 100%

predictFrame = model.predict(testFrame)
model.model_performance(testFrame)

H2O Module¶

`h2o.h2o` Module¶

Example¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

H2O Module¶

h2o.h2o Module¶

Example¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation

`h2o.h2o` Module¶