This module implements the communication REST layer for the python <-> H2O connection.
Build a supervised Deep Learning model (kwargs are the same arguments that you can find in FLOW)
Retrieve metadata for a key that points to a Frame.
Parameters: | key – A pointer to a Frame in H2O. |
---|---|
Returns: | Meta information on the frame |
Build a Gradient Boosted Method model (kwargs are the same arguments that you can find in FLOW)
Build a Generalized Linear Model (kwargs are the same arguments that you can find in FLOW)
Import a single file or collection of files.
Parameters: | path – A path to a data file (remote or local). |
---|---|
Returns: | A new H2OFrame |
Import a frame from a file (remote or local machine). If you run H2O on Hadoop, you can access to HDFS
Parameters: | path – A path specifying the location of the data to import. |
---|---|
Returns: | A new H2OFrame |
Initiate an H2O connection to the specified ip and port.
Parameters: |
|
---|---|
Returns: | None |
Build a KMeans model (kwargs are the same arguments that you can find in FLOW)
Search for a relative path and turn it into an absolute path. This is handy when hunting for data files to be passed into h2o and used by import file. Note: This function is for unit testing purposes only.
Parameters: | path – Path to search for |
---|---|
Returns: | Absolute path if it is found. None otherwise. |
Trigger a parse; blocking; removeFrame just keep the Vec keys.
Parameters: |
|
---|---|
Returns: | A new parsed object |
Parameters: | rawkey – A collection of imported file keys |
---|---|
Returns: | A ParseSetup “object” |
Fire off a Rapids expression.
Parameters: | expr – The rapids expression (ascii string). |
---|---|
Returns: | The JSON response of the Rapids execution |
Here is a small example (H2O on Hadoop) :
import h2o
h2o.init(ip="192.168.1.10", port=54321)
-------------------------- ------------------------------------
H2O cluster uptime: 2 minutes 1 seconds 966 milliseconds
H2O cluster version: 0.1.27.1064
H2O cluster name: H2O_96762
H2O cluster total nodes: 4
H2O cluster total memory: 38.34 GB
H2O cluster total cores: 16
H2O cluster allowed cores: 80
H2O cluster healthy: True
-------------------------- ------------------------------------
pathDataTrain = ["hdfs://192.168.1.10/user/data/data_train.csv"]
pathDataTest = ["hdfs://192.168.1.10/user/data/data_test.csv"]
trainFrame = h2o.import_frame(path=pathDataTrain)
testFrame = h2o.import_frame(path=pathDataTest)
#Parse Progress: [##################################################] 100%
#Imported [hdfs://192.168.1.10/user/data/data_train.csv'] into cluster with 60000 rows and 500 cols
#Parse Progress: [##################################################] 100%
#Imported ['hdfs://192.168.1.10/user/data/data_test.csv'] into cluster with 10000 rows and 500 cols
trainFrame[499]._name = "label"
testFrame[499]._name = "label"
model = h2o.gbm(x=trainFrame.drop("label"),
y=trainFrame["label"],
validation_x=testFrame.drop("label"),
validation_y=testFrame["label"],
ntrees=100,
max_depth=10
)
#gbm Model Build Progress: [##################################################] 100%
predictFrame = model.predict(testFrame)
model.model_performance(testFrame)