Modeling In H2O

H2OEstimator

class h2o.estimators.estimator_base.H2OEstimator[source]

Bases: h2o.model.model_base.ModelBase

H2O Estimators

H2O Estimators implement the following methods for model construction:
  • start - Top-level user-facing API for asynchronous model build
  • join - Top-level user-facing API for blocking on async model build
  • train - Top-level user-facing API for model building.
  • fit - Used by scikit-learn.

Because H2OEstimator instances are instances of ModelBase, these objects can use the H2O model API.

fit(X, y=None, **params)[source]

Fit an H2O model as part of a scikit-learn pipeline or grid search.

A warning will be issued if a caller other than sklearn attempts to use this method.

Parameters:

X : H2OFrame

An H2OFrame consisting of the predictor variables.

y
: H2OFrame, optional

An H2OFrame consisting of the response variable.

params
: optional

Extra arguments.

Returns:

The current instance of H2OEstimator for method chaining.

get_params(deep=True)[source]

Useful method for obtaining parameters for this estimator. Used primarily for sklearn Pipelines and sklearn grid search.

Parameters:

deep : bool, optional

If True, return parameters of all sub-objects that are estimators.

Returns:

A dict of parameters

set_params(**parms)[source]

Used by sklearn for updating parameters during grid search.

Parameters:

parms : dict

A dictionary of parameters that will be set on this model.

Returns:

Returns self, the current estimator object with the parameters all set as desired.

start(x, y=None, training_frame=None, offset_column=None, fold_column=None, weights_column=None, validation_frame=None, **params)[source]

Asynchronous model build by specifying the predictor columns, response column, and any additional frame-specific values.

To block for results, call join.

Parameters:

x : list

A list of column names or indices indicating the predictor columns.

y
: str

An index or a column name indicating the response column.

training_frame
: H2OFrame

The H2OFrame having the columns indicated by x and y (as well as any additional columns specified by fold, offset, and weights).

offset_column
: str, optional

The name or index of the column in training_frame that holds the offsets.

fold_column
: str, optional

The name or index of the column in training_frame that holds the per-row fold assignments.

weights_column
: str, optional

The name or index of the column in training_frame that holds the per-row weights.

validation_frame
: H2OFrame, optional

H2OFrame with validation data to be scored on while training.

train(x, y=None, training_frame=None, offset_column=None, fold_column=None, weights_column=None, validation_frame=None, max_runtime_secs=None, **params)[source]

Train the H2O model by specifying the predictor columns, response column, and any additional frame-specific values.

Parameters:

x : list

A list of column names or indices indicating the predictor columns.

y
: str

An index or a column name indicating the response column.

training_frame
: H2OFrame

The H2OFrame having the columns indicated by x and y (as well as any additional columns specified by fold, offset, and weights).

offset_column
: str, optional

The name or index of the column in training_frame that holds the offsets.

fold_column
: str, optional

The name or index of the column in training_frame that holds the per-row fold assignments.

weights_column
: str, optional

The name or index of the column in training_frame that holds the per-row weights.

validation_frame
: H2OFrame, optional

H2OFrame with validation data to be scored on while training.

max_runtime_secs
: float

Maximum allowed runtime in seconds for model training. Use 0 to disable.

H2ODeepLearningEstimator

class h2o.estimators.deeplearning.H2ODeepLearningEstimator(**kwargs)[source]

Bases: h2o.estimators.estimator_base.H2OEstimator

Examples

>>> import h2o as ml
>>> from h2o.estimators.deeplearning import H2ODeepLearningEstimator
>>> ml.init()
>>> rows = [[1,2,3,4,0], [2,1,2,4,1], [2,1,4,2,1], [0,1,2,34,1], [2,3,4,1,0]] * 50
>>> fr = ml.H2OFrame(rows)
>>> fr[4] = fr[4].asfactor()
>>> model = H2ODeepLearningEstimator()
>>> model.train(x=range(4), y=4, training_frame=fr)

H2OAutoEncoderEstimator

class h2o.estimators.deeplearning.H2OAutoEncoderEstimator(**kwargs)[source]

Bases: h2o.estimators.deeplearning.H2ODeepLearningEstimator

Examples

>>> import h2o as ml
>>> from h2o.estimators.deeplearning import H2OAutoEncoderEstimator
>>> ml.init()
>>> rows = [[1,2,3,4,0]*50, [2,1,2,4,1]*50, [2,1,4,2,1]*50, [0,1,2,34,1]*50, [2,3,4,1,0]*50]
>>> fr = ml.H2OFrame(rows)
>>> fr[4] = fr[4].asfactor()
>>> model = H2OAutoEncoderEstimator()
>>> model.train(x=range(4), training_frame=fr)

H2ORandomForestEstimator

class h2o.estimators.random_forest.H2ORandomForestEstimator(**kwargs)[source]

Bases: h2o.estimators.estimator_base.H2OEstimator

H2OGradientBoostingEstimator

class h2o.estimators.gbm.H2OGradientBoostingEstimator(**kwargs)[source]

Bases: h2o.estimators.estimator_base.H2OEstimator

H2OGeneralizedLinearEstimator

class h2o.estimators.glm.H2OGeneralizedLinearEstimator(**kwargs)[source]

Bases: h2o.estimators.estimator_base.H2OEstimator

Returns:

A subclass of ModelBase is returned. The specific subclass depends on the machine learning task at hand

(if it’s binomial classification, then an H2OBinomialModel is returned, if it’s regression then a

H2ORegressionModel is returned). The default print-out of the models is shown, but further GLM-specific

information can be queried out of the object. Upon completion of the GLM, the resulting object has

coefficients, normalized coefficients, residual/null deviance, aic, and a host of model metrics including

MSE, AUC (for logistic regression), degrees of freedom, and confusion matrices.

Lambda

[DEPRECATED] Use self.lambda_ instead

static getGLMRegularizationPath(model)[source]

Extract full regularization path explored during lambda search from glm model. @param model - source lambda search model

lambda_

[DEPRECATED] Use self.lambda_ instead

static makeGLMModel(model, coefs, threshold=0.5)[source]

Create a custom GLM model using the given coefficients. Needs to be passed source model trained on the dataset to extract the dataset information from.

@param model - source model, used for extracting dataset information @param coefs - dictionary containing model coefficients @param threshold - (optional, only for binomial) decision threshold used for classification

H2OGeneralizedLowRankEstimator

class h2o.estimators.glrm.H2OGeneralizedLowRankEstimator(**kwargs)[source]

Bases: h2o.estimators.estimator_base.H2OEstimator

H2OKMeansEstimator

class h2o.estimators.kmeans.H2OKMeansEstimator(**kwargs)[source]

Bases: h2o.estimators.estimator_base.H2OEstimator

H2ONaiveBayesEstimator

class h2o.estimators.naive_bayes.H2ONaiveBayesEstimator(**kwargs)[source]

Bases: h2o.estimators.estimator_base.H2OEstimator