Model Categories

class h2o.model.H2OAutoEncoderModel

Bases: h2o.model.model_base.ModelBase

Class for AutoEncoder models.

Attributes

full_parameters Get the full specification of all parameters.
model_id
return:Retrieve this model’s identifier.
params Get the parameters and the actual/default values only.
type Get the type of model built as a string.
xvals Return a list of the cross-validated models.

Methods

aic([train, valid, xval]) Get the AIC(s).
anomaly(test_data[, per_feature]) Obtain the reconstruction error for the input test_data.
auc([train, valid, xval]) Get the AUC(s).
biases([vector_id]) Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return.
catoffsets() Categorical offsets for one-hot encoding
coef()
return:Return the coefficients for this model.
coef_norm()
return:Return the normalized coefficients
deepfeatures(test_data, layer) Return hidden layer details
download_pojo([path]) Download the POJO for this model to the directory specified by path (no trailing slash!).
get_xval_models([key]) Return a Model object.
giniCoef([train, valid, xval]) Get the Gini Coefficient(s).
is_cross_validated()
return:True if the model was cross-validated.
logloss([train, valid, xval]) Get the Log Loss(s).
mean_residual_deviance([train, valid, xval]) Get the Mean Residual Deviances(s).
model_performance([test_data, train, valid]) Generate model metrics for this model on test_data.
mse([train, valid, xval]) Get the MSE(s).
next()
normmul() Normalization/Standardization multipliers for numeric predictors
normsub() Normalization/Standardization offsets for numeric predictors
null_degrees_of_freedom([train, valid, xval]) Retreive the null degress of freedom if this model has the attribute, or None otherwise.
null_deviance([train, valid, xval]) Retreive the null deviance if this model has the attribute, or None otherwise.
pprint_coef() Pretty print the coefficents table (includes normalized coefficients)
predict(test_data) Predict on a dataset.
r2([train, valid, xval]) Return the R^2 for this regression model.
residual_degrees_of_freedom([train, valid, xval]) Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
residual_deviance([train, valid, xval]) Retreive the residual deviance if this model has the attribute, or None otherwise.
respmul() Normalization/Standardization multipliers for numeric response
respsub() Normalization/Standardization offsets for numeric response
score_history() Deprecated for scoring_history
scoring_history() Retrieve Model Score History
show() Print innards of model, without regards to type
summary() Print a detailed summary of the model.
varimp([use_pandas]) Pretty print the variable importances, or return them in a list
weights([matrix_id]) Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return.
xval_keys()
return:The model keys for the cross-validated model.
anomaly(test_data, per_feature=False)

Obtain the reconstruction error for the input test_data.

Parameters:

test_data : H2OFrame

The dataset upon which the reconstruction error is computed.

per_feature : bool

Whether to return the square reconstruction error per feature. Otherwise, return the mean square error.

Returns:

Return the reconstruction error.

class h2o.model.H2OBinomialModel

Bases: h2o.model.model_base.ModelBase

Attributes

full_parameters Get the full specification of all parameters.
model_id
return:Retrieve this model’s identifier.
params Get the parameters and the actual/default values only.
type Get the type of model built as a string.
xvals Return a list of the cross-validated models.

Methods

F0point5([thresholds, train, valid, xval]) Get the F0.5 for a set of thresholds.
F1([thresholds, train, valid, xval]) Get the F1 value for a set of thresholds
F2([thresholds, train, valid, xval]) Get the F2 for a set of thresholds.
accuracy([thresholds, train, valid, xval]) Get the accuracy for a set of thresholds.
aic([train, valid, xval]) Get the AIC(s).
auc([train, valid, xval]) Get the AUC(s).
biases([vector_id]) Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return.
catoffsets() Categorical offsets for one-hot encoding
coef()
return:Return the coefficients for this model.
coef_norm()
return:Return the normalized coefficients
confusion_matrix([metrics, thresholds, ...]) Get the confusion matrix for the specified metrics/thresholds
deepfeatures(test_data, layer) Return hidden layer details
download_pojo([path]) Download the POJO for this model to the directory specified by path (no trailing slash!).
error([thresholds, train, valid, xval]) Get the error for a set of thresholds.
fallout([thresholds, train, valid, xval]) Get the Fallout (AKA False Positive Rate) for a set of thresholds.
find_idx_by_threshold(threshold[, train, ...]) Retrieve the index in this metric’s threshold list at which the given threshold is located.
find_threshold_by_max_metric(metric[, ...]) If all are False (default), then return the training metric value.
fnr([thresholds, train, valid, xval]) Get the False Negative Rates for a set of thresholds.
fpr([thresholds, train, valid, xval]) Get the False Positive Rates for a set of thresholds.
gains_lift([train, valid, xval]) Get the Gains/Lift table for the specified metrics
get_xval_models([key]) Return a Model object.
giniCoef([train, valid, xval]) Get the Gini Coefficient(s).
is_cross_validated()
return:True if the model was cross-validated.
logloss([train, valid, xval]) Get the Log Loss(s).
max_per_class_error([thresholds, train, ...]) Get the max per class error for a set of thresholds.
mcc([thresholds, train, valid, xval]) Get the mcc for a set of thresholds.
mean_residual_deviance([train, valid, xval]) Get the Mean Residual Deviances(s).
metric(metric[, thresholds, train, valid, xval]) Get the metric value for a set of thresholds.
missrate([thresholds, train, valid, xval]) Get the miss rate (AKA False Negative Rate) for a set of thresholds.
model_performance([test_data, train, valid]) Generate model metrics for this model on test_data.
mse([train, valid, xval]) Get the MSE(s).
next()
normmul() Normalization/Standardization multipliers for numeric predictors
normsub() Normalization/Standardization offsets for numeric predictors
null_degrees_of_freedom([train, valid, xval]) Retreive the null degress of freedom if this model has the attribute, or None otherwise.
null_deviance([train, valid, xval]) Retreive the null deviance if this model has the attribute, or None otherwise.
plot([timestep, metric]) Plots training set (and validation set if available) scoring history for an H2OBinomialModel. The timestep and metric
pprint_coef() Pretty print the coefficents table (includes normalized coefficients)
precision([thresholds, train, valid, xval]) Get the precision for a set of thresholds.
predict(test_data) Predict on a dataset.
r2([train, valid, xval]) Return the R^2 for this regression model.
recall([thresholds, train, valid, xval]) Get the Recall (AKA True Positive Rate) for a set of thresholds.
residual_degrees_of_freedom([train, valid, xval]) Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
residual_deviance([train, valid, xval]) Retreive the residual deviance if this model has the attribute, or None otherwise.
respmul() Normalization/Standardization multipliers for numeric response
respsub() Normalization/Standardization offsets for numeric response
roc([train, valid, xval]) Return the coordinates of the ROC curve for a given set of data,
score_history() Deprecated for scoring_history
scoring_history() Retrieve Model Score History
sensitivity([thresholds, train, valid, xval]) Get the sensitivity (AKA True Positive Rate or Recall) for a set of thresholds.
show() Print innards of model, without regards to type
specificity([thresholds, train, valid, xval]) Get the specificity (AKA True Negative Rate) for a set of thresholds.
summary() Print a detailed summary of the model.
tnr([thresholds, train, valid, xval]) Get the True Negative Rate for a set of thresholds.
tpr([thresholds, train, valid, xval]) Get the True Positive Rate for a set of thresholds.
varimp([use_pandas]) Pretty print the variable importances, or return them in a list
weights([matrix_id]) Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return.
xval_keys()
return:The model keys for the cross-validated model.
F0point5(thresholds=None, train=False, valid=False, xval=False)

Get the F0.5 for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the F0point5 value for the training data.
  • valid – If valid is True, then return the F0point5 value for the validation data.
  • xval – If xval is True, then return the F0point5 value for the cross validation data.
Returns:

The F0point5 for this binomial model.

F1(thresholds=None, train=False, valid=False, xval=False)

Get the F1 value for a set of thresholds

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.

Parameters:

thresholds : list, optional

If None, then the thresholds in this set of metrics will be used.

train : bool, optional

If True, return the F1 value for the training data.

valid : bool, optional

If True, return the F1 value for the validation data.

xval : bool, optional

If True, return the F1 value for each of the cross-validated splits.

Returns:

The F1 values for the specified key(s).

Examples

>>> import h2o as ml
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator
>>> ml.init()
>>> rows=[[1,2,3,4,0],[2,1,2,4,1],[2,1,4,2,1],[0,1,2,34,1],[2,3,4,1,0]]*50
>>> fr = ml.H2OFrame(rows)
>>> fr[4] = fr[4].asfactor()
>>> model = H2OGradientBoostingEstimator(ntrees=10, max_depth=10, nfolds=4)
>>> model.train(x=range(4), y=4, training_frame=fr)
>>> model.F1(train=True)
F2(thresholds=None, train=False, valid=False, xval=False)

Get the F2 for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the F2 value for the training data.
  • valid – If valid is True, then return the F2 value for the validation data.
  • xval – If xval is True, then return the F2 value for the cross validation data.
Returns:

The F2 for this binomial model.

accuracy(thresholds=None, train=False, valid=False, xval=False)

Get the accuracy for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the accuracy value for the training data.
  • valid – If valid is True, then return the accuracy value for the validation data.
  • xval – If xval is True, then return the accuracy value for the cross validation data.
Returns:

The accuracy for this binomial model.

confusion_matrix(metrics=None, thresholds=None, train=False, valid=False, xval=False)

Get the confusion matrix for the specified metrics/thresholds If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • metrics – A string (or list of strings) in {“min_per_class_accuracy”, “absolute_MCC”, “tnr”, “fnr”, “fpr”, “tpr”, “precision”, “accuracy”, “f0point5”, “f2”, “f1”}
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the confusion matrix value for the training data.
  • valid – If valid is True, then return the confusion matrix value for the validation data.
  • xval – If xval is True, then return the confusion matrix value for the cross validation data.
Returns:

The confusion matrix for this binomial model.

error(thresholds=None, train=False, valid=False, xval=False)

Get the error for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the error value for the training data.
  • valid – If valid is True, then return the error value for the validation data.
  • xval – If xval is True, then return the error value for the cross validation data.
Returns:

The error for this binomial model.

fallout(thresholds=None, train=False, valid=False, xval=False)

Get the Fallout (AKA False Positive Rate) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the fallout value for the training data.
  • valid – If valid is True, then return the fallout value for the validation data.
  • xval – If xval is True, then return the fallout value for the cross validation data.
Returns:

The fallout for this binomial model.

find_idx_by_threshold(threshold, train=False, valid=False, xval=False)

Retrieve the index in this metric’s threshold list at which the given threshold is located. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the idx_by_threshold for the training data.
  • valid – If valid is True, then return the idx_by_threshold for the validation data.
  • xval – If xval is True, then return the idx_by_threshold for the cross validation data.
Returns:

The idx_by_threshold for this binomial model.

find_threshold_by_max_metric(metric, train=False, valid=False, xval=False)

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the threshold_by_max_metric value for the training data.
  • valid – If valid is True, then return the threshold_by_max_metric value for the validation data.
  • xval – If xval is True, then return the threshold_by_max_metric value for the cross validation data.
Returns:

The threshold_by_max_metric for this binomial model.

fnr(thresholds=None, train=False, valid=False, xval=False)

Get the False Negative Rates for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the fnr value for the training data.
  • valid – If valid is True, then return the fnr value for the validation data.
  • xval – If xval is True, then return the fnr value for the cross validation data.
Returns:

The fnr for this binomial model.

fpr(thresholds=None, train=False, valid=False, xval=False)

Get the False Positive Rates for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the fpr value for the training data.
  • valid – If valid is True, then return the fpr value for the validation data.
  • xval – If xval is True, then return the fpr value for the cross validation data.
Returns:

The fpr for this binomial model.

gains_lift(train=False, valid=False, xval=False)

Get the Gains/Lift table for the specified metrics If all are False (default), then return the training metric Gains/Lift table. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the Gains/Lift table for the training data.
  • valid – If valid is True, then return the Gains/Lift table for the validation data.
  • xval – If xval is True, then return the Gains/Lift table for the cross validation data.
Returns:

The Gains/Lift table for this binomial model.

max_per_class_error(thresholds=None, train=False, valid=False, xval=False)

Get the max per class error for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the max_per_class_error value for the training data.
  • valid – If valid is True, then return the max_per_class_error value for the validation data.
  • xval – If xval is True, then return the max_per_class_error value for the cross validation data.
Returns:

The max_per_class_error for this binomial model.

mcc(thresholds=None, train=False, valid=False, xval=False)

Get the mcc for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the mcc value for the training data.
  • valid – If valid is True, then return the mcc value for the validation data.
  • xval – If xval is True, then return the mcc value for the cross validation data.
Returns:

The mcc for this binomial model.

metric(metric, thresholds=None, train=False, valid=False, xval=False)

Get the metric value for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the metrics for the training data.
  • valid – If valid is True, then return the metrics for the validation data.
  • xval – If xval is True, then return the metrics for the cross validation data.
Returns:

The metrics for this binomial model.

missrate(thresholds=None, train=False, valid=False, xval=False)

Get the miss rate (AKA False Negative Rate) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the missrate value for the training data.
  • valid – If valid is True, then return the missrate value for the validation data.
  • xval – If xval is True, then return the missrate value for the cross validation data.
Returns:

The missrate for this binomial model.

plot(timestep='AUTO', metric='AUTO', **kwargs)

Plots training set (and validation set if available) scoring history for an H2OBinomialModel. The timestep and metric arguments are restricted to what is available in its scoring history.

Parameters:
  • timestep – A unit of measurement for the x-axis.
  • metric – A unit of measurement for the y-axis.
Returns:

A scoring history plot.

precision(thresholds=None, train=False, valid=False, xval=False)

Get the precision for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the precision value for the training data.
  • valid – If valid is True, then return the precision value for the validation data.
  • xval – If xval is True, then return the precision value for the cross validation data.
Returns:

The precision for this binomial model.

recall(thresholds=None, train=False, valid=False, xval=False)

Get the Recall (AKA True Positive Rate) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the recall value for the training data.
  • valid – If valid is True, then return the recall value for the validation data.
  • xval – If xval is True, then return the recall value for the cross validation data.
Returns:

The recall for this binomial model.

roc(train=False, valid=False, xval=False)

Return the coordinates of the ROC curve for a given set of data, as a two-tuple containing the false positive rates as a list and true positive rates as a list. If all are False (default), then return is the training data. If more than one ROC curve is requested, the data is returned as a dictionary of two-tuples. :param train: If train is true, then return the ROC coordinates for the training data. :param valid: If valid is true, then return the ROC coordinates for the validation data. :param xval: If xval is true, then return the ROC coordinates for the cross validation data. :return rocs_cooridinates: the true cooridinates of the roc curve.

sensitivity(thresholds=None, train=False, valid=False, xval=False)

Get the sensitivity (AKA True Positive Rate or Recall) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the sensitivity value for the training data.
  • valid – If valid is True, then return the sensitivity value for the validation data.
  • xval – If xval is True, then return the sensitivity value for the cross validation data.
Returns:

The sensitivity for this binomial model.

specificity(thresholds=None, train=False, valid=False, xval=False)

Get the specificity (AKA True Negative Rate) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the specificity value for the training data.
  • valid – If valid is True, then return the specificity value for the validation data.
  • xval – If xval is True, then return the specificity value for the cross validation data.
Returns:

The specificity for this binomial model.

tnr(thresholds=None, train=False, valid=False, xval=False)

Get the True Negative Rate for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the tnr value for the training data.
  • valid – If valid is True, then return the tnr value for the validation data.
  • xval – If xval is True, then return the tnr value for the cross validation data.
Returns:

The F1 for this binomial model.

tpr(thresholds=None, train=False, valid=False, xval=False)

Get the True Positive Rate for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the tpr value for the training data.
  • valid – If valid is True, then return the tpr value for the validation data.
  • xval – If xval is True, then return the tpr value for the cross validation data.
Returns:

The tpr for this binomial model.

class h2o.model.H2OClusteringModel

Bases: h2o.model.model_base.ModelBase

Attributes

full_parameters Get the full specification of all parameters.
model_id
return:Retrieve this model’s identifier.
params Get the parameters and the actual/default values only.
type Get the type of model built as a string.
xvals Return a list of the cross-validated models.

Methods

aic([train, valid, xval]) Get the AIC(s).
auc([train, valid, xval]) Get the AUC(s).
betweenss([train, valid, xval]) Get the between cluster sum of squares.
biases([vector_id]) Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return.
catoffsets() Categorical offsets for one-hot encoding
centers()
Returns:
centers_std()
Returns:
centroid_stats([train, valid, xval]) Get the centroid statistics for each cluster.
coef()
return:Return the coefficients for this model.
coef_norm()
return:Return the normalized coefficients
deepfeatures(test_data, layer) Return hidden layer details
download_pojo([path]) Download the POJO for this model to the directory specified by path (no trailing slash!).
get_xval_models([key]) Return a Model object.
giniCoef([train, valid, xval]) Get the Gini Coefficient(s).
is_cross_validated()
return:True if the model was cross-validated.
logloss([train, valid, xval]) Get the Log Loss(s).
mean_residual_deviance([train, valid, xval]) Get the Mean Residual Deviances(s).
model_performance([test_data, train, valid]) Generate model metrics for this model on test_data.
mse([train, valid, xval]) Get the MSE(s).
next()
normmul() Normalization/Standardization multipliers for numeric predictors
normsub() Normalization/Standardization offsets for numeric predictors
null_degrees_of_freedom([train, valid, xval]) Retreive the null degress of freedom if this model has the attribute, or None otherwise.
null_deviance([train, valid, xval]) Retreive the null deviance if this model has the attribute, or None otherwise.
num_iterations() Get the number of iterations that it took to converge or reach max iterations.
pprint_coef() Pretty print the coefficents table (includes normalized coefficients)
predict(test_data) Predict on a dataset.
r2([train, valid, xval]) Return the R^2 for this regression model.
residual_degrees_of_freedom([train, valid, xval]) Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
residual_deviance([train, valid, xval]) Retreive the residual deviance if this model has the attribute, or None otherwise.
respmul() Normalization/Standardization multipliers for numeric response
respsub() Normalization/Standardization offsets for numeric response
score_history() Deprecated for scoring_history
scoring_history() Retrieve Model Score History
show() Print innards of model, without regards to type
size([train, valid, xval]) Get the sizes of each cluster.
summary() Print a detailed summary of the model.
tot_withinss([train, valid, xval]) Get the total within cluster sum of squares.
totss([train, valid, xval]) Get the total sum of squares.
varimp([use_pandas]) Pretty print the variable importances, or return them in a list
weights([matrix_id]) Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return.
withinss([train, valid, xval]) Get the within cluster sum of squares for each cluster.
xval_keys()
return:The model keys for the cross-validated model.
betweenss(train=False, valid=False, xval=False)

Get the between cluster sum of squares.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.

Parameters:

train : bool, optional

If True, then return the between cluster sum of squares value for the training data.

valid : bool, optional

If True, then return the between cluster sum of squares value for the validation data.

xval : bool, optional

If True, then return the between cluster sum of squares value for each of the cross-validated splits.

Returns:

Returns the between sum of squares values for the specified key(s).

centers()
Returns:The centers for the KMeans model.
centers_std()
Returns:The standardized centers for the kmeans model.
centroid_stats(train=False, valid=False, xval=False)

Get the centroid statistics for each cluster.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.

Parameters:

train : bool, optional

If True, then return the centroid statistics for the training data.

valid : bool, optional

If True, then return the centroid statistics for the validation data.

xval : bool, optional

If True, then return the centroid statistics for each of the cross-validated splits.

Returns:

Returns the centroid statistics for the specified key(s).

num_iterations()

Get the number of iterations that it took to converge or reach max iterations.

Returns:The number of iterations (integer).
size(train=False, valid=False, xval=False)

Get the sizes of each cluster.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:

train : bool, optional

If True, then return cluster sizes for the training data.

valid : bool, optional

If True, then return the cluster sizes for the validation data.

xval : bool, optional

If True, then return the cluster sizes for each of the cross-validated splits.

Returns:

Returns the cluster sizes for the specified key(s).

tot_withinss(train=False, valid=False, xval=False)

Get the total within cluster sum of squares.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.

Parameters:

train : bool, optional

If True, then return the total within cluster sum of squares value for the training data.

valid : bool, optional

If True, then return the total within cluster sum of squares value for the validation data.

xval : bool, optional

If True, then return the total within cluster sum of squares value for each of the cross-validated splits.

Returns:

Returns the total within cluster sum of squares values for the specified key(s).

totss(train=False, valid=False, xval=False)

Get the total sum of squares.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.

Parameters:

train : bool, optional

If True, then return the total sum of squares value for the training data.

valid : bool, optional

If True, then return the total sum of squares value for the validation data.

xval : bool, optional

If True, then return the total sum of squares value for each of the cross-validated splits.

Returns:

Returns the total sum of squares values for the specified key(s).

withinss(train=False, valid=False, xval=False)

Get the within cluster sum of squares for each cluster.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.

Parameters:

train : bool, optional

If True, then return the within cluster sum of squares value for the training data.

valid : bool, optional

If True, then return the within cluster sum of squares value for the validation data.

xval : bool, optional

If True, then return the within cluster sum of squares value for each of the cross-validated splits.

Returns:

Returns the total sum of squares values for the specified key(s).

class h2o.model.ConfusionMatrix(cm, domains=None, table_header=None)

Bases: future.types.newobject.newobject

Methods

next()
read_cms([cms, domains])
show()
to_list()
ROUND = 4
static read_cms(cms=None, domains=None)
show()
to_list()
class h2o.model.H2ODimReductionModel

Bases: h2o.model.model_base.ModelBase

Attributes

full_parameters Get the full specification of all parameters.
model_id
return:Retrieve this model’s identifier.
params Get the parameters and the actual/default values only.
type Get the type of model built as a string.
xvals Return a list of the cross-validated models.

Methods

aic([train, valid, xval]) Get the AIC(s).
archetypes()
return:the archetypes (Y) of the GLRM model.
auc([train, valid, xval]) Get the AUC(s).
biases([vector_id]) Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return.
catoffsets() Categorical offsets for one-hot encoding
coef()
return:Return the coefficients for this model.
coef_norm()
return:Return the normalized coefficients
deepfeatures(test_data, layer) Return hidden layer details
download_pojo([path]) Download the POJO for this model to the directory specified by path (no trailing slash!).
final_step() Get the final step size from the GLRM model.
get_xval_models([key]) Return a Model object.
giniCoef([train, valid, xval]) Get the Gini Coefficient(s).
is_cross_validated()
return:True if the model was cross-validated.
logloss([train, valid, xval]) Get the Log Loss(s).
mean_residual_deviance([train, valid, xval]) Get the Mean Residual Deviances(s).
model_performance([test_data, train, valid]) Generate model metrics for this model on test_data.
mse([train, valid, xval]) Get the MSE(s).
next()
normmul() Normalization/Standardization multipliers for numeric predictors
normsub() Normalization/Standardization offsets for numeric predictors
null_degrees_of_freedom([train, valid, xval]) Retreive the null degress of freedom if this model has the attribute, or None otherwise.
null_deviance([train, valid, xval]) Retreive the null deviance if this model has the attribute, or None otherwise.
num_iterations() Get the number of iterations that it took to converge or reach max iterations.
objective() Get the final value of the objective function from the GLRM model.
pprint_coef() Pretty print the coefficents table (includes normalized coefficients)
predict(test_data) Predict on a dataset.
proj_archetypes(test_data[, reverse_transform]) Convert archetypes of a GLRM model into original feature space.
r2([train, valid, xval]) Return the R^2 for this regression model.
reconstruct(test_data[, reverse_transform]) Reconstruct the training data from the GLRM model and impute all missing values.
residual_degrees_of_freedom([train, valid, xval]) Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
residual_deviance([train, valid, xval]) Retreive the residual deviance if this model has the attribute, or None otherwise.
respmul() Normalization/Standardization multipliers for numeric response
respsub() Normalization/Standardization offsets for numeric response
score_history() Deprecated for scoring_history
scoring_history() Retrieve Model Score History
screeplot([type]) Produce the scree plot :param type: type of plot.
show() Print innards of model, without regards to type
summary() Print a detailed summary of the model.
varimp([use_pandas]) Pretty print the variable importances, or return them in a list
weights([matrix_id]) Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return.
xval_keys()
return:The model keys for the cross-validated model.
archetypes()
Returns:the archetypes (Y) of the GLRM model.
final_step()

Get the final step size from the GLRM model.

Returns:final step size (double)
num_iterations()

Get the number of iterations that it took to converge or reach max iterations.

Returns:number of iterations (integer)
objective()

Get the final value of the objective function from the GLRM model.

Returns:final objective value (double)
proj_archetypes(test_data, reverse_transform=False)

Convert archetypes of a GLRM model into original feature space.

Parameters:

test_data : H2OFrame

The dataset upon which the H2O GLRM model was trained.

reverse_transform : logical

Whether the transformation of the training data during model-building should be reversed on the projected archetypes.

Returns:

Return the GLRM archetypes projected back into the original training data’s feature space.

reconstruct(test_data, reverse_transform=False)

Reconstruct the training data from the GLRM model and impute all missing values.

Parameters:

test_data : H2OFrame

The dataset upon which the H2O GLRM model was trained.

reverse_transform : logical

Whether the transformation of the training data during model-building should be reversed on the reconstructed frame.

Returns:

Return the approximate reconstruction of the training data.

screeplot(type='barplot', **kwargs)

Produce the scree plot :param type: type of plot. “barplot” and “lines” currently supported :param show: if False, the plot is not shown. matplotlib show method is blocking. :return: None

class h2o.model.MetricsBase(metric_json, on=None, algo='')

Bases: future.types.newobject.newobject

A parent class to house common metrics available for the various Metrics types.

The methods here are available across different model categories, and so appear here.

Methods

aic()
return:Retrieve the AIC for this set of metrics.
auc()
return:Retrieve the AUC for this set of metrics.
giniCoef()
return:Retrieve the Gini coefficeint for this set of metrics.
logloss()
return:Retrieve the log loss for this set of metrics.
mean_residual_deviance()
return:Retrieve the mean residual deviance for this set of metrics.
mse()
return:Retrieve the MSE for this set of metrics
next()
null_degrees_of_freedom()
return:the null dof if the model has residual deviance, or None if no null dof.
null_deviance()
return:the null deviance if the model has residual deviance, or None if no null deviance.
r2()
return:Retrieve the R^2 coefficient for this set of metrics
residual_degrees_of_freedom()
return:the residual dof if the model has residual deviance, or None if no residual dof.
residual_deviance()
return:the residual deviance if the model has residual deviance, or None if no residual deviance.
show() Display a short summary of the metrics.
aic()
Returns:Retrieve the AIC for this set of metrics.
auc()
Returns:Retrieve the AUC for this set of metrics.
giniCoef()
Returns:Retrieve the Gini coefficeint for this set of metrics.
logloss()
Returns:Retrieve the log loss for this set of metrics.
mean_residual_deviance()
Returns:Retrieve the mean residual deviance for this set of metrics.
mse()
Returns:Retrieve the MSE for this set of metrics
null_degrees_of_freedom()
Returns:the null dof if the model has residual deviance, or None if no null dof.
null_deviance()
Returns:the null deviance if the model has residual deviance, or None if no null deviance.
r2()
Returns:Retrieve the R^2 coefficient for this set of metrics
residual_degrees_of_freedom()
Returns:the residual dof if the model has residual deviance, or None if no residual dof.
residual_deviance()
Returns:the residual deviance if the model has residual deviance, or None if no residual deviance.
show()

Display a short summary of the metrics. :return: None

class h2o.model.ModelBase

Bases: future.types.newobject.newobject

Attributes

full_parameters Get the full specification of all parameters.
model_id
return:Retrieve this model’s identifier.
params Get the parameters and the actual/default values only.
type Get the type of model built as a string.
xvals Return a list of the cross-validated models.

Methods

aic([train, valid, xval]) Get the AIC(s).
auc([train, valid, xval]) Get the AUC(s).
biases([vector_id]) Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return.
catoffsets() Categorical offsets for one-hot encoding
coef()
return:Return the coefficients for this model.
coef_norm()
return:Return the normalized coefficients
deepfeatures(test_data, layer) Return hidden layer details
download_pojo([path]) Download the POJO for this model to the directory specified by path (no trailing slash!).
get_xval_models([key]) Return a Model object.
giniCoef([train, valid, xval]) Get the Gini Coefficient(s).
is_cross_validated()
return:True if the model was cross-validated.
logloss([train, valid, xval]) Get the Log Loss(s).
mean_residual_deviance([train, valid, xval]) Get the Mean Residual Deviances(s).
model_performance([test_data, train, valid]) Generate model metrics for this model on test_data.
mse([train, valid, xval]) Get the MSE(s).
next()
normmul() Normalization/Standardization multipliers for numeric predictors
normsub() Normalization/Standardization offsets for numeric predictors
null_degrees_of_freedom([train, valid, xval]) Retreive the null degress of freedom if this model has the attribute, or None otherwise.
null_deviance([train, valid, xval]) Retreive the null deviance if this model has the attribute, or None otherwise.
pprint_coef() Pretty print the coefficents table (includes normalized coefficients)
predict(test_data) Predict on a dataset.
r2([train, valid, xval]) Return the R^2 for this regression model.
residual_degrees_of_freedom([train, valid, xval]) Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
residual_deviance([train, valid, xval]) Retreive the residual deviance if this model has the attribute, or None otherwise.
respmul() Normalization/Standardization multipliers for numeric response
respsub() Normalization/Standardization offsets for numeric response
score_history() Deprecated for scoring_history
scoring_history() Retrieve Model Score History
show() Print innards of model, without regards to type
summary() Print a detailed summary of the model.
varimp([use_pandas]) Pretty print the variable importances, or return them in a list
weights([matrix_id]) Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return.
xval_keys()
return:The model keys for the cross-validated model.
aic(train=False, valid=False, xval=False)

Get the AIC(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the AIC value for the training data.
  • valid – If valid is True, then return the AIC value for the validation data.
  • xval – If xval is True, then return the AIC value for the validation data.
Returns:

The AIC.

auc(train=False, valid=False, xval=False)

Get the AUC(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the AUC value for the training data.
  • valid – If valid is True, then return the AUC value for the validation data.
  • xval – If xval is True, then return the AUC value for the validation data.
Returns:

The AUC.

biases(vector_id=0)

Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return. :return: an H2OFrame which represents the bias vector identified by vector_id

catoffsets()

Categorical offsets for one-hot encoding

coef()
Returns:Return the coefficients for this model.
coef_norm()
Returns:Return the normalized coefficients
deepfeatures(test_data, layer)

Return hidden layer details

Parameters:
  • test_data – Data to create a feature space on
  • layer – 0 index hidden layer
download_pojo(path='')

Download the POJO for this model to the directory specified by path (no trailing slash!). If path is “”, then dump to screen. :param model: Retrieve this model’s scoring POJO. :param path: An absolute path to the directory where POJO should be saved. :return: None

full_parameters None

Get the full specification of all parameters.

Returns:a dictionary of parameters used to build this model.
get_xval_models(key=None)

Return a Model object.

Parameters:key – If None, return all cross-validated models; otherwise return the model that key points to.
Returns:A model or list of models.
giniCoef(train=False, valid=False, xval=False)

Get the Gini Coefficient(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the Gini Coefficient value for the training data.
  • valid – If valid is True, then return the Gini Coefficient value for the validation data.
  • xval – If xval is True, then return the Gini Coefficient value for the cross validation data.
Returns:

The Gini Coefficient for this binomial model.

is_cross_validated()
Returns:True if the model was cross-validated.
logloss(train=False, valid=False, xval=False)

Get the Log Loss(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the Log Loss value for the training data.
  • valid – If valid is True, then return the Log Loss value for the validation data.
  • xval – If xval is True, then return the Log Loss value for the cross validation data.
Returns:

The Log Loss for this binomial model.

mean_residual_deviance(train=False, valid=False, xval=False)

Get the Mean Residual Deviances(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the Mean Residual Deviance value for the training data.
  • valid – If valid is True, then return the Mean Residual Deviance value for the validation data.
  • xval – If xval is True, then return the Mean Residual Deviance value for the cross validation data.
Returns:

The Mean Residual Deviance for this regression model.

model_id None
Returns:Retrieve this model’s identifier.
model_performance(test_data=None, train=False, valid=False)

Generate model metrics for this model on test_data.

Parameters:

test_data: H2OFrame, optional

Data set for which model metrics shall be computed against. Both train and valid arguments are ignored if test_data is not None.

train: boolean, optional

Report the training metrics for the model. If the test_data is the training data, the training metrics are returned.

valid: boolean, optional

Report the validation metrics for the model. If train and valid are True, then it defaults to True.

Returns:

An object of class H2OModelMetrics.

mse(train=False, valid=False, xval=False)

Get the MSE(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:

train : bool, default=True

If train is True, then return the MSE value for the training data.

valid : bool, default=True

If valid is True, then return the MSE value for the validation data.

xval : bool, default=True

If xval is True, then return the MSE value for the cross validation data.

Returns:

The MSE for this regression model.

normmul()

Normalization/Standardization multipliers for numeric predictors

normsub()

Normalization/Standardization offsets for numeric predictors

null_degrees_of_freedom(train=False, valid=False, xval=False)

Retreive the null degress of freedom if this model has the attribute, or None otherwise.

Parameters:
  • train – Get the null dof for the training set. If both train and valid are False, then train is selected by default.
  • valid – Get the null dof for the validation set. If both train and valid are True, then train is selected by default.
Returns:

Return the null dof, or None if it is not present.

null_deviance(train=False, valid=False, xval=False)

Retreive the null deviance if this model has the attribute, or None otherwise.

Param:train Get the null deviance for the training set. If both train and valid are False, then train is selected by default.
Param:valid Get the null deviance for the validation set. If both train and valid are True, then train is selected by default.
Returns:Return the null deviance, or None if it is not present.
params None

Get the parameters and the actual/default values only.

Returns:A dictionary of parameters used to build this model.
pprint_coef()

Pretty print the coefficents table (includes normalized coefficients)

predict(test_data)

Predict on a dataset.

Parameters:

test_data: H2OFrame

Data on which to make predictions.

Returns:

A new H2OFrame of predictions.

r2(train=False, valid=False, xval=False)

Return the R^2 for this regression model.

The R^2 value is defined to be 1 - MSE/var, where var is computed as sigma*sigma.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the R^2 value for the training data.
  • valid – If valid is True, then return the R^2 value for the validation data.
  • xval – If xval is True, then return the R^2 value for the cross validation data.
Returns:

The R^2 for this regression model.

residual_degrees_of_freedom(train=False, valid=False, xval=False)

Retreive the residual degress of freedom if this model has the attribute, or None otherwise.

Parameters:
  • train – Get the residual dof for the training set. If both train and valid are False, then train is selected by default.
  • valid – Get the residual dof for the validation set. If both train and valid are True, then train is selected by default.
Returns:

Return the residual dof, or None if it is not present.

residual_deviance(train=False, valid=False, xval=False)

Retreive the residual deviance if this model has the attribute, or None otherwise.

Parameters:
  • train – Get the residual deviance for the training set. If both train and valid are False, then train is selected by default.
  • valid – Get the residual deviance for the validation set. If both train and valid are True, then train is selected by default.
Returns:

Return the residual deviance, or None if it is not present.

respmul()

Normalization/Standardization multipliers for numeric response

respsub()

Normalization/Standardization offsets for numeric response

score_history()

Deprecated for scoring_history

scoring_history()

Retrieve Model Score History

Returns:The score history as an H2OTwoDimTable or a Pandas DataFrame.
show()

Print innards of model, without regards to type

summary()

Print a detailed summary of the model.

type None

Get the type of model built as a string.

Returns:“classifier” or “regressor” or “unsupervised”
varimp(use_pandas=False)

Pretty print the variable importances, or return them in a list

Parameters:

use_pandas: boolean, optional

If True, then the variable importances will be returned as a pandas data frame.

Returns:

A list or Pandas DataFrame.

weights(matrix_id=0)

Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return. :return: an H2OFrame which represents the weight matrix identified by matrix_id

xval_keys()
Returns:The model keys for the cross-validated model.
xvals None

Return a list of the cross-validated models.

Returns:A list of models
class h2o.model.H2OModelFuture(job, x)

Bases: future.types.newobject.newobject

A class representing a future H2O model (a model that may, or may not, be in the process of being built)

Methods

next()
poll()
poll()

ModelBase

This module implements the base model class. All model things inherit from this class.

class h2o.model.model_base.ModelBase[source]

Bases: future.types.newobject.newobject

Attributes

full_parameters Get the full specification of all parameters.
model_id
return:Retrieve this model’s identifier.
params Get the parameters and the actual/default values only.
type Get the type of model built as a string.
xvals Return a list of the cross-validated models.

Methods

aic([train, valid, xval]) Get the AIC(s).
auc([train, valid, xval]) Get the AUC(s).
biases([vector_id]) Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return.
catoffsets() Categorical offsets for one-hot encoding
coef()
return:Return the coefficients for this model.
coef_norm()
return:Return the normalized coefficients
deepfeatures(test_data, layer) Return hidden layer details
download_pojo([path]) Download the POJO for this model to the directory specified by path (no trailing slash!).
get_xval_models([key]) Return a Model object.
giniCoef([train, valid, xval]) Get the Gini Coefficient(s).
is_cross_validated()
return:True if the model was cross-validated.
logloss([train, valid, xval]) Get the Log Loss(s).
mean_residual_deviance([train, valid, xval]) Get the Mean Residual Deviances(s).
model_performance([test_data, train, valid]) Generate model metrics for this model on test_data.
mse([train, valid, xval]) Get the MSE(s).
next()
normmul() Normalization/Standardization multipliers for numeric predictors
normsub() Normalization/Standardization offsets for numeric predictors
null_degrees_of_freedom([train, valid, xval]) Retreive the null degress of freedom if this model has the attribute, or None otherwise.
null_deviance([train, valid, xval]) Retreive the null deviance if this model has the attribute, or None otherwise.
pprint_coef() Pretty print the coefficents table (includes normalized coefficients)
predict(test_data) Predict on a dataset.
r2([train, valid, xval]) Return the R^2 for this regression model.
residual_degrees_of_freedom([train, valid, xval]) Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
residual_deviance([train, valid, xval]) Retreive the residual deviance if this model has the attribute, or None otherwise.
respmul() Normalization/Standardization multipliers for numeric response
respsub() Normalization/Standardization offsets for numeric response
score_history() Deprecated for scoring_history
scoring_history() Retrieve Model Score History
show() Print innards of model, without regards to type
summary() Print a detailed summary of the model.
varimp([use_pandas]) Pretty print the variable importances, or return them in a list
weights([matrix_id]) Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return.
xval_keys()
return:The model keys for the cross-validated model.
aic(train=False, valid=False, xval=False)[source]

Get the AIC(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the AIC value for the training data.
  • valid – If valid is True, then return the AIC value for the validation data.
  • xval – If xval is True, then return the AIC value for the validation data.
Returns:

The AIC.

auc(train=False, valid=False, xval=False)[source]

Get the AUC(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the AUC value for the training data.
  • valid – If valid is True, then return the AUC value for the validation data.
  • xval – If xval is True, then return the AUC value for the validation data.
Returns:

The AUC.

biases(vector_id=0)[source]

Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return. :return: an H2OFrame which represents the bias vector identified by vector_id

catoffsets()[source]

Categorical offsets for one-hot encoding

coef()[source]
Returns:Return the coefficients for this model.
coef_norm()[source]
Returns:Return the normalized coefficients
deepfeatures(test_data, layer)[source]

Return hidden layer details

Parameters:
  • test_data – Data to create a feature space on
  • layer – 0 index hidden layer
download_pojo(path='')[source]

Download the POJO for this model to the directory specified by path (no trailing slash!). If path is “”, then dump to screen. :param model: Retrieve this model’s scoring POJO. :param path: An absolute path to the directory where POJO should be saved. :return: None

full_parameters None[source]

Get the full specification of all parameters.

Returns:a dictionary of parameters used to build this model.
get_xval_models(key=None)[source]

Return a Model object.

Parameters:key – If None, return all cross-validated models; otherwise return the model that key points to.
Returns:A model or list of models.
giniCoef(train=False, valid=False, xval=False)[source]

Get the Gini Coefficient(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the Gini Coefficient value for the training data.
  • valid – If valid is True, then return the Gini Coefficient value for the validation data.
  • xval – If xval is True, then return the Gini Coefficient value for the cross validation data.
Returns:

The Gini Coefficient for this binomial model.

is_cross_validated()[source]
Returns:True if the model was cross-validated.
logloss(train=False, valid=False, xval=False)[source]

Get the Log Loss(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the Log Loss value for the training data.
  • valid – If valid is True, then return the Log Loss value for the validation data.
  • xval – If xval is True, then return the Log Loss value for the cross validation data.
Returns:

The Log Loss for this binomial model.

mean_residual_deviance(train=False, valid=False, xval=False)[source]

Get the Mean Residual Deviances(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the Mean Residual Deviance value for the training data.
  • valid – If valid is True, then return the Mean Residual Deviance value for the validation data.
  • xval – If xval is True, then return the Mean Residual Deviance value for the cross validation data.
Returns:

The Mean Residual Deviance for this regression model.

model_id None[source]
Returns:Retrieve this model’s identifier.
model_performance(test_data=None, train=False, valid=False)[source]

Generate model metrics for this model on test_data.

Parameters:

test_data: H2OFrame, optional

Data set for which model metrics shall be computed against. Both train and valid arguments are ignored if test_data is not None.

train: boolean, optional

Report the training metrics for the model. If the test_data is the training data, the training metrics are returned.

valid: boolean, optional

Report the validation metrics for the model. If train and valid are True, then it defaults to True.

Returns:

An object of class H2OModelMetrics.

mse(train=False, valid=False, xval=False)[source]

Get the MSE(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:

train : bool, default=True

If train is True, then return the MSE value for the training data.

valid : bool, default=True

If valid is True, then return the MSE value for the validation data.

xval : bool, default=True

If xval is True, then return the MSE value for the cross validation data.

Returns:

The MSE for this regression model.

normmul()[source]

Normalization/Standardization multipliers for numeric predictors

normsub()[source]

Normalization/Standardization offsets for numeric predictors

null_degrees_of_freedom(train=False, valid=False, xval=False)[source]

Retreive the null degress of freedom if this model has the attribute, or None otherwise.

Parameters:
  • train – Get the null dof for the training set. If both train and valid are False, then train is selected by default.
  • valid – Get the null dof for the validation set. If both train and valid are True, then train is selected by default.
Returns:

Return the null dof, or None if it is not present.

null_deviance(train=False, valid=False, xval=False)[source]

Retreive the null deviance if this model has the attribute, or None otherwise.

Param:train Get the null deviance for the training set. If both train and valid are False, then train is selected by default.
Param:valid Get the null deviance for the validation set. If both train and valid are True, then train is selected by default.
Returns:Return the null deviance, or None if it is not present.
params None[source]

Get the parameters and the actual/default values only.

Returns:A dictionary of parameters used to build this model.
pprint_coef()[source]

Pretty print the coefficents table (includes normalized coefficients)

predict(test_data)[source]

Predict on a dataset.

Parameters:

test_data: H2OFrame

Data on which to make predictions.

Returns:

A new H2OFrame of predictions.

r2(train=False, valid=False, xval=False)[source]

Return the R^2 for this regression model.

The R^2 value is defined to be 1 - MSE/var, where var is computed as sigma*sigma.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the R^2 value for the training data.
  • valid – If valid is True, then return the R^2 value for the validation data.
  • xval – If xval is True, then return the R^2 value for the cross validation data.
Returns:

The R^2 for this regression model.

residual_degrees_of_freedom(train=False, valid=False, xval=False)[source]

Retreive the residual degress of freedom if this model has the attribute, or None otherwise.

Parameters:
  • train – Get the residual dof for the training set. If both train and valid are False, then train is selected by default.
  • valid – Get the residual dof for the validation set. If both train and valid are True, then train is selected by default.
Returns:

Return the residual dof, or None if it is not present.

residual_deviance(train=False, valid=False, xval=False)[source]

Retreive the residual deviance if this model has the attribute, or None otherwise.

Parameters:
  • train – Get the residual deviance for the training set. If both train and valid are False, then train is selected by default.
  • valid – Get the residual deviance for the validation set. If both train and valid are True, then train is selected by default.
Returns:

Return the residual deviance, or None if it is not present.

respmul()[source]

Normalization/Standardization multipliers for numeric response

respsub()[source]

Normalization/Standardization offsets for numeric response

score_history()[source]

Deprecated for scoring_history

scoring_history()[source]

Retrieve Model Score History

Returns:The score history as an H2OTwoDimTable or a Pandas DataFrame.
show()[source]

Print innards of model, without regards to type

summary()[source]

Print a detailed summary of the model.

type None[source]

Get the type of model built as a string.

Returns:“classifier” or “regressor” or “unsupervised”
varimp(use_pandas=False)[source]

Pretty print the variable importances, or return them in a list

Parameters:

use_pandas: boolean, optional

If True, then the variable importances will be returned as a pandas data frame.

Returns:

A list or Pandas DataFrame.

weights(matrix_id=0)[source]

Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return. :return: an H2OFrame which represents the weight matrix identified by matrix_id

xval_keys()[source]
Returns:The model keys for the cross-validated model.
xvals None[source]

Return a list of the cross-validated models.

Returns:A list of models

Binomial Classification c

class h2o.model.binomial.H2OBinomialModel[source]

Bases: h2o.model.model_base.ModelBase

Attributes

full_parameters Get the full specification of all parameters.
model_id
return:Retrieve this model’s identifier.
params Get the parameters and the actual/default values only.
type Get the type of model built as a string.
xvals Return a list of the cross-validated models.

Methods

F0point5([thresholds, train, valid, xval]) Get the F0.5 for a set of thresholds.
F1([thresholds, train, valid, xval]) Get the F1 value for a set of thresholds
F2([thresholds, train, valid, xval]) Get the F2 for a set of thresholds.
accuracy([thresholds, train, valid, xval]) Get the accuracy for a set of thresholds.
aic([train, valid, xval]) Get the AIC(s).
auc([train, valid, xval]) Get the AUC(s).
biases([vector_id]) Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return.
catoffsets() Categorical offsets for one-hot encoding
coef()
return:Return the coefficients for this model.
coef_norm()
return:Return the normalized coefficients
confusion_matrix([metrics, thresholds, ...]) Get the confusion matrix for the specified metrics/thresholds
deepfeatures(test_data, layer) Return hidden layer details
download_pojo([path]) Download the POJO for this model to the directory specified by path (no trailing slash!).
error([thresholds, train, valid, xval]) Get the error for a set of thresholds.
fallout([thresholds, train, valid, xval]) Get the Fallout (AKA False Positive Rate) for a set of thresholds.
find_idx_by_threshold(threshold[, train, ...]) Retrieve the index in this metric’s threshold list at which the given threshold is located.
find_threshold_by_max_metric(metric[, ...]) If all are False (default), then return the training metric value.
fnr([thresholds, train, valid, xval]) Get the False Negative Rates for a set of thresholds.
fpr([thresholds, train, valid, xval]) Get the False Positive Rates for a set of thresholds.
gains_lift([train, valid, xval]) Get the Gains/Lift table for the specified metrics
get_xval_models([key]) Return a Model object.
giniCoef([train, valid, xval]) Get the Gini Coefficient(s).
is_cross_validated()
return:True if the model was cross-validated.
logloss([train, valid, xval]) Get the Log Loss(s).
max_per_class_error([thresholds, train, ...]) Get the max per class error for a set of thresholds.
mcc([thresholds, train, valid, xval]) Get the mcc for a set of thresholds.
mean_residual_deviance([train, valid, xval]) Get the Mean Residual Deviances(s).
metric(metric[, thresholds, train, valid, xval]) Get the metric value for a set of thresholds.
missrate([thresholds, train, valid, xval]) Get the miss rate (AKA False Negative Rate) for a set of thresholds.
model_performance([test_data, train, valid]) Generate model metrics for this model on test_data.
mse([train, valid, xval]) Get the MSE(s).
next()
normmul() Normalization/Standardization multipliers for numeric predictors
normsub() Normalization/Standardization offsets for numeric predictors
null_degrees_of_freedom([train, valid, xval]) Retreive the null degress of freedom if this model has the attribute, or None otherwise.
null_deviance([train, valid, xval]) Retreive the null deviance if this model has the attribute, or None otherwise.
plot([timestep, metric]) Plots training set (and validation set if available) scoring history for an H2OBinomialModel. The timestep and metric
pprint_coef() Pretty print the coefficents table (includes normalized coefficients)
precision([thresholds, train, valid, xval]) Get the precision for a set of thresholds.
predict(test_data) Predict on a dataset.
r2([train, valid, xval]) Return the R^2 for this regression model.
recall([thresholds, train, valid, xval]) Get the Recall (AKA True Positive Rate) for a set of thresholds.
residual_degrees_of_freedom([train, valid, xval]) Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
residual_deviance([train, valid, xval]) Retreive the residual deviance if this model has the attribute, or None otherwise.
respmul() Normalization/Standardization multipliers for numeric response
respsub() Normalization/Standardization offsets for numeric response
roc([train, valid, xval]) Return the coordinates of the ROC curve for a given set of data,
score_history() Deprecated for scoring_history
scoring_history() Retrieve Model Score History
sensitivity([thresholds, train, valid, xval]) Get the sensitivity (AKA True Positive Rate or Recall) for a set of thresholds.
show() Print innards of model, without regards to type
specificity([thresholds, train, valid, xval]) Get the specificity (AKA True Negative Rate) for a set of thresholds.
summary() Print a detailed summary of the model.
tnr([thresholds, train, valid, xval]) Get the True Negative Rate for a set of thresholds.
tpr([thresholds, train, valid, xval]) Get the True Positive Rate for a set of thresholds.
varimp([use_pandas]) Pretty print the variable importances, or return them in a list
weights([matrix_id]) Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return.
xval_keys()
return:The model keys for the cross-validated model.
F0point5(thresholds=None, train=False, valid=False, xval=False)[source]

Get the F0.5 for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the F0point5 value for the training data.
  • valid – If valid is True, then return the F0point5 value for the validation data.
  • xval – If xval is True, then return the F0point5 value for the cross validation data.
Returns:

The F0point5 for this binomial model.

F1(thresholds=None, train=False, valid=False, xval=False)[source]

Get the F1 value for a set of thresholds

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.

Parameters:

thresholds : list, optional

If None, then the thresholds in this set of metrics will be used.

train : bool, optional

If True, return the F1 value for the training data.

valid : bool, optional

If True, return the F1 value for the validation data.

xval : bool, optional

If True, return the F1 value for each of the cross-validated splits.

Returns:

The F1 values for the specified key(s).

Examples

>>> import h2o as ml
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator
>>> ml.init()
>>> rows=[[1,2,3,4,0],[2,1,2,4,1],[2,1,4,2,1],[0,1,2,34,1],[2,3,4,1,0]]*50
>>> fr = ml.H2OFrame(rows)
>>> fr[4] = fr[4].asfactor()
>>> model = H2OGradientBoostingEstimator(ntrees=10, max_depth=10, nfolds=4)
>>> model.train(x=range(4), y=4, training_frame=fr)
>>> model.F1(train=True)
F2(thresholds=None, train=False, valid=False, xval=False)[source]

Get the F2 for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the F2 value for the training data.
  • valid – If valid is True, then return the F2 value for the validation data.
  • xval – If xval is True, then return the F2 value for the cross validation data.
Returns:

The F2 for this binomial model.

accuracy(thresholds=None, train=False, valid=False, xval=False)[source]

Get the accuracy for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the accuracy value for the training data.
  • valid – If valid is True, then return the accuracy value for the validation data.
  • xval – If xval is True, then return the accuracy value for the cross validation data.
Returns:

The accuracy for this binomial model.

confusion_matrix(metrics=None, thresholds=None, train=False, valid=False, xval=False)[source]

Get the confusion matrix for the specified metrics/thresholds If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • metrics – A string (or list of strings) in {“min_per_class_accuracy”, “absolute_MCC”, “tnr”, “fnr”, “fpr”, “tpr”, “precision”, “accuracy”, “f0point5”, “f2”, “f1”}
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the confusion matrix value for the training data.
  • valid – If valid is True, then return the confusion matrix value for the validation data.
  • xval – If xval is True, then return the confusion matrix value for the cross validation data.
Returns:

The confusion matrix for this binomial model.

error(thresholds=None, train=False, valid=False, xval=False)[source]

Get the error for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the error value for the training data.
  • valid – If valid is True, then return the error value for the validation data.
  • xval – If xval is True, then return the error value for the cross validation data.
Returns:

The error for this binomial model.

fallout(thresholds=None, train=False, valid=False, xval=False)[source]

Get the Fallout (AKA False Positive Rate) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the fallout value for the training data.
  • valid – If valid is True, then return the fallout value for the validation data.
  • xval – If xval is True, then return the fallout value for the cross validation data.
Returns:

The fallout for this binomial model.

find_idx_by_threshold(threshold, train=False, valid=False, xval=False)[source]

Retrieve the index in this metric’s threshold list at which the given threshold is located. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the idx_by_threshold for the training data.
  • valid – If valid is True, then return the idx_by_threshold for the validation data.
  • xval – If xval is True, then return the idx_by_threshold for the cross validation data.
Returns:

The idx_by_threshold for this binomial model.

find_threshold_by_max_metric(metric, train=False, valid=False, xval=False)[source]

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the threshold_by_max_metric value for the training data.
  • valid – If valid is True, then return the threshold_by_max_metric value for the validation data.
  • xval – If xval is True, then return the threshold_by_max_metric value for the cross validation data.
Returns:

The threshold_by_max_metric for this binomial model.

fnr(thresholds=None, train=False, valid=False, xval=False)[source]

Get the False Negative Rates for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the fnr value for the training data.
  • valid – If valid is True, then return the fnr value for the validation data.
  • xval – If xval is True, then return the fnr value for the cross validation data.
Returns:

The fnr for this binomial model.

fpr(thresholds=None, train=False, valid=False, xval=False)[source]

Get the False Positive Rates for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the fpr value for the training data.
  • valid – If valid is True, then return the fpr value for the validation data.
  • xval – If xval is True, then return the fpr value for the cross validation data.
Returns:

The fpr for this binomial model.

gains_lift(train=False, valid=False, xval=False)[source]

Get the Gains/Lift table for the specified metrics If all are False (default), then return the training metric Gains/Lift table. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the Gains/Lift table for the training data.
  • valid – If valid is True, then return the Gains/Lift table for the validation data.
  • xval – If xval is True, then return the Gains/Lift table for the cross validation data.
Returns:

The Gains/Lift table for this binomial model.

max_per_class_error(thresholds=None, train=False, valid=False, xval=False)[source]

Get the max per class error for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the max_per_class_error value for the training data.
  • valid – If valid is True, then return the max_per_class_error value for the validation data.
  • xval – If xval is True, then return the max_per_class_error value for the cross validation data.
Returns:

The max_per_class_error for this binomial model.

mcc(thresholds=None, train=False, valid=False, xval=False)[source]

Get the mcc for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the mcc value for the training data.
  • valid – If valid is True, then return the mcc value for the validation data.
  • xval – If xval is True, then return the mcc value for the cross validation data.
Returns:

The mcc for this binomial model.

metric(metric, thresholds=None, train=False, valid=False, xval=False)[source]

Get the metric value for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the metrics for the training data.
  • valid – If valid is True, then return the metrics for the validation data.
  • xval – If xval is True, then return the metrics for the cross validation data.
Returns:

The metrics for this binomial model.

missrate(thresholds=None, train=False, valid=False, xval=False)[source]

Get the miss rate (AKA False Negative Rate) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the missrate value for the training data.
  • valid – If valid is True, then return the missrate value for the validation data.
  • xval – If xval is True, then return the missrate value for the cross validation data.
Returns:

The missrate for this binomial model.

plot(timestep='AUTO', metric='AUTO', **kwargs)[source]

Plots training set (and validation set if available) scoring history for an H2OBinomialModel. The timestep and metric arguments are restricted to what is available in its scoring history.

Parameters:
  • timestep – A unit of measurement for the x-axis.
  • metric – A unit of measurement for the y-axis.
Returns:

A scoring history plot.

precision(thresholds=None, train=False, valid=False, xval=False)[source]

Get the precision for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the precision value for the training data.
  • valid – If valid is True, then return the precision value for the validation data.
  • xval – If xval is True, then return the precision value for the cross validation data.
Returns:

The precision for this binomial model.

recall(thresholds=None, train=False, valid=False, xval=False)[source]

Get the Recall (AKA True Positive Rate) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the recall value for the training data.
  • valid – If valid is True, then return the recall value for the validation data.
  • xval – If xval is True, then return the recall value for the cross validation data.
Returns:

The recall for this binomial model.

roc(train=False, valid=False, xval=False)[source]

Return the coordinates of the ROC curve for a given set of data, as a two-tuple containing the false positive rates as a list and true positive rates as a list. If all are False (default), then return is the training data. If more than one ROC curve is requested, the data is returned as a dictionary of two-tuples. :param train: If train is true, then return the ROC coordinates for the training data. :param valid: If valid is true, then return the ROC coordinates for the validation data. :param xval: If xval is true, then return the ROC coordinates for the cross validation data. :return rocs_cooridinates: the true cooridinates of the roc curve.

sensitivity(thresholds=None, train=False, valid=False, xval=False)[source]

Get the sensitivity (AKA True Positive Rate or Recall) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the sensitivity value for the training data.
  • valid – If valid is True, then return the sensitivity value for the validation data.
  • xval – If xval is True, then return the sensitivity value for the cross validation data.
Returns:

The sensitivity for this binomial model.

specificity(thresholds=None, train=False, valid=False, xval=False)[source]

Get the specificity (AKA True Negative Rate) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the specificity value for the training data.
  • valid – If valid is True, then return the specificity value for the validation data.
  • xval – If xval is True, then return the specificity value for the cross validation data.
Returns:

The specificity for this binomial model.

tnr(thresholds=None, train=False, valid=False, xval=False)[source]

Get the True Negative Rate for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the tnr value for the training data.
  • valid – If valid is True, then return the tnr value for the validation data.
  • xval – If xval is True, then return the tnr value for the cross validation data.
Returns:

The F1 for this binomial model.

tpr(thresholds=None, train=False, valid=False, xval=False)[source]

Get the True Positive Rate for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the tpr value for the training data.
  • valid – If valid is True, then return the tpr value for the validation data.
  • xval – If xval is True, then return the tpr value for the cross validation data.
Returns:

The tpr for this binomial model.

Multinomial Classification

class h2o.model.multinomial.H2OMultinomialModel[source]

Bases: h2o.model.model_base.ModelBase

Attributes

full_parameters Get the full specification of all parameters.
model_id
return:Retrieve this model’s identifier.
params Get the parameters and the actual/default values only.
type Get the type of model built as a string.
xvals Return a list of the cross-validated models.

Methods

aic([train, valid, xval]) Get the AIC(s).
auc([train, valid, xval]) Get the AUC(s).
biases([vector_id]) Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return.
catoffsets() Categorical offsets for one-hot encoding
coef()
return:Return the coefficients for this model.
coef_norm()
return:Return the normalized coefficients
confusion_matrix(data) Returns a confusion matrix based of H2O’s default prediction threshold for a dataset
deepfeatures(test_data, layer) Return hidden layer details
download_pojo([path]) Download the POJO for this model to the directory specified by path (no trailing slash!).
get_xval_models([key]) Return a Model object.
giniCoef([train, valid, xval]) Get the Gini Coefficient(s).
hit_ratio_table([train, valid, xval]) Retrieve the Hit Ratios
is_cross_validated()
return:True if the model was cross-validated.
logloss([train, valid, xval]) Get the Log Loss(s).
mean_residual_deviance([train, valid, xval]) Get the Mean Residual Deviances(s).
model_performance([test_data, train, valid]) Generate model metrics for this model on test_data.
mse([train, valid, xval]) Get the MSE(s).
next()
normmul() Normalization/Standardization multipliers for numeric predictors
normsub() Normalization/Standardization offsets for numeric predictors
null_degrees_of_freedom([train, valid, xval]) Retreive the null degress of freedom if this model has the attribute, or None otherwise.
null_deviance([train, valid, xval]) Retreive the null deviance if this model has the attribute, or None otherwise.
plot([timestep, metric]) Plots training set (and validation set if available) scoring history for an H2OMultinomialModel. The timestep and metric
pprint_coef() Pretty print the coefficents table (includes normalized coefficients)
predict(test_data) Predict on a dataset.
r2([train, valid, xval]) Return the R^2 for this regression model.
residual_degrees_of_freedom([train, valid, xval]) Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
residual_deviance([train, valid, xval]) Retreive the residual deviance if this model has the attribute, or None otherwise.
respmul() Normalization/Standardization multipliers for numeric response
respsub() Normalization/Standardization offsets for numeric response
score_history() Deprecated for scoring_history
scoring_history() Retrieve Model Score History
show() Print innards of model, without regards to type
summary() Print a detailed summary of the model.
varimp([use_pandas]) Pretty print the variable importances, or return them in a list
weights([matrix_id]) Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return.
xval_keys()
return:The model keys for the cross-validated model.
confusion_matrix(data)[source]

Returns a confusion matrix based of H2O’s default prediction threshold for a dataset

hit_ratio_table(train=False, valid=False, xval=False)[source]

Retrieve the Hit Ratios

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the R^2 value for the training data.
  • valid – If valid is True, then return the R^2 value for the validation data.
  • xval – If xval is True, then return the R^2 value for the cross validation data.
Returns:

The R^2 for this regression model.

plot(timestep='AUTO', metric='AUTO', **kwargs)[source]

Plots training set (and validation set if available) scoring history for an H2OMultinomialModel. The timestep and metric arguments are restricted to what is available in its scoring history.

Parameters:
  • timestep – A unit of measurement for the x-axis.
  • metric – A unit of measurement for the y-axis.
Returns:

A scoring history plot.

Regression

class h2o.model.regression.H2ORegressionModel[source]

Bases: h2o.model.model_base.ModelBase

Attributes

full_parameters Get the full specification of all parameters.
model_id
return:Retrieve this model’s identifier.
params Get the parameters and the actual/default values only.
type Get the type of model built as a string.
xvals Return a list of the cross-validated models.

Methods

aic([train, valid, xval]) Get the AIC(s).
auc([train, valid, xval]) Get the AUC(s).
biases([vector_id]) Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return.
catoffsets() Categorical offsets for one-hot encoding
coef()
return:Return the coefficients for this model.
coef_norm()
return:Return the normalized coefficients
deepfeatures(test_data, layer) Return hidden layer details
download_pojo([path]) Download the POJO for this model to the directory specified by path (no trailing slash!).
get_xval_models([key]) Return a Model object.
giniCoef([train, valid, xval]) Get the Gini Coefficient(s).
is_cross_validated()
return:True if the model was cross-validated.
logloss([train, valid, xval]) Get the Log Loss(s).
mean_residual_deviance([train, valid, xval]) Get the Mean Residual Deviances(s).
model_performance([test_data, train, valid]) Generate model metrics for this model on test_data.
mse([train, valid, xval]) Get the MSE(s).
next()
normmul() Normalization/Standardization multipliers for numeric predictors
normsub() Normalization/Standardization offsets for numeric predictors
null_degrees_of_freedom([train, valid, xval]) Retreive the null degress of freedom if this model has the attribute, or None otherwise.
null_deviance([train, valid, xval]) Retreive the null deviance if this model has the attribute, or None otherwise.
plot([timestep, metric]) Plots training set (and validation set if available) scoring history for an H2ORegressionModel. The timestep and metric
pprint_coef() Pretty print the coefficents table (includes normalized coefficients)
predict(test_data) Predict on a dataset.
r2([train, valid, xval]) Return the R^2 for this regression model.
residual_degrees_of_freedom([train, valid, xval]) Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
residual_deviance([train, valid, xval]) Retreive the residual deviance if this model has the attribute, or None otherwise.
respmul() Normalization/Standardization multipliers for numeric response
respsub() Normalization/Standardization offsets for numeric response
score_history() Deprecated for scoring_history
scoring_history() Retrieve Model Score History
show() Print innards of model, without regards to type
summary() Print a detailed summary of the model.
varimp([use_pandas]) Pretty print the variable importances, or return them in a list
weights([matrix_id]) Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return.
xval_keys()
return:The model keys for the cross-validated model.
plot(timestep='AUTO', metric='AUTO', **kwargs)[source]

Plots training set (and validation set if available) scoring history for an H2ORegressionModel. The timestep and metric arguments are restricted to what is available in its scoring history.

Parameters:
  • timestep – A unit of measurement for the x-axis.
  • metric – A unit of measurement for the y-axis.
Returns:

A scoring history plot.

h2o.model.regression.h2o_explained_variance_score(y_actual, y_predicted, weights=None)[source]

Explained variance regression score function

Parameters:
  • y_actual – H2OFrame of actual response.
  • y_predicted – H2OFrame of predicted response.
  • weights – (Optional) sample weights
Returns:

the explained variance score (float)

h2o.model.regression.h2o_mean_absolute_error(y_actual, y_predicted, weights=None)[source]

Mean absolute error regression loss.

Parameters:
  • y_actual – H2OFrame of actual response.
  • y_predicted – H2OFrame of predicted response.
  • weights – (Optional) sample weights
Returns:

loss (float) (best is 0.0)

h2o.model.regression.h2o_mean_squared_error(y_actual, y_predicted, weights=None)[source]

Mean squared error regression loss

Parameters:
  • y_actual – H2OFrame of actual response.
  • y_predicted – H2OFrame of predicted response.
  • weights – (Optional) sample weights
Returns:

loss (float) (best is 0.0)

h2o.model.regression.h2o_median_absolute_error(y_actual, y_predicted)[source]

Median absolute error regression loss

Parameters:
  • y_actual – H2OFrame of actual response.
  • y_predicted – H2OFrame of predicted response.
Returns:

loss (float) (best is 0.0)

h2o.model.regression.h2o_r2_score(y_actual, y_predicted, weights=1.0)[source]

R^2 (coefficient of determination) regression score function

Parameters:
  • y_actual – H2OFrame of actual response.
  • y_predicted – H2OFrame of predicted response.
  • weights – (Optional) sample weights
Returns:

R^2 (float) (best is 1.0, lower is worse)

Clustering Methods

class h2o.model.clustering.H2OClusteringModel[source]

Bases: h2o.model.model_base.ModelBase

Attributes

full_parameters Get the full specification of all parameters.
model_id
return:Retrieve this model’s identifier.
params Get the parameters and the actual/default values only.
type Get the type of model built as a string.
xvals Return a list of the cross-validated models.

Methods

aic([train, valid, xval]) Get the AIC(s).
auc([train, valid, xval]) Get the AUC(s).
betweenss([train, valid, xval]) Get the between cluster sum of squares.
biases([vector_id]) Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return.
catoffsets() Categorical offsets for one-hot encoding
centers()
Returns:
centers_std()
Returns:
centroid_stats([train, valid, xval]) Get the centroid statistics for each cluster.
coef()
return:Return the coefficients for this model.
coef_norm()
return:Return the normalized coefficients
deepfeatures(test_data, layer) Return hidden layer details
download_pojo([path]) Download the POJO for this model to the directory specified by path (no trailing slash!).
get_xval_models([key]) Return a Model object.
giniCoef([train, valid, xval]) Get the Gini Coefficient(s).
is_cross_validated()
return:True if the model was cross-validated.
logloss([train, valid, xval]) Get the Log Loss(s).
mean_residual_deviance([train, valid, xval]) Get the Mean Residual Deviances(s).
model_performance([test_data, train, valid]) Generate model metrics for this model on test_data.
mse([train, valid, xval]) Get the MSE(s).
next()
normmul() Normalization/Standardization multipliers for numeric predictors
normsub() Normalization/Standardization offsets for numeric predictors
null_degrees_of_freedom([train, valid, xval]) Retreive the null degress of freedom if this model has the attribute, or None otherwise.
null_deviance([train, valid, xval]) Retreive the null deviance if this model has the attribute, or None otherwise.
num_iterations() Get the number of iterations that it took to converge or reach max iterations.
pprint_coef() Pretty print the coefficents table (includes normalized coefficients)
predict(test_data) Predict on a dataset.
r2([train, valid, xval]) Return the R^2 for this regression model.
residual_degrees_of_freedom([train, valid, xval]) Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
residual_deviance([train, valid, xval]) Retreive the residual deviance if this model has the attribute, or None otherwise.
respmul() Normalization/Standardization multipliers for numeric response
respsub() Normalization/Standardization offsets for numeric response
score_history() Deprecated for scoring_history
scoring_history() Retrieve Model Score History
show() Print innards of model, without regards to type
size([train, valid, xval]) Get the sizes of each cluster.
summary() Print a detailed summary of the model.
tot_withinss([train, valid, xval]) Get the total within cluster sum of squares.
totss([train, valid, xval]) Get the total sum of squares.
varimp([use_pandas]) Pretty print the variable importances, or return them in a list
weights([matrix_id]) Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return.
withinss([train, valid, xval]) Get the within cluster sum of squares for each cluster.
xval_keys()
return:The model keys for the cross-validated model.
betweenss(train=False, valid=False, xval=False)[source]

Get the between cluster sum of squares.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.

Parameters:

train : bool, optional

If True, then return the between cluster sum of squares value for the training data.

valid : bool, optional

If True, then return the between cluster sum of squares value for the validation data.

xval : bool, optional

If True, then return the between cluster sum of squares value for each of the cross-validated splits.

Returns:

Returns the between sum of squares values for the specified key(s).

centers()[source]
Returns:The centers for the KMeans model.
centers_std()[source]
Returns:The standardized centers for the kmeans model.
centroid_stats(train=False, valid=False, xval=False)[source]

Get the centroid statistics for each cluster.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.

Parameters:

train : bool, optional

If True, then return the centroid statistics for the training data.

valid : bool, optional

If True, then return the centroid statistics for the validation data.

xval : bool, optional

If True, then return the centroid statistics for each of the cross-validated splits.

Returns:

Returns the centroid statistics for the specified key(s).

num_iterations()[source]

Get the number of iterations that it took to converge or reach max iterations.

Returns:The number of iterations (integer).
size(train=False, valid=False, xval=False)[source]

Get the sizes of each cluster.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:

train : bool, optional

If True, then return cluster sizes for the training data.

valid : bool, optional

If True, then return the cluster sizes for the validation data.

xval : bool, optional

If True, then return the cluster sizes for each of the cross-validated splits.

Returns:

Returns the cluster sizes for the specified key(s).

tot_withinss(train=False, valid=False, xval=False)[source]

Get the total within cluster sum of squares.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.

Parameters:

train : bool, optional

If True, then return the total within cluster sum of squares value for the training data.

valid : bool, optional

If True, then return the total within cluster sum of squares value for the validation data.

xval : bool, optional

If True, then return the total within cluster sum of squares value for each of the cross-validated splits.

Returns:

Returns the total within cluster sum of squares values for the specified key(s).

totss(train=False, valid=False, xval=False)[source]

Get the total sum of squares.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.

Parameters:

train : bool, optional

If True, then return the total sum of squares value for the training data.

valid : bool, optional

If True, then return the total sum of squares value for the validation data.

xval : bool, optional

If True, then return the total sum of squares value for each of the cross-validated splits.

Returns:

Returns the total sum of squares values for the specified key(s).

withinss(train=False, valid=False, xval=False)[source]

Get the within cluster sum of squares for each cluster.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.

Parameters:

train : bool, optional

If True, then return the within cluster sum of squares value for the training data.

valid : bool, optional

If True, then return the within cluster sum of squares value for the validation data.

xval : bool, optional

If True, then return the within cluster sum of squares value for each of the cross-validated splits.

Returns:

Returns the total sum of squares values for the specified key(s).

AutoEncoders

class h2o.model.autoencoder.H2OAutoEncoderModel[source]

Bases: h2o.model.model_base.ModelBase

Class for AutoEncoder models.

Attributes

full_parameters Get the full specification of all parameters.
model_id
return:Retrieve this model’s identifier.
params Get the parameters and the actual/default values only.
type Get the type of model built as a string.
xvals Return a list of the cross-validated models.

Methods

aic([train, valid, xval]) Get the AIC(s).
anomaly(test_data[, per_feature]) Obtain the reconstruction error for the input test_data.
auc([train, valid, xval]) Get the AUC(s).
biases([vector_id]) Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return.
catoffsets() Categorical offsets for one-hot encoding
coef()
return:Return the coefficients for this model.
coef_norm()
return:Return the normalized coefficients
deepfeatures(test_data, layer) Return hidden layer details
download_pojo([path]) Download the POJO for this model to the directory specified by path (no trailing slash!).
get_xval_models([key]) Return a Model object.
giniCoef([train, valid, xval]) Get the Gini Coefficient(s).
is_cross_validated()
return:True if the model was cross-validated.
logloss([train, valid, xval]) Get the Log Loss(s).
mean_residual_deviance([train, valid, xval]) Get the Mean Residual Deviances(s).
model_performance([test_data, train, valid]) Generate model metrics for this model on test_data.
mse([train, valid, xval]) Get the MSE(s).
next()
normmul() Normalization/Standardization multipliers for numeric predictors
normsub() Normalization/Standardization offsets for numeric predictors
null_degrees_of_freedom([train, valid, xval]) Retreive the null degress of freedom if this model has the attribute, or None otherwise.
null_deviance([train, valid, xval]) Retreive the null deviance if this model has the attribute, or None otherwise.
pprint_coef() Pretty print the coefficents table (includes normalized coefficients)
predict(test_data) Predict on a dataset.
r2([train, valid, xval]) Return the R^2 for this regression model.
residual_degrees_of_freedom([train, valid, xval]) Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
residual_deviance([train, valid, xval]) Retreive the residual deviance if this model has the attribute, or None otherwise.
respmul() Normalization/Standardization multipliers for numeric response
respsub() Normalization/Standardization offsets for numeric response
score_history() Deprecated for scoring_history
scoring_history() Retrieve Model Score History
show() Print innards of model, without regards to type
summary() Print a detailed summary of the model.
varimp([use_pandas]) Pretty print the variable importances, or return them in a list
weights([matrix_id]) Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return.
xval_keys()
return:The model keys for the cross-validated model.
anomaly(test_data, per_feature=False)[source]

Obtain the reconstruction error for the input test_data.

Parameters:

test_data : H2OFrame

The dataset upon which the reconstruction error is computed.

per_feature : bool

Whether to return the square reconstruction error per feature. Otherwise, return the mean square error.

Returns:

Return the reconstruction error.