Modeling in H2O

Modeling In H2O

ModelBase

This module implements the base model class. All model things inherit from this class.

class h2o.model.model_base.ModelBase(dest_key, model_json, metrics_class)[source]

Bases: object

aic(train=False, valid=False, xval=False)[source]

Get the AIC(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the AIC value for the training data.
  • valid – If valid is True, then return the AIC value for the validation data.
  • xval – If xval is True, then return the AIC value for the validation data.
Returns:

The AIC.

auc(train=False, valid=False, xval=False)[source]

Get the AUC(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the AUC value for the training data.
  • valid – If valid is True, then return the AUC value for the validation data.
  • xval – If xval is True, then return the AUC value for the validation data.
Returns:

The AUC.

biases(vector_id=0)[source]

Return the frame for the respective bias vector :param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return. :return: an H2OFrame which represents the bias vector identified by vector_id

coef()[source]
Returns:Return the coefficients for this model.
coef_norm()[source]
Returns:Return the normalized coefficients
deepfeatures(test_data, layer)[source]

Return hidden layer details

Parameters:
  • test_data – Data to create a feature space on
  • layer – 0 index hidden layer
download_pojo(path='')[source]

Download the POJO for this model to the directory specified by path (no trailing slash!). If path is “”, then dump to screen. :param model: Retrieve this model’s scoring POJO. :param path: An absolute path to the directory where POJO should be saved. :return: None

full_parameters None[source]

Get the full specification of all parameters.

Returns:a dictionary of parameters used to build this model.
get_xval_models(key=None)[source]

Return a Model object.

Parameters:key – If None, return all cross-validated models; otherwise return the model that key points to.
Returns:A model or list of models.
giniCoef(train=False, valid=False, xval=False)[source]

Get the Gini Coefficient(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the Gini Coefficient value for the training data.
  • valid – If valid is True, then return the Gini Coefficient value for the validation data.
  • xval – If xval is True, then return the Gini Coefficient value for the cross validation data.
Returns:

The Gini Coefficient for this binomial model.

id None[source]
Returns:Retrieve this model’s identifier.
is_cross_validated()[source]
Returns:True if the model was cross-validated.
logloss(train=False, valid=False, xval=False)[source]

Get the Log Loss(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the Log Loss value for the training data.
  • valid – If valid is True, then return the Log Loss value for the validation data.
  • xval – If xval is True, then return the Log Loss value for the cross validation data.
Returns:

The Log Loss for this binomial model.

mean_residual_deviance(train=False, valid=False, xval=False)[source]

Get the Mean Residual Deviances(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the Mean Residual Deviance value for the training data.
  • valid – If valid is True, then return the Mean Residual Deviance value for the validation data.
  • xval – If xval is True, then return the Mean Residual Deviance value for the cross validation data.
Returns:

The Mean Residual Deviance for this regression model.

model_performance(test_data=None, train=False, valid=False)[source]

Generate model metrics for this model on test_data.

Parameters:
  • test_data – Data set for which model metrics shall be computed against. Both train and valid arguments are ignored if test_data is not None.
  • train – Report the training metrics for the model. If the test_data is the training data, the training metrics are returned.
  • valid – Report the validation metrics for the model. If train and valid are True, then it defaults to True.
Returns:

An object of class H2OModelMetrics.

mse(train=False, valid=False, xval=False)[source]

Get the MSE(s). If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the MSE value for the training data.
  • valid – If valid is True, then return the MSE value for the validation data.
  • xval – If xval is True, then return the MSE value for the cross validation data.
Returns:

The MSE for this regression model.

null_degrees_of_freedom(train=False, valid=False, xval=False)[source]

Retreive the null degress of freedom if this model has the attribute, or None otherwise.

Parameters:
  • train – Get the null dof for the training set. If both train and valid are False, then train is selected by default.
  • valid – Get the null dof for the validation set. If both train and valid are True, then train is selected by default.
Returns:

Return the null dof, or None if it is not present.

null_deviance(train=False, valid=False, xval=False)[source]

Retreive the null deviance if this model has the attribute, or None otherwise.

Param:train Get the null deviance for the training set. If both train and valid are False, then train is selected by default.
Param:valid Get the null deviance for the validation set. If both train and valid are True, then train is selected by default.
Returns:Return the null deviance, or None if it is not present.
params None[source]

Get the parameters and the actual/default values only.

Returns:A dictionary of parameters used to build this model.
pprint_coef()[source]

Pretty print the coefficents table (includes normalized coefficients) :return: None

predict(test_data)[source]

Predict on a dataset.

Parameters:test_data – Data to be predicted on.
Returns:A new H2OFrame filled with predictions.
r2(train=False, valid=False, xval=False)[source]

Return the R^2 for this regression model.

The R^2 value is defined to be 1 - MSE/var, where var is computed as sigma*sigma.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the R^2 value for the training data.
  • valid – If valid is True, then return the R^2 value for the validation data.
  • xval – If xval is True, then return the R^2 value for the cross validation data.
Returns:

The R^2 for this regression model.

residual_degrees_of_freedom(train=False, valid=False, xval=False)[source]

Retreive the residual degress of freedom if this model has the attribute, or None otherwise.

Parameters:
  • train – Get the residual dof for the training set. If both train and valid are False, then train is selected by default.
  • valid – Get the residual dof for the validation set. If both train and valid are True, then train is selected by default.
Returns:

Return the residual dof, or None if it is not present.

residual_deviance(train=False, valid=False, xval=False)[source]

Retreive the residual deviance if this model has the attribute, or None otherwise.

Parameters:
  • train – Get the residual deviance for the training set. If both train and valid are False, then train is selected by default.
  • valid – Get the residual deviance for the validation set. If both train and valid are True, then train is selected by default.
Returns:

Return the residual deviance, or None if it is not present.

score_history()[source]

Retrieve Model Score History :return: the score history (H2OTwoDimTable)

show()[source]

Print innards of model, without regards to type

Returns:None
summary()[source]

Print a detailed summary of the model.

Returns:
varimp(return_list=False)[source]

Pretty print the variable importances, or return them in a list :param return_list: if True, then return the variable importances in an list (ordered from most important to least important). Each entry in the list is a 4-tuple of (variable, relative_importance, scaled_importance, percentage). :return: None or ordered list

weights(matrix_id=0)[source]

Return the frame for the respective weight matrix :param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return. :return: an H2OFrame which represents the weight matrix identified by matrix_id

xval_keys()[source]
Returns:The model keys for the cross-validated model.
xvals None[source]

Return a list of the cross-validated models.

Returns:A list of models

Binomial Classification

Binomial Models

class h2o.model.binomial.H2OBinomialModel(dest_key, model_json)[source]

Bases: h2o.model.model_base.ModelBase

Class for Binomial models.

F0point5(thresholds=None, train=False, valid=False, xval=False)[source]

Get the F0.5 for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the F0point5 value for the training data.
  • valid – If valid is True, then return the F0point5 value for the validation data.
  • xval – If xval is True, then return the F0point5 value for the cross validation data.
Returns:

The F0point5 for this binomial model.

F1(thresholds=None, train=False, valid=False, xval=False)[source]

Get the F1 for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the F1 value for the training data.
  • valid – If valid is True, then return the F1 value for the validation data.
  • xval – If xval is True, then return the F1 value for the cross validation data.
Returns:

The F1 for this binomial model.

F2(thresholds=None, train=False, valid=False, xval=False)[source]

Get the F2 for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the F2 value for the training data.
  • valid – If valid is True, then return the F2 value for the validation data.
  • xval – If xval is True, then return the F2 value for the cross validation data.
Returns:

The F2 for this binomial model.

accuracy(thresholds=None, train=False, valid=False, xval=False)[source]

Get the accuracy for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the accuracy value for the training data.
  • valid – If valid is True, then return the accuracy value for the validation data.
  • xval – If xval is True, then return the accuracy value for the cross validation data.
Returns:

The accuracy for this binomial model.

confusion_matrix(metrics=None, thresholds=None, train=False, valid=False, xval=False)[source]

Get the confusion matrix for the specified metrics/thresholds If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • metrics – A string (or list of strings) in {“min_per_class_accuracy”, “absolute_MCC”, “tnr”, “fnr”, “fpr”, “tpr”, “precision”, “accuracy”, “f0point5”, “f2”, “f1”}
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the confusion matrix value for the training data.
  • valid – If valid is True, then return the confusion matrix value for the validation data.
  • xval – If xval is True, then return the confusion matrix value for the cross validation data.
Returns:

The confusion matrix for this binomial model.

error(thresholds=None, train=False, valid=False, xval=False)[source]

Get the error for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the error value for the training data.
  • valid – If valid is True, then return the error value for the validation data.
  • xval – If xval is True, then return the error value for the cross validation data.
Returns:

The error for this binomial model.

fallout(thresholds=None, train=False, valid=False, xval=False)[source]

Get the Fallout (AKA False Positive Rate) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the fallout value for the training data.
  • valid – If valid is True, then return the fallout value for the validation data.
  • xval – If xval is True, then return the fallout value for the cross validation data.
Returns:

The fallout for this binomial model.

find_idx_by_threshold(threshold, train=False, valid=False, xval=False)[source]

Retrieve the index in this metric’s threshold list at which the given threshold is located. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the idx_by_threshold for the training data.
  • valid – If valid is True, then return the idx_by_threshold for the validation data.
  • xval – If xval is True, then return the idx_by_threshold for the cross validation data.
Returns:

The idx_by_threshold for this binomial model.

find_threshold_by_max_metric(metric, train=False, valid=False, xval=False)[source]

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the threshold_by_max_metric value for the training data.
  • valid – If valid is True, then return the threshold_by_max_metric value for the validation data.
  • xval – If xval is True, then return the threshold_by_max_metric value for the cross validation data.
Returns:

The threshold_by_max_metric for this binomial model.

fnr(thresholds=None, train=False, valid=False, xval=False)[source]

Get the False Negative Rates for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the fnr value for the training data.
  • valid – If valid is True, then return the fnr value for the validation data.
  • xval – If xval is True, then return the fnr value for the cross validation data.
Returns:

The fnr for this binomial model.

fpr(thresholds=None, train=False, valid=False, xval=False)[source]

Get the False Positive Rates for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the fpr value for the training data.
  • valid – If valid is True, then return the fpr value for the validation data.
  • xval – If xval is True, then return the fpr value for the cross validation data.
Returns:

The fpr for this binomial model.

max_per_class_error(thresholds=None, train=False, valid=False, xval=False)[source]

Get the max per class error for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the max_per_class_error value for the training data.
  • valid – If valid is True, then return the max_per_class_error value for the validation data.
  • xval – If xval is True, then return the max_per_class_error value for the cross validation data.
Returns:

The max_per_class_error for this binomial model.

mcc(thresholds=None, train=False, valid=False, xval=False)[source]

Get the mcc for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the mcc value for the training data.
  • valid – If valid is True, then return the mcc value for the validation data.
  • xval – If xval is True, then return the mcc value for the cross validation data.
Returns:

The mcc for this binomial model.

metric(metric, thresholds=None, train=False, valid=False, xval=False)[source]

Get the metric value for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the metrics for the training data.
  • valid – If valid is True, then return the metrics for the validation data.
  • xval – If xval is True, then return the metrics for the cross validation data.
Returns:

The metrics for this binomial model.

missrate(thresholds=None, train=False, valid=False, xval=False)[source]

Get the miss rate (AKA False Negative Rate) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the missrate value for the training data.
  • valid – If valid is True, then return the missrate value for the validation data.
  • xval – If xval is True, then return the missrate value for the cross validation data.
Returns:

The missrate for this binomial model.

plot(type='roc', train=False, valid=False, xval=False, **kwargs)[source]

Produce the desired metric plot If all are False (default), then return the training metric value.

Parameters:
  • type – the type of metric plot (currently, only ROC supported)
  • train – If train is True, then plot for training data.
  • valid – If valid is True, then plot for validation data.
  • xval – If xval is True, then plot for cross validation data.
  • show – if False, the plot is not shown. matplotlib show method is blocking.
Returns:

None

precision(thresholds=None, train=False, valid=False, xval=False)[source]

Get the precision for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the precision value for the training data.
  • valid – If valid is True, then return the precision value for the validation data.
  • xval – If xval is True, then return the precision value for the cross validation data.
Returns:

The precision for this binomial model.

recall(thresholds=None, train=False, valid=False, xval=False)[source]

Get the Recall (AKA True Positive Rate) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the recall value for the training data.
  • valid – If valid is True, then return the recall value for the validation data.
  • xval – If xval is True, then return the recall value for the cross validation data.
Returns:

The recall for this binomial model.

roc(train=False, valid=False, xval=False)[source]

Return the coordinates of the ROC curve for a given set of data, as a two-tuple containing the false positive rates as a list and true positive rates as a list. If all are False (default), then return is the training data. If more than one ROC curve is requested, the data is returned as a dictionary of two-tuples. :param train: If train is true, then return the ROC coordinates for the training data. :param valid: If valid is true, then return the ROC coordinates for the validation data. :param xval: If xval is true, then return the ROC coordinates for the cross validation data. :return rocs_cooridinates: the true cooridinates of the roc curve.

sensitivity(thresholds=None, train=False, valid=False, xval=False)[source]

Get the sensitivity (AKA True Positive Rate or Recall) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the sensitivity value for the training data.
  • valid – If valid is True, then return the sensitivity value for the validation data.
  • xval – If xval is True, then return the sensitivity value for the cross validation data.
Returns:

The sensitivity for this binomial model.

specificity(thresholds=None, train=False, valid=False, xval=False)[source]

Get the specificity (AKA True Negative Rate) for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the specificity value for the training data.
  • valid – If valid is True, then return the specificity value for the validation data.
  • xval – If xval is True, then return the specificity value for the cross validation data.
Returns:

The specificity for this binomial model.

tnr(thresholds=None, train=False, valid=False, xval=False)[source]

Get the True Negative Rate for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the tnr value for the training data.
  • valid – If valid is True, then return the tnr value for the validation data.
  • xval – If xval is True, then return the tnr value for the cross validation data.
Returns:

The F1 for this binomial model.

tpr(thresholds=None, train=False, valid=False, xval=False)[source]

Get the True Positive Rate for a set of thresholds. If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • thresholds – thresholds parameter must be a list (i.e. [0.01, 0.5, 0.99]). If None, then the thresholds in this set of metrics will be used.
  • train – If train is True, then return the tpr value for the training data.
  • valid – If valid is True, then return the tpr value for the validation data.
  • xval – If xval is True, then return the tpr value for the cross validation data.
Returns:

The tpr for this binomial model.

Multinomial Classification

Multinomial Models

class h2o.model.multinomial.H2OMultinomialModel(dest_key, model_json)[source]

Bases: h2o.model.model_base.ModelBase

confusion_matrix(data)[source]

Returns a confusion matrix based of H2O’s default prediction threshold for a dataset

hit_ratio_table(train=False, valid=False, xval=False)[source]

Retrieve the Hit Ratios

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the R^2 value for the training data.
  • valid – If valid is True, then return the R^2 value for the validation data.
  • xval – If xval is True, then return the R^2 value for the cross validation data.
Returns:

The R^2 for this regression model.

Regression

Regression Models

class h2o.model.regression.H2ORegressionModel(dest_key, model_json)[source]

Bases: h2o.model.model_base.ModelBase

Class for Regression models.

h2o.model.regression.h2o_explained_variance_score(y_actual, y_predicted, weights=None)[source]

Explained variance regression score function

Parameters:
  • y_actual – H2OFrame of actual response.
  • y_predicted – H2OFrame of predicted response.
  • weights – (Optional) sample weights
Returns:

the explained variance score (float)

h2o.model.regression.h2o_mean_absolute_error(y_actual, y_predicted, weights=None)[source]

Mean absolute error regression loss.

Parameters:
  • y_actual – H2OFrame of actual response.
  • y_predicted – H2OFrame of predicted response.
  • weights – (Optional) sample weights
Returns:

loss (float) (best is 0.0)

h2o.model.regression.h2o_mean_squared_error(y_actual, y_predicted, weights=None)[source]

Mean squared error regression loss

Parameters:
  • y_actual – H2OFrame of actual response.
  • y_predicted – H2OFrame of predicted response.
  • weights – (Optional) sample weights
Returns:

loss (float) (best is 0.0)

h2o.model.regression.h2o_median_absolute_error(y_actual, y_predicted)[source]

Median absolute error regression loss

Parameters:
  • y_actual – H2OFrame of actual response.
  • y_predicted – H2OFrame of predicted response.
Returns:

loss (float) (best is 0.0)

h2o.model.regression.h2o_r2_score(y_actual, y_predicted, weights=1.0)[source]

R^2 (coefficient of determination) regression score function

Parameters:
  • y_actual – H2OFrame of actual response.
  • y_predicted – H2OFrame of predicted response.
  • weights – (Optional) sample weights
Returns:

R^2 (float) (best is 1.0, lower is worse)

Clustering Methods

Clustering Models

class h2o.model.clustering.H2OClusteringModel(dest_key, model_json)[source]

Bases: h2o.model.model_base.ModelBase

betweenss(train=False, valid=False, xval=False)[source]

Get the between cluster sum of squares.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the between cluster sum of squares value for the training data.
  • valid – If valid is True, then return the between cluster sum of squares value for the validation data.
  • xval – If xval is True, then return the between cluster sum of squares value for the cross validation data.
Returns:

The between cluster sum of squares for this clustering model.

centers()[source]
Returns:the centers for the kmeans model.
centers_std()[source]
Returns:the standardized centers for the kmeans model.
centroid_stats(train=False, valid=False, xval=False)[source]

Get the centroid statistics for each cluster.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the centroid statistics for the training data.
  • valid – If valid is True, then return the centroid statistics for the validation data.
  • xval – If xval is True, then return the centroid statistics for the cross validation data.
Returns:

The centroid statistics for this clustering model.

num_iterations()[source]

Get the number of iterations that it took to converge or reach max iterations.

Returns:number of iterations (integer)
size(train=False, valid=False, xval=False)[source]

Get the sizes of each cluster.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the between cluster sum of squares value for the training data.
  • valid – If valid is True, then return the between cluster sum of squares value for the validation data.
  • xval – If xval is True, then return the between cluster sum of squares value for the cross validation data.
Returns:

The between cluster sum of squares for this clustering model.

tot_withinss(train=False, valid=False, xval=False)[source]

Get the total within cluster sum of squares.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the total within cluster sum of squares value for the training data.
  • valid – If valid is True, then return the total within cluster sum of squares value for the validation data.
  • xval – If xval is True, then return the total within cluster sum of squares value for the cross validation data.
Returns:

The total within cluster sum of squares for this clustering model.

totss(train=False, valid=False, xval=False)[source]

Get the total sum of squares to grand mean.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the total sum of squares to grand mean value for the training data.
  • valid – If valid is True, then return the total sum of squares to grand mean value for the validation data.
  • xval – If xval is True, then return the total sum of squares to grand mean value for the cross validation data.
Returns:

The total sum of squares to grand mean for this clustering model.

withinss(train=False, valid=False, xval=False)[source]

Get the within cluster sum of squares for each cluster.

If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”

Parameters:
  • train – If train is True, then return the within cluster sum of squares value for the training data.
  • valid – If valid is True, then return the within cluster sum of squares value for the validation data.
  • xval – If xval is True, then return the within cluster sum of squares value for the cross validation data.
Returns:

The within cluster sum of squares for this clustering model.

AutoEncoders

AutoEncoder Models

class h2o.model.autoencoder.H2OAutoEncoderModel(dest_key, model_json)[source]

Bases: h2o.model.model_base.ModelBase

Class for AutoEncoder models.

anomaly(test_data, per_feature=False)[source]

Obtain the reconstruction error for the input test_data.

Parameters:
  • test_data – The dataset upon which the reconstruction error is computed.
  • per_feature – Whether to return the square reconstruction error per feature. Otherwise, return the mean square error.
Returns:

Return the reconstruction error.