Model Categories¶
-
class
h2o.model.
H2OAutoEncoderModel
[source]¶ Bases:
h2o.model.model_base.ModelBase
-
anomaly
(test_data, per_feature=False)[source]¶ Obtain the reconstruction error for the input test_data.
Parameters: test_data : H2OFrame
The dataset upon which the reconstruction error is computed.
- per_feature : bool
Whether to return the square reconstruction error per feature. Otherwise, return the mean square error.
Returns: Return the reconstruction error.
-
-
class
h2o.model.
H2OBinomialModel
[source]¶ Bases:
h2o.model.model_base.ModelBase
Binomial model.
-
F0point5
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the F0.5 for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the F0point5 value for the training data.
- valid : bool, optional
If True, return the F0point5 value for the validation data.
- xval : bool, optional
If True, return the F0point5 value for each of the cross-validated splits.
Returns: The F0point5 values for the specified key(s).
-
F1
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the F1 value for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the F1 value for the training data.
- valid : bool, optional
If True, return the F1 value for the validation data.
- xval : bool, optional
If True, return the F1 value for each of the cross-validated splits.
Returns: The F1 values for the specified key(s).
Examples
>>> import h2o as ml >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> ml.init() >>> rows=[[1,2,3,4,0],[2,1,2,4,1],[2,1,4,2,1],[0,1,2,34,1],[2,3,4,1,0]]*50 >>> fr = ml.H2OFrame(rows) >>> fr[4] = fr[4].asfactor() >>> model = H2OGradientBoostingEstimator(ntrees=10, max_depth=10, nfolds=4) >>> model.train(x=range(4), y=4, training_frame=fr) >>> model.F1(train=True)
-
F2
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the F2 for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the F2 value for the training data.
- valid : bool, optional
If True, return the F2 value for the validation data.
- xval : bool, optional
If True, return the F2 value for each of the cross-validated splits.
Returns: The F2 values for the specified key(s).
-
accuracy
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the accuracy for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the accuracy value for the training data.
- valid : bool, optional
If True, return the accuracy value for the validation data.
- xval : bool, optional
If True, return the accuracy value for each of the cross-validated splits.
Returns: The accuracy values for the specified key(s).
-
confusion_matrix
(metrics=None, thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the confusion matrix for the specified metrics/thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: metrics : str, list
One or more of “min_per_class_accuracy”, “absolute_mcc”, “tnr”, “fnr”, “fpr”, “tpr”, “precision”, “accuracy”, “f0point5”, “f2”, “f1”.
- thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the confusion_matrix for the training data.
- valid : bool, optional
If True, return the confusion_matrix for the validation data.
- xval : bool, optional
If True, return the confusion_matrix for each of the cross-validated splits.
Returns: The metric values for the specified key(s).
-
error
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the error for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the error value for the training data.
- valid : bool, optional
If True, return the error value for the validation data.
- xval : bool, optional
If True, return the error value for each of the cross-validated splits.
Returns: The error values for the specified key(s).
-
fallout
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the fallout for a set of thresholds (aka False Positive Rate).
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the fallout value for the training data.
- valid : bool, optional
If True, return the fallout value for the validation data.
- xval : bool, optional
If True, return the fallout value for each of the cross-validated splits.
Returns: The fallout values for the specified key(s).
-
find_idx_by_threshold
(threshold, train=False, valid=False, xval=False)[source]¶ Retrieve the index in this metric’s threshold list at which the given threshold is located.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: train : bool, optional
If True, return the find_idx_by_threshold value for the training data.
- valid : bool, optional
If True, return the find_idx_by_threshold value for the validation data.
- xval : bool, optional
If True, return the find_idx_by_threshold value for each of the cross-validated splits.
Returns: The find_idx_by_threshold values for the specified key(s).
-
find_threshold_by_max_metric
(metric, train=False, valid=False, xval=False)[source]¶ If all are False (default), then return the training metric value.
If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: train : bool, optional
If True, return the find_threshold_by_max_metric value for the training data.
- valid : bool, optional
If True, return the find_threshold_by_max_metric value for the validation data.
- xval : bool, optional
If True, return the find_threshold_by_max_metric value for each of the cross-validated splits.
Returns: The find_threshold_by_max_metric values for the specified key(s).
-
fnr
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the False Negative Rates for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the fnr value for the training data.
- valid : bool, optional
If True, return the fnr value for the validation data.
- xval : bool, optional
If True, return the fnr value for each of the cross-validated splits.
Returns: The fnr values for the specified key(s).
-
fpr
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the False Positive Rates for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the fpr value for the training data.
- valid : bool, optional
If True, return the fpr value for the validation data.
- xval : bool, optional
If True, return the fpr value for each of the cross-validated splits.
Returns: The fpr values for the specified key(s).
-
gains_lift
(train=False, valid=False, xval=False)[source]¶ Get the Gains/Lift table for the specified metrics.
If all are False (default), then return the training metric Gains/Lift table. If more than one options is set to True, then return a dictionary of metrics where t he keys are “train”, “valid”, and “xval”
Parameters: train : bool, optional
If True, return the gains_lift value for the training data.
- valid : bool, optional
If True, return the gains_lift value for the validation data.
- xval : bool, optional
If True, return the gains_lift value for each of the cross-validated splits.
Returns: The gains_lift values for the specified key(s).
-
max_per_class_error
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the max per class error for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the max_per_class_error value for the training data.
- valid : bool, optional
If True, return the max_per_class_error value for the validation data.
- xval : bool, optional
If True, return the max_per_class_error value for each of the cross-validated splits.
Returns: The max_per_class_error values for the specified key(s).
-
mcc
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the mcc for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the mcc value for the training data.
- valid : bool, optional
If True, return the mcc value for the validation data.
- xval : bool, optional
If True, return the mcc value for each of the cross-validated splits.
Returns: The mcc values for the specified key(s).
-
mean_per_class_error
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the mean per class error for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the mean_per_class_error value for the training data.
- valid : bool, optional
If True, return the mean_per_class_error value for the validation data.
- xval : bool, optional
If True, return the mean_per_class_error value for each of the cross-validated splits.
Returns: The mean_per_class_error values for the specified key(s).
-
metric
(metric, thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the metric value for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the metric value for the training data.
- valid : bool, optional
If True, return the metric value for the validation data.
- xval : bool, optional
If True, return the metric value for each of the cross-validated splits.
Returns: The metric values for the specified key(s).
-
missrate
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the miss rate for a set of thresholds (aka False Negative Rate).
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the missrate value for the training data.
- valid : bool, optional
If True, return the missrate value for the validation data.
- xval : bool, optional
If True, return the missrate value for each of the cross-validated splits.
Returns: The missrate values for the specified key(s).
-
plot
(timestep=u'AUTO', metric=u'AUTO', server=False, **kwargs)[source]¶ Plot training set (and validation set if available) scoring history for an H2OBinomialModel.
The timestep and metric arguments are restricted to what is available in its scoring history.
Parameters: timestep : str
A unit of measurement for the x-axis.
- metric : str
A unit of measurement for the y-axis.
-
precision
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the precision for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the precision value for the training data.
- valid : bool, optional
If True, return the precision value for the validation data.
- xval : bool, optional
If True, return the precision value for each of the cross-validated splits.
Returns: The precision values for the specified key(s).
-
recall
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the recall for a set of thresholds (aka True Positive Rate).
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the recall value for the training data.
- valid : bool, optional
If True, return the recall value for the validation data.
- xval : bool, optional
If True, return the recall value for each of the cross-validated splits.
Returns: The recall values for the specified key(s).
-
roc
(train=False, valid=False, xval=False)[source]¶ Return the coordinates of the ROC curve for a given set of data.
The coordinates are two-tuples containing the false positive rates as a list and true positive rates as a list. If all are False (default), then return is the training data. If more than one ROC curve is requested, the data is returned as a dictionary of two-tuples.
Parameters: train : bool, optional
If True, return the roc value for the training data.
- valid : bool, optional
If True, return the roc value for the validation data.
- xval : bool, optional
If True, return the roc value for each of the cross-validated splits.
Returns: The roc values for the specified key(s).
-
sensitivity
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the sensitivity for a set of thresholds (aka True Positive Rate or Recall).
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the sensitivity value for the training data.
- valid : bool, optional
If True, return the sensitivity value for the validation data.
- xval : bool, optional
If True, return the sensitivity value for each of the cross-validated splits.
Returns: The sensitivity values for the specified key(s).
-
specificity
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the specificity for a set of thresholds (aka True Negative Rate).
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the specificity value for the training data.
- valid : bool, optional
If True, return the specificity value for the validation data.
- xval : bool, optional
If True, return the specificity value for each of the cross-validated splits.
Returns: The specificity values for the specified key(s).
-
tnr
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the True Negative Rate for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the tnr value for the training data.
- valid : bool, optional
If True, return the tnr value for the validation data.
- xval : bool, optional
If True, return the tnr value for each of the cross-validated splits.
Returns: The tnr values for the specified key(s).
-
tpr
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the True Positive Rate for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the tpr value for the training data.
- valid : bool, optional
If True, return the tpr value for the validation data.
- xval : bool, optional
If True, return the tpr value for each of the cross-validated splits.
Returns: The tpr values for the specified key(s).
-
-
class
h2o.model.
H2OClusteringModel
[source]¶ Bases:
h2o.model.model_base.ModelBase
-
betweenss
(train=False, valid=False, xval=False)[source]¶ Get the between cluster sum of squares.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, optional
If True, then return the between cluster sum of squares value for the training data.
- valid : bool, optional
If True, then return the between cluster sum of squares value for the validation data.
- xval : bool, optional
If True, then return the between cluster sum of squares value for each of the cross-validated splits.
Returns: Returns the between sum of squares values for the specified key(s).
-
centroid_stats
(train=False, valid=False, xval=False)[source]¶ Get the centroid statistics for each cluster.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, optional
If True, then return the centroid statistics for the training data.
- valid : bool, optional
If True, then return the centroid statistics for the validation data.
- xval : bool, optional
If True, then return the centroid statistics for each of the cross-validated splits.
Returns: Returns the centroid statistics for the specified key(s).
-
num_iterations
()[source]¶ Get the number of iterations that it took to converge or reach max iterations.
Returns: The number of iterations (integer).
-
size
(train=False, valid=False, xval=False)[source]¶ Get the sizes of each cluster.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: train : bool, optional
If True, then return cluster sizes for the training data.
- valid : bool, optional
If True, then return the cluster sizes for the validation data.
- xval : bool, optional
If True, then return the cluster sizes for each of the cross-validated splits.
Returns: Returns the cluster sizes for the specified key(s).
-
tot_withinss
(train=False, valid=False, xval=False)[source]¶ Get the total within cluster sum of squares.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, optional
If True, then return the total within cluster sum of squares value for the training data.
- valid : bool, optional
If True, then return the total within cluster sum of squares value for the validation data.
- xval : bool, optional
If True, then return the total within cluster sum of squares value for each of the cross-validated splits.
Returns: Returns the total within cluster sum of squares values for the specified key(s).
-
totss
(train=False, valid=False, xval=False)[source]¶ Get the total sum of squares.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, optional
If True, then return the total sum of squares value for the training data.
- valid : bool, optional
If True, then return the total sum of squares value for the validation data.
- xval : bool, optional
If True, then return the total sum of squares value for each of the cross-validated splits.
Returns: Returns the total sum of squares values for the specified key(s).
-
withinss
(train=False, valid=False, xval=False)[source]¶ Get the within cluster sum of squares for each cluster.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, optional
If True, then return the within cluster sum of squares value for the training data.
- valid : bool, optional
If True, then return the within cluster sum of squares value for the validation data.
- xval : bool, optional
If True, then return the within cluster sum of squares value for each of the cross-validated splits.
Returns: Returns the total sum of squares values for the specified key(s).
-
-
class
h2o.model.
ConfusionMatrix
(cm, domains=None, table_header=None)[source]¶ Bases:
object
-
ROUND
= 4¶
-
-
class
h2o.model.
H2ODimReductionModel
[source]¶ Bases:
h2o.model.model_base.ModelBase
-
num_iterations
()[source]¶ Get the number of iterations that it took to converge or reach max iterations.
Returns: Number of iterations (integer)
-
objective
()[source]¶ Get the final value of the objective function from the GLRM model.
Returns: Final objective value
-
proj_archetypes
(test_data, reverse_transform=False)[source]¶ Convert archetypes of a GLRM model into original feature space.
Parameters: test_data : H2OFrame
The dataset upon which the H2O GLRM model was trained.
- reverse_transform : logical
Whether the transformation of the training data during model-building should be reversed on the projected archetypes.
Returns: Return the GLRM archetypes projected back into the original training data’s
feature space.
-
reconstruct
(test_data, reverse_transform=False)[source]¶ Reconstruct the training data from the GLRM model and impute all missing values.
Parameters: test_data : H2OFrame
The dataset upon which the H2O GLRM model was trained.
- reverse_transform : logical
Whether the transformation of the training data during model-building should be reversed on the reconstructed frame.
Returns: Return the approximate reconstruction of the training data.
-
-
class
h2o.model.
MetricsBase
(metric_json, on=None, algo=u'')[source]¶ Bases:
h2o.utils.backward_compatibility.BackwardsCompatibleBase
A parent class to house common metrics available for the various Metrics types.
The methods here are available across different model categories, and so appear here.
-
classmethod
make
(kvs)[source]¶ Factory method to instantiate a MetricsBase object from the list of key-value pairs.
-
mean_residual_deviance
()[source]¶ Returns: Retrieve the mean residual deviance for this set of metrics.
-
null_degrees_of_freedom
()[source]¶ Returns: the null dof if the model has residual deviance, or None if no null dof.
-
null_deviance
()[source]¶ Returns: the null deviance if the model has residual deviance, or None if no null deviance.
-
residual_degrees_of_freedom
()[source]¶ Returns: the residual dof if the model has residual deviance, or None if no residual dof.
-
classmethod
-
class
h2o.model.
ModelBase
[source]¶ Bases:
h2o.utils.backward_compatibility.BackwardsCompatibleBase
Base class for all models.
-
actual_params
¶ Get actual parameters of a model
Returns: A dictionary of actual parameters for the model
-
aic
(train=False, valid=False, xval=False)[source]¶ Get the AIC(s).
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: - train – If train is True, then return the AIC value for the training data.
- valid – If valid is True, then return the AIC value for the validation data.
- xval – If xval is True, then return the AIC value for the validation data.
Returns: The AIC.
-
auc
(train=False, valid=False, xval=False)[source]¶ Get the AUC.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: - train – If train is True, then return the AUC value for the training data.
- valid – If valid is True, then return the AUC value for the validation data.
- xval – If xval is True, then return the AUC value for the validation data.
Returns: The AUC.
-
biases
(vector_id=0)[source]¶ Return the frame for the respective bias vector.
Param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return. Returns: an H2OFrame which represents the bias vector identified by vector_id
-
cross_validation_fold_assignment
()[source]¶ Obtain the cross-validation fold assignment for all rows in the training data.
Returns: H2OFrame
-
cross_validation_holdout_predictions
()[source]¶ Obtain the (out-of-sample) holdout predictions of all cross-validation models on the training data.
This is equivalent to summing up all H2OFrames returned by cross_validation_predictions.
Returns: H2OFrame
-
cross_validation_metrics_summary
()[source]¶ Retrieve Cross-Validation Metrics Summary.
Returns: The cross-validation metrics summary as an H2OTwoDimTable
-
cross_validation_models
()[source]¶ Obtain a list of cross-validation models.
Returns: list of H2OModel objects
-
cross_validation_predictions
()[source]¶ Obtain the (out-of-sample) holdout predictions of all cross-validation models on their holdout data.
Note that the predictions are expanded to the full number of rows of the training data, with 0 fill-in.
Returns: list of H2OFrame objects
-
deepfeatures
(test_data, layer)[source]¶ Return hidden layer details.
Parameters: - test_data – Data to create a feature space on
- layer – 0 index hidden layer
-
default_params
¶ Get default parameters of a model
Returns: A dictionary of default parameters for the model
-
download_mojo
(path=u'.', get_genmodel_jar=False)[source]¶ Download the model in MOJO format.
Parameters: - path – the path where MOJO file should be saved.
- get_genmodel_jar – if True, then also download h2o-genmodel.jar and store it in folder
path
.
Returns: name of the MOJO file written.
-
download_pojo
(path=u'', get_genmodel_jar=False)[source]¶ Download the POJO for this model to the directory specified by path.
If path is “”, then dump to screen.
Parameters: path – An absolute path to the directory where POJO should be saved. Returns: name of the POJO file written.
-
full_parameters
¶ Get the full specification of all parameters.
Returns: a dictionary of parameters used to build this model.
-
get_xval_models
(key=None)[source]¶ Return a Model object.
Parameters: key – If None, return all cross-validated models; otherwise return the model that key points to. Returns: A model or list of models.
-
gini
(train=False, valid=False, xval=False)[source]¶ Get the Gini coefficient.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: - train – If train is True, then return the Gini Coefficient value for the training data.
- valid – If valid is True, then return the Gini Coefficient value for the validation data.
- xval – If xval is True, then return the Gini Coefficient value for the cross validation data.
Returns: The Gini Coefficient for this binomial model.
-
logloss
(train=False, valid=False, xval=False)[source]¶ Get the Log Loss.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: - train – If train is True, then return the Log Loss value for the training data.
- valid – If valid is True, then return the Log Loss value for the validation data.
- xval – If xval is True, then return the Log Loss value for the cross validation data.
Returns: The Log Loss for this binomial model.
-
mae
(train=False, valid=False, xval=False)[source]¶ Get the MAE.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, default=True
If train is True, then return the MAE value for the training data.
valid : bool, default=True
If valid is True, then return the MAE value for the validation data.
xval : bool, default=True
If xval is True, then return the MAE value for the cross validation data.
Returns: The MAE for this regression model.
-
mean_residual_deviance
(train=False, valid=False, xval=False)[source]¶ Get the Mean Residual Deviances.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: - train – If train is True, then return the Mean Residual Deviance value for the training data.
- valid – If valid is True, then return the Mean Residual Deviance value for the validation data.
- xval – If xval is True, then return the Mean Residual Deviance value for the cross validation data.
Returns: The Mean Residual Deviance for this regression model.
-
model_id
¶ Model identifier.
-
model_performance
(test_data=None, train=False, valid=False, xval=False)[source]¶ Generate model metrics for this model on test_data.
Parameters: test_data: H2OFrame, optional
Data set for which model metrics shall be computed against. All three of train, valid and xval arguments are ignored if test_data is not None.
train: boolean, optional
Report the training metrics for the model.
valid: boolean, optional
Report the validation metrics for the model.
xval: boolean, optional
Report the cross-validation metrics for the model. If train and valid are True, then it defaults to True.
:returns: An object of class H2OModelMetrics.
-
mse
(train=False, valid=False, xval=False)[source]¶ Get the MSE.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, default=True
If train is True, then return the MSE value for the training data.
valid : bool, default=True
If valid is True, then return the MSE value for the validation data.
xval : bool, default=True
If xval is True, then return the MSE value for the cross validation data.
:returns: The MSE for this regression model.
-
null_degrees_of_freedom
(train=False, valid=False, xval=False)[source]¶ Retreive the null degress of freedom if this model has the attribute, or None otherwise.
Parameters: - train – Get the null dof for the training set. If both train and valid are False, then train is selected by default.
- valid – Get the null dof for the validation set. If both train and valid are True, then train is selected by default.
- xval – not implemented
Returns: Return the null dof, or None if it is not present.
-
null_deviance
(train=False, valid=False, xval=False)[source]¶ Retreive the null deviance if this model has the attribute, or None otherwise.
Parameters: - train – Get the null deviance for the training set. If both train and valid are False, then train is selected by default.
- valid – Get the null deviance for the validation set. If both train and valid are True, then train is selected by default.
- xval – not implemented
Returns: Return the null deviance, or None if it is not present.
-
params
¶ Get the parameters and the actual/default values only.
Returns: A dictionary of parameters used to build this model.
-
predict
(test_data)[source]¶ Predict on a dataset.
Parameters: test_data (H2OFrame) – Data on which to make predictions. Returns: A new H2OFrame of predictions.
-
predict_leaf_node_assignment
(test_data)[source]¶ Predict on a dataset and return the leaf node assignment (only for tree-based models).
Parameters: test_data (H2OFrame) – Data on which to make predictions. Returns: A new H2OFrame of predictions.
-
r2
(train=False, valid=False, xval=False)[source]¶ Return the R^2 for this regression model.
Will return R^2 for GLM Models and will return NaN otherwise.
The R^2 value is defined to be 1 - MSE/var, where var is computed as sigma*sigma.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: - train – If train is True, then return the R^2 value for the training data.
- valid – If valid is True, then return the R^2 value for the validation data.
- xval – If xval is True, then return the R^2 value for the cross validation data.
Returns: The R^2 for this regression model.
-
residual_degrees_of_freedom
(train=False, valid=False, xval=False)[source]¶ Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
Parameters: - train – Get the residual dof for the training set. If both train and valid are False, then train is selected by default.
- valid – Get the residual dof for the validation set. If both train and valid are True, then train is selected by default.
- xval – not implemented
Returns: Return the residual dof, or None if it is not present.
-
residual_deviance
(train=False, valid=False, xval=None)[source]¶ Retreive the residual deviance if this model has the attribute, or None otherwise.
Parameters: - train – Get the residual deviance for the training set. If both train and valid are False, then train is selected by default.
- valid – Get the residual deviance for the validation set. If both train and valid are True, then train is selected by default.
- xval – not implemented
Returns: Return the residual deviance, or None if it is not present.
-
rmse
(train=False, valid=False, xval=False)[source]¶ Get the RMSE.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, default=True
If train is True, then return the RMSE value for the training data.
valid : bool, default=True
If valid is True, then return the RMSE value for the validation data.
xval : bool, default=True
If xval is True, then return the RMSE value for the cross validation data.
Returns: The RMSE for this regression model.
-
rmsle
(train=False, valid=False, xval=False)[source]¶ Get the rmsle.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, default=True
If train is True, then return the rmsle value for the training data.
valid : bool, default=True
If valid is True, then return the rmsle value for the validation data.
xval : bool, default=True
If xval is True, then return the rmsle value for the cross validation data.
Returns: The rmsle for this regression model.
-
scoring_history
()[source]¶ Retrieve Model Score History.
Returns: The score history as an H2OTwoDimTable or a Pandas DataFrame.
-
std_coef_plot
(num_of_features=None, server=False)[source]¶ Plot a GLM model’s standardized coefficient magnitudes.
Parameters: - num_of_features – the number of features shown in the plot.
- server –
?
Returns: None.
-
type
¶ Get the type of model built as a string.
Returns: “classifier” or “regressor” or “unsupervised”
-
varimp
(use_pandas=False)[source]¶ Pretty print the variable importances, or return them in a list.
Parameters: use_pandas – If True, then the variable importances will be returned as a pandas data frame. Returns: A list or Pandas DataFrame.
-
varimp_plot
(num_of_features=None, server=False)[source]¶ Plot the variable importance for a trained model.
Parameters: - num_of_features – the number of features shown in the plot.
- server –
?
Returns: None.
-
weights
(matrix_id=0)[source]¶ Return the frame for the respective weight matrix.
Param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return. Returns: an H2OFrame which represents the weight matrix identified by matrix_id
-
xvals
¶ Return a list of the cross-validated models.
Returns: A list of models
-
-
class
h2o.model.
H2OModelFuture
(job, x)[source]¶ Bases:
object
A class representing a future H2O model (a model that may, or may not, be in the process of being built)
ModelBase
¶
This module implements the base model class. All model things inherit from this class.
copyright: |
|
---|---|
license: | Apache License Version 2.0 (see LICENSE for details) |
-
class
h2o.model.model_base.
ModelBase
[source]¶ Bases:
h2o.utils.backward_compatibility.BackwardsCompatibleBase
Base class for all models.
-
actual_params
¶ Get actual parameters of a model
Returns: A dictionary of actual parameters for the model
-
aic
(train=False, valid=False, xval=False)[source]¶ Get the AIC(s).
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: - train – If train is True, then return the AIC value for the training data.
- valid – If valid is True, then return the AIC value for the validation data.
- xval – If xval is True, then return the AIC value for the validation data.
Returns: The AIC.
-
auc
(train=False, valid=False, xval=False)[source]¶ Get the AUC.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: - train – If train is True, then return the AUC value for the training data.
- valid – If valid is True, then return the AUC value for the validation data.
- xval – If xval is True, then return the AUC value for the validation data.
Returns: The AUC.
-
biases
(vector_id=0)[source]¶ Return the frame for the respective bias vector.
Param: vector_id: an integer, ranging from 0 to number of layers, that specifies the bias vector to return. Returns: an H2OFrame which represents the bias vector identified by vector_id
-
cross_validation_fold_assignment
()[source]¶ Obtain the cross-validation fold assignment for all rows in the training data.
Returns: H2OFrame
-
cross_validation_holdout_predictions
()[source]¶ Obtain the (out-of-sample) holdout predictions of all cross-validation models on the training data.
This is equivalent to summing up all H2OFrames returned by cross_validation_predictions.
Returns: H2OFrame
-
cross_validation_metrics_summary
()[source]¶ Retrieve Cross-Validation Metrics Summary.
Returns: The cross-validation metrics summary as an H2OTwoDimTable
-
cross_validation_models
()[source]¶ Obtain a list of cross-validation models.
Returns: list of H2OModel objects
-
cross_validation_predictions
()[source]¶ Obtain the (out-of-sample) holdout predictions of all cross-validation models on their holdout data.
Note that the predictions are expanded to the full number of rows of the training data, with 0 fill-in.
Returns: list of H2OFrame objects
-
deepfeatures
(test_data, layer)[source]¶ Return hidden layer details.
Parameters: - test_data – Data to create a feature space on
- layer – 0 index hidden layer
-
default_params
¶ Get default parameters of a model
Returns: A dictionary of default parameters for the model
-
download_mojo
(path=u'.', get_genmodel_jar=False)[source]¶ Download the model in MOJO format.
Parameters: - path – the path where MOJO file should be saved.
- get_genmodel_jar – if True, then also download h2o-genmodel.jar and store it in folder
path
.
Returns: name of the MOJO file written.
-
download_pojo
(path=u'', get_genmodel_jar=False)[source]¶ Download the POJO for this model to the directory specified by path.
If path is “”, then dump to screen.
Parameters: path – An absolute path to the directory where POJO should be saved. Returns: name of the POJO file written.
-
full_parameters
¶ Get the full specification of all parameters.
Returns: a dictionary of parameters used to build this model.
-
get_xval_models
(key=None)[source]¶ Return a Model object.
Parameters: key – If None, return all cross-validated models; otherwise return the model that key points to. Returns: A model or list of models.
-
gini
(train=False, valid=False, xval=False)[source]¶ Get the Gini coefficient.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: - train – If train is True, then return the Gini Coefficient value for the training data.
- valid – If valid is True, then return the Gini Coefficient value for the validation data.
- xval – If xval is True, then return the Gini Coefficient value for the cross validation data.
Returns: The Gini Coefficient for this binomial model.
-
logloss
(train=False, valid=False, xval=False)[source]¶ Get the Log Loss.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: - train – If train is True, then return the Log Loss value for the training data.
- valid – If valid is True, then return the Log Loss value for the validation data.
- xval – If xval is True, then return the Log Loss value for the cross validation data.
Returns: The Log Loss for this binomial model.
-
mae
(train=False, valid=False, xval=False)[source]¶ Get the MAE.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, default=True
If train is True, then return the MAE value for the training data.
valid : bool, default=True
If valid is True, then return the MAE value for the validation data.
xval : bool, default=True
If xval is True, then return the MAE value for the cross validation data.
Returns: The MAE for this regression model.
-
mean_residual_deviance
(train=False, valid=False, xval=False)[source]¶ Get the Mean Residual Deviances.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: - train – If train is True, then return the Mean Residual Deviance value for the training data.
- valid – If valid is True, then return the Mean Residual Deviance value for the validation data.
- xval – If xval is True, then return the Mean Residual Deviance value for the cross validation data.
Returns: The Mean Residual Deviance for this regression model.
-
model_id
¶ Model identifier.
-
model_performance
(test_data=None, train=False, valid=False, xval=False)[source]¶ Generate model metrics for this model on test_data.
Parameters: test_data: H2OFrame, optional
Data set for which model metrics shall be computed against. All three of train, valid and xval arguments are ignored if test_data is not None.
train: boolean, optional
Report the training metrics for the model.
valid: boolean, optional
Report the validation metrics for the model.
xval: boolean, optional
Report the cross-validation metrics for the model. If train and valid are True, then it defaults to True.
:returns: An object of class H2OModelMetrics.
-
mse
(train=False, valid=False, xval=False)[source]¶ Get the MSE.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, default=True
If train is True, then return the MSE value for the training data.
valid : bool, default=True
If valid is True, then return the MSE value for the validation data.
xval : bool, default=True
If xval is True, then return the MSE value for the cross validation data.
:returns: The MSE for this regression model.
-
null_degrees_of_freedom
(train=False, valid=False, xval=False)[source]¶ Retreive the null degress of freedom if this model has the attribute, or None otherwise.
Parameters: - train – Get the null dof for the training set. If both train and valid are False, then train is selected by default.
- valid – Get the null dof for the validation set. If both train and valid are True, then train is selected by default.
- xval – not implemented
Returns: Return the null dof, or None if it is not present.
-
null_deviance
(train=False, valid=False, xval=False)[source]¶ Retreive the null deviance if this model has the attribute, or None otherwise.
Parameters: - train – Get the null deviance for the training set. If both train and valid are False, then train is selected by default.
- valid – Get the null deviance for the validation set. If both train and valid are True, then train is selected by default.
- xval – not implemented
Returns: Return the null deviance, or None if it is not present.
-
params
¶ Get the parameters and the actual/default values only.
Returns: A dictionary of parameters used to build this model.
-
predict
(test_data)[source]¶ Predict on a dataset.
Parameters: test_data (H2OFrame) – Data on which to make predictions. Returns: A new H2OFrame of predictions.
-
predict_leaf_node_assignment
(test_data)[source]¶ Predict on a dataset and return the leaf node assignment (only for tree-based models).
Parameters: test_data (H2OFrame) – Data on which to make predictions. Returns: A new H2OFrame of predictions.
-
r2
(train=False, valid=False, xval=False)[source]¶ Return the R^2 for this regression model.
Will return R^2 for GLM Models and will return NaN otherwise.
The R^2 value is defined to be 1 - MSE/var, where var is computed as sigma*sigma.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: - train – If train is True, then return the R^2 value for the training data.
- valid – If valid is True, then return the R^2 value for the validation data.
- xval – If xval is True, then return the R^2 value for the cross validation data.
Returns: The R^2 for this regression model.
-
residual_degrees_of_freedom
(train=False, valid=False, xval=False)[source]¶ Retreive the residual degress of freedom if this model has the attribute, or None otherwise.
Parameters: - train – Get the residual dof for the training set. If both train and valid are False, then train is selected by default.
- valid – Get the residual dof for the validation set. If both train and valid are True, then train is selected by default.
- xval – not implemented
Returns: Return the residual dof, or None if it is not present.
-
residual_deviance
(train=False, valid=False, xval=None)[source]¶ Retreive the residual deviance if this model has the attribute, or None otherwise.
Parameters: - train – Get the residual deviance for the training set. If both train and valid are False, then train is selected by default.
- valid – Get the residual deviance for the validation set. If both train and valid are True, then train is selected by default.
- xval – not implemented
Returns: Return the residual deviance, or None if it is not present.
-
rmse
(train=False, valid=False, xval=False)[source]¶ Get the RMSE.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, default=True
If train is True, then return the RMSE value for the training data.
valid : bool, default=True
If valid is True, then return the RMSE value for the validation data.
xval : bool, default=True
If xval is True, then return the RMSE value for the cross validation data.
Returns: The RMSE for this regression model.
-
rmsle
(train=False, valid=False, xval=False)[source]¶ Get the rmsle.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, default=True
If train is True, then return the rmsle value for the training data.
valid : bool, default=True
If valid is True, then return the rmsle value for the validation data.
xval : bool, default=True
If xval is True, then return the rmsle value for the cross validation data.
Returns: The rmsle for this regression model.
-
scoring_history
()[source]¶ Retrieve Model Score History.
Returns: The score history as an H2OTwoDimTable or a Pandas DataFrame.
-
std_coef_plot
(num_of_features=None, server=False)[source]¶ Plot a GLM model’s standardized coefficient magnitudes.
Parameters: - num_of_features – the number of features shown in the plot.
- server –
?
Returns: None.
-
type
¶ Get the type of model built as a string.
Returns: “classifier” or “regressor” or “unsupervised”
-
varimp
(use_pandas=False)[source]¶ Pretty print the variable importances, or return them in a list.
Parameters: use_pandas – If True, then the variable importances will be returned as a pandas data frame. Returns: A list or Pandas DataFrame.
-
varimp_plot
(num_of_features=None, server=False)[source]¶ Plot the variable importance for a trained model.
Parameters: - num_of_features – the number of features shown in the plot.
- server –
?
Returns: None.
-
weights
(matrix_id=0)[source]¶ Return the frame for the respective weight matrix.
Param: matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return. Returns: an H2OFrame which represents the weight matrix identified by matrix_id
-
xvals
¶ Return a list of the cross-validated models.
Returns: A list of models
-
Binomial Classification
¶
Binomial model.
copyright: |
|
---|---|
license: | Apache License Version 2.0 (see LICENSE for details) |
-
class
h2o.model.binomial.
H2OBinomialModel
[source]¶ Bases:
h2o.model.model_base.ModelBase
Binomial model.
-
F0point5
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the F0.5 for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the F0point5 value for the training data.
- valid : bool, optional
If True, return the F0point5 value for the validation data.
- xval : bool, optional
If True, return the F0point5 value for each of the cross-validated splits.
Returns: The F0point5 values for the specified key(s).
-
F1
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the F1 value for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the F1 value for the training data.
- valid : bool, optional
If True, return the F1 value for the validation data.
- xval : bool, optional
If True, return the F1 value for each of the cross-validated splits.
Returns: The F1 values for the specified key(s).
Examples
>>> import h2o as ml >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> ml.init() >>> rows=[[1,2,3,4,0],[2,1,2,4,1],[2,1,4,2,1],[0,1,2,34,1],[2,3,4,1,0]]*50 >>> fr = ml.H2OFrame(rows) >>> fr[4] = fr[4].asfactor() >>> model = H2OGradientBoostingEstimator(ntrees=10, max_depth=10, nfolds=4) >>> model.train(x=range(4), y=4, training_frame=fr) >>> model.F1(train=True)
-
F2
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the F2 for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the F2 value for the training data.
- valid : bool, optional
If True, return the F2 value for the validation data.
- xval : bool, optional
If True, return the F2 value for each of the cross-validated splits.
Returns: The F2 values for the specified key(s).
-
accuracy
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the accuracy for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the accuracy value for the training data.
- valid : bool, optional
If True, return the accuracy value for the validation data.
- xval : bool, optional
If True, return the accuracy value for each of the cross-validated splits.
Returns: The accuracy values for the specified key(s).
-
confusion_matrix
(metrics=None, thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the confusion matrix for the specified metrics/thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: metrics : str, list
One or more of “min_per_class_accuracy”, “absolute_mcc”, “tnr”, “fnr”, “fpr”, “tpr”, “precision”, “accuracy”, “f0point5”, “f2”, “f1”.
- thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the confusion_matrix for the training data.
- valid : bool, optional
If True, return the confusion_matrix for the validation data.
- xval : bool, optional
If True, return the confusion_matrix for each of the cross-validated splits.
Returns: The metric values for the specified key(s).
-
error
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the error for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the error value for the training data.
- valid : bool, optional
If True, return the error value for the validation data.
- xval : bool, optional
If True, return the error value for each of the cross-validated splits.
Returns: The error values for the specified key(s).
-
fallout
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the fallout for a set of thresholds (aka False Positive Rate).
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the fallout value for the training data.
- valid : bool, optional
If True, return the fallout value for the validation data.
- xval : bool, optional
If True, return the fallout value for each of the cross-validated splits.
Returns: The fallout values for the specified key(s).
-
find_idx_by_threshold
(threshold, train=False, valid=False, xval=False)[source]¶ Retrieve the index in this metric’s threshold list at which the given threshold is located.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: train : bool, optional
If True, return the find_idx_by_threshold value for the training data.
- valid : bool, optional
If True, return the find_idx_by_threshold value for the validation data.
- xval : bool, optional
If True, return the find_idx_by_threshold value for each of the cross-validated splits.
Returns: The find_idx_by_threshold values for the specified key(s).
-
find_threshold_by_max_metric
(metric, train=False, valid=False, xval=False)[source]¶ If all are False (default), then return the training metric value.
If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: train : bool, optional
If True, return the find_threshold_by_max_metric value for the training data.
- valid : bool, optional
If True, return the find_threshold_by_max_metric value for the validation data.
- xval : bool, optional
If True, return the find_threshold_by_max_metric value for each of the cross-validated splits.
Returns: The find_threshold_by_max_metric values for the specified key(s).
-
fnr
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the False Negative Rates for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the fnr value for the training data.
- valid : bool, optional
If True, return the fnr value for the validation data.
- xval : bool, optional
If True, return the fnr value for each of the cross-validated splits.
Returns: The fnr values for the specified key(s).
-
fpr
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the False Positive Rates for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the fpr value for the training data.
- valid : bool, optional
If True, return the fpr value for the validation data.
- xval : bool, optional
If True, return the fpr value for each of the cross-validated splits.
Returns: The fpr values for the specified key(s).
-
gains_lift
(train=False, valid=False, xval=False)[source]¶ Get the Gains/Lift table for the specified metrics.
If all are False (default), then return the training metric Gains/Lift table. If more than one options is set to True, then return a dictionary of metrics where t he keys are “train”, “valid”, and “xval”
Parameters: train : bool, optional
If True, return the gains_lift value for the training data.
- valid : bool, optional
If True, return the gains_lift value for the validation data.
- xval : bool, optional
If True, return the gains_lift value for each of the cross-validated splits.
Returns: The gains_lift values for the specified key(s).
-
max_per_class_error
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the max per class error for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the max_per_class_error value for the training data.
- valid : bool, optional
If True, return the max_per_class_error value for the validation data.
- xval : bool, optional
If True, return the max_per_class_error value for each of the cross-validated splits.
Returns: The max_per_class_error values for the specified key(s).
-
mcc
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the mcc for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the mcc value for the training data.
- valid : bool, optional
If True, return the mcc value for the validation data.
- xval : bool, optional
If True, return the mcc value for each of the cross-validated splits.
Returns: The mcc values for the specified key(s).
-
mean_per_class_error
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the mean per class error for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the mean_per_class_error value for the training data.
- valid : bool, optional
If True, return the mean_per_class_error value for the validation data.
- xval : bool, optional
If True, return the mean_per_class_error value for each of the cross-validated splits.
Returns: The mean_per_class_error values for the specified key(s).
-
metric
(metric, thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the metric value for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the metric value for the training data.
- valid : bool, optional
If True, return the metric value for the validation data.
- xval : bool, optional
If True, return the metric value for each of the cross-validated splits.
Returns: The metric values for the specified key(s).
-
missrate
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the miss rate for a set of thresholds (aka False Negative Rate).
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the missrate value for the training data.
- valid : bool, optional
If True, return the missrate value for the validation data.
- xval : bool, optional
If True, return the missrate value for each of the cross-validated splits.
Returns: The missrate values for the specified key(s).
-
plot
(timestep=u'AUTO', metric=u'AUTO', server=False, **kwargs)[source]¶ Plot training set (and validation set if available) scoring history for an H2OBinomialModel.
The timestep and metric arguments are restricted to what is available in its scoring history.
Parameters: timestep : str
A unit of measurement for the x-axis.
- metric : str
A unit of measurement for the y-axis.
-
precision
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the precision for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the precision value for the training data.
- valid : bool, optional
If True, return the precision value for the validation data.
- xval : bool, optional
If True, return the precision value for each of the cross-validated splits.
Returns: The precision values for the specified key(s).
-
recall
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the recall for a set of thresholds (aka True Positive Rate).
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the recall value for the training data.
- valid : bool, optional
If True, return the recall value for the validation data.
- xval : bool, optional
If True, return the recall value for each of the cross-validated splits.
Returns: The recall values for the specified key(s).
-
roc
(train=False, valid=False, xval=False)[source]¶ Return the coordinates of the ROC curve for a given set of data.
The coordinates are two-tuples containing the false positive rates as a list and true positive rates as a list. If all are False (default), then return is the training data. If more than one ROC curve is requested, the data is returned as a dictionary of two-tuples.
Parameters: train : bool, optional
If True, return the roc value for the training data.
- valid : bool, optional
If True, return the roc value for the validation data.
- xval : bool, optional
If True, return the roc value for each of the cross-validated splits.
Returns: The roc values for the specified key(s).
-
sensitivity
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the sensitivity for a set of thresholds (aka True Positive Rate or Recall).
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the sensitivity value for the training data.
- valid : bool, optional
If True, return the sensitivity value for the validation data.
- xval : bool, optional
If True, return the sensitivity value for each of the cross-validated splits.
Returns: The sensitivity values for the specified key(s).
-
specificity
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the specificity for a set of thresholds (aka True Negative Rate).
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the specificity value for the training data.
- valid : bool, optional
If True, return the specificity value for the validation data.
- xval : bool, optional
If True, return the specificity value for each of the cross-validated splits.
Returns: The specificity values for the specified key(s).
-
tnr
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the True Negative Rate for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the tnr value for the training data.
- valid : bool, optional
If True, return the tnr value for the validation data.
- xval : bool, optional
If True, return the tnr value for each of the cross-validated splits.
Returns: The tnr values for the specified key(s).
-
tpr
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the True Positive Rate for a set of thresholds.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: thresholds : list, optional
If None, then the thresholds in this set of metrics will be used.
- train : bool, optional
If True, return the tpr value for the training data.
- valid : bool, optional
If True, return the tpr value for the validation data.
- xval : bool, optional
If True, return the tpr value for each of the cross-validated splits.
Returns: The tpr values for the specified key(s).
-
Multinomial Classification
¶
Multinomial model.
copyright: |
|
---|---|
license: | Apache License Version 2.0 (see LICENSE for details) |
-
class
h2o.model.multinomial.
H2OMultinomialModel
[source]¶ Bases:
h2o.model.model_base.ModelBase
-
confusion_matrix
(data)[source]¶ Returns a confusion matrix based of H2O’s default prediction threshold for a dataset
-
hit_ratio_table
(train=False, valid=False, xval=False)[source]¶ Retrieve the Hit Ratios
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: - train – If train is True, then return the R^2 value for the training data.
- valid – If valid is True, then return the R^2 value for the validation data.
- xval – If xval is True, then return the R^2 value for the cross validation data.
Returns: The R^2 for this regression model.
-
mean_per_class_error
(train=False, valid=False, xval=False)[source]¶ Retrieve the mean per class error across all classes
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: train : bool, optional
If True, return the mean_per_class_error value for the training data.
- valid : bool, optional
If True, return the mean_per_class_error value for the validation data.
- xval : bool, optional
If True, return the mean_per_class_error value for each of the cross-validated splits.
Returns: The mean_per_class_error values for the specified key(s).
-
plot
(timestep=u'AUTO', metric=u'AUTO', **kwargs)[source]¶ Plots training set (and validation set if available) scoring history for an H2OMultinomialModel. The timestep and metric arguments are restricted to what is available in its scoring history.
Parameters: - timestep – A unit of measurement for the x-axis.
- metric – A unit of measurement for the y-axis.
Returns: A scoring history plot.
-
Regression
¶
Regression model.
copyright: |
|
---|---|
license: | Apache License Version 2.0 (see LICENSE for details) |
-
class
h2o.model.regression.
H2ORegressionModel
[source]¶ Bases:
h2o.model.model_base.ModelBase
-
plot
(timestep=u'AUTO', metric=u'AUTO', **kwargs)[source]¶ Plots training set (and validation set if available) scoring history for an H2ORegressionModel. The timestep and metric arguments are restricted to what is available in its scoring history.
Parameters: - timestep – A unit of measurement for the x-axis.
- metric – A unit of measurement for the y-axis.
Returns: A scoring history plot.
-
-
h2o.model.regression.
h2o_explained_variance_score
(y_actual, y_predicted, weights=None)[source]¶ Explained variance regression score function
Parameters: - y_actual – H2OFrame of actual response.
- y_predicted – H2OFrame of predicted response.
- weights – (Optional) sample weights
Returns: the explained variance score (float)
-
h2o.model.regression.
h2o_mean_absolute_error
(y_actual, y_predicted, weights=None)[source]¶ Mean absolute error regression loss.
Parameters: - y_actual – H2OFrame of actual response.
- y_predicted – H2OFrame of predicted response.
- weights – (Optional) sample weights
Returns: loss (float) (best is 0.0)
-
h2o.model.regression.
h2o_mean_squared_error
(y_actual, y_predicted, weights=None)[source]¶ Mean squared error regression loss
Parameters: - y_actual – H2OFrame of actual response.
- y_predicted – H2OFrame of predicted response.
- weights – (Optional) sample weights
Returns: loss (float) (best is 0.0)
-
h2o.model.regression.
h2o_median_absolute_error
(y_actual, y_predicted)[source]¶ Median absolute error regression loss
Parameters: - y_actual – H2OFrame of actual response.
- y_predicted – H2OFrame of predicted response.
Returns: loss (float) (best is 0.0)
-
h2o.model.regression.
h2o_r2_score
(y_actual, y_predicted, weights=1.0)[source]¶ R^2 (coefficient of determination) regression score function
Parameters: - y_actual – H2OFrame of actual response.
- y_predicted – H2OFrame of predicted response.
- weights – (Optional) sample weights
Returns: R^2 (float) (best is 1.0, lower is worse)
Clustering Methods
¶
Clustering model.
copyright: |
|
---|---|
license: | Apache License Version 2.0 (see LICENSE for details) |
-
class
h2o.model.clustering.
H2OClusteringModel
[source]¶ Bases:
h2o.model.model_base.ModelBase
-
betweenss
(train=False, valid=False, xval=False)[source]¶ Get the between cluster sum of squares.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, optional
If True, then return the between cluster sum of squares value for the training data.
- valid : bool, optional
If True, then return the between cluster sum of squares value for the validation data.
- xval : bool, optional
If True, then return the between cluster sum of squares value for each of the cross-validated splits.
Returns: Returns the between sum of squares values for the specified key(s).
-
centroid_stats
(train=False, valid=False, xval=False)[source]¶ Get the centroid statistics for each cluster.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, optional
If True, then return the centroid statistics for the training data.
- valid : bool, optional
If True, then return the centroid statistics for the validation data.
- xval : bool, optional
If True, then return the centroid statistics for each of the cross-validated splits.
Returns: Returns the centroid statistics for the specified key(s).
-
num_iterations
()[source]¶ Get the number of iterations that it took to converge or reach max iterations.
Returns: The number of iterations (integer).
-
size
(train=False, valid=False, xval=False)[source]¶ Get the sizes of each cluster.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”
Parameters: train : bool, optional
If True, then return cluster sizes for the training data.
- valid : bool, optional
If True, then return the cluster sizes for the validation data.
- xval : bool, optional
If True, then return the cluster sizes for each of the cross-validated splits.
Returns: Returns the cluster sizes for the specified key(s).
-
tot_withinss
(train=False, valid=False, xval=False)[source]¶ Get the total within cluster sum of squares.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, optional
If True, then return the total within cluster sum of squares value for the training data.
- valid : bool, optional
If True, then return the total within cluster sum of squares value for the validation data.
- xval : bool, optional
If True, then return the total within cluster sum of squares value for each of the cross-validated splits.
Returns: Returns the total within cluster sum of squares values for the specified key(s).
-
totss
(train=False, valid=False, xval=False)[source]¶ Get the total sum of squares.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, optional
If True, then return the total sum of squares value for the training data.
- valid : bool, optional
If True, then return the total sum of squares value for the validation data.
- xval : bool, optional
If True, then return the total sum of squares value for each of the cross-validated splits.
Returns: Returns the total sum of squares values for the specified key(s).
-
withinss
(train=False, valid=False, xval=False)[source]¶ Get the within cluster sum of squares for each cluster.
If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.
Parameters: train : bool, optional
If True, then return the within cluster sum of squares value for the training data.
- valid : bool, optional
If True, then return the within cluster sum of squares value for the validation data.
- xval : bool, optional
If True, then return the within cluster sum of squares value for each of the cross-validated splits.
Returns: Returns the total sum of squares values for the specified key(s).
-
AutoEncoders
¶
Autoencoder model.
copyright: |
|
---|---|
license: | Apache License Version 2.0 (see LICENSE for details) |
-
class
h2o.model.autoencoder.
H2OAutoEncoderModel
[source]¶ Bases:
h2o.model.model_base.ModelBase
-
anomaly
(test_data, per_feature=False)[source]¶ Obtain the reconstruction error for the input test_data.
Parameters: test_data : H2OFrame
The dataset upon which the reconstruction error is computed.
- per_feature : bool
Whether to return the square reconstruction error per feature. Otherwise, return the mean square error.
Returns: Return the reconstruction error.
-