Metrics in H2O¶
H2O Model Metrics¶
MetricsBase
¶
- copyright
2016 H2O.ai
- license
Apache License Version 2.0 (see LICENSE for details)
-
class
h2o.model.metrics_base.
MetricsBase
(metric_json, on=None, algo='')[source] Bases:
h2o.model.metrics_base.MetricsBase
A parent class to house common metrics available for the various Metrics types.
The methods here are available across different model categories.
Note
This class and its subclasses are used at runtime as mixins: their methods can (and should) be accessed directly from a metrics object, for example as a result of
model_performance()
.-
aic
()[source] The AIC for this set of metrics.
- Examples
>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.aic()
-
auc
()[source] The AUC for this set of metrics.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.auc()
-
aucpr
()[source] The area under the precision recall curve.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.aucpr()
-
custom_metric_name
()[source] Name of custom metric or None.
-
custom_metric_value
()[source] Value of custom metric or None.
-
gini
()[source] Gini coefficient.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.gini()
-
hglm_metric
(metric_string)[source]
-
logloss
()[source] Log loss.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.logloss()
-
mae
()[source] The MAE for this set of metrics.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(distribution = "poisson", ... seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.mae()
-
classmethod
make
(kvs)[source] Factory method to instantiate a MetricsBase object from the list of key-value pairs.
-
mean_per_class_error
()[source] The mean per class error.
- Examples
>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.mean_per_class_error()
-
mean_residual_deviance
()[source] The mean residual deviance for this set of metrics.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/AirlinesTest.csv.zip") >>> air_gbm = H2OGradientBoostingEstimator() >>> air_gbm.train(x=list(range(9)), ... y=9, ... training_frame=airlines, ... validation_frame=airlines) >>> air_gbm.mean_residual_deviance(train=True,valid=False,xval=False)
-
mse
()[source] The MSE for this set of metrics.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.mse()
-
nobs
()[source] The number of observations.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> perf = cars_gbm.model_performance() >>> perf.nobs()
-
null_degrees_of_freedom
()[source] The null DoF if the model has residual deviance, otherwise None.
- Examples
>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.null_degrees_of_freedom()
-
null_deviance
()[source] The null deviance if the model has residual deviance, otherwise None.
- Examples
>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.null_deviance()
-
pr_auc
()[source] MetricsBase.pr_auc
is deprecated, please useMetricsBase.aucpr
instead.
-
r2
()[source] The R squared coefficient.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.r2()
-
residual_degrees_of_freedom
()[source] The residual DoF if the model has residual deviance, otherwise None.
- Examples
>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.residual_degrees_of_freedom()
-
residual_deviance
()[source] The residual deviance if the model has it, otherwise None.
- Examples
>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.residual_deviance()
-
rmse
()[source] The RMSE for this set of metrics.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.rmse()
-
rmsle
()[source] The RMSLE for this set of metrics.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(distribution = "poisson", ... seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.rmsle()
-
show
(verbosity=None, fmt=None)[source] Describe and renders the current object in the given format and verbosity level if supported, by default guessing the best format for the current environment.
- Parameters
verbosity – one of (None, ‘short’, ‘medium’, ‘full’). Defaults to None (object’s default verbosity).
fmt – one of (None, ‘plain’, ‘pretty’, ‘html’). Defaults to None (picks appropriate format depending on platform/context).
-
Binomial Classification
¶
-
class
h2o.model.metrics.binomial.
H2OBinomialModelMetrics
(metric_json, on=None, algo='')[source]¶ Bases:
h2o.model.metrics_base.MetricsBase
This class is essentially an API for the AUC object. This class contains methods for inspecting the AUC for different criteria. To input the different criteria, use the static variable
criteria
.-
F0point5
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The F0.5 for this set of metrics and thresholds.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.F0point5()
-
F1
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The F1 for the given set of thresholds.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.F1()
-
F2
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The F2 for this set of metrics and thresholds.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.F2()
-
accuracy
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The accuracy for this set of metrics and thresholds.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.accuracy()
-
confusion_matrix
(metrics=None, thresholds=None)[source]¶ Get the confusion matrix for the specified metric.
- Parameters
metrics – A string (or list of strings) among metrics listed in
maximizing_metrics
. Defaults to'f1'
.thresholds – A value (or list of values) between 0 and 1. If None, then the thresholds maximizing each provided metric will be used.
- Returns
a list of ConfusionMatrix objects (if there are more than one to return), a single ConfusionMatrix (if there is only one), or None if thresholds are metrics scores are missing.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution=distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.confusion_matrix(train)
-
error
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold minimizing the error will be used.- Returns
The error for this set of metrics and thresholds.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.error()
-
fallout
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The fallout (same as False Positive Rate) for this set of metrics and thresholds.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.fallout()
-
find_idx_by_threshold
(threshold)[source]¶ Retrieve the index in this metric’s threshold list at which the given threshold is located.
- Parameters
threshold – Find the index of this input threshold.
- Returns
the index.
- Raises
ValueError – if no such index can be found.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> local_data = [[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'], ... [1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'], ... [0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'], ... [0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'],[0, 'b']] >>> h2o_data = h2o.H2OFrame(local_data) >>> h2o_data.set_names(['response', 'predictor']) >>> h2o_data["response"] = h2o_data["response"].asfactor() >>> gbm = H2OGradientBoostingEstimator(ntrees=1, ... distribution="bernoulli") >>> gbm.train(x=list(range(1,h2o_data.ncol)), ... y="response", ... training_frame=h2o_data) >>> perf = gbm.model_performance() >>> perf.find_idx_by_threshold(0.45)
-
find_threshold_by_max_metric
(metric)[source]¶ - Parameters
metrics – A string among the metrics listed in
maximizing_metrics
.- Returns
the threshold at which the given metric is maximal.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> local_data = [[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'], ... [1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'], ... [0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'], ... [0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'],[0, 'b']] >>> h2o_data = h2o.H2OFrame(local_data) >>> h2o_data.set_names(['response', 'predictor']) >>> h2o_data["response"] = h2o_data["response"].asfactor() >>> gbm = H2OGradientBoostingEstimator(ntrees=1, ... distribution="bernoulli") >>> gbm.train(x=list(range(1,h2o_data.ncol)), ... y="response", ... training_frame=h2o_data) >>> perf = gbm.model_performance() >>> perf.find_threshold_by_max_metric("f1")
-
fnr
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The False Negative Rate.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.fnr()
-
fpr
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The False Positive Rate.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.fpr()
-
property
fprs
¶ Return all false positive rates for all threshold values.
- Returns
a list of false positive rates.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> r = cars[0].runif() >>> train = cars[r > .2] >>> valid = cars[r <= .2] >>> response_col = "economy_20mpg" >>> distribution = "bernoulli" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, distribution=distribution, fold_assignment="Random") >>> gbm.train(y=response_col, x=predictors, validation_frame=valid, training_frame=train) >>> (fprs, tprs) = gbm.roc(train=True, valid=False, xval=False) >>> fprs
-
gains_lift
()[source]¶ Retrieve the Gains/Lift table.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution=distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.gains_lift()
-
gains_lift_plot
(type='both', server=False, save_plot_path=None, plot=True)[source]¶ Plot Gains/Lift curves.
- Parameters
type –
one of:
”both” (default)
”gains”
”lift”
server – if
True
, generate plot inline using matplotlib’s Anti-Grain Geometry (AGG) backend.save_plot_path – filename to save the plot to.
plot –
True
to plot curve;False
to get a gains lift table.
- Returns
Gains lift table + the resulting plot (can be accessed using
result.figure()
).
-
max_per_class_error
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold minimizing the error will be used.- Returns
Return 1 - min(per class accuracy).
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.max_per_class_error()
-
maximizing_metrics
= ('absolute_mcc', 'accuracy', 'precision', 'f0point5', 'f1', 'f2', 'mean_per_class_accuracy', 'min_per_class_accuracy', 'tns', 'fns', 'fps', 'tps', 'tnr', 'fnr', 'fpr', 'tpr', 'fallout', 'missrate', 'recall', 'sensitivity', 'specificity')¶ metrics names allowed for confusion matrix
-
mcc
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The absolute MCC (a value between 0 and 1, 0 being totally dissimilar, 1 being identical).
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.mcc()
-
mean_per_class_error
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold minimizing the error will be used.- Returns
mean per class error.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.mean_per_class_error()
-
metric
(metric, thresholds=None)[source]¶ - Parameters
metric (str) – A metric among
maximizing_metrics
.thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used. If ‘all’, then all stored thresholds are used and returned with the matching metric.
- Returns
The set of metrics for the list of thresholds. The returned list has a ‘value’ property holding only the metric value (if no threshold provided or if provided as a number), or all the metric values (if thresholds provided as a list)
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> local_data = [[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'], ... [1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'], ... [0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'], ... [0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'],[0, 'b']] >>> h2o_data = h2o.H2OFrame(local_data) >>> h2o_data.set_names(['response', 'predictor']) >>> h2o_data["response"] = h2o_data["response"].asfactor() >>> gbm = H2OGradientBoostingEstimator(ntrees=1, ... distribution="bernoulli") >>> gbm.train(x=list(range(1,h2o_data.ncol)), ... y="response", ... training_frame=h2o_data) >>> perf = gbm.model_performance() >>> perf.metric("tps", [perf.find_threshold_by_max_metric("f1")])[0][1]
-
metrics_aliases
= {'fallout': 'fpr', 'missrate': 'fnr', 'recall': 'tpr', 'sensitivity': 'tpr', 'specificity': 'tnr'}¶
-
missrate
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The miss rate (same as False Negative Rate).
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.missrate()
-
plot
(type='roc', server=False, save_plot_path=None, plot=True)[source]¶ Produce the desired metric plot.
- Parameters
type –
the type of metric plot. One of (currently supported):
ROC curve (‘roc’)
Precision Recall curve (‘pr’)
Gains Lift curve (‘gainslift’)
server – if True, generate plot inline using matplotlib’s Anti-Grain Geometry (AGG) backend.
save_plot_path – filename to save the plot to.
plot –
True
to plot curve;False
to get a tuple of values at axis x and y of the plot (tprs and fprs for AUC, recall and precision for PR).
- Returns
None or values of x and y axis of the plot + the resulting plot (can be accessed using
result.figure()
).- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.plot(type="roc") >>> cars_gbm.plot(type="pr")
-
precision
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The precision for this set of metrics and thresholds.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.precision()
-
recall
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
Recall for this set of metrics and thresholds.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.recall()
-
roc
()[source]¶ Return the coordinates of the ROC curve as a tuple containing the false positive rates as a list and true positive rates as a list. :returns: The ROC values.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> r = cars[0].runif() >>> train = cars[r > .2] >>> valid = cars[r <= .2] >>> response_col = "economy_20mpg" >>> distribution = "bernoulli" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution=distribution, ... fold_assignment="Random") >>> gbm.train(x=predictors, ... y=response_col, ... validation_frame=valid, ... training_frame=train) >>> gbm.roc(train=True, valid=False, xval=False)
-
sensitivity
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
Sensitivity or True Positive Rate for this set of metrics and thresholds.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.sensitivity()
-
specificity
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The specificity (same as True Negative Rate).
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.specificity()
-
tnr
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The True Negative Rate.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.tnr()
-
tpr
(thresholds=None)[source]¶ - Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.- Returns
The True Postive Rate.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.tpr()
-
property
tprs
¶ Return all true positive rates for all threshold values.
- Returns
a list of true positive rates.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> r = cars[0].runif() >>> train = cars[r > .2] >>> valid = cars[r <= .2] >>> response_col = "economy_20mpg" >>> distribution = "bernoulli" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, distribution=distribution, fold_assignment="Random") >>> gbm.train(y=response_col, x=predictors, validation_frame=valid, training_frame=train) >>> (fprs, tprs) = gbm.roc(train=True, valid=False, xval=False) >>> tprs
-
Multinomial Classification
¶
-
class
h2o.model.metrics.multinomial.
H2OMultinomialModelMetrics
(metric_json, on=None, algo='')[source]¶ Bases:
h2o.model.metrics_base.MetricsBase
-
confusion_matrix
()[source]¶ Returns a confusion matrix based on H2O’s default prediction threshold for a dataset.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution = distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.confusion_matrix(train)
-
hit_ratio_table
()[source]¶ Retrieve the Hit Ratios.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution = distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.hit_ratio_table()
-
multinomial_auc_table
()[source]¶ Retrieve the multinomial AUC values.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution = distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.multinomial_auc_table()
-
multinomial_aucpr_table
()[source]¶ Retrieve the multinomial PR AUC values.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution = distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.multinomial_aucpr_table()
-
Regression
¶
-
class
h2o.model.metrics.regression.
H2ORegressionModelMetrics
(metric_json, on=None, algo='')[source]¶ Bases:
h2o.model.metrics_base.MetricsBase
This class provides an API for inspecting the metrics returned by a regression model.
It is possible to retrieve the \(R^2\) (1 - MSE/variance) and MSE.
- Examples
>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_glm = H2OGeneralizedLinearEstimator() >>> cars_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_glm.mse()
Anomaly Detection
¶
-
class
h2o.model.metrics.anomaly_detection.
H2OAnomalyDetectionModelMetrics
(metric_json, on=None, algo='')[source]¶ Bases:
h2o.model.metrics_base.MetricsBase
-
mean_normalized_score
()[source]¶ Mean Normalized Anomaly Score. For Isolation Forest - normalized average path length.
- Examples
>>> from h2o.estimators.isolation_forest import H2OIsolationForestEstimator >>> train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_train.csv") >>> test = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_test.csv") >>> isofor_model = H2OIsolationForestEstimator(sample_size=5, ntrees=7) >>> isofor_model.train(training_frame = train) >>> perf = isofor_model.model_performance() >>> perf.mean_normalized_score()
-
mean_score
()[source]¶ Mean Anomaly Score. For Isolation Forest represents the average of all tree-path lengths.
- Examples
>>> from h2o.estimators.isolation_forest import H2OIsolationForestEstimator >>> train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_train.csv") >>> test = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_test.csv") >>> isofor_model = H2OIsolationForestEstimator(sample_size=5, ntrees=7) >>> isofor_model.train(training_frame = train) >>> perf = isofor_model.model_performance() >>> perf.mean_score()
-
Clustering
¶
-
class
h2o.model.metrics.clustering.
H2OClusteringModelMetrics
(metric_json, on=None, algo='')[source]¶ Bases:
h2o.model.metrics_base.MetricsBase
-
betweenss
()[source]¶ The Between Cluster Sum-of-Square Error, or None if not present.
- Examples
>>> from h2o.estimators.kmeans import H2OKMeansEstimator >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> km = H2OKMeansEstimator(k=3, nfolds=3) >>> km.train(x=list(range(4)), training_frame=iris) >>> km.betweenss()
-
tot_withinss
()[source]¶ The Total Within Cluster Sum-of-Square Error, or None if not present.
- Examples
>>> from h2o.estimators.kmeans import H2OKMeansEstimator >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> km = H2OKMeansEstimator(k=3, nfolds=3) >>> km.train(x=list(range(4)), training_frame=iris) >>> km.tot_withinss()
-
totss
()[source]¶ The Total Sum-of-Square Error to Grand Mean, or None if not present.
- Examples
>>> from h2o.estimators.kmeans import H2OKMeansEstimator >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> km = H2OKMeansEstimator(k=3, nfolds=3) >>> km.train(x=list(range(4)), training_frame=iris) >>> km.totss()
-
CoxPH
¶
-
class
h2o.model.metrics.coxph.
H2ORegressionCoxPHModelMetrics
(metric_json, on=None, algo='')[source]¶ Bases:
h2o.model.metrics_base.MetricsBase
- Examples
>>> from h2o.estimators.coxph import H2OCoxProportionalHazardsEstimator >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop", ... ties="breslow") >>> coxph.train(x="age", y="event", training_frame=heart) >>> coxph
Dimensionality Reduction
¶
Ordinal
¶
-
class
h2o.model.metrics.ordinal.
H2OOrdinalModelMetrics
(metric_json, on=None, algo='')[source]¶ Bases:
h2o.model.metrics_base.MetricsBase
-
confusion_matrix
()[source]¶ Returns a confusion matrix based of H2O’s default prediction threshold for a dataset.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution = distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.confusion_matrix(train)
-
hit_ratio_table
()[source]¶ Retrieve the Hit Ratios.
- Examples
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution = distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.hit_ratio_table()
-
Uplift
¶
-
class
h2o.model.metrics.uplift.
H2OBinomialUpliftModelMetrics
(metric_json, on=None, algo='')[source]¶ Bases:
h2o.model.metrics_base.MetricsBase
This class is available only for Uplift DRF model. This class is essentially an API for the AUUC object.
-
aecu
(metric='AUTO')[source]¶ Retrieve AECU value (average excess cumulative uplift - area between Uplift curve and random curve).
- Parameters
metric –
AECU metric type One of:
”None”
”qini”
”lift”
”gain”
”AUTO” (default; defaults to “qini”)
- Returns
AECU value.
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.aecu()
-
aecu_table
()[source]¶ Retrieve all types of AECU values in a table.
- Returns
a table of AECU values.
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.aecu_table()
-
auuc
(metric=None)[source]¶ Retrieve area under cumulative uplift curve (AUUC) value.
- Parameters
metric –
AUUC metric type. One of:
”None” (default; takes default metric from model parameters)
”AUTO” (defaults to “qini”)
”qini”
”lift”
”gain”
- Returns
AUUC value.
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.auuc()
-
auuc_normalized
(metric=None)[source]¶ Retrieve normalized area under cumulative uplift curve (AUUC) value.
- Parameters
metric –
AUUC metric type. One of:
”None” (default; takes default metric from model parameters)
”AUTO” (defaults to “qini”)
”qini”
”lift”
”gain”
- Returns
normalized AUUC value.
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.auuc_normalized()
-
auuc_table
()[source]¶ Retrieve all types of AUUC in a table.
- Returns
a table of AUUCs.
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.auuc_table()
-
n
()[source]¶ Retrieve cumulative sum of numbers of observations in each bin.
- Returns
a list of numbers of observation.
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.n()
-
plot_uplift
(server=False, save_to_file=None, plot=True, metric='AUTO', normalize=False)[source]¶ Plot Uplift Curve.
- Parameters
server – if
True
, generate plot inline using matplotlib’s Anti-Grain Geometry (AGG) backend.save_to_file – filename to save the plot to.
plot –
True
to plot curve,False
to get a tuple of values at axis x and y of the plot (number of observations and uplift values)metric –
AUUC metric type. One of:
”qini”
”lift”
”gain”
”AUTO” (default; defaults to “qini”)
normalize – If
True
, normalized values are plotted.
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.plot_uplift(plot=True) >>> n, uplift = perf.plot_uplift(plot=False)
-
qini
()[source]¶ Retrieve Qini value (area between Qini cumulative uplift curve and random curve).
- Returns
Qini value.
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.qini()
-
thresholds
()[source]¶ Retrieve prediction thresholds for each bin.
- Returns
a list of thresholds.
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.thresholds()
-
thresholds_and_metric_scores
()[source]¶ Retrieve thresholds and metric scores table.
- Returns
a thresholds and metric scores table for the specified key(s).
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.thresholds_and_metric_scores()
-
uplift
(metric='AUTO')[source]¶ Retrieve uplift values for each bin.
- Parameters
metric –
AUUC metric type. One of:
”qini”
”lift”
”gain”
”AUTO” (default; defaults to “qini”)
- Returns
a list of uplift values.
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.uplift()
-
uplift_normalized
(metric='AUTO')[source]¶ Retrieve normalized uplift values for each bin.
- Parameters
metric –
AUUC metric type. One of:
”qini”
”lift”
”gain”
”AUTO” (default; defaults to “qini”)
- Returns
a list of normalized uplift values.
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.uplift_normalized()
-
uplift_random
(metric='AUTO')[source]¶ Retrieve random uplift values for each bin.
- Parameters
metric –
AUUC metric type. One of:
”qini”
”lift”
”gain”
”AUTO” (default; defaults to “qini”)
- Returns
a list of random uplift values.
- Examples
>>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.uplift_random()
-
H2O Grid Metrics¶
Note
Classes in this module are used at runtime as mixins: their methods can (and should) be accessed directly from a trained grid.
-
class
h2o.grid.metrics.
H2OAutoEncoderGridSearch
[source]¶ Bases:
object
-
anomaly
(test_data, per_feature=False)[source]¶ Obtain the reconstruction error for the input
test_data
.- Parameters
test_data (H2OFrame) – The dataset upon which the reconstruction error is computed.
per_feature (bool) – Whether to return the square reconstruction error per feature. Otherwise, return the mean square error.
- Returns
the reconstruction error.
- Example
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators import H2OAutoEncoderEstimator >>> rows = [[1,2,3,4,0]*50, ... [2,1,2,4,1]*50, ... [2,1,4,2,1]*50, ... [0,1,2,34,1]*50, ... [2,3,4,1,0]*50] >>> fr = h2o.H2OFrame(rows) >>> hyper_parameters = {'activation': "Tanh", 'hidden': [50,50,50]} >>> gs = H2OGridSearch(H2OAutoEncoderEstimator(), hyper_parameters) >>> gs.train(x=range(4), training_frame=fr) >>> gs.anomaly(fr, per_feature=True)
-
-
class
h2o.grid.metrics.
H2OBinomialGridSearch
[source]¶ Bases:
object
-
F0point5
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the F0.5 for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the F0point5 value for the training data.valid (bool) – If valid is
True
, then return the F0point5 value for the validation data.xval (bool) – If xval is
True
, then return the F0point5 value for the cross validation data.
- Returns
The F0point5 for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.F0point5(train=True)
-
F1
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the F1 values for a set of thresholds for the models explored.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If
True
, return the F1 value for the training data.valid (bool) – If
True
, return the F1 value for the validation data.xval (bool) – If
True
, return the F1 value for each of the cross-validated splits.
- Returns
Dictionary of model keys to F1 values
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.F1(train=True)
-
F2
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the F2 for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the F2 value for the training data.valid (bool) – If valid is
True
, then return the F2 value for the validation data.xval (bool) – If xval is
True
, then return the F2 value for the cross validation data.
- Returns
Dictionary of model keys to F2 values.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.F2(train=True)
-
accuracy
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the accuracy for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the accuracy value for the training data.valid (bool) – If valid is
True
, then return the accuracy value for the validation data.xval (bool) – If xval is
True
, then return the accuracy value for the cross validation data.
- Returns
The accuracy for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.accuracy(train=True)
-
confusion_matrix
(metrics=None, thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the confusion matrix for the specified metrics/thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
metrics – A string (or list of strings) among metrics listed in
H2OBinomialModelMetrics.maximizing_metrics
. Defaults to'f1'
.thresholds – A value (or list of values) between 0 and 1. If None, then the thresholds maximizing each provided metric will be used.
train (bool) – If train is
True
, then return the confusion matrix value for the training data.valid (bool) – If valid is
True
, then return the confusion matrix value for the validation data.xval (bool) – If xval is
True
, then return the confusion matrix value for the cross validation data.
- Returns
The confusion matrix for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.confusion_matrix(train=True)
-
error
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the error for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold minimizing the error will be used.train (bool) – If train is
True
, then return the error value for the training data.valid (bool) – If valid is
True
, then return the error value for the validation data.xval (bool) – If xval is
True
, then return the error value for the cross validation data.
- Returns
The error for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.error(train=True)
-
fallout
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the Fallout (AKA False Positive Rate) for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the fallout value for the training data.valid (bool) – If valid is
True
, then return the fallout value for the validation data.xval (bool) – If xval is
True
, then return the fallout value for the cross validation data.
- Returns
The fallout for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.fallout(train=True)
-
find_idx_by_threshold
(threshold, train=False, valid=False, xval=False)[source]¶ Retrieve the index in this metric’s threshold list at which the given threshold is located.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
threshold (float) – The threshold value to search for.
train (bool) – If train is
True
, then return theidx_by_threshold
for the training data.valid (bool) – If valid is
True
, then return theidx_by_threshold
for the validation data.xval (bool) – If xval is
True
, then return theidx_by_threshold
for the cross validation data.
- Returns
The
idx_by_threshold
for this binomial model.- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.find_idx_by_threshold(0.45, train=True)
-
find_threshold_by_max_metric
(metric, train=False, valid=False, xval=False)[source]¶ If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
metric (str) – A metric among the metrics listed in
H2OBinomialModelMetrics.maximizing_metrics
.train (bool) – If train is
True
, then return thethreshold_by_max_metric
value for the training data.valid (bool) – If valid is
True
, then return thethreshold_by_max_metric
value for the validation data.xval (bool) – If xval is
True
, then return thethreshold_by_max_metric
value for the cross validation data.
- Returns
The
threshold_by_max_metric
for this binomial model.- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.find_threshold_by_max_metric("tps", train=True)
-
fnr
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the False Negative Rates for a set of thresholds. If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the FNR value for the training data.valid (bool) – If valid is
True
, then return the FNR value for the validation data.xval (bool) – If xval is
True
, then return the FNR value for the cross validation data.
- Returns
The FNR for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.fnr(train=True)
-
fpr
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the False Positive Rates for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the FPR value for the training data.valid (bool) – If valid is
True
, then return the FPR value for the validation data.xval (bool) – If xval is
True
, then return the FPR value for the cross validation data.
- Returns
The FPR for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.fpr(train=True)
-
max_per_class_error
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the max per class error for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold minimizing the error will be used.train (bool) – If train is
True
, then return themax_per_class_error
value for the training data.valid (bool) – If valid is
True
, then return themax_per_class_error
value for the validation data.xval (bool) – If xval is
True
, then return themax_per_class_error
value for the cross validation data.
- Returns
The max per class error for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.max_per_class_error(train=True)
-
mcc
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the MCC for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the mcc value for the training data.valid (bool) – If valid is
True
, then return the mcc value for the validation data.xval (bool) – If xval is
True
, then return the mcc value for the cross validation data.
- Returns
The MCC for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.mcc(train=True)
-
mean_per_class_error
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the mean per class error for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold minimizing the error will be used.train (bool) – If train is
True
, then return themean_per_class_error
value for the training data.valid (bool) – If valid is
True
, then return themean_per_class_error
value for the validation data.xval (bool) – If xval is
True
, then return themean_per_class_error
value for the cross validation data.
- Returns
The mean per class error for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.mean_per_class_error(train=True)
-
metric
(metric, thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the metric value for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
metric – name of the metric to compute.
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the metrics for the training data.valid (bool) – If valid is
True
, then return the metrics for the validation data.xval (bool) – If xval is
True
, then return the metrics for the cross validation data.
- Returns
The metrics for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.metric("tps", train=True)
-
missrate
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the miss rate (AKA False Negative Rate) for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the missrate value for the training data.valid (bool) – If valid is
True
, then return the missrate value for the validation data.xval (bool) – If xval is
True
, then return the missrate value for the cross validation data.
- Returns
The missrate for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.missrate(train=True)
-
precision
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the precision for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the precision value for the training data.valid (bool) – If valid is
True
, then return the precision value for the validation data.xval (bool) – If xval is
True
, then return the precision value for the cross validation data.
- Returns
The precision for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs. precision(train=True)
-
recall
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the Recall (AKA True Positive Rate) for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the recall value for the training data.valid (bool) – If valid is
True
, then return the recall value for the validation data.xval (bool) – If xval is
True
, then return the recall value for the cross validation data.
- Returns
The recall for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.recall(train=True)
-
roc
(train=False, valid=False, xval=False)[source]¶ Return the coordinates of the ROC curve for a given set of data, as a two-tuple containing the false positive rates as a list and true positive rates as a list.
If all are
False
(default), then return the training data. If more than one ROC curve is requested, the data is returned as a dictionary of two-tuples.- Parameters
train (bool) – If train is
True
, then return the ROC coordinates for the training data.valid (bool) – If valid is
True
, then return the ROC coordinates for the validation data.xval (bool) – If xval is
True
, then return the ROC coordinates for the cross validation data.
- Returns
the true cooridinates of the roc curve.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.roc(train=True)
-
sensitivity
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the sensitivity (AKA True Positive Rate or Recall) for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the sensitivity value for the training data.valid (bool) – If valid is
True
, then return the sensitivity value for the validation data.xval (bool) – If xval is
True
, then return the sensitivity value for the cross validation data.
- Returns
The sensitivity for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.sensitivity(train=True)
-
specificity
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the specificity (AKA True Negative Rate) for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the specificity value for the training data.valid (bool) – If valid is
True
, then return the specificity value for the validation data.xval (bool) – If xval is
True
, then return the specificity value for the cross validation data.
- Returns
The specificity for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.specificity(train=True)
-
tnr
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the True Negative Rate for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the TNR value for the training data.valid (bool) – If valid is
True
, then return the TNR value for the validation data.xval (bool) – If xval is
True
, then return the TNR value for the cross validation data.
- Returns
The TNR for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.tnr(train=True)
-
tpr
(thresholds=None, train=False, valid=False, xval=False)[source]¶ Get the True Positive Rate for a set of thresholds.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
thresholds – thresholds parameter must be a list (e.g.
[0.01, 0.5, 0.99]
). If None, then the threshold maximizing the metric will be used.train (bool) – If train is
True
, then return the TPR value for the training data.valid (bool) – If valid is
True
, then return the TPR value for the validation data.xval (bool) – If xval is
True
, then return the TPR value for the cross validation data.
- Returns
The TPR for this binomial model.
- Examples
>>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.tpr(train=True)
-
-
class
h2o.grid.metrics.
H2OClusteringGridSearch
[source]¶ Bases:
object
-
betweenss
(train=False, valid=False, xval=False)[source]¶ Get the between cluster sum of squares.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
train (bool) – If
True
, then return the between cluster sum of squares value for the training data.valid (bool) – If
True
, then return the between cluster sum of squares value for the validation data.xval (bool) – If
True
, then return the between cluster sum of squares value for each of the cross-validated splits.
- Returns
the between cluster sum of squares values for the specified key(s).
- Examples
>>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.betweenss(train=True)
-
centers
()[source]¶ Returns the centers for the KMeans model.
- Examples
>>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.centers()
-
centers_std
()[source]¶ Returns the standardized centers for the KMeans model.
- Examples
>>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.centers_std()
-
centroid_stats
(train=False, valid=False, xval=False)[source]¶ Get the centroid statistics for each cluster.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
train (bool) – If
True
, then return the centroid statistics for the training data.valid (bool) – If
True
, then return the centroid statistics for the validation data.xval (bool) – If
True
, then return the centroid statistics for each of the cross-validated splits.
- Returns
the centroid statistics for the specified key(s).
- Examples
>>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.centroid_stats(train=True)
-
num_iterations
()[source]¶ Get the number of iterations that it took to converge or reach max iterations.
- Examples
>>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.num_iterations()
-
size
(train=False, valid=False, xval=False)[source]¶ Get the sizes of each cluster.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
train (bool) – If
True
, then return the cluster sizes for the training data.valid (bool) – If
True
, then return the cluster sizes for the validation data.xval (bool) – If
True
, then return the cluster sizes for each of the cross-validated splits.
- Returns
the cluster sizes for the specified key(s).
- Examples
>>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.size(train=True)
-
tot_withinss
(train=False, valid=False, xval=False)[source]¶ Get the total within cluster sum of squares.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
train (bool) – If
True
, then return the total within cluster sum of squares for the training data.valid (bool) – If
True
, then return the total within cluster sum of squares for the validation data.xval (bool) – If
True
, then return the total within cluster sum of squares for each of the cross-validated splits.
- Returns
the total within cluster sum of squares values for the specified key(s).
- Examples
>>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.tot_withinss(train=True)
-
totss
(train=False, valid=False, xval=False)[source]¶ Get the total sum of squares.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
train (bool) – If
True
, then return total sum of squares for the training data.valid (bool) – If
True
, then return the total sum of squares for the validation data.xval (bool) – If
True
, then return the total sum of squares for each of the cross-validated splits.
- Returns
the total sum of squares values for the specified key(s).
- Examples
>>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.totss(train=True)
-
withinss
(train=False, valid=False, xval=False)[source]¶ Get the within cluster sum of squares for each cluster.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
train (bool) – If
True
, then return within cluster sum of squares for the training data.valid (bool) – If
True
, then return the within cluster sum of squares for the validation data.xval (bool) – If
True
, then return the within cluster sum of squares for each of the cross-validated splits.
- Returns
the within cluster sum of squares values for the specified key(s).
- Examples
>>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.withinss(train=True)
-
-
class
h2o.grid.metrics.
H2ODimReductionGridSearch
[source]¶ Bases:
object
-
archetypes
()[source]¶ - Returns
the archetypes (Y) of the GLRM model.
- Examples
>>> from h2o.estimators import H2OGeneralizedLowRankEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> hyper_parameters = {'gamma_x': [0.05, 0.5], 'gamma_y': [0.05,0.5]} >>> gs = H2OGridSearch(H2OGeneralizedLowRankEstimator(), ... hyper_parameters) >>> gs.train(x=iris.names, training_frame=iris) >>> gs.archetypes()
-
final_step
()[source]¶ Get the final step size from the GLRM model.
- Returns
final step size (double).
- Examples
>>> from h2o.estimators import H2OGeneralizedLowRankEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> hyper_parameters = {'gamma_x': [0.05, 0.5], 'gamma_y': [0.05,0.5]} >>> gs = H2OGridSearch(H2OGeneralizedLowRankEstimator(), ... hyper_parameters) >>> gs.train(x=iris.names, training_frame=iris) >>> gs.final_step()
-
num_iterations
()[source]¶ Get the number of iterations that it took to converge or reach max iterations.
- Returns
number of iterations (integer).
- Examples
>>> from h2o.estimators import H2OGeneralizedLowRankEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> hyper_parameters = {'gamma_x': [0.05, 0.5], 'gamma_y': [0.05,0.5]} >>> gs = H2OGridSearch(H2OGeneralizedLowRankEstimator(), ... hyper_parameters) >>> gs.train(x=iris.names, training_frame=iris) >>> gs.num_iterations()
-
objective
()[source]¶ Get the final value of the objective function from the GLRM model.
- Returns
final objective value (double).
- Examples
>>> from h2o.estimators import H2OGeneralizedLowRankEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> hyper_parameters = {'gamma_x': [0.05, 0.5], 'gamma_y': [0.05,0.5]} >>> gs = H2OGridSearch(H2OGeneralizedLowRankEstimator(), ... hyper_parameters) >>> gs.train(x=iris.names, training_frame=iris) >>> gs.objective()
-
-
class
h2o.grid.metrics.
H2OMultinomialGridSearch
[source]¶ Bases:
object
-
auc
(train=False, valid=False, xval=False)[source]¶ Retrieve the AUC value.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
train (bool) – If train is
True
, then return the AUC values for the training data.valid (bool) – If valid is
True
, then return the AUC values for the validation data.xval (bool) – If xval is
True
, then return the AUC values for the cross validation data.
- Returns
The AUC values for this multinomial model.
- Examples
>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family = "multinomial"), ... hyper_parameters) >>> gs.train(x=[0,1,2,3], y=4, training_frame=iris) >>> gs.auc(train=True)
-
aucpr
(train=False, valid=False, xval=False)[source]¶ Retrieve the PR AUC value.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
train (bool) – If train is
True
, then return the PR AUC values for the training data.valid (bool) – If valid is
True
, then return the PR AUC values for the validation data.xval (bool) – If xval is
True
, then return the PR AUC values for the cross validation data.
- Returns
The PR AUC values for this multinomial model.
- Examples
>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family = "multinomial"), ... hyper_parameters) >>> gs.train(x=[0,1,2,3], y=4, training_frame=iris) >>> gs.aucpr(train=True)
-
confusion_matrix
(data)[source]¶ Returns a confusion matrix based of H2O’s default prediction threshold for a dataset.
- Parameters
data – metric for which the confusion matrix will be calculated.
- Examples
>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family = "multinomial"), ... hyper_parameters) >>> gs.train(x=[0,1,2,3], y=4, training_frame=iris) >>> gs.confusion_matrix(iris)
-
hit_ratio_table
(train=False, valid=False, xval=False)[source]¶ Retrieve the Hit Ratios.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
train (bool) – If train is
True
, then return the hit ratio value for the training data.valid (bool) – If valid is
True
, then return the hit ratio value for the validation data.xval (bool) – If xval is
True
, then return the hit ratio value for the cross validation data.
- Returns
The hit ratio for this multinomial model.
- Examples
>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family = "multinomial"), ... hyper_parameters) >>> gs.train(x=[0,1,2,3], y=4, training_frame=iris) >>> gs.hit_ratio_table(train=True)
-
mean_per_class_error
(train=False, valid=False, xval=False)[source]¶ Get the mean per class error.
If all are
False
(default), then return the training metric value. If more than one options is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
train (bool) – If train is
True
, then return the mean per class error value for the training data.valid (bool) – If valid is
True
, then return the mean per class error value for the validation data.xval (bool) – If xval is
True
, then return the mean per class error value for the cross validation data.
- Returns
The mean per class error for this multinomial model.
- Examples
>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family = "multinomial"), ... hyper_parameters) >>> gs.train(x=[0,1,2,3], y=4, training_frame=iris) >>> gs.mean_per_class_error(train=True)
-
-
class
h2o.grid.metrics.
H2OOrdinalGridSearch
[source]¶ Bases:
object
-
confusion_matrix
(data)[source]¶ Returns a confusion matrix based of H2O’s default prediction threshold for a dataset.
- Parameters
data – metric for which the confusion matrix will be calculated.
- Examples
>>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> h2o_df = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/glm_ordinal_logit/ordinal_multinomial_training_set.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family="ordinal"), hyper_parameters) >>> h2o_df['C11'] = h2o_df['C11'].asfactor() >>> gs.train(x=list(range(0,10)), y="C11", training_frame=h2o_df) >>> gs.confusion_matrix(h2o_df)
-
hit_ratio_table
(train=False, valid=False, xval=False)[source]¶ Retrieve the Hit Ratios.
If all are
False
(default), then return the training metric value. If more than one option is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
train (bool) – If train is
True
, then return the hit ratio value for the training data.valid (bool) – If valid is
True
, then return the hit ratio value for the validation data.xval (bool) – If xval is
True
, then return the hit ratio value for the cross validation data.
- Returns
The hit ratio for this ordinal model.
- Examples
>>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> h2o_df = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/glm_ordinal_logit/ordinal_multinomial_training_set.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family="ordinal"), hyper_parameters) >>> h2o_df['C11'] = h2o_df['C11'].asfactor() >>> gs.train(x=list(range(0,10)), y="C11", training_frame=h2o_df) >>> gs.hit_ratio_table(train=True)
-
mean_per_class_error
(train=False, valid=False, xval=False)[source]¶ Get the mean per class error.
If all are
False
(default), then return the training metric value. If more than one options is set toTrue
, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
train (bool) – If train is
True
, then return the mean per class error value for the training data.valid (bool) – If valid is
True
, then return the mean per class error value for the validation data.xval (bool) – If xval is
True
, then return the mean per class error value for the cross validation data.
- Returns
The mean per class error for this ordinal model.
- Examples
>>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> h2o_df = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/glm_ordinal_logit/ordinal_multinomial_training_set.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family="ordinal"), hyper_parameters) >>> h2o_df['C11'] = h2o_df['C11'].asfactor() >>> gs.train(x=list(range(0,10)), y="C11", training_frame=h2o_df) >>> gs.mean_per_class_error(train=True)
-