Metrics in H2O¶
H2O Model Metrics¶
MetricsBase¶
- copyright
- 2016 H2O.ai 
 
- license
- Apache License Version 2.0 (see LICENSE for details) 
- 
class h2o.model.metrics_base.MetricsBase(metric_json, on=None, algo='')[source]
- Bases: - h2o.model.metrics_base.MetricsBase- A parent class to house common metrics available for the various Metrics types. - The methods here are available across different model categories. - Note - This class and its subclasses are used at runtime as mixins: their methods can (and should) be accessed directly from a metrics object, for example as a result of - model_performance().- 
aic()[source]
- The AIC for this set of metrics. - Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.aic() 
 - 
auc()[source]
- The AUC for this set of metrics. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.auc() 
 - 
aucpr()[source]
- The area under the precision recall curve. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.aucpr() 
 - 
custom_metric_name()[source]
- Name of custom metric or None. 
 - 
custom_metric_value()[source]
- Value of custom metric or None. 
 - 
gini()[source]
- Gini coefficient. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.gini() 
 - 
hglm_metric(metric_string)[source]
 - 
loglikelihood()[source]
- The log likelihood for this set of metrics. - Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.loglikelihood() 
 - 
logloss()[source]
- Log loss. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.logloss() 
 - 
mae()[source]
- The MAE for this set of metrics. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(distribution = "poisson", ... seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.mae() 
 - 
classmethod make(kvs)[source]
- Factory method to instantiate a MetricsBase object from the list of key-value pairs. 
 - 
mean_per_class_error()[source]
- The mean per class error. - Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.mean_per_class_error() 
 - 
mean_residual_deviance()[source]
- The mean residual deviance for this set of metrics. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/AirlinesTest.csv.zip") >>> air_gbm = H2OGradientBoostingEstimator() >>> air_gbm.train(x=list(range(9)), ... y=9, ... training_frame=airlines, ... validation_frame=airlines) >>> air_gbm.mean_residual_deviance(train=True,valid=False,xval=False) 
 - 
mse()[source]
- The MSE for this set of metrics. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.mse() 
 - 
nobs()[source]
- The number of observations. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> perf = cars_gbm.model_performance() >>> perf.nobs() 
 - 
null_degrees_of_freedom()[source]
- The null DoF if the model has residual deviance, otherwise None. - Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.null_degrees_of_freedom() 
 - 
null_deviance()[source]
- The null deviance if the model has residual deviance, otherwise None. - Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.null_deviance() 
 - 
pr_auc()[source]
- MetricsBase.pr_aucis deprecated, please use- MetricsBase.aucprinstead.
 - 
r2()[source]
- The R squared coefficient. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.r2() 
 - 
residual_degrees_of_freedom()[source]
- The residual DoF if the model has residual deviance, otherwise None. - Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.residual_degrees_of_freedom() 
 - 
residual_deviance()[source]
- The residual deviance if the model has it, otherwise None. - Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> train, valid = prostate.split_frame(ratios=[.8],seed=1234) >>> pros_glm = H2OGeneralizedLinearEstimator(family="binomial") >>> pros_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> pros_glm.residual_deviance() 
 - 
rmse()[source]
- The RMSE for this set of metrics. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.rmse() 
 - 
rmsle()[source]
- The RMSLE for this set of metrics. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(distribution = "poisson", ... seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.rmsle() 
 - 
show(verbosity=None, fmt=None)[source]
- Describe and renders the current object in the given format and verbosity level if supported, by default guessing the best format for the current environment. - Parameters
- verbosity – one of (None, ‘short’, ‘medium’, ‘full’). Defaults to None (object’s default verbosity). 
- fmt – one of (None, ‘plain’, ‘pretty’, ‘html’). Defaults to None (picks appropriate format depending on platform/context). 
 
 
 
- 
Binomial Classification¶
- 
class h2o.model.metrics.binomial.H2OBinomialModelMetrics(metric_json, on=None, algo='')[source]¶
- Bases: - h2o.model.metrics_base.MetricsBase- This class is essentially an API for the AUC object. This class contains methods for inspecting the AUC for different criteria. To input the different criteria, use the static variable - criteria.- 
F0point5(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- The F0.5 for this set of metrics and thresholds. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.F0point5() 
 - 
F1(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- The F1 for the given set of thresholds. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.F1() 
 - 
F2(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- The F2 for this set of metrics and thresholds. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.F2() 
 - 
accuracy(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- The accuracy for this set of metrics and thresholds. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.accuracy() 
 - 
confusion_matrix(metrics=None, thresholds=None)[source]¶
- Get the confusion matrix for the specified metric. - Parameters
- metrics – A string (or list of strings) among metrics listed in - maximizing_metrics. Defaults to- 'f1'.
- thresholds – A value (or list of values) between 0 and 1. If None, then the thresholds maximizing each provided metric will be used. 
 
- Returns
- a list of ConfusionMatrix objects (if there are more than one to return), a single ConfusionMatrix (if there is only one), or None if thresholds are metrics scores are missing. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution=distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.confusion_matrix(train) 
 - 
error(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold minimizing the error will be used.
- Returns
- The error for this set of metrics and thresholds. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.error() 
 - 
fallout(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- The fallout (same as False Positive Rate) for this set of metrics and thresholds. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.fallout() 
 - 
find_idx_by_threshold(threshold)[source]¶
- Retrieve the index in this metric’s threshold list at which the given threshold is located. - Parameters
- threshold – Find the index of this input threshold. 
- Returns
- the index. 
- Raises
- ValueError – if no such index can be found. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> local_data = [[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'], ... [1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'], ... [0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'], ... [0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'],[0, 'b']] >>> h2o_data = h2o.H2OFrame(local_data) >>> h2o_data.set_names(['response', 'predictor']) >>> h2o_data["response"] = h2o_data["response"].asfactor() >>> gbm = H2OGradientBoostingEstimator(ntrees=1, ... distribution="bernoulli") >>> gbm.train(x=list(range(1,h2o_data.ncol)), ... y="response", ... training_frame=h2o_data) >>> perf = gbm.model_performance() >>> perf.find_idx_by_threshold(0.45) 
 - 
find_threshold_by_max_metric(metric)[source]¶
- Parameters
- metrics – A string among the metrics listed in - maximizing_metrics.
- Returns
- the threshold at which the given metric is maximal. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> local_data = [[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'], ... [1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'], ... [0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'], ... [0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'],[0, 'b']] >>> h2o_data = h2o.H2OFrame(local_data) >>> h2o_data.set_names(['response', 'predictor']) >>> h2o_data["response"] = h2o_data["response"].asfactor() >>> gbm = H2OGradientBoostingEstimator(ntrees=1, ... distribution="bernoulli") >>> gbm.train(x=list(range(1,h2o_data.ncol)), ... y="response", ... training_frame=h2o_data) >>> perf = gbm.model_performance() >>> perf.find_threshold_by_max_metric("f1") 
 - 
fnr(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- The False Negative Rate. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.fnr() 
 - 
fpr(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- The False Positive Rate. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.fpr() 
 - 
property fprs¶
- Return all false positive rates for all threshold values. - Returns
- a list of false positive rates. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> r = cars[0].runif() >>> train = cars[r > .2] >>> valid = cars[r <= .2] >>> response_col = "economy_20mpg" >>> distribution = "bernoulli" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, distribution=distribution, fold_assignment="Random") >>> gbm.train(y=response_col, x=predictors, validation_frame=valid, training_frame=train) >>> (fprs, tprs) = gbm.roc(train=True, valid=False, xval=False) >>> fprs 
 - 
gains_lift()[source]¶
- Retrieve the Gains/Lift table. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution=distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.gains_lift() 
 - 
gains_lift_plot(type='both', server=False, save_plot_path=None, plot=True)[source]¶
- Plot Gains/Lift curves. - Parameters
- type – - one of: - ”both” (default) 
- ”gains” 
- ”lift” 
 
- server – if - True, generate plot inline using matplotlib’s Anti-Grain Geometry (AGG) backend.
- save_plot_path – filename to save the plot to. 
- plot – - Trueto plot curve;- Falseto get a gains lift table.
 
- Returns
- Gains lift table + the resulting plot (can be accessed using - result.figure()).
 
 - 
max_per_class_error(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold minimizing the error will be used.
- Returns
- Return 1 - min(per class accuracy). 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.max_per_class_error() 
 - 
maximizing_metrics= ('absolute_mcc', 'accuracy', 'precision', 'f0point5', 'f1', 'f2', 'mean_per_class_accuracy', 'min_per_class_accuracy', 'tns', 'fns', 'fps', 'tps', 'tnr', 'fnr', 'fpr', 'tpr', 'fallout', 'missrate', 'recall', 'sensitivity', 'specificity')¶
- metrics names allowed for confusion matrix 
 - 
mcc(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- The absolute MCC (a value between 0 and 1, 0 being totally dissimilar, 1 being identical). 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.mcc() 
 - 
mean_per_class_error(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold minimizing the error will be used.
- Returns
- mean per class error. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.mean_per_class_error() 
 - 
metric(metric, thresholds=None)[source]¶
- Parameters
- metric (str) – A metric among - maximizing_metrics.
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used. If ‘all’, then all stored thresholds are used and returned with the matching metric.
 
- Returns
- The set of metrics for the list of thresholds. The returned list has a ‘value’ property holding only the metric value (if no threshold provided or if provided as a number), or all the metric values (if thresholds provided as a list) 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> local_data = [[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'], ... [1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'],[1, 'a'], ... [0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'], ... [0, 'b'],[0, 'b'],[0, 'b'],[0, 'b'],[0, 'b']] >>> h2o_data = h2o.H2OFrame(local_data) >>> h2o_data.set_names(['response', 'predictor']) >>> h2o_data["response"] = h2o_data["response"].asfactor() >>> gbm = H2OGradientBoostingEstimator(ntrees=1, ... distribution="bernoulli") >>> gbm.train(x=list(range(1,h2o_data.ncol)), ... y="response", ... training_frame=h2o_data) >>> perf = gbm.model_performance() >>> perf.metric("tps", [perf.find_threshold_by_max_metric("f1")])[0][1] 
 - 
metrics_aliases= {'fallout': 'fpr', 'missrate': 'fnr', 'recall': 'tpr', 'sensitivity': 'tpr', 'specificity': 'tnr'}¶
 - 
missrate(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- The miss rate (same as False Negative Rate). 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.missrate() 
 - 
plot(type='roc', server=False, save_plot_path=None, plot=True)[source]¶
- Produce the desired metric plot. - Parameters
- type – - the type of metric plot. One of (currently supported): - ROC curve (‘roc’) 
- Precision Recall curve (‘pr’) 
- Gains Lift curve (‘gainslift’) 
 
- server – if True, generate plot inline using matplotlib’s Anti-Grain Geometry (AGG) backend. 
- save_plot_path – filename to save the plot to. 
- plot – - Trueto plot curve;- Falseto get a tuple of values at axis x and y of the plot (tprs and fprs for AUC, recall and precision for PR).
 
- Returns
- None or values of x and y axis of the plot + the resulting plot (can be accessed using - result.figure()).
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.plot(type="roc") >>> cars_gbm.plot(type="pr") 
 - 
precision(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- Precision for this set of metrics and thresholds. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.precision() 
 - 
recall(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- Recall for this set of metrics and thresholds. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.recall() 
 - 
roc()[source]¶
- Return the coordinates of the ROC curve as a tuple containing the false positive rates as a list and true positive rates as a list. :returns: The ROC values. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> r = cars[0].runif() >>> train = cars[r > .2] >>> valid = cars[r <= .2] >>> response_col = "economy_20mpg" >>> distribution = "bernoulli" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution=distribution, ... fold_assignment="Random") >>> gbm.train(x=predictors, ... y=response_col, ... validation_frame=valid, ... training_frame=train) >>> gbm.roc(train=True, valid=False, xval=False) 
 - 
sensitivity(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- Sensitivity or True Positive Rate for this set of metrics and thresholds. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.sensitivity() 
 - 
specificity(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- The specificity (same as True Negative Rate). 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.specificity() 
 - 
tnr(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- The True Negative Rate. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.tnr() 
 - 
tpr(thresholds=None)[source]¶
- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- Returns
- The True Postive Rate. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234) >>> cars_gbm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_gbm.tpr() 
 - 
property tprs¶
- Return all true positive rates for all threshold values. - Returns
- a list of true positive rates. 
- Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> r = cars[0].runif() >>> train = cars[r > .2] >>> valid = cars[r <= .2] >>> response_col = "economy_20mpg" >>> distribution = "bernoulli" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, distribution=distribution, fold_assignment="Random") >>> gbm.train(y=response_col, x=predictors, validation_frame=valid, training_frame=train) >>> (fprs, tprs) = gbm.roc(train=True, valid=False, xval=False) >>> tprs 
 
- 
Multinomial Classification¶
- 
class h2o.model.metrics.multinomial.H2OMultinomialModelMetrics(metric_json, on=None, algo='')[source]¶
- Bases: - h2o.model.metrics_base.MetricsBase- 
confusion_matrix()[source]¶
- Returns a confusion matrix based on H2O’s default prediction threshold for a dataset. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution = distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.confusion_matrix(train) 
 - 
hit_ratio_table()[source]¶
- Retrieve the Hit Ratios. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution = distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.hit_ratio_table() 
 - 
multinomial_auc_table()[source]¶
- Retrieve the multinomial AUC values. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution = distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.multinomial_auc_table() 
 - 
multinomial_aucpr_table()[source]¶
- Retrieve the multinomial PR AUC values. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution = distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.multinomial_aucpr_table() 
 
- 
Regression¶
- 
class h2o.model.metrics.regression.H2ORegressionModelMetrics(metric_json, on=None, algo='')[source]¶
- Bases: - h2o.model.metrics_base.MetricsBase- This class provides an API for inspecting the metrics returned by a regression model. - It is possible to retrieve the \(R^2\) (1 - MSE/variance) and MSE. - Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_glm = H2OGeneralizedLinearEstimator() >>> cars_glm.train(x = predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> cars_glm.mse() 
Anomaly Detection¶
- 
class h2o.model.metrics.anomaly_detection.H2OAnomalyDetectionModelMetrics(metric_json, on=None, algo='')[source]¶
- Bases: - h2o.model.metrics_base.MetricsBase- 
mean_normalized_score()[source]¶
- Mean Normalized Anomaly Score. For Isolation Forest - normalized average path length. - Examples
 - >>> from h2o.estimators.isolation_forest import H2OIsolationForestEstimator >>> train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_train.csv") >>> test = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_test.csv") >>> isofor_model = H2OIsolationForestEstimator(sample_size=5, ntrees=7) >>> isofor_model.train(training_frame = train) >>> perf = isofor_model.model_performance() >>> perf.mean_normalized_score() 
 - 
mean_score()[source]¶
- Mean Anomaly Score. For Isolation Forest represents the average of all tree-path lengths. - Examples
 - >>> from h2o.estimators.isolation_forest import H2OIsolationForestEstimator >>> train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_train.csv") >>> test = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_test.csv") >>> isofor_model = H2OIsolationForestEstimator(sample_size=5, ntrees=7) >>> isofor_model.train(training_frame = train) >>> perf = isofor_model.model_performance() >>> perf.mean_score() 
 
- 
Clustering¶
- 
class h2o.model.metrics.clustering.H2OClusteringModelMetrics(metric_json, on=None, algo='')[source]¶
- Bases: - h2o.model.metrics_base.MetricsBase- 
betweenss()[source]¶
- The Between Cluster Sum-of-Square Error, or None if not present. - Examples
 - >>> from h2o.estimators.kmeans import H2OKMeansEstimator >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> km = H2OKMeansEstimator(k=3, nfolds=3) >>> km.train(x=list(range(4)), training_frame=iris) >>> km.betweenss() 
 - 
tot_withinss()[source]¶
- The Total Within Cluster Sum-of-Square Error, or None if not present. - Examples
 - >>> from h2o.estimators.kmeans import H2OKMeansEstimator >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> km = H2OKMeansEstimator(k=3, nfolds=3) >>> km.train(x=list(range(4)), training_frame=iris) >>> km.tot_withinss() 
 - 
totss()[source]¶
- The Total Sum-of-Square Error to Grand Mean, or None if not present. - Examples
 - >>> from h2o.estimators.kmeans import H2OKMeansEstimator >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> km = H2OKMeansEstimator(k=3, nfolds=3) >>> km.train(x=list(range(4)), training_frame=iris) >>> km.totss() 
 
- 
CoxPH¶
- 
class h2o.model.metrics.coxph.H2ORegressionCoxPHModelMetrics(metric_json, on=None, algo='')[source]¶
- Bases: - h2o.model.metrics_base.MetricsBase- Examples
 - >>> from h2o.estimators.coxph import H2OCoxProportionalHazardsEstimator >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop", ... ties="breslow") >>> coxph.train(x="age", y="event", training_frame=heart) >>> coxph 
Dimensionality Reduction¶
Ordinal¶
- 
class h2o.model.metrics.ordinal.H2OOrdinalModelMetrics(metric_json, on=None, algo='')[source]¶
- Bases: - h2o.model.metrics_base.MetricsBase- 
confusion_matrix()[source]¶
- Returns a confusion matrix based of H2O’s default prediction threshold for a dataset. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution = distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.confusion_matrix(train) 
 - 
hit_ratio_table()[source]¶
- Retrieve the Hit Ratios. - Examples
 - >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["cylinders"] = cars["cylinders"].asfactor() >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> response_col = "cylinders" >>> distribution = "multinomial" >>> predictors = ["displacement","power","weight","acceleration","year"] >>> gbm = H2OGradientBoostingEstimator(nfolds=3, ... distribution = distribution) >>> gbm.train(x=predictors, ... y = response, ... training_frame = train, ... validation_frame = valid) >>> gbm.hit_ratio_table() 
 
- 
Uplift¶
- 
class h2o.model.metrics.uplift.H2OBinomialUpliftModelMetrics(metric_json, on=None, algo='')[source]¶
- Bases: - h2o.model.metrics_base.MetricsBase- This class is available only for Uplift DRF model. This class is essentially an API for the AUUC object. - 
aecu(metric='AUTO')[source]¶
- Retrieve AECU value (average excess cumulative uplift - area between Uplift curve and random curve). - Parameters
- metric – - AECU metric type One of: - ”None” 
- ”qini” 
- ”lift” 
- ”gain” 
- ”AUTO” (default; defaults to “qini”) 
 
- Returns
- AECU value. 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.aecu() 
 - 
aecu_table()[source]¶
- Retrieve all types of AECU values in a table. - Returns
- a table of AECU values. 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.aecu_table() 
 - 
auuc(metric=None)[source]¶
- Retrieve area under cumulative uplift curve (AUUC) value. - Parameters
- metric – - AUUC metric type. One of: - ”None” (default; takes default metric from model parameters) 
- ”AUTO” (defaults to “qini”) 
- ”qini” 
- ”lift” 
- ”gain” 
 
- Returns
- AUUC value. 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.auuc() 
 - 
auuc_normalized(metric=None)[source]¶
- Retrieve normalized area under cumulative uplift curve (AUUC) value. - Parameters
- metric – - AUUC metric type. One of: - ”None” (default; takes default metric from model parameters) 
- ”AUTO” (defaults to “qini”) 
- ”qini” 
- ”lift” 
- ”gain” 
 
- Returns
- normalized AUUC value. 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.auuc_normalized() 
 - 
auuc_table()[source]¶
- Retrieve all types of AUUC in a table. - Returns
- a table of AUUCs. 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.auuc_table() 
 - 
n()[source]¶
- Retrieve cumulative sum of numbers of observations in each bin. - Returns
- a list of numbers of observation. 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.n() 
 - 
plot_uplift(server=False, save_to_file=None, plot=True, metric='AUTO', normalize=False)[source]¶
- Plot Uplift Curve. - Parameters
- server – if - True, generate plot inline using matplotlib’s Anti-Grain Geometry (AGG) backend.
- save_to_file – filename to save the plot to. 
- plot – - Trueto plot curve,- Falseto get a tuple of values at axis x and y of the plot (number of observations and uplift values)
- metric – - AUUC metric type. One of: - ”qini” 
- ”lift” 
- ”gain” 
- ”AUTO” (default; defaults to “qini”) 
 
- normalize – If - True, normalized values are plotted.
 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.plot_uplift(plot=True) >>> n, uplift = perf.plot_uplift(plot=False) 
 - 
qini()[source]¶
- Retrieve Qini value (area between Qini cumulative uplift curve and random curve). - Returns
- Qini value. 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.qini() 
 - 
thresholds()[source]¶
- Retrieve prediction thresholds for each bin. - Returns
- a list of thresholds. 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.thresholds() 
 - 
thresholds_and_metric_scores()[source]¶
- Retrieve thresholds and metric scores table. - Returns
- a thresholds and metric scores table for the specified key(s). 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.thresholds_and_metric_scores() 
 - 
uplift(metric='AUTO')[source]¶
- Retrieve uplift values for each bin. - Parameters
- metric – - AUUC metric type. One of: - ”qini” 
- ”lift” 
- ”gain” 
- ”AUTO” (default; defaults to “qini”) 
 
- Returns
- a list of uplift values. 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.uplift() 
 - 
uplift_normalized(metric='AUTO')[source]¶
- Retrieve normalized uplift values for each bin. - Parameters
- metric – - AUUC metric type. One of: - ”qini” 
- ”lift” 
- ”gain” 
- ”AUTO” (default; defaults to “qini”) 
 
- Returns
- a list of normalized uplift values. 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.uplift_normalized() 
 - 
uplift_random(metric='AUTO')[source]¶
- Retrieve random uplift values for each bin. - Parameters
- metric – - AUUC metric type. One of: - ”qini” 
- ”lift” 
- ”gain” 
- ”AUTO” (default; defaults to “qini”) 
 
- Returns
- a list of random uplift values. 
- Examples
 - >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> treatment_column = "treatment" >>> response_column = "conversion" >>> train[treatment_column] = train[treatment_column].asfactor() >>> train[response_column] = train[response_column].asfactor() >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6"] >>> >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="kl", ... distribution="bernoulli", ... min_rows=10, ... auuc_type="gain") >>> uplift_model.train(y=response_column, x=predictors, training_frame=train) >>> perf = uplift_model.model_performance() >>> perf.uplift_random() 
 
- 
H2O Grid Metrics¶
Note
Classes in this module are used at runtime as mixins: their methods can (and should) be accessed directly from a trained grid.
- 
class h2o.grid.metrics.H2OAutoEncoderGridSearch[source]¶
- Bases: - object- 
anomaly(test_data, per_feature=False)[source]¶
- Obtain the reconstruction error for the input - test_data.- Parameters
- test_data (H2OFrame) – The dataset upon which the reconstruction error is computed. 
- per_feature (bool) – Whether to return the square reconstruction error per feature. Otherwise, return the mean square error. 
 
- Returns
- the reconstruction error. 
- Example
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators import H2OAutoEncoderEstimator >>> rows = [[1,2,3,4,0]*50, ... [2,1,2,4,1]*50, ... [2,1,4,2,1]*50, ... [0,1,2,34,1]*50, ... [2,3,4,1,0]*50] >>> fr = h2o.H2OFrame(rows) >>> hyper_parameters = {'activation': "Tanh", 'hidden': [50,50,50]} >>> gs = H2OGridSearch(H2OAutoEncoderEstimator(), hyper_parameters) >>> gs.train(x=range(4), training_frame=fr) >>> gs.anomaly(fr, per_feature=True) 
 
- 
- 
class h2o.grid.metrics.H2OBinomialGridSearch[source]¶
- Bases: - object- 
F0point5(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the F0.5 for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the F0point5 value for the training data.
- valid (bool) – If valid is - True, then return the F0point5 value for the validation data.
- xval (bool) – If xval is - True, then return the F0point5 value for the cross validation data.
 
- Returns
- The F0point5 for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.F0point5(train=True) 
 - 
F1(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the F1 values for a set of thresholds for the models explored. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If - True, return the F1 value for the training data.
- valid (bool) – If - True, return the F1 value for the validation data.
- xval (bool) – If - True, return the F1 value for each of the cross-validated splits.
 
- Returns
- Dictionary of model keys to F1 values 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.F1(train=True) 
 - 
F2(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the F2 for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the F2 value for the training data.
- valid (bool) – If valid is - True, then return the F2 value for the validation data.
- xval (bool) – If xval is - True, then return the F2 value for the cross validation data.
 
- Returns
- Dictionary of model keys to F2 values. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.F2(train=True) 
 - 
accuracy(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the accuracy for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the accuracy value for the training data.
- valid (bool) – If valid is - True, then return the accuracy value for the validation data.
- xval (bool) – If xval is - True, then return the accuracy value for the cross validation data.
 
- Returns
- The accuracy for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.accuracy(train=True) 
 - 
confusion_matrix(metrics=None, thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the confusion matrix for the specified metrics/thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- metrics – A string (or list of strings) among metrics listed in - H2OBinomialModelMetrics.maximizing_metrics. Defaults to- 'f1'.
- thresholds – A value (or list of values) between 0 and 1. If None, then the thresholds maximizing each provided metric will be used. 
- train (bool) – If train is - True, then return the confusion matrix value for the training data.
- valid (bool) – If valid is - True, then return the confusion matrix value for the validation data.
- xval (bool) – If xval is - True, then return the confusion matrix value for the cross validation data.
 
- Returns
- The confusion matrix for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.confusion_matrix(train=True) 
 - 
error(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the error for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold minimizing the error will be used.
- train (bool) – If train is - True, then return the error value for the training data.
- valid (bool) – If valid is - True, then return the error value for the validation data.
- xval (bool) – If xval is - True, then return the error value for the cross validation data.
 
- Returns
- The error for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.error(train=True) 
 - 
fallout(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the Fallout (AKA False Positive Rate) for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the fallout value for the training data.
- valid (bool) – If valid is - True, then return the fallout value for the validation data.
- xval (bool) – If xval is - True, then return the fallout value for the cross validation data.
 
- Returns
- The fallout for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.fallout(train=True) 
 - 
find_idx_by_threshold(threshold, train=False, valid=False, xval=False)[source]¶
- Retrieve the index in this metric’s threshold list at which the given threshold is located. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- threshold (float) – The threshold value to search for. 
- train (bool) – If train is - True, then return the- idx_by_thresholdfor the training data.
- valid (bool) – If valid is - True, then return the- idx_by_thresholdfor the validation data.
- xval (bool) – If xval is - True, then return the- idx_by_thresholdfor the cross validation data.
 
- Returns
- The - idx_by_thresholdfor this binomial model.
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.find_idx_by_threshold(0.45, train=True) 
 - 
find_threshold_by_max_metric(metric, train=False, valid=False, xval=False)[source]¶
- If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- metric (str) – A metric among the metrics listed in - H2OBinomialModelMetrics.maximizing_metrics.
- train (bool) – If train is - True, then return the- threshold_by_max_metricvalue for the training data.
- valid (bool) – If valid is - True, then return the- threshold_by_max_metricvalue for the validation data.
- xval (bool) – If xval is - True, then return the- threshold_by_max_metricvalue for the cross validation data.
 
- Returns
- The - threshold_by_max_metricfor this binomial model.
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.find_threshold_by_max_metric("tps", train=True) 
 - 
fnr(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the False Negative Rates for a set of thresholds. If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the FNR value for the training data.
- valid (bool) – If valid is - True, then return the FNR value for the validation data.
- xval (bool) – If xval is - True, then return the FNR value for the cross validation data.
 
- Returns
- The FNR for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.fnr(train=True) 
 - 
fpr(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the False Positive Rates for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the FPR value for the training data.
- valid (bool) – If valid is - True, then return the FPR value for the validation data.
- xval (bool) – If xval is - True, then return the FPR value for the cross validation data.
 
- Returns
- The FPR for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.fpr(train=True) 
 - 
max_per_class_error(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the max per class error for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold minimizing the error will be used.
- train (bool) – If train is - True, then return the- max_per_class_errorvalue for the training data.
- valid (bool) – If valid is - True, then return the- max_per_class_errorvalue for the validation data.
- xval (bool) – If xval is - True, then return the- max_per_class_errorvalue for the cross validation data.
 
- Returns
- The max per class error for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.max_per_class_error(train=True) 
 - 
mcc(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the MCC for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the mcc value for the training data.
- valid (bool) – If valid is - True, then return the mcc value for the validation data.
- xval (bool) – If xval is - True, then return the mcc value for the cross validation data.
 
- Returns
- The MCC for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.mcc(train=True) 
 - 
mean_per_class_error(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the mean per class error for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold minimizing the error will be used.
- train (bool) – If train is - True, then return the- mean_per_class_errorvalue for the training data.
- valid (bool) – If valid is - True, then return the- mean_per_class_errorvalue for the validation data.
- xval (bool) – If xval is - True, then return the- mean_per_class_errorvalue for the cross validation data.
 
- Returns
- The mean per class error for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.mean_per_class_error(train=True) 
 - 
metric(metric, thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the metric value for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- metric – name of the metric to compute. 
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the metrics for the training data.
- valid (bool) – If valid is - True, then return the metrics for the validation data.
- xval (bool) – If xval is - True, then return the metrics for the cross validation data.
 
- Returns
- The metrics for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.metric("tps", train=True) 
 - 
missrate(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the miss rate (AKA False Negative Rate) for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the missrate value for the training data.
- valid (bool) – If valid is - True, then return the missrate value for the validation data.
- xval (bool) – If xval is - True, then return the missrate value for the cross validation data.
 
- Returns
- The missrate for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.missrate(train=True) 
 - 
precision(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the precision for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the precision value for the training data.
- valid (bool) – If valid is - True, then return the precision value for the validation data.
- xval (bool) – If xval is - True, then return the precision value for the cross validation data.
 
- Returns
- The precision for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs. precision(train=True) 
 - 
recall(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the Recall (AKA True Positive Rate) for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the recall value for the training data.
- valid (bool) – If valid is - True, then return the recall value for the validation data.
- xval (bool) – If xval is - True, then return the recall value for the cross validation data.
 
- Returns
- The recall for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.recall(train=True) 
 - 
roc(train=False, valid=False, xval=False)[source]¶
- Return the coordinates of the ROC curve for a given set of data, as a two-tuple containing the false positive rates as a list and true positive rates as a list. - If all are - False(default), then return the training data. If more than one ROC curve is requested, the data is returned as a dictionary of two-tuples.- Parameters
- train (bool) – If train is - True, then return the ROC coordinates for the training data.
- valid (bool) – If valid is - True, then return the ROC coordinates for the validation data.
- xval (bool) – If xval is - True, then return the ROC coordinates for the cross validation data.
 
- Returns
- the true cooridinates of the roc curve. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.roc(train=True) 
 - 
sensitivity(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the sensitivity (AKA True Positive Rate or Recall) for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the sensitivity value for the training data.
- valid (bool) – If valid is - True, then return the sensitivity value for the validation data.
- xval (bool) – If xval is - True, then return the sensitivity value for the cross validation data.
 
- Returns
- The sensitivity for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.sensitivity(train=True) 
 - 
specificity(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the specificity (AKA True Negative Rate) for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the specificity value for the training data.
- valid (bool) – If valid is - True, then return the specificity value for the validation data.
- xval (bool) – If xval is - True, then return the specificity value for the cross validation data.
 
- Returns
- The specificity for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.specificity(train=True) 
 - 
tnr(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the True Negative Rate for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the TNR value for the training data.
- valid (bool) – If valid is - True, then return the TNR value for the validation data.
- xval (bool) – If xval is - True, then return the TNR value for the cross validation data.
 
- Returns
- The TNR for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.tnr(train=True) 
 - 
tpr(thresholds=None, train=False, valid=False, xval=False)[source]¶
- Get the True Positive Rate for a set of thresholds. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- thresholds – thresholds parameter must be a list (e.g. - [0.01, 0.5, 0.99]). If None, then the threshold maximizing the metric will be used.
- train (bool) – If train is - True, then return the TPR value for the training data.
- valid (bool) – If valid is - True, then return the TPR value for the validation data.
- xval (bool) – If xval is - True, then return the TPR value for the cross validation data.
 
- Returns
- The TPR for this binomial model. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=[3, 4-11], ... y=3, ... training_frame=training_data) >>> gs.tpr(train=True) 
 
- 
- 
class h2o.grid.metrics.H2OClusteringGridSearch[source]¶
- Bases: - object- 
betweenss(train=False, valid=False, xval=False)[source]¶
- Get the between cluster sum of squares. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- train (bool) – If - True, then return the between cluster sum of squares value for the training data.
- valid (bool) – If - True, then return the between cluster sum of squares value for the validation data.
- xval (bool) – If - True, then return the between cluster sum of squares value for each of the cross-validated splits.
 
- Returns
- the between cluster sum of squares values for the specified key(s). 
- Examples
 - >>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.betweenss(train=True) 
 - 
centers()[source]¶
- Returns the centers for the KMeans model. - Examples
 - >>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.centers() 
 - 
centers_std()[source]¶
- Returns the standardized centers for the KMeans model. - Examples
 - >>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.centers_std() 
 - 
centroid_stats(train=False, valid=False, xval=False)[source]¶
- Get the centroid statistics for each cluster. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- train (bool) – If - True, then return the centroid statistics for the training data.
- valid (bool) – If - True, then return the centroid statistics for the validation data.
- xval (bool) – If - True, then return the centroid statistics for each of the cross-validated splits.
 
- Returns
- the centroid statistics for the specified key(s). 
- Examples
 - >>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.centroid_stats(train=True) 
 - 
num_iterations()[source]¶
- Get the number of iterations that it took to converge or reach max iterations. - Examples
 - >>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.num_iterations() 
 - 
size(train=False, valid=False, xval=False)[source]¶
- Get the sizes of each cluster. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- train (bool) – If - True, then return the cluster sizes for the training data.
- valid (bool) – If - True, then return the cluster sizes for the validation data.
- xval (bool) – If - True, then return the cluster sizes for each of the cross-validated splits.
 
- Returns
- the cluster sizes for the specified key(s). 
- Examples
 - >>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.size(train=True) 
 - 
tot_withinss(train=False, valid=False, xval=False)[source]¶
- Get the total within cluster sum of squares. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- train (bool) – If - True, then return the total within cluster sum of squares for the training data.
- valid (bool) – If - True, then return the total within cluster sum of squares for the validation data.
- xval (bool) – If - True, then return the total within cluster sum of squares for each of the cross-validated splits.
 
- Returns
- the total within cluster sum of squares values for the specified key(s). 
- Examples
 - >>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.tot_withinss(train=True) 
 - 
totss(train=False, valid=False, xval=False)[source]¶
- Get the total sum of squares. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- train (bool) – If - True, then return total sum of squares for the training data.
- valid (bool) – If - True, then return the total sum of squares for the validation data.
- xval (bool) – If - True, then return the total sum of squares for each of the cross-validated splits.
 
- Returns
- the total sum of squares values for the specified key(s). 
- Examples
 - >>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.totss(train=True) 
 - 
withinss(train=False, valid=False, xval=False)[source]¶
- Get the within cluster sum of squares for each cluster. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- train (bool) – If - True, then return within cluster sum of squares for the training data.
- valid (bool) – If - True, then return the within cluster sum of squares for the validation data.
- xval (bool) – If - True, then return the within cluster sum of squares for each of the cross-validated splits.
 
- Returns
- the within cluster sum of squares values for the specified key(s). 
- Examples
 - >>> from h2o.estimators import H2OKMeansEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> hyper_parameters = {'k': [2,3,4], 'init': "random"} >>> gs = H2OGridSearch(H2OKMeansEstimator(), hyper_parameters) >>> gs.train(x=list(range(4)), training_frame=iris) >>> gs.withinss(train=True) 
 
- 
- 
class h2o.grid.metrics.H2ODimReductionGridSearch[source]¶
- Bases: - object- 
archetypes()[source]¶
- Returns
- the archetypes (Y) of the GLRM model. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLowRankEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> hyper_parameters = {'gamma_x': [0.05, 0.5], 'gamma_y': [0.05,0.5]} >>> gs = H2OGridSearch(H2OGeneralizedLowRankEstimator(), ... hyper_parameters) >>> gs.train(x=iris.names, training_frame=iris) >>> gs.archetypes() 
 - 
final_step()[source]¶
- Get the final step size from the GLRM model. - Returns
- final step size (double). 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLowRankEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> hyper_parameters = {'gamma_x': [0.05, 0.5], 'gamma_y': [0.05,0.5]} >>> gs = H2OGridSearch(H2OGeneralizedLowRankEstimator(), ... hyper_parameters) >>> gs.train(x=iris.names, training_frame=iris) >>> gs.final_step() 
 - 
num_iterations()[source]¶
- Get the number of iterations that it took to converge or reach max iterations. - Returns
- number of iterations (integer). 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLowRankEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> hyper_parameters = {'gamma_x': [0.05, 0.5], 'gamma_y': [0.05,0.5]} >>> gs = H2OGridSearch(H2OGeneralizedLowRankEstimator(), ... hyper_parameters) >>> gs.train(x=iris.names, training_frame=iris) >>> gs.num_iterations() 
 - 
objective()[source]¶
- Get the final value of the objective function from the GLRM model. - Returns
- final objective value (double). 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLowRankEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> hyper_parameters = {'gamma_x': [0.05, 0.5], 'gamma_y': [0.05,0.5]} >>> gs = H2OGridSearch(H2OGeneralizedLowRankEstimator(), ... hyper_parameters) >>> gs.train(x=iris.names, training_frame=iris) >>> gs.objective() 
 
- 
- 
class h2o.grid.metrics.H2OMultinomialGridSearch[source]¶
- Bases: - object- 
auc(train=False, valid=False, xval=False)[source]¶
- Retrieve the AUC value. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- train (bool) – If train is - True, then return the AUC values for the training data.
- valid (bool) – If valid is - True, then return the AUC values for the validation data.
- xval (bool) – If xval is - True, then return the AUC values for the cross validation data.
 
- Returns
- The AUC values for this multinomial model. 
- Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family = "multinomial"), ... hyper_parameters) >>> gs.train(x=[0,1,2,3], y=4, training_frame=iris) >>> gs.auc(train=True) 
 - 
aucpr(train=False, valid=False, xval=False)[source]¶
- Retrieve the PR AUC value. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- train (bool) – If train is - True, then return the PR AUC values for the training data.
- valid (bool) – If valid is - True, then return the PR AUC values for the validation data.
- xval (bool) – If xval is - True, then return the PR AUC values for the cross validation data.
 
- Returns
- The PR AUC values for this multinomial model. 
- Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family = "multinomial"), ... hyper_parameters) >>> gs.train(x=[0,1,2,3], y=4, training_frame=iris) >>> gs.aucpr(train=True) 
 - 
confusion_matrix(data)[source]¶
- Returns a confusion matrix based of H2O’s default prediction threshold for a dataset. - Parameters
- data – metric for which the confusion matrix will be calculated. 
- Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family = "multinomial"), ... hyper_parameters) >>> gs.train(x=[0,1,2,3], y=4, training_frame=iris) >>> gs.confusion_matrix(iris) 
 - 
hit_ratio_table(train=False, valid=False, xval=False)[source]¶
- Retrieve the Hit Ratios. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- train (bool) – If train is - True, then return the hit ratio value for the training data.
- valid (bool) – If valid is - True, then return the hit ratio value for the validation data.
- xval (bool) – If xval is - True, then return the hit ratio value for the cross validation data.
 
- Returns
- The hit ratio for this multinomial model. 
- Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family = "multinomial"), ... hyper_parameters) >>> gs.train(x=[0,1,2,3], y=4, training_frame=iris) >>> gs.hit_ratio_table(train=True) 
 - 
mean_per_class_error(train=False, valid=False, xval=False)[source]¶
- Get the mean per class error. - If all are - False(default), then return the training metric value. If more than one options is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- train (bool) – If train is - True, then return the mean per class error value for the training data.
- valid (bool) – If valid is - True, then return the mean per class error value for the validation data.
- xval (bool) – If xval is - True, then return the mean per class error value for the cross validation data.
 
- Returns
- The mean per class error for this multinomial model. 
- Examples
 - >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family = "multinomial"), ... hyper_parameters) >>> gs.train(x=[0,1,2,3], y=4, training_frame=iris) >>> gs.mean_per_class_error(train=True) 
 
- 
- 
class h2o.grid.metrics.H2OOrdinalGridSearch[source]¶
- Bases: - object- 
confusion_matrix(data)[source]¶
- Returns a confusion matrix based of H2O’s default prediction threshold for a dataset. - Parameters
- data – metric for which the confusion matrix will be calculated. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> h2o_df = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/glm_ordinal_logit/ordinal_multinomial_training_set.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family="ordinal"), hyper_parameters) >>> h2o_df['C11'] = h2o_df['C11'].asfactor() >>> gs.train(x=list(range(0,10)), y="C11", training_frame=h2o_df) >>> gs.confusion_matrix(h2o_df) 
 - 
hit_ratio_table(train=False, valid=False, xval=False)[source]¶
- Retrieve the Hit Ratios. - If all are - False(default), then return the training metric value. If more than one option is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- train (bool) – If train is - True, then return the hit ratio value for the training data.
- valid (bool) – If valid is - True, then return the hit ratio value for the validation data.
- xval (bool) – If xval is - True, then return the hit ratio value for the cross validation data.
 
- Returns
- The hit ratio for this ordinal model. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> h2o_df = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/glm_ordinal_logit/ordinal_multinomial_training_set.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family="ordinal"), hyper_parameters) >>> h2o_df['C11'] = h2o_df['C11'].asfactor() >>> gs.train(x=list(range(0,10)), y="C11", training_frame=h2o_df) >>> gs.hit_ratio_table(train=True) 
 - 
mean_per_class_error(train=False, valid=False, xval=False)[source]¶
- Get the mean per class error. - If all are - False(default), then return the training metric value. If more than one options is set to- True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”.- Parameters
- train (bool) – If train is - True, then return the mean per class error value for the training data.
- valid (bool) – If valid is - True, then return the mean per class error value for the validation data.
- xval (bool) – If xval is - True, then return the mean per class error value for the cross validation data.
 
- Returns
- The mean per class error for this ordinal model. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> h2o_df = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/glm_ordinal_logit/ordinal_multinomial_training_set.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family="ordinal"), hyper_parameters) >>> h2o_df['C11'] = h2o_df['C11'].asfactor() >>> gs.train(x=list(range(0,10)), y="C11", training_frame=h2o_df) >>> gs.mean_per_class_error(train=True) 
 
-