stopping_metric
¶
- Available in: GBM, DRF, Deep Learning, AutoML, XGBoost, Isolation Forest
- Hyperparameter: yes
Description¶
This option specifies the metric to consider when early stopping is specified (i.e., when stopping_rounds
> 0). For example, given the following options:
stopping_rounds=3
stopping_metric=misclassification
stopping_tolerance=1e-3
then the model will stop training after reaching three scoring events in a row in which a model’s missclassication value does not improve by 1e-3. These stopping options are used to increase performance by restricting the number of models that get built.
Available options for stopping_metric
include the following:
AUTO
: This defaults tologloss
for classification,deviance
(mean residual deviance) for regression, andanomaly_score
for Isolation Forest.anomaly_score
(for Isolation Forest only)deviance
logloss
MSE
RMSE
MAE
RMSLE
AUC
(area under the ROC curve)AUCPR
(area under the Precision-Recall curve)lift_top_group
misclassification
mean_per_class_error
custom
(for custom metric functions where “less is better”. It is expected that the lower bound is 0.) Note that this is currently only supported in the Python client for GBM and DRF. More information available in Python example below and here.custom_increasing
(for custom metric functions where “more is better”.) Note that this is currently only supported in the Python client for GBM and DRF. More information available in Python example below and here.
Note: stopping_rounds
must be enabled for stopping_metric
or stopping_tolerance
to work.
Example¶
- r
- python
library(h2o)
h2o.init()
# import the airlines dataset:
# This dataset is used to classify whether a flight will be delayed 'YES' or not "NO"
# original data can be found at http://www.transtats.bts.gov/
airlines <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")
# convert columns to factors
airlines["Year"] <- as.factor(airlines["Year"])
airlines["Month"] <- as.factor(airlines["Month"])
airlines["DayOfWeek"] <- as.factor(airlines["DayOfWeek"])
airlines["Cancelled"] <- as.factor(airlines["Cancelled"])
airlines['FlightNum'] <- as.factor(airlines['FlightNum'])
# set the predictor names and the response column name
predictors <- c("Origin", "Dest", "Year", "UniqueCarrier", "DayOfWeek", "Month", "Distance", "FlightNum")
response <- "IsDepDelayed"
# split into train and validation
airlines.splits <- h2o.splitFrame(data = airlines, ratios = .8, seed = 1234)
train <- airlines.splits[[1]]
valid <- airlines.splits[[2]]
# try using the `stopping_metric` parameter:
# since this is a classification problem we will look at the AUC
# you could also choose logloss, or misclassification, among other options
# train your model, where you specify the stopping_metric, stopping_rounds,
# and stopping_tolerance
airlines.gbm <- h2o.gbm(x = predictors, y = response, training_frame = train, validation_frame = valid,
stopping_metric = "AUC", stopping_rounds = 3,
stopping_tolerance = 1e-2, seed = 1234)
# print the auc for the validation data
print(h2o.auc(airlines.gbm, valid = TRUE))