max_hit_ratio_k
¶
Available in: GBM, DRF, Deep Learning, Naïve-Bayes
Hyperparameter: no
Description¶
Hit ratios can be used to evaluate the performance of a model. The hit ratio is the percentage of instances where the model correctly predicts the actual class of an observation. The max_hit_ratio_k
option specifies the maximum number of predictions to consider for the hit ratio computation.
Note that this option is available for multiclass problems only and is set to 0 (disabled) by default.
Example¶
library(h2o)
h2o.init()
# import the covtype dataset:
# this dataset is used to classify the correct forest cover type
# original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Covertype
covtype <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data")
# convert response column to a factor
covtype[, 55] <- as.factor(covtype[, 55])
# set the predictor names and the response column name
predictors <- colnames(covtype[1:54])
response <- 'C55'
# split into train and validation sets
covtype_splits <- h2o.splitFrame(data = covtype, ratios = 0.8, seed = 1234)
train <- covtype_splits[[1]]
valid <- covtype_splits[[2]]
# try using the max_hit_ratio_k parameter:
# max_hit_ratio_k does not affect the actual model fit, and is for information
# and inner-H2O calculations
cov_gbm <- h2o.gbm(x = predictors, y = response, training_frame = train,
validation_frame = valid, max_hit_ratio_k = 3, seed = 1234)
# print out model results to see the max_hite_ratio_k table
cov_gbm
import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
h2o.init(strict_version_check=False)
h2o.cluster().show_status()
# import the covtype dataset:
# this dataset is used to classify the correct forest cover type
# original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Covertype
covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data")
# convert response column to a factor
covtype[54] = covtype[54].asfactor()
# set the predictor names and the response column name
predictors = covtype.columns[0:54]
response = 'C55'
# split into train and validation sets
train, valid = covtype.split_frame(ratios = [.8], seed = 1234)
# try using the max_hit_ratio_k parameter:
# max_hit_ratio_k does not affect the actual model fit, and is for information
# and inner-H2O calculations
cov_gbm = H2OGradientBoostingEstimator(max_hit_ratio_k = 3, seed = 1234)
cov_gbm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
# print out model results to see the max_hite_ratio_k table
cov_gbm.show()