metalearner_params
¶
- Available in: Stacked Ensembles
- Hyperparmeter: no
Description¶
Stacked Ensemble allows you to specify the metalearning algorithm to use when training the ensemble. When metalearner_algorithm
is set to a non-default value (e.g. “GBM”), Stacked Ensemble runs with the specified algorithm’s default hyperparameter values. The metalearner_params
option allows you to pass in a dictionary/list of hyperparameters to use for that algorithm instead of the defaults.
The default parameters for the metalearning algorithms may not be the best choice, so it’s a good idea to experiment a bit with different parameters using metalearner_params
. In the next release of H2O, there will be an option to easily do a grid search over metalearner parameters using the standard H2O Grid interface, which will make tuning the metalearner a bit easier.
Note: The seed
argument in Stacked Ensemble is passed through to the metalearner automatically. If you define seed
in metalearner_params
, it will use that value instead of value defined by the seed
argument. If the only parameter that you want to customze for the metalearner is seed
, then it’s simpler to just use top-level argument instead.
Example¶
library(h2o)
h2o.init()
# import the higgs_train_5k train and test datasets
train <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv")
test <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_test_5k.csv")
# Identify predictors and response
y <- "response"
x <- setdiff(names(train), y)
# Convert the response column in train and test datasets to a factor
train[,y] <- as.factor(train[,y])
test[,y] <- as.factor(test[,y])
# Set number of folds for base learners
nfolds <- 3
# Train & Cross-validate a GBM model
my_gbm <- h2o.gbm(x = x,
y = y,
training_frame = train,
distribution = "bernoulli",
ntrees = 10,
nfolds = nfolds,
keep_cross_validation_predictions = TRUE,
seed = 1)
# Train & Cross-validate an RF model
my_rf <- h2o.randomForest(x = x,
y = y,
training_frame = train,
ntrees = 10,
nfolds = nfolds,
keep_cross_validation_predictions = TRUE,
seed = 1)
# Next we can train a few different ensembles using different metalearners
# Add metalearner parameters for GBM and RF
gbm_params <- list(ntrees = 100, max_depth = 2)
rf_params <- list(ntrees = 500, max_depth = 8)
# Train a stacked ensemble using GBM as the metalearner algorithm along with
# a list of specified GBM parameters
stack_gbm <- h2o.stackedEnsemble(x = x,
y = y,
training_frame = train,
base_models = list(my_gbm, my_rf),
metalearner_algorithm = "gbm",
metalearner_params = gbm_params)
h2o.auc(h2o.performance(stack_gbm, test))
# 0.7563162
# Train a stacked ensemble using DRF as the metalearner algorithm along with
# a list of specified DRF parameters
stack_rf <- h2o.stackedEnsemble(x = x,
y = y,
training_frame = train,
base_models = list(my_gbm, my_rf),
metalearner_algorithm = "drf",
metalearner_params = rf_params)
h2o.auc(h2o.performance(stack_rf, test))
# 0.7498578
import h2o
from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator
h2o.init()
# import the higgs_train_5k train and test datasets
train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv")
test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_test_5k.csv")
# Identify predictors and response
x = train.columns
y = "response"
x.remove(y)
# Convert the response column in train and test datasets to a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()
# Set number of folds for base learners
nfolds = 3
# Train and cross-validate a GBM model
my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli",
ntrees=10,
nfolds=nfolds,
fold_assignment="Modulo",
keep_cross_validation_predictions=True,
seed=1)
my_gbm.train(x=x, y=y, training_frame=train)
# Train and cross-validate an RF model
my_rf = H2ORandomForestEstimator(ntrees=50,
nfolds=nfolds,
fold_assignment="Modulo",
keep_cross_validation_predictions=True,
seed=1)
my_rf.train(x=x, y=y, training_frame=train)
# Next we can train a few different ensembles using different metalearners
# Add custom metalearner params for GBM and RF
gbm_params = {"ntrees": 100, "max_depth": 3}
rf_params = {"ntrees": 500, "max_depth": 8}
# Train a stacked ensemble using GBM as the metalearner algorithm along with
# a list of specified GBM parameters
stack_gbm = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf],
metalearner_algorithm="gbm",
metalearner_params=gbm_params)
stack_gbm.train(x=x, y=y, training_frame=train)
stack_gbm.model_performance(test).auc()
# 0.7576578946309993
# Train a stacked ensemble using RF as the metalearner algorithm along with
# a list of specified RF parameters
stack_rf = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf],
metalearner_algorithm="drf",
metalearner_params=rf_params)
stack_rf.train(x=x, y=y, training_frame=train)
stack_rf.model_performance(test).auc()
# 0.7525306981028109