metalearner_params

  • Available in: Stacked Ensembles

  • Hyperparmeter: no

Description

Stacked Ensemble allows you to specify the metalearning algorithm to use when training the ensemble. When metalearner_algorithm is set to a non-default value (e.g. “GBM”), Stacked Ensemble runs with the specified algorithm’s default hyperparameter values. The metalearner_params option allows you to pass in a dictionary/list of hyperparameters to use for that algorithm instead of the defaults.

The default parameters for the metalearning algorithms may not be the best choice, so it’s a good idea to experiment a bit with different parameters using metalearner_params. In the next release of H2O, there will be an option to easily do a grid search over metalearner parameters using the standard H2O Grid interface, which will make tuning the metalearner a bit easier.

Note: The seed argument in Stacked Ensemble is passed through to the metalearner automatically. If you define seed in metalearner_params, it will use that value instead of value defined by the seed argument. If the only parameter that you want to customze for the metalearner is seed, then it’s simpler to just use top-level argument instead.

Example

library(h2o)
h2o.init()

# import the higgs_train_5k train and test datasets
train <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv")
test <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_test_5k.csv")

# Identify predictors and response
y <- "response"
x <- setdiff(names(train), y)

# Convert the response column in train and test datasets to a factor
train[, y] <- as.factor(train[, y])
test[, y] <- as.factor(test[, y])

# Set number of folds for base learners
nfolds <- 3


# Train & Cross-validate a GBM model
my_gbm <- h2o.gbm(x = x,
                  y = y,
                  training_frame = train,
                  distribution = "bernoulli",
                  ntrees = 10,
                  nfolds = nfolds,
                  keep_cross_validation_predictions = TRUE,
                  seed = 1)

# Train & Cross-validate an RF model
my_rf <- h2o.randomForest(x = x,
                          y = y,
                          training_frame = train,
                          ntrees = 10,
                          nfolds = nfolds,
                          keep_cross_validation_predictions = TRUE,
                          seed = 1)


# Next we can train a few different ensembles using different metalearners

# Add metalearner parameters for GBM and RF
gbm_params <- list(ntrees = 100, max_depth = 2)
rf_params <- list(ntrees = 500, max_depth = 8)

# Train a stacked ensemble using GBM as the metalearner algorithm along with
# a list of specified GBM parameters
stack_gbm <- h2o.stackedEnsemble(x = x,
                                 y = y,
                                 training_frame = train,
                                 base_models = list(my_gbm, my_rf),
                                 metalearner_algorithm = "gbm",
                                 metalearner_params = gbm_params)
h2o.auc(h2o.performance(stack_gbm, test))
# 0.7563162


# Train a stacked ensemble using DRF as the metalearner algorithm along with
# a list of specified DRF parameters
stack_rf <- h2o.stackedEnsemble(x = x,
                                y = y,
                                training_frame = train,
                                base_models = list(my_gbm, my_rf),
                                metalearner_algorithm = "drf",
                                metalearner_params = rf_params)
h2o.auc(h2o.performance(stack_rf, test))
# 0.7498578
import h2o
from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator
h2o.init()

# import the higgs_train_5k train and test datasets
train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv")
test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_test_5k.csv")

# Identify predictors and response
x = train.columns
y = "response"
x.remove(y)

# Convert the response column in train and test datasets to a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()

# Set number of folds for base learners
nfolds = 3


# Train and cross-validate a GBM model
my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli",
                                      ntrees=10,
                                      nfolds=nfolds,
                                      fold_assignment="Modulo",
                                      keep_cross_validation_predictions=True,
                                      seed=1)
my_gbm.train(x=x, y=y, training_frame=train)

# Train and cross-validate an RF model
my_rf = H2ORandomForestEstimator(ntrees=50,
                                 nfolds=nfolds,
                                 fold_assignment="Modulo",
                                 keep_cross_validation_predictions=True,
                                 seed=1)
my_rf.train(x=x, y=y, training_frame=train)


# Next we can train a few different ensembles using different metalearners

# Add custom metalearner params for GBM and RF
gbm_params = {"ntrees": 100, "max_depth": 3}
rf_params = {"ntrees": 500, "max_depth": 8}

# Train a stacked ensemble using GBM as the metalearner algorithm along with
# a list of specified GBM parameters
stack_gbm = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf],
                                        metalearner_algorithm="gbm",
                                        metalearner_params=gbm_params)
stack_gbm.train(x=x, y=y, training_frame=train)
stack_gbm.model_performance(test).auc()
# 0.7576578946309993


# Train a stacked ensemble using RF as the metalearner algorithm along with
# a list of specified RF parameters
stack_rf = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf],
                                       metalearner_algorithm="drf",
                                       metalearner_params=rf_params)
stack_rf.train(x=x, y=y, training_frame=train)
stack_rf.model_performance(test).auc()
# 0.7525306981028109