R/stackedensemble.R
Build a stacked ensemble (aka. Super Learner) using the H2O base learning algorithms specified by the user.
h2o.stackedEnsemble(x, y, training_frame, model_id = NULL, validation_frame = NULL, base_models = list(), metalearner_algorithm = c("AUTO", "glm", "gbm", "drf", "deeplearning"), metalearner_nfolds = 0, metalearner_fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), metalearner_fold_column = NULL, keep_levelone_frame = FALSE, seed = -1, metalearner_params = NULL)
x | (Optional). A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. Training frame is used only to compute ensemble training metrics. |
---|---|
y | The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. |
training_frame | Id of the training data frame. |
model_id | Destination id for this model; auto-generated if not specified. |
validation_frame | Id of the validation data frame. |
base_models | List of models (or model ids) to ensemble/stack together. Models must have been cross-validated using nfolds > 1, and folds must be identical across models. Defaults to []. |
metalearner_algorithm | Type of algorithm to use as the metalearner. Options include 'AUTO' (GLM with non negative weights; if validation_frame is present, a lambda search is performed), 'glm' (GLM with default parameters), 'gbm' (GBM with default parameters), 'drf' (Random Forest with default parameters), or 'deeplearning' (Deep Learning with default parameters). Must be one of: "AUTO", "glm", "gbm", "drf", "deeplearning". Defaults to AUTO. |
metalearner_nfolds | Number of folds for K-fold cross-validation of the metalearner algorithm (0 to disable or >= 2). Defaults to 0. |
metalearner_fold_assignment | Cross-validation fold assignment scheme for metalearner cross-validation. Defaults to AUTO (which is currently set to Random). The 'Stratified' option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". |
metalearner_fold_column | Column with cross-validation fold index assignment per observation for cross-validation of the metalearner. |
keep_levelone_frame |
|
seed | Seed for random numbers; passed through to the metalearner algorithm. Defaults to -1 (time-based random number) Defaults to -1 (time-based random number). |
metalearner_params | Parameters for metalearner algorithm Defaults to NULL. |
# NOT RUN { # See example R code here: # http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/stacked-ensembles.html # }