max_active_predictors
¶
- Available in: GLM
- Hyperparameter: no
Description¶
This option limits the number of active predictors. (Note that the actual number of non-zero predictors in the model is going to be slightly lower). It is useful when obtaining a sparse solution to avoid costly computation of models with too many predictors.
When using the λ1 penalty with lambda search, this option will stop the search before it completes. Models built at the beginning of the lambda search have higher lambda alues, consider fewer predictors, and take less time to calculate the model. Models built at the end of the lambda search have lower lambda values, incorporate more predictors, and take a longer time to calculate the model. Set the nlambdas
parameter for a lambda search to specify the number of models attempted across the search.
Default Value
- If
solver
is IRLSM, COORDINATE_DESCENT, or COORDINATE_DESCENT_NAIVE, thenmax_active_predictors
defaults to 5000. - If lambda search is disabled,
alpha
< 0,solver
is AUTO, and you have less than 5000 active predictors, then thesolver
will be IRLSM, andmax_active_predictors
defaults to 5000. - If you run lambda search with
alpha
> 0, andsolver
is AUTO, thensolver
will be COORDINATE_DESCENT, andmax_active_predictors
will default to 5000. - For all other scenarios,
max_active_predictors
will default to 100000000.
Example¶
- r
- python
library(h2o)
h2o.init()
# import the higgs dataset:
# This dataset is used to classify whether or not a signal process produces a Higgs bosons.
# original data can be found at https://archive.ics.uci.edu/ml/datasets/HIGGS
higgs <- h2o.importFile("https://h2o-public-test-data.s3.amazonaws.com/smalldata/testng/higgs_train_5k.csv")
# set the predictor names and the response column name
predictors <- colnames(higgs)[-1]
response <- "response"
# split into train and validation
higgs.splits <- h2o.splitFrame(data = higgs, ratios = .8)
train <- higgs.splits[[1]]
valid <- higgs.splits[[2]]
# try using the `max_active_predictors` parameter:
higgs_glm <- h2o.glm(family = 'binomial', x = predictors, y = response, training_frame = train,
validation_frame = valid,
max_active_predictors = 200)
# print the AUC for the validation data
print(h2o.auc(higgs_glm, valid = TRUE))