Provides a set of functions to launch a grid search and get its results.
h2o.grid( algorithm, grid_id, x, y, training_frame, ..., hyper_params = list(), is_supervised = NULL, do_hyper_params_check = FALSE, search_criteria = NULL, export_checkpoints_dir = NULL, recovery_dir = NULL, parallelism = 1 )
algorithm | Name of algorithm to use in grid search (gbm, randomForest, kmeans, glm, deeplearning, naivebayes, pca). |
---|---|
grid_id | (Optional) ID for resulting grid search. If it is not specified then it is autogenerated. |
x | (Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. |
y | The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. |
training_frame | Id of the training data frame. |
... | arguments describing parameters to use with algorithm (i.e., x, y, training_frame). Look at the specific algorithm - h2o.gbm, h2o.glm, h2o.kmeans, h2o.deepLearning - for available parameters. |
hyper_params | List of lists of hyper parameters (i.e., |
is_supervised | (Optional) If specified then override the default heuristic which decides if the given algorithm name and parameters specify a supervised or unsupervised algorithm. |
do_hyper_params_check | Perform client check for specified hyper parameters. It can be time expensive for large hyper space. |
search_criteria | (Optional) List of control parameters for smarter hyperparameter search. The list can
include values for: strategy, max_models, max_runtime_secs, stopping_metric, stopping_tolerance, stopping_rounds and
seed. The default strategy 'Cartesian' covers the entire space of hyperparameter combinations. If you want to use
cartesian grid search, you can leave the search_criteria argument unspecified. Specify the "RandomDiscrete" strategy
to get random search of all the combinations of your hyperparameters with three ways of specifying when to stop the
search: max number of models, max time, and metric-based early stopping (e.g., stop if MSE has not improved by 0.0001
over the 5 best models). Examples below:
|
export_checkpoints_dir | Directory to automatically export grid and its models to. |
recovery_dir | When specified the grid and all necessary data (frames, models) will be saved to this
directory (use HDFS or other distributed file-system). Should the cluster crash during training, the grid
can be reloaded from this directory via |
parallelism | Level of Parallelism during grid model building. 1 = sequential building (default). Use the value of 0 for adaptive parallelism - decided by H2O. Any number > 1 sets the exact number of models built in parallel. |
Launch grid search with given algorithm and parameters.
# NOT RUN { library(h2o) library(jsonlite) h2o.init() iris_hf <- as.h2o(iris) grid <- h2o.grid("gbm", x = c(1:4), y = 5, training_frame = iris_hf, hyper_params = list(ntrees = c(1, 2, 3))) # Get grid summary summary(grid) # Fetch grid models model_ids <- grid@model_ids models <- lapply(model_ids, function(id) { h2o.getModel(id)}) # }