Provides a set of functions to launch a grid search and get its results.
h2o.grid( algorithm, grid_id, x, y, training_frame, ..., hyper_params = list(), is_supervised = NULL, do_hyper_params_check = FALSE, search_criteria = NULL, export_checkpoints_dir = NULL, parallelism = 1 )
| algorithm | Name of algorithm to use in grid search (gbm, randomForest, kmeans, glm, deeplearning, naivebayes, pca). | 
|---|---|
| grid_id | (Optional) ID for resulting grid search. If it is not specified then it is autogenerated. | 
| x | (Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. | 
| y | The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. | 
| training_frame | Id of the training data frame. | 
| ... | arguments describing parameters to use with algorithm (i.e., x, y, training_frame). Look at the specific algorithm - h2o.gbm, h2o.glm, h2o.kmeans, h2o.deepLearning - for available parameters. | 
| hyper_params | List of lists of hyper parameters (i.e.,  | 
| is_supervised | (Optional) If specified then override the default heuristic which decides if the given algorithm name and parameters specify a supervised or unsupervised algorithm. | 
| do_hyper_params_check | Perform client check for specified hyper parameters. It can be time expensive for large hyper space. | 
| search_criteria | (Optional)  List of control parameters for smarter hyperparameter search.  The list can 
include values for: strategy, max_models, max_runtime_secs, stopping_metric, stopping_tolerance, stopping_rounds and
seed.  The default strategy 'Cartesian' covers the entire space of hyperparameter combinations.  If you want to use
cartesian grid search, you can leave the search_criteria argument unspecified. Specify the "RandomDiscrete" strategy
to get random search of all the combinations of your hyperparameters with three ways of specifying when to stop the
search: max number of models, max time, and metric-based early stopping (e.g., stop if MSE has not improved by 0.0001
over the 5 best models). Examples below:
 | 
| export_checkpoints_dir | Directory to automatically export grid in binary form to. | 
| parallelism | Level of Parallelism during grid model building. 1 = sequential building (default). Use the value of 0 for adaptive parallelism - decided by H2O. Any number > 1 sets the exact number of models built in parallel. | 
Launch grid search with given algorithm and parameters.
# NOT RUN { library(h2o) library(jsonlite) h2o.init() iris_hf <- as.h2o(iris) grid <- h2o.grid("gbm", x = c(1:4), y = 5, training_frame = iris_hf, hyper_params = list(ntrees = c(1, 2, 3))) # Get grid summary summary(grid) # Fetch grid models model_ids <- grid@model_ids models <- lapply(model_ids, function(id) { h2o.getModel(id)}) # }