Provides a set of functions to train a group of models on different segments (subpopulations) of the training set.
h2o.train_segments( algorithm, segment_columns, segment_models_id, parallelism = 1, ... )
algorithm | Name of algorithm to use in training segment models (gbm, randomForest, kmeans, glm, deeplearning, naivebayes, psvm, xgboost, pca, svd, targetencoder, aggregator, word2vec, coxph, isolationforest, kmeans, stackedensemble, glrm, gam, anovaglm, maxrglm). |
---|---|
segment_columns | A list of columns to segment-by. H2O will group the training (and validation) dataset by the segment-by columns and train a separate model for each segment (group of rows). |
segment_models_id | Identifier for the returned collection of Segment Models. If not specified it will be automatically generated. |
parallelism | Level of parallelism of bulk model building, it is the maximum number of models each H2O node will be building in parallel, defaults to 1. |
... | Use to pass along training_frame parameter, x, y, and all non-default parameter values to the algorithm Look at the specific algorithm - h2o.gbm, h2o.glm, h2o.kmeans, h2o.deepLearning - for available parameters. |
Start Segmented-Data bulk Model Training for a given algorithm and parameters.
# NOT RUN { library(h2o) h2o.init() iris_hf <- as.h2o(iris) models <- h2o.train_segments(algorithm = "gbm", segment_columns = "Species", x = c(1:3), y = 4, training_frame = iris_hf, ntrees = 5, max_depth = 4) as.data.frame(models) # }