``validation_frame`` -------------------- - Available in: GBM, DRF, Deep Learning, GLM, PCA, GLRM, Naïve-Bayes, K-Means - Hyperparameter: no Description ~~~~~~~~~~~ Datasets are commonly split into training, testing, and validation sets. When splitting a dataset, the bulk of the data goes into the training dataset, with small portions held out for the testing and validation dataframes. While the ``training_frame`` is used to build the model, the ``validation_frame`` is used to compare against the adjusted model and evaluate the model's accuracy. Typically, the model will include sampled data which will then be compared against the validation frame's unsampled data. The recommended process is to train on the training set and stop early based on the validation set (and/or cross-validation). When you find a good model, you score it once on the test set to estimate the generalization error. Related Parameters ~~~~~~~~~~~~~~~~~~ - `model_id `_ - `training_frame `__ Example ~~~~~~~ .. example-code:: .. code-block:: r library(h2o) h2o.init() # import the cars dataset: # this dataset is used to classify whether or not a car is economical based on # the car's displacement, power, weight, and acceleration, and the year it was made cars <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") # convert response column to a factor cars["economy_20mpg"] <- as.factor(cars["economy_20mpg"]) # set the predictor names and the response column name predictors <- c("displacement","power","weight","acceleration","year") response <- "economy_20mpg" # split into train and validation sets cars.split <- h2o.splitFrame(data = cars,ratios = 0.8, seed = 1234) train <- cars.split[[1]] valid <- cars.split[[2]] # try using the `validation_frame` parameter: # train your model, where you specify your 'x' predictors, your 'y' the response column # training_frame and validation_frame cars_gbm <- h2o.gbm(x = predictors, y = response, training_frame = train, validation_frame = valid, seed = 1234) # print the auc for your model print(h2o.auc(cars_gbm, valid = TRUE)) .. code-block:: python import h2o from h2o.estimators.gbm import H2OGradientBoostingEstimator h2o.init() h2o.cluster().show_status() # import the cars dataset: # this dataset is used to classify whether or not a car is economical based on # the car's displacement, power, weight, and acceleration, and the year it was made cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") # convert response column to a factor cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() # set the predictor names and the response column name predictors = ["displacement","power","weight","acceleration","year"] response = "economy_20mpg" # split into train and validation sets train, valid = cars.split_frame(ratios = [.8], seed = 1234) # try using the `validation_frame` parameter: # first initialize your estimator cars_gbm = H2OGradientBoostingEstimator(seed = 1234) # then train your model, where you specify your 'x' predictors, your 'y' the response column # training_frame and validation_frame cars_gbm.train(x = predictors, y = response, training_frame = train, validation_frame = valid) # print the auc for the validation data cars_gbm.auc(valid=True)