Applies a target encoding map to an H2OFrame object. Computing target encoding for high cardinality categorical columns can improve performance of supervised learning models. A Target Encoding tutorial is available here: https://github.com/h2oai/h2o-tutorials/blob/master/best-practices/categorical-predictors/target_encoding.md.
h2o.target_encode_apply( data, x, y, target_encode_map, holdout_type, fold_column = NULL, blended_avg = TRUE, noise_level = NULL, seed = -1 )
data | An H2OFrame object with which to apply the target encoding map. |
---|---|
x | A list containing the names or indices of the variables to encode. A target encoding column will be created for each element in the list. Items in the list can be multiple columns. For example, if `x = list(c("A"), c("B", "C"))`, then the resulting frame will have a target encoding column for A and a target encoding column for B & C (in this case, we group by two columns). |
y | The name or column index of the response variable in the data. The response variable can be either numeric or binary. |
target_encode_map | A list of H2OFrame objects that is the results of the |
holdout_type | The holdout type used. Must be one of: "LeaveOneOut", "KFold", "None". |
fold_column | (Optional) The name or column index of the fold column in the data. Defaults to NULL (no `fold_column`). Only required if `holdout_type` = "KFold". |
blended_avg |
|
noise_level | (Optional) The amount of random noise added to the target encoding. This helps prevent overfitting. Defaults to 0.01 * range of y. |
seed | (Optional) A random seed used to generate draws from the uniform distribution for random noise. Defaults to -1. |
Returns an H2OFrame object containing the target encoding per record.
h2o.target_encode_create
for creating the target encoding map
# NOT RUN { library(h2o) h2o.init() # Get Target Encoding Frame on bank-additional-full data with numeric `y` data <- h2o.importFile( path = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/demos/bank-additional-full.csv") splits <- h2o.splitFrame(data, seed = 1234) train <- splits[[1]] test <- splits[[2]] mapping <- h2o.target_encode_create(data = train, x = list(c("job"), c("job", "marital")), y = "age") # Apply mapping to the training dataset train_encode <- h2o.target_encode_apply(data = train, x = list(c("job"), c("job", "marital")), y = "age", mapping, holdout_type = "LeaveOneOut") # Apply mapping to a test dataset test_encode <- h2o.target_encode_apply(data = test, x = list(c("job"), c("job", "marital")), y = "age", target_encode_map = mapping, holdout_type = "None") # }