Creates a target encoding map based on group-by columns (`x`) and a numeric or binary target column (`y`). Computing target encoding for high cardinality categorical columns can improve performance of supervised learning models.
h2o.target_encode_create(data, x, y, fold_column = NULL)
data | An H2OFrame object with which to create the target encoding map. |
---|---|
x | A list containing the names or indices of the variables to encode. A target encoding map will be created for each element in the list. Items in the list can be multiple columns. For example, if `x = list(c("A"), c("B", "C"))`, then there will be one mapping frame for A and one mapping frame for B & C (in this case, we group by two columns). |
y | The name or column index of the response variable in the data. The response variable can be either numeric or binary. |
fold_column | (Optional) The name or column index of the fold column in the data. Defaults to NULL (no `fold_column`). |
Returns a list of H2OFrame objects containing the target encoding mapping for each column in `x`.
h2o.target_encode_apply
for applying the target encoding mapping to a frame.
# NOT RUN { library(h2o) h2o.init() # Get Target Encoding Map on bank-additional-full data with numeric response data <- h2o.importFile( path = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/demos/bank-additional-full.csv", destination_frame = "data") mapping_age <- h2o.target_encode_create(data = data, x = list(c("job"), c("job", "marital")), y = "age") head(mapping_age) # Get Target Encoding Map on bank-additional-full data with binary response mapping_y <- h2o.target_encode_create(data = data, x = list(c("job"), c("job", "marital")), y = "y") head(mapping_y) # }