This is an API for a new target encoding implemented in JAVA. Applies a target encoding map to an H2OFrame object. Computing target encoding for high cardinality categorical columns can improve performance of supervised learning models.
h2o.target_encode_transform(frame, x, y, target_encode_map, holdout_type, fold_column = NULL, blended_avg = TRUE, inflection_point = NULL, smoothing = NULL, noise = -1, seed = -1)
frame | An H2OFrame object with which to apply the target encoding map. |
---|---|
x | List of categorical column names or indices that we want apply target encoding to. Case when item in the list is a list of multiple columns itself is not supported for now. |
y | The name or column index of the response variable in the frame. |
target_encode_map | An object that is a result of the calling |
holdout_type | Supported options: 1) "kfold" - encodings for a fold are generated based on out-of-fold data. 2) "loo" - leave one out. Current row's response value is subtracted from the pre-calculated per-level frequencies. 3) "none" - we do not holdout anything. Using whole frame for training |
fold_column | (Optional) The name or column index of the fold column in the frame. |
blended_avg |
|
inflection_point | Parameter for blending. Used to calculate `lambda`. Determines half of the minimal sample size for which we completely trust the estimate based on the sample in the particular level of categorical variable. |
smoothing | Parameter for blending. Used to calculate `lambda`. Controls the rate of transition between the particular level's posterior probability and the prior probability. For smoothing values approaching infinity it becomes a hard threshold between the posterior and the prior probability. |
noise | (Optional) The amount of random noise added to the target encoding. This helps prevent overfitting. Defaults to 0.01 * range of y. |
seed | (Optional) A random seed used to generate draws from the uniform distribution for random noise. Defaults to -1. |
Returns an H2OFrame object containing the target encoding per record.
h2o.target_encode_fit
for creating the target encoding map