Applies a target encoding map to an H2OFrame object. Computing target encoding for high cardinality categorical columns can improve performance of supervised learning models.

h2o.target_encode_transform(frame, x, y, target_encode_map, holdout_type,
  fold_column = NULL, blended_avg = TRUE, inflection_point = NULL,
  smoothing = NULL, noise = -1, seed = -1)

Arguments

frame

An H2OFrame object with which to apply the target encoding map.

x

List of categorical column names or indices that we want apply target encoding to.

y

The name or column index of the response variable in the frame.

target_encode_map

An object that is a result of the calling h2o.target_encode_fit function.

holdout_type

Supported options:

1) "kfold" - encodings for a fold are generated based on out-of-fold data. 2) "loo" - leave one out. Current row's response value is subtracted from the pre-calculated per-level frequencies. 3) "none" - we do not holdout anything. Using whole frame for training

fold_column

(Optional) The name or column index of the fold column in the frame.

blended_avg

Logical. (Optional) Whether to perform blended average. Defaults to TRUE

inflection_point

Parameter for blending. Used to calculate `lambda`. Determines half of the minimal sample

smoothing

Parameter for blending. Used to calculate `lambda`. Controls the rate of transition between

noise

(Optional) The amount of random noise added to the target encoding. This helps prevent overfitting. Defaults to 0.01 * range of y.

seed

(Optional) A random seed used to generate draws from the uniform distribution for random noise. Defaults to -1.

Value

Returns an H2OFrame object containing the target encoding per record.

See also

h2o.target_encode_fit for creating the target encoding map