This is an API for a new target encoding implemented in JAVA. Applies a target encoding map to an H2OFrame object. Computing target encoding for high cardinality categorical columns can improve performance of supervised learning models.

h2o.target_encode_transform(frame, x, y, target_encode_map, holdout_type,
  fold_column = NULL, blended_avg = TRUE, inflection_point = 10,
  smoothing = 20, noise = -1, seed = -1)

Arguments

frame

An H2OFrame object with which to apply the target encoding map.

x

List of categorical column names or indices that we want apply target encoding to. Case when item in the list is a list of multiple columns itself is not supported for now.

y

The name or column index of the response variable in the frame.

target_encode_map

An object that is a result of the calling h2o.target_encode_fit function.

holdout_type

Supported options:

1) "kfold" - encodings for a fold are generated based on out-of-fold data.

2) "loo" - leave one out. Current row's response value is subtracted from the pre-calculated per-level frequencies.

3) "none" - we do not holdout anything. Using whole frame for training

fold_column

(Optional) The name or column index of the fold column in the frame.

blended_avg

Logical. (Optional) Whether to perform blended average. Defaults to TRUE

inflection_point

(Optional) Parameter for blending. Used to calculate `lambda`. Determines half of the minimal sample size for which we completely trust the estimate based on the sample in the particular level of categorical variable. Default value is 10.

smoothing

(Optional) Parameter for blending. Used to calculate `lambda`. Controls the rate of transition between the particular level's posterior probability and the prior probability. For smoothing values approaching infinity it becomes a hard threshold between the posterior and the prior probability. Default value is 20.

noise

(Optional) The amount of random noise added to the target encoding. This helps prevent overfitting. Defaults to 0.01 * range of y.

seed

(Optional) A random seed used to generate draws from the uniform distribution for random noise. Defaults to -1.

Value

Returns an H2OFrame object containing the target encoding per record.

See also

h2o.target_encode_fit for creating the target encoding map