Transform Frame by Target Encoding Map

This is an API for a new target encoding implemented in JAVA. Applies a target encoding map to an H2OFrame object. Computing target encoding for high cardinality categorical columns can improve performance of supervised learning models.

h2o.target_encode_transform(frame, x, y, target_encode_map, holdout_type,
  fold_column = NULL, blended_avg = TRUE, inflection_point = NULL,
  smoothing = NULL, noise = -1, seed = -1)

Arguments

frame	An H2OFrame object with which to apply the target encoding map.
x	List of categorical column names or indices that we want apply target encoding to. Case when item in the list is a list of multiple columns itself is not supported for now.
y	The name or column index of the response variable in the frame.
target_encode_map	An object that is a result of the calling `h2o.target_encode_fit` function.
holdout_type	Supported options: 1) "kfold" - encodings for a fold are generated based on out-of-fold data. 2) "loo" - leave one out. Current row's response value is subtracted from the pre-calculated per-level frequencies. 3) "none" - we do not holdout anything. Using whole frame for training
fold_column	(Optional) The name or column index of the fold column in the frame.
blended_avg	`Logical`. (Optional) Whether to perform blended average. Defaults to TRUE
inflection_point	Parameter for blending. Used to calculate `lambda`. Determines half of the minimal sample size for which we completely trust the estimate based on the sample in the particular level of categorical variable.
smoothing	Parameter for blending. Used to calculate `lambda`. Controls the rate of transition between the particular level's posterior probability and the prior probability. For smoothing values approaching infinity it becomes a hard threshold between the posterior and the prior probability.
noise	(Optional) The amount of random noise added to the target encoding. This helps prevent overfitting. Defaults to 0.01 * range of y.
seed	(Optional) A random seed used to generate draws from the uniform distribution for random noise. Defaults to -1.

Value

Returns an H2OFrame object containing the target encoding per record.

Arguments

Value

See also

Contents