Parameters of H2OTargetEncoder

Affected Class

  • ai.h2o.sparkling.ml.features.H2OTargetEncoder

Parameters

  • Each parameter has also a corresponding getter and setter method. (E.g.: label -> getLabel() , setLabel(...) )

blendedAvgEnabled

If set, the target average becomes a weighted average of the posterior average for a given categorical level and the prior average of the target. The weight is determined by the size of the given group that the row belongs to.

Scala default value: false ; Python default value: False

Also available on the trained model.

blendedAvgInflectionPoint

A parameter of the blended average. The bigger number is set, the groups relatively bigger to the overall data set size will consider the global target value as a component in the weighted average.

Default value: 10.0

Also available on the trained model.

blendedAvgSmoothing

A parameter of blended average. Controls the rate of transition between a group target value and a global target value.

Default value: 20.0

Also available on the trained model.

foldCol

A name of a column determining folds when KFold holdoutStrategy is applied.

Scala default value: null ; Python default value: None

Also available on the trained model.

holdoutStrategy

A strategy deciding what records will be excluded when calculating the target average on the training dataset. Options:

  • None - All rows are considered for the calculation

  • LeaveOneOut - All rows except the row the calculation is made for

  • KFold - Only out-of-fold data is considered (The option requires foldCol to be set.)

Default value: "None"

Also available on the trained model.

inputCols

Names of columns that will be transformed.

Scala default value: Array() ; Python default value: []

Also available on the trained model.

labelCol

Label column name.

Default value: "label"

Also available on the trained model.

noise

Amount of random noise added to output values of training dataset. Noise addition can be disabled by setting the parameter to 0.0

Default value: 0.01

Also available on the trained model.

noiseSeed

A seed of the generator producing the random noise.

Scala default value: -1L ; Python default value: -1

Also available on the trained model.

outputCols

Names of columns representing the result of target encoding. If the parameter is not specified by user, the output column names will be automatically derived from inputCols by appending the suffix _te.

Scala default value: Array() ; Python default value: []

Also available on the trained model.

problemType

A type of ML problem type for which the target encoder will be used for:

  • Auto - If this option is chosen (default), the problem type will be automatically resolved based on the data type of labelCol. If the data type of labelCol is boolean or string, classification is chosen. Otherwise, the target encoder chooses regression.

  • Classification - A classification problem

  • Regression - A regression problem

Default value: "Auto"

Also available on the trained model.