Parameters of H2OTargetEncoder¶
Affected Class¶
ai.h2o.sparkling.ml.features.H2OTargetEncoder
Parameters¶
Each parameter has also a corresponding getter and setter method. (E.g.:
label
->getLabel()
,setLabel(...)
)
- blendedAvgEnabled
If set, the target average becomes a weighted average of the posterior average for a given categorical level and the prior average of the target. The weight is determined by the size of the given group that the row belongs to.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- blendedAvgInflectionPoint
A parameter of the blended average. The bigger number is set, the groups relatively bigger to the overall data set size will consider the global target value as a component in the weighted average.
Default value:
10.0
Also available on the trained model.
- blendedAvgSmoothing
A parameter of blended average. Controls the rate of transition between a group target value and a global target value.
Default value:
20.0
Also available on the trained model.
- foldCol
A name of a column determining folds when
KFold
holdoutStrategy is applied.Scala default value:
null
; Python default value:None
Also available on the trained model.
- holdoutStrategy
A strategy deciding what records will be excluded when calculating the target average on the training dataset. Options:
None
- All rows are considered for the calculationLeaveOneOut
- All rows except the row the calculation is made forKFold
- Only out-of-fold data is considered (The option requires foldCol to be set.)
Default value:
"None"
Also available on the trained model.
- inputCols
Names of columns that will be transformed.
Scala default value:
Array()
; Python default value:[]
Also available on the trained model.
- labelCol
Label column name.
Default value:
"label"
Also available on the trained model.
- noise
Amount of random noise added to output values of training dataset. Noise addition can be disabled by setting the parameter to 0.0
Default value:
0.01
Also available on the trained model.
- noiseSeed
A seed of the generator producing the random noise.
Scala default value:
-1L
; Python default value:-1
Also available on the trained model.
- outputCols
Names of columns representing the result of target encoding. If the parameter is not specified by user, the output column names will be automatically derived from
inputCols
by appending the suffix _te.Scala default value:
Array()
; Python default value:[]
Also available on the trained model.
- problemType
A type of ML problem type for which the target encoder will be used for:
Auto
- If this option is chosen (default), the problem type will be automatically resolved based on the data type of labelCol. If the data type of labelCol is boolean or string, classification is chosen. Otherwise, the target encoder chooses regression.Classification
- A classification problemRegression
- A regression problem
Default value:
"Auto"
Also available on the trained model.