Impute columns of a data frame in place.
This impute can impute whole Frames or a specific Vec within the Frame. Imputation
will be by the default mean (for numeric columns) or mode (for categorical columns).
String, date, and UUID columns are never imputed.
When a Vec is specified to be imputed, it can alternatively be imputed by grouping on
some other columns in the Frame. If groupByCols is specified, but the user does not
supply a column to be imputed then an IllegalArgumentException will be raised. Further,
if the user specifies the column to impute within the groupByCols, exceptions will be
raised.
The methods that a user may impute by are as follows:
- mean: Vec.T_NUM
- median: Vec.T_NUM
- mode: Vec.T_CAT
- bfill: Any valid Vec type
- ffill: Any valid Vec type
All methods of imputation are done in place! The first three methods (mean, median,
mode) are self-explanatory. The bfill and ffill methods will attempt to fill NAs using
adjacent cell value (either before or forward):
Vec = [ bfill_value, NA, ffill_value]
| ^^ |
-> || <-
impute
If the impute method is median then the combineMethod can be one of the Enum variants
of QuantileModel.CombineMethod = { INTERPOLATE, AVERAGE, LOW, HIGH }. The Enum
specifies how to combine quantiles on even sample sizes. This parameter is ignored in
all other cases.
Finally, the groupByFrame can be used to impute a column with a pre-computed groupby
result.
Other notes:
If col is -1, then the entire Frame will be imputed using mean/mode where appropriate.