Parameters of H2ODRF¶
Affected Classes¶
ai.h2o.sparkling.ml.algos.H2ODRF
ai.h2o.sparkling.ml.algos.classification.H2ODRFClassifier
ai.h2o.sparkling.ml.algos.regression.H2ODRFRegressor
Parameters¶
Each parameter has also a corresponding getter and setter method. (E.g.:
label
->getLabel()
,setLabel(...)
)
- calibrationDataFrame
Calibration frame for Platt Scaling. To enable usage of the data frame, set the parameter calibrateModel to True.
Scala default value:
null
; Python default value:None
- ignoredCols
Names of columns to ignore for training.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- aucType
Set default multinomial AUC type. Possible values are
"AUTO"
,"NONE"
,"MACRO_OVR"
,"WEIGHTED_OVR"
,"MACRO_OVO"
,"WEIGHTED_OVO"
.Default value:
"AUTO"
Also available on the trained model.
- balanceClasses
Balance training data class counts via over/under-sampling (for imbalanced data).
Scala default value:
false
; Python default value:False
Also available on the trained model.
- binomialDoubleTrees
For binary classification: Build 2x as many trees (one per class) - can lead to higher accuracy.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- buildTreeOneNode
Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- calibrateModel
Use Platt Scaling (default) or Isotonic Regression to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- calibrationMethod
Calibration method to use. Possible values are
"AUTO"
,"PlattScaling"
,"IsotonicRegression"
.Default value:
"AUTO"
Also available on the trained model.
- categoricalEncoding
Encoding scheme for categorical features. Possible values are
"AUTO"
,"OneHotInternal"
,"OneHotExplicit"
,"Enum"
,"Binary"
,"Eigen"
,"LabelEncoder"
,"SortByResponse"
,"EnumLimited"
.Default value:
"AUTO"
Also available on the trained model.
- checkConstantResponse
Check if response column is constant. If enabled, then an exception is thrown if the response column is a constant value.If disabled, then model will train regardless of the response column being a constant value or not.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- classSamplingFactors
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- colSampleRateChangePerLevel
Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0).
Default value:
1.0
Also available on the trained model.
- colSampleRatePerTree
Column sample rate per tree (from 0.0 to 1.0).
Default value:
1.0
Also available on the trained model.
- columnsToCategorical
List of columns to convert to categorical before modelling
Scala default value:
Array()
; Python default value:[]
- convertInvalidNumbersToNa
If set to ‘true’, the model converts invalid numbers to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- convertUnknownCategoricalLevelsToNa
If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- customMetricFunc
Reference to custom evaluation function, format: language:keyName=funcName.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- dataFrameSerializer
A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.
Default value:
"ai.h2o.sparkling.utils.JSONDataFrameSerializer"
Also available on the trained model.
- detailedPredictionCol
Column containing additional prediction details, its content depends on the model type.
Default value:
"detailed_prediction"
Also available on the trained model.
- distribution
Distribution function. Possible values are
"AUTO"
,"bernoulli"
,"quasibinomial"
,"modified_huber"
,"multinomial"
,"ordinal"
,"gaussian"
,"poisson"
,"gamma"
,"tweedie"
,"huber"
,"laplace"
,"quantile"
,"fractionalbinomial"
,"negativebinomial"
,"custom"
.Default value:
"AUTO"
Also available on the trained model.
- exportCheckpointsDir
Automatically export generated models to this directory.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- featuresCols
Name of feature columns
Scala default value:
Array()
; Python default value:[]
Also available on the trained model.
- foldAssignment
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. Possible values are
"AUTO"
,"Random"
,"Modulo"
,"Stratified"
.Default value:
"AUTO"
Also available on the trained model.
- foldCol
Column with cross-validation fold index assignment per observation.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- gainsliftBins
Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning.
Default value:
-1
Also available on the trained model.
- histogramType
What type of histogram to use for finding optimal split points. Possible values are
"AUTO"
,"UniformAdaptive"
,"Random"
,"QuantilesGlobal"
,"RoundRobin"
,"UniformRobust"
.Default value:
"AUTO"
Also available on the trained model.
- ignoreConstCols
Ignore constant columns.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- keepBinaryModels
If set to true, all binary models created during execution of the
fit
method will be kept in DKV of H2O-3 cluster.Scala default value:
false
; Python default value:False
- keepCrossValidationFoldAssignment
Whether to keep the cross-validation fold assignment.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- keepCrossValidationModels
Whether to keep the cross-validation models.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- keepCrossValidationPredictions
Whether to keep the predictions of the cross-validation models.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- labelCol
Response variable column.
Default value:
"label"
Also available on the trained model.
- maxAfterBalanceSize
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
Scala default value:
5.0f
; Python default value:5.0
Also available on the trained model.
- maxConfusionMatrixSize
[Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs.
Default value:
20
Also available on the trained model.
- maxDepth
Maximum tree depth (0 for unlimited).
Default value:
20
Also available on the trained model.
- maxRuntimeSecs
Maximum allowed runtime in seconds for model training. Use 0 to disable.
Default value:
0.0
Also available on the trained model.
- minRows
Fewest allowed (weighted) observations in a leaf.
Default value:
1.0
Also available on the trained model.
- minSplitImprovement
Minimum relative improvement in squared error reduction for a split to happen.
Scala default value:
1.0e-5
; Python default value:1.0E-5
Also available on the trained model.
- modelId
Destination id for this model; auto-generated if not specified.
Scala default value:
null
; Python default value:None
- mtries
Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrt{p} for classification and p/3 for regression (where p is the # of predictors.
Default value:
-1
Also available on the trained model.
- nbins
For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point.
Default value:
20
Also available on the trained model.
- nbinsCats
For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting.
Default value:
1024
Also available on the trained model.
- nbinsTopLevel
For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level.
Default value:
1024
Also available on the trained model.
- nfolds
Number of folds for K-fold cross-validation (0 to disable or >= 2).
Default value:
0
Also available on the trained model.
- ntrees
Number of trees.
Default value:
50
Also available on the trained model.
- offsetCol
Offset column. This will be added to the combination of columns before applying the link function.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- predictionCol
Prediction column name
Default value:
"prediction"
Also available on the trained model.
- sampleRate
Row sample rate per tree (from 0.0 to 1.0).
Default value:
0.632
Also available on the trained model.
- sampleRatePerClass
A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- scoreEachIteration
Whether to score during each iteration of model training.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- scoreTreeInterval
Score the model after every so many trees. Disabled if set to 0.
Default value:
0
Also available on the trained model.
- seed
Seed for pseudo random number generator (if applicable).
Scala default value:
-1L
; Python default value:-1
Also available on the trained model.
- splitRatio
Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.
Default value:
1.0
- stoppingMetric
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are
"AUTO"
,"deviance"
,"logloss"
,"MSE"
,"RMSE"
,"MAE"
,"RMSLE"
,"AUC"
,"AUCPR"
,"lift_top_group"
,"misclassification"
,"mean_per_class_error"
,"anomaly_score"
,"AUUC"
,"ATE"
,"ATT"
,"ATC"
,"qini"
,"custom"
,"custom_increasing"
.Default value:
"AUTO"
Also available on the trained model.
- stoppingRounds
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).
Default value:
0
Also available on the trained model.
- stoppingTolerance
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).
Default value:
0.001
Also available on the trained model.
- validationDataFrame
A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable!
Scala default value:
null
; Python default value:None
- weightCol
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- withContributions
Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- withLeafNodeAssignments
Enables or disables computation of leaf node assignments.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- withStageResults
Enables or disables computation of stage results.
Scala default value:
false
; Python default value:False
Also available on the trained model.