Parameters of H2ODRF¶

Affected Classes¶

ai.h2o.sparkling.ml.algos.H2ODRF
ai.h2o.sparkling.ml.algos.classification.H2ODRFClassifier
ai.h2o.sparkling.ml.algos.regression.H2ODRFRegressor

Parameters¶

Each parameter has also a corresponding getter and setter method. (E.g.: label -> getLabel() , setLabel(...) )

calibrationDataFrame

Calibration frame for Platt Scaling. To enable usage of the data frame, set the parameter calibrateModel to True.

Scala default value: null ; Python default value: None

ignoredCols

Names of columns to ignore for training.

Scala default value: null ; Python default value: None

Also available on the trained model.

balanceClasses

Balance training data class counts via over/under-sampling (for imbalanced data).

Scala default value: false ; Python default value: False

Also available on the trained model.

binomialDoubleTrees

For binary classification: Build 2x as many trees (one per class) - can lead to higher accuracy.

Scala default value: false ; Python default value: False

Also available on the trained model.

buildTreeOneNode

Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.

Scala default value: false ; Python default value: False

Also available on the trained model.

calibrateModel

Use Platt Scaling to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities.

Scala default value: false ; Python default value: False

Also available on the trained model.

categoricalEncoding

Encoding scheme for categorical features. Possible values are "AUTO", "OneHotInternal", "OneHotExplicit", "Enum", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited".

Default value: "AUTO"

Also available on the trained model.

checkConstantResponse

Check if response column is constant. If enabled, then an exception is thrown if the response column is a constant value.If disabled, then model will train regardless of the response column being a constant value or not.

Scala default value: true ; Python default value: True

Also available on the trained model.

classSamplingFactors

Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.

Scala default value: null ; Python default value: None

Also available on the trained model.

colSampleRateChangePerLevel

Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0).

Default value: 1.0

Also available on the trained model.

colSampleRatePerTree

Column sample rate per tree (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

columnsToCategorical

List of columns to convert to categorical before modelling

Scala default value: Array() ; Python default value: []

convertInvalidNumbersToNa

If set to ‘true’, the model converts invalid numbers to NA during making predictions.

Scala default value: false ; Python default value: False

Also available on the trained model.

convertUnknownCategoricalLevelsToNa

If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.

Scala default value: false ; Python default value: False

Also available on the trained model.

customMetricFunc

Reference to custom evaluation function, format: language:keyName=funcName.

Scala default value: null ; Python default value: None

Also available on the trained model.

detailedPredictionCol

Column containing additional prediction details, its content depends on the model type.

Default value: "detailed_prediction"

Also available on the trained model.

distribution

Distribution function. Possible values are "AUTO", "bernoulli", "quasibinomial", "modified_huber", "multinomial", "ordinal", "gaussian", "poisson", "gamma", "tweedie", "huber", "laplace", "quantile", "fractionalbinomial", "negativebinomial", "custom".

Default value: "AUTO"

Also available on the trained model.

exportCheckpointsDir

Automatically export generated models to this directory.

Scala default value: null ; Python default value: None

Also available on the trained model.

featuresCols

Name of feature columns

Scala default value: Array() ; Python default value: []

Also available on the trained model.

foldAssignment

Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. Possible values are "AUTO", "Random", "Modulo", "Stratified".

Default value: "AUTO"

Also available on the trained model.

foldCol

Column with cross-validation fold index assignment per observation.

Scala default value: null ; Python default value: None

Also available on the trained model.

gainsliftBins

Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning.

Default value: -1

Also available on the trained model.

histogramType

What type of histogram to use for finding optimal split points. Possible values are "AUTO", "UniformAdaptive", "Random", "QuantilesGlobal", "RoundRobin".

Default value: "AUTO"

Also available on the trained model.

ignoreConstCols

Ignore constant columns.

Scala default value: true ; Python default value: True

Also available on the trained model.

keepCrossValidationFoldAssignment

Whether to keep the cross-validation fold assignment.

Scala default value: false ; Python default value: False

Also available on the trained model.

keepCrossValidationModels

Whether to keep the cross-validation models.

Scala default value: true ; Python default value: True

Also available on the trained model.

keepCrossValidationPredictions

Whether to keep the predictions of the cross-validation models.

Scala default value: false ; Python default value: False

Also available on the trained model.

labelCol

Response variable column.

Default value: "label"

Also available on the trained model.

maxAfterBalanceSize

Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.

Scala default value: 5.0f ; Python default value: 5.0

Also available on the trained model.

maxConfusionMatrixSize

[Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs.

Default value: 20

Also available on the trained model.

maxDepth

Maximum tree depth (0 for unlimited).

Default value: 20

Also available on the trained model.

maxRuntimeSecs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

Default value: 0.0

Also available on the trained model.

minRows

Fewest allowed (weighted) observations in a leaf.

Default value: 1.0

Also available on the trained model.

minSplitImprovement

Minimum relative improvement in squared error reduction for a split to happen.

Scala default value: 1.0e-5 ; Python default value: 1.0E-5

Also available on the trained model.

modelId

Destination id for this model; auto-generated if not specified.

Scala default value: null ; Python default value: None

mtries

Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrt{p} for classification and p/3 for regression (where p is the # of predictors.

Default value: -1

Also available on the trained model.

namedMojoOutputColumns

Mojo Output is not stored in the array but in the properly named columns

Scala default value: true ; Python default value: True

Also available on the trained model.

nbins

For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point.

Default value: 20

Also available on the trained model.

nbinsCats

For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting.

Default value: 1024

Also available on the trained model.

nbinsTopLevel

For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level.

Default value: 1024

Also available on the trained model.

nfolds

Number of folds for K-fold cross-validation (0 to disable or >= 2).

Default value: 0

Also available on the trained model.

ntrees

Number of trees.

Default value: 50

Also available on the trained model.

offsetCol

Offset column. This will be added to the combination of columns before applying the link function.

Scala default value: null ; Python default value: None

Also available on the trained model.

predictionCol

Prediction column name

Default value: "prediction"

Also available on the trained model.

sampleRate

Row sample rate per tree (from 0.0 to 1.0).

Default value: 0.632

Also available on the trained model.

sampleRatePerClass

A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree.

Scala default value: null ; Python default value: None

Also available on the trained model.

scoreEachIteration

Whether to score during each iteration of model training.

Scala default value: false ; Python default value: False

Also available on the trained model.

scoreTreeInterval

Score the model after every so many trees. Disabled if set to 0.

Default value: 0

Also available on the trained model.

seed

Seed for pseudo random number generator (if applicable).

Scala default value: -1L ; Python default value: -1

Also available on the trained model.

splitRatio

Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

Default value: 1.0

stoppingMetric

Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anonomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "AUCPR", "lift_top_group", "misclassification", "mean_per_class_error", "anomaly_score", "custom", "custom_increasing".

Default value: "AUTO"

Also available on the trained model.

stoppingRounds

Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).

Default value: 0

Also available on the trained model.

stoppingTolerance

Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).

Default value: 0.001

Also available on the trained model.

validationDataFrame

A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter.

Scala default value: null ; Python default value: None

weightCol

Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.

Scala default value: null ; Python default value: None

Also available on the trained model.

withContributions

Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values.

Scala default value: false ; Python default value: False

Also available on the trained model.

withLeafNodeAssignments

Enables or disables computation of leaf node assignments.

Scala default value: false ; Python default value: False

Also available on the trained model.

withStageResults

Enables or disables computation of stage results.

Scala default value: false ; Python default value: False

Also available on the trained model.