Parameters of H2OXGBoost¶

Affected Classes¶

ai.h2o.sparkling.ml.algos.H2OXGBoost
ai.h2o.sparkling.ml.algos.classification.H2OXGBoostClassifier
ai.h2o.sparkling.ml.algos.regression.H2OXGBoostRegressor

Parameters¶

Each parameter has also a corresponding getter and setter method. (E.g.: label -> getLabel() , setLabel(...) )

calibrationDataFrame

Calibration frame for Platt Scaling. To enable usage of the data frame, set the parameter calibrateModel to True.

Scala default value: null ; Python default value: None

ignoredCols

Names of columns to ignore for training.

Scala default value: null ; Python default value: None

Also available on the trained model.

monotoneConstraints

A key must correspond to a feature name and value could be 1 or -1

Scala default value: Map() ; Python default value: {}

Also available on the trained model.

backend

Backend. By default (auto), a GPU is used if available. Possible values are "auto", "gpu", "cpu".

Default value: "auto"

Also available on the trained model.

booster

Booster type. Possible values are "gbtree", "gblinear", "dart".

Default value: "gbtree"

Also available on the trained model.

buildTreeOneNode

Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.

Scala default value: false ; Python default value: False

Also available on the trained model.

calibrateModel

Use Platt Scaling to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities.

Scala default value: false ; Python default value: False

Also available on the trained model.

categoricalEncoding

Encoding scheme for categorical features. Possible values are "AUTO", "OneHotInternal", "OneHotExplicit", "Enum", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited".

Default value: "AUTO"

Also available on the trained model.

colSampleByLevel

(same as col_sample_rate) Column sample rate (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

colSampleByNode

Column sample rate per tree node (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

colSampleByTree

(same as col_sample_rate_per_tree) Column sample rate per tree (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

colSampleRate

(same as colsample_bylevel) Column sample rate (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

colSampleRatePerTree

(same as colsample_bytree) Column sample rate per tree (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

columnsToCategorical

List of columns to convert to categorical before modelling

Scala default value: Array() ; Python default value: []

convertInvalidNumbersToNa

If set to ‘true’, the model converts invalid numbers to NA during making predictions.

Scala default value: false ; Python default value: False

Also available on the trained model.

convertUnknownCategoricalLevelsToNa

If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.

Scala default value: false ; Python default value: False

Also available on the trained model.

detailedPredictionCol

Column containing additional prediction details, its content depends on the model type.

Default value: "detailed_prediction"

Also available on the trained model.

distribution

Distribution function. Possible values are "AUTO", "bernoulli", "quasibinomial", "modified_huber", "multinomial", "ordinal", "gaussian", "poisson", "gamma", "tweedie", "huber", "laplace", "quantile", "fractionalbinomial", "negativebinomial", "custom".

Default value: "AUTO"

Also available on the trained model.

dmatrixType

Type of DMatrix. For sparse, NAs and 0 are treated equally. Possible values are "auto", "dense", "sparse".

Default value: "auto"

Also available on the trained model.

eta

(same as learn_rate) Learning rate (from 0.0 to 1.0).

Default value: 0.3

Also available on the trained model.

exportCheckpointsDir

Automatically export generated models to this directory.

Scala default value: null ; Python default value: None

Also available on the trained model.

featuresCols

Name of feature columns

Scala default value: Array() ; Python default value: []

Also available on the trained model.

foldAssignment

Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. Possible values are "AUTO", "Random", "Modulo", "Stratified".

Default value: "AUTO"

Also available on the trained model.

foldCol

Column with cross-validation fold index assignment per observation.

Scala default value: null ; Python default value: None

Also available on the trained model.

gainsliftBins

Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning.

Default value: -1

Also available on the trained model.

gamma

(same as min_split_improvement) Minimum relative improvement in squared error reduction for a split to happen.

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

gpuId

Which GPU to use. .

Default value: 0

Also available on the trained model.

growPolicy

Grow policy - depthwise is standard GBM, lossguide is LightGBM. Possible values are "depthwise", "lossguide".

Default value: "depthwise"

Also available on the trained model.

ignoreConstCols

Ignore constant columns.

Scala default value: true ; Python default value: True

Also available on the trained model.

interactionConstraints

A set of allowed column interactions.

Scala default value: null ; Python default value: None

Also available on the trained model.

keepCrossValidationFoldAssignment

Whether to keep the cross-validation fold assignment.

Scala default value: false ; Python default value: False

Also available on the trained model.

keepCrossValidationModels

Whether to keep the cross-validation models.

Scala default value: true ; Python default value: True

Also available on the trained model.

keepCrossValidationPredictions

Whether to keep the predictions of the cross-validation models.

Scala default value: false ; Python default value: False

Also available on the trained model.

labelCol

Response variable column.

Default value: "label"

Also available on the trained model.

learnRate

(same as eta) Learning rate (from 0.0 to 1.0).

Default value: 0.3

Also available on the trained model.

maxAbsLeafnodePred

(same as max_delta_step) Maximum absolute value of a leaf node prediction.

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

maxBins

For tree_method=hist only: maximum number of bins.

Default value: 256

Also available on the trained model.

maxDeltaStep

(same as max_abs_leafnode_pred) Maximum absolute value of a leaf node prediction.

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

maxDepth

Maximum tree depth (0 for unlimited).

Default value: 6

Also available on the trained model.

maxLeaves

For tree_method=hist only: maximum number of leaves.

Default value: 0

Also available on the trained model.

maxRuntimeSecs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

Default value: 0.0

Also available on the trained model.

minChildWeight

(same as min_rows) Fewest allowed (weighted) observations in a leaf.

Default value: 1.0

Also available on the trained model.

minRows

(same as min_child_weight) Fewest allowed (weighted) observations in a leaf.

Default value: 1.0

Also available on the trained model.

minSplitImprovement

(same as gamma) Minimum relative improvement in squared error reduction for a split to happen.

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

modelId

Destination id for this model; auto-generated if not specified.

Scala default value: null ; Python default value: None

namedMojoOutputColumns

Mojo Output is not stored in the array but in the properly named columns

Scala default value: true ; Python default value: True

Also available on the trained model.

nfolds

Number of folds for K-fold cross-validation (0 to disable or >= 2).

Default value: 0

Also available on the trained model.

normalizeType

For booster=dart only: normalize_type. Possible values are "tree", "forest".

Default value: "tree"

Also available on the trained model.

nthread

Number of parallel threads that can be used to run XGBoost. Cannot exceed H2O cluster limits (-nthreads parameter). Defaults to maximum available.

Default value: -1

Also available on the trained model.

ntrees

(same as n_estimators) Number of trees.

Default value: 50

Also available on the trained model.

offsetCol

Offset column. This will be added to the combination of columns before applying the link function.

Scala default value: null ; Python default value: None

Also available on the trained model.

oneDrop

For booster=dart only: one_drop.

Scala default value: false ; Python default value: False

Also available on the trained model.

predictionCol

Prediction column name

Default value: "prediction"

Also available on the trained model.

quietMode

Enable quiet mode.

Scala default value: true ; Python default value: True

Also available on the trained model.

rateDrop

For booster=dart only: rate_drop (0..1).

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

regAlpha

L1 regularization.

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

regLambda

L2 regularization.

Scala default value: 1.0f ; Python default value: 1.0

Also available on the trained model.

sampleRate

(same as subsample) Row sample rate per tree (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

sampleType

For booster=dart only: sample_type. Possible values are "uniform", "weighted".

Default value: "uniform"

Also available on the trained model.

saveMatrixDirectory

Directory where to save matrices passed to XGBoost library. Useful for debugging.

Scala default value: null ; Python default value: None

Also available on the trained model.

scoreEachIteration

Whether to score during each iteration of model training.

Scala default value: false ; Python default value: False

Also available on the trained model.

scoreTreeInterval

Score the model after every so many trees. Disabled if set to 0.

Default value: 0

Also available on the trained model.

seed

Seed for pseudo random number generator (if applicable).

Scala default value: -1L ; Python default value: -1

Also available on the trained model.

skipDrop

For booster=dart only: skip_drop (0..1).

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

splitRatio

Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

Default value: 1.0

stoppingMetric

Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anonomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "AUCPR", "lift_top_group", "misclassification", "mean_per_class_error", "anomaly_score", "custom", "custom_increasing".

Default value: "AUTO"

Also available on the trained model.

stoppingRounds

Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).

Default value: 0

Also available on the trained model.

stoppingTolerance

Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).

Default value: 0.001

Also available on the trained model.

subsample

(same as sample_rate) Row sample rate per tree (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

treeMethod

Tree method. Possible values are "auto", "exact", "approx", "hist".

Default value: "auto"

Also available on the trained model.

tweediePower

Tweedie power for Tweedie regression, must be between 1 and 2.

Default value: 1.5

Also available on the trained model.

validationDataFrame

A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter.

Scala default value: null ; Python default value: None

weightCol

Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.

Scala default value: null ; Python default value: None

Also available on the trained model.

withContributions

Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values.

Scala default value: false ; Python default value: False

Also available on the trained model.

withLeafNodeAssignments

Enables or disables computation of leaf node assignments.

Scala default value: false ; Python default value: False

Also available on the trained model.

withStageResults

Enables or disables computation of stage results.

Scala default value: false ; Python default value: False

Also available on the trained model.