Parameters of H2OXGBoost

Affected Classes

  • ai.h2o.sparkling.ml.algos.H2OXGBoost

  • ai.h2o.sparkling.ml.algos.classification.H2OXGBoostClassifier

  • ai.h2o.sparkling.ml.algos.regression.H2OXGBoostRegressor

Parameters

  • Each parameter has also a corresponding getter and setter method. (E.g.: label -> getLabel() , setLabel(...) )

calibrationDataFrame

Calibration frame for Platt Scaling. To enable usage of the data frame, set the parameter calibrateModel to True.

Scala default value: null ; Python default value: None

ignoredCols

Names of columns to ignore for training.

Scala default value: null ; Python default value: None

Also available on the trained model.

monotoneConstraints

A key must correspond to a feature name and value could be 1 or -1

Scala default value: Map() ; Python default value: {}

Also available on the trained model.

aucType

Set default multinomial AUC type. Possible values are "AUTO", "NONE", "MACRO_OVR", "WEIGHTED_OVR", "MACRO_OVO", "WEIGHTED_OVO".

Default value: "AUTO"

Also available on the trained model.

backend

Backend. By default (auto), a GPU is used if available. Possible values are "auto", "gpu", "cpu".

Default value: "auto"

Also available on the trained model.

booster

Booster type. Possible values are "gbtree", "gblinear", "dart".

Default value: "gbtree"

Also available on the trained model.

buildTreeOneNode

Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.

Scala default value: false ; Python default value: False

Also available on the trained model.

calibrateModel

Use Platt Scaling (default) or Isotonic Regression to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities.

Scala default value: false ; Python default value: False

Also available on the trained model.

calibrationMethod

Calibration method to use. Possible values are "AUTO", "PlattScaling", "IsotonicRegression".

Default value: "AUTO"

Also available on the trained model.

categoricalEncoding

Encoding scheme for categorical features. Possible values are "AUTO", "OneHotInternal", "OneHotExplicit", "Enum", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited".

Default value: "AUTO"

Also available on the trained model.

colSampleByLevel

(same as col_sample_rate) Column sample rate (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

colSampleByNode

Column sample rate per tree node (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

colSampleByTree

(same as col_sample_rate_per_tree) Column sample rate per tree (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

colSampleRate

(same as colsample_bylevel) Column sample rate (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

colSampleRatePerTree

(same as colsample_bytree) Column sample rate per tree (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

columnsToCategorical

List of columns to convert to categorical before modelling

Scala default value: Array() ; Python default value: []

convertInvalidNumbersToNa

If set to ‘true’, the model converts invalid numbers to NA during making predictions.

Scala default value: false ; Python default value: False

Also available on the trained model.

convertUnknownCategoricalLevelsToNa

If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.

Scala default value: false ; Python default value: False

Also available on the trained model.

dataFrameSerializer

A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.

Default value: "ai.h2o.sparkling.utils.JSONDataFrameSerializer"

Also available on the trained model.

detailedPredictionCol

Column containing additional prediction details, its content depends on the model type.

Default value: "detailed_prediction"

Also available on the trained model.

distribution

Distribution function. Possible values are "AUTO", "bernoulli", "quasibinomial", "modified_huber", "multinomial", "ordinal", "gaussian", "poisson", "gamma", "tweedie", "huber", "laplace", "quantile", "fractionalbinomial", "negativebinomial", "custom".

Default value: "AUTO"

Also available on the trained model.

dmatrixType

Type of DMatrix. For sparse, NAs and 0 are treated equally. Possible values are "auto", "dense", "sparse".

Default value: "auto"

Also available on the trained model.

eta

(same as learn_rate) Learning rate (from 0.0 to 1.0).

Default value: 0.3

Also available on the trained model.

exportCheckpointsDir

Automatically export generated models to this directory.

Scala default value: null ; Python default value: None

Also available on the trained model.

featuresCols

Name of feature columns

Scala default value: Array() ; Python default value: []

Also available on the trained model.

foldAssignment

Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. Possible values are "AUTO", "Random", "Modulo", "Stratified".

Default value: "AUTO"

Also available on the trained model.

foldCol

Column with cross-validation fold index assignment per observation.

Scala default value: null ; Python default value: None

Also available on the trained model.

gainsliftBins

Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning.

Default value: -1

Also available on the trained model.

gamma

(same as min_split_improvement) Minimum relative improvement in squared error reduction for a split to happen.

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

gpuId

Which GPU(s) to use. .

Scala default value: null ; Python default value: None

Also available on the trained model.

growPolicy

Grow policy - depthwise is standard GBM, lossguide is LightGBM. Possible values are "depthwise", "lossguide".

Default value: "depthwise"

Also available on the trained model.

ignoreConstCols

Ignore constant columns.

Scala default value: true ; Python default value: True

Also available on the trained model.

interactionConstraints

A set of allowed column interactions.

Scala default value: null ; Python default value: None

Also available on the trained model.

keepBinaryModels

If set to true, all binary models created during execution of the fit method will be kept in DKV of H2O-3 cluster.

Scala default value: false ; Python default value: False

keepCrossValidationFoldAssignment

Whether to keep the cross-validation fold assignment.

Scala default value: false ; Python default value: False

Also available on the trained model.

keepCrossValidationModels

Whether to keep the cross-validation models.

Scala default value: true ; Python default value: True

Also available on the trained model.

keepCrossValidationPredictions

Whether to keep the predictions of the cross-validation models.

Scala default value: false ; Python default value: False

Also available on the trained model.

labelCol

Response variable column.

Default value: "label"

Also available on the trained model.

learnRate

(same as eta) Learning rate (from 0.0 to 1.0).

Default value: 0.3

Also available on the trained model.

maxAbsLeafnodePred

(same as max_delta_step) Maximum absolute value of a leaf node prediction.

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

maxBins

For tree_method=hist only: maximum number of bins.

Default value: 256

Also available on the trained model.

maxDeltaStep

(same as max_abs_leafnode_pred) Maximum absolute value of a leaf node prediction.

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

maxDepth

Maximum tree depth (0 for unlimited).

Default value: 6

Also available on the trained model.

maxLeaves

For tree_method=hist only: maximum number of leaves.

Default value: 0

Also available on the trained model.

maxRuntimeSecs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

Default value: 0.0

Also available on the trained model.

minChildWeight

(same as min_rows) Fewest allowed (weighted) observations in a leaf.

Default value: 1.0

Also available on the trained model.

minRows

(same as min_child_weight) Fewest allowed (weighted) observations in a leaf.

Default value: 1.0

Also available on the trained model.

minSplitImprovement

(same as gamma) Minimum relative improvement in squared error reduction for a split to happen.

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

modelId

Destination id for this model; auto-generated if not specified.

Scala default value: null ; Python default value: None

namedMojoOutputColumns

Mojo Output is not stored in the array but in the properly named columns

Scala default value: true ; Python default value: True

Also available on the trained model.

nfolds

Number of folds for K-fold cross-validation (0 to disable or >= 2).

Default value: 0

Also available on the trained model.

normalizeType

For booster=dart only: normalize_type. Possible values are "tree", "forest".

Default value: "tree"

Also available on the trained model.

nthread

Number of parallel threads that can be used to run XGBoost. Cannot exceed H2O cluster limits (-nthreads parameter). Defaults to maximum available.

Default value: -1

Also available on the trained model.

ntrees

(same as n_estimators) Number of trees.

Default value: 50

Also available on the trained model.

offsetCol

Offset column. This will be added to the combination of columns before applying the link function.

Scala default value: null ; Python default value: None

Also available on the trained model.

oneDrop

For booster=dart only: one_drop.

Scala default value: false ; Python default value: False

Also available on the trained model.

predictionCol

Prediction column name

Default value: "prediction"

Also available on the trained model.

quietMode

Enable quiet mode.

Scala default value: true ; Python default value: True

Also available on the trained model.

rateDrop

For booster=dart only: rate_drop (0..1).

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

regAlpha

L1 regularization.

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

regLambda

L2 regularization.

Scala default value: 1.0f ; Python default value: 1.0

Also available on the trained model.

sampleRate

(same as subsample) Row sample rate per tree (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

sampleType

For booster=dart only: sample_type. Possible values are "uniform", "weighted".

Default value: "uniform"

Also available on the trained model.

saveMatrixDirectory

Directory where to save matrices passed to XGBoost library. Useful for debugging.

Scala default value: null ; Python default value: None

Also available on the trained model.

scalePosWeight

Controls the effect of observations with positive labels in relation to the observations with negative labels on gradient calculation. Useful for imbalanced problems.

Scala default value: 1.0f ; Python default value: 1.0

Also available on the trained model.

scoreEachIteration

Whether to score during each iteration of model training.

Scala default value: false ; Python default value: False

Also available on the trained model.

scoreTreeInterval

Score the model after every so many trees. Disabled if set to 0.

Default value: 0

Also available on the trained model.

seed

Seed for pseudo random number generator (if applicable).

Scala default value: -1L ; Python default value: -1

Also available on the trained model.

skipDrop

For booster=dart only: skip_drop (0..1).

Scala default value: 0.0f ; Python default value: 0.0

Also available on the trained model.

splitRatio

Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

Default value: 1.0

stoppingMetric

Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anonomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "AUCPR", "lift_top_group", "misclassification", "mean_per_class_error", "anomaly_score", "custom", "custom_increasing".

Default value: "AUTO"

Also available on the trained model.

stoppingRounds

Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).

Default value: 0

Also available on the trained model.

stoppingTolerance

Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).

Default value: 0.001

Also available on the trained model.

subsample

(same as sample_rate) Row sample rate per tree (from 0.0 to 1.0).

Default value: 1.0

Also available on the trained model.

treeMethod

Tree method. Possible values are "auto", "exact", "approx", "hist".

Default value: "auto"

Also available on the trained model.

tweediePower

Tweedie power for Tweedie regression, must be between 1 and 2.

Default value: 1.5

Also available on the trained model.

validationDataFrame

A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable!

Scala default value: null ; Python default value: None

weightCol

Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.

Scala default value: null ; Python default value: None

Also available on the trained model.

withContributions

Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features.

Scala default value: false ; Python default value: False

Also available on the trained model.

withLeafNodeAssignments

Enables or disables computation of leaf node assignments.

Scala default value: false ; Python default value: False

Also available on the trained model.

withStageResults

Enables or disables computation of stage results.

Scala default value: false ; Python default value: False

Also available on the trained model.