.. _parameters_H2OXGBoost:

Parameters of H2OXGBoost
------------------------

Affected Classes
################

- ``ai.h2o.sparkling.ml.algos.H2OXGBoost``
- ``ai.h2o.sparkling.ml.algos.classification.H2OXGBoostClassifier``
- ``ai.h2o.sparkling.ml.algos.regression.H2OXGBoostRegressor``

Parameters
##########

- *Each parameter has also a corresponding getter and setter method.*
  *(E.g.:* ``label`` *->* ``getLabel()`` *,* ``setLabel(...)`` *)*

calibrationDataFrame
  Calibration frame for Platt Scaling. To enable usage of the data frame, set the parameter calibrateModel to True.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

ignoredCols
  Names of columns to ignore for training.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

monotoneConstraints
  A key must correspond to a feature name and value could be 1 or -1

  *Scala default value:* ``Map()`` *; Python default value:* ``{}``
  
  *Also available on the trained model.*

aucType
  Set default multinomial AUC type. Possible values are ``"AUTO"``, ``"NONE"``, ``"MACRO_OVR"``, ``"WEIGHTED_OVR"``, ``"MACRO_OVO"``, ``"WEIGHTED_OVO"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

backend
  Backend. By default (auto), a GPU is used if available. Possible values are ``"auto"``, ``"gpu"``, ``"cpu"``.

  *Default value:* ``"auto"``
  
  *Also available on the trained model.*

booster
  Booster type. Possible values are ``"gbtree"``, ``"gblinear"``, ``"dart"``.

  *Default value:* ``"gbtree"``
  
  *Also available on the trained model.*

buildTreeOneNode
  Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

calibrateModel
  Use Platt Scaling to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

categoricalEncoding
  Encoding scheme for categorical features. Possible values are ``"AUTO"``, ``"OneHotInternal"``, ``"OneHotExplicit"``, ``"Enum"``, ``"Binary"``, ``"Eigen"``, ``"LabelEncoder"``, ``"SortByResponse"``, ``"EnumLimited"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

colSampleByLevel
  (same as col_sample_rate) Column sample rate (from 0.0 to 1.0).

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

colSampleByNode
  Column sample rate per tree node (from 0.0 to 1.0).

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

colSampleByTree
  (same as col_sample_rate_per_tree) Column sample rate per tree (from 0.0 to 1.0).

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

colSampleRate
  (same as colsample_bylevel) Column sample rate (from 0.0 to 1.0).

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

colSampleRatePerTree
  (same as colsample_bytree) Column sample rate per tree (from 0.0 to 1.0).

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

columnsToCategorical
  List of columns to convert to categorical before modelling

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  

convertInvalidNumbersToNa
  If set to 'true', the model converts invalid numbers to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

convertUnknownCategoricalLevelsToNa
  If set to 'true', the model converts unknown categorical levels to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

dataFrameSerializer
  A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.

  *Default value:* ``"ai.h2o.sparkling.utils.JSONDataFrameSerializer"``
  
  *Also available on the trained model.*

detailedPredictionCol
  Column containing additional prediction details, its content depends on the model type.

  *Default value:* ``"detailed_prediction"``
  
  *Also available on the trained model.*

distribution
  Distribution function. Possible values are ``"AUTO"``, ``"bernoulli"``, ``"quasibinomial"``, ``"modified_huber"``, ``"multinomial"``, ``"ordinal"``, ``"gaussian"``, ``"poisson"``, ``"gamma"``, ``"tweedie"``, ``"huber"``, ``"laplace"``, ``"quantile"``, ``"fractionalbinomial"``, ``"negativebinomial"``, ``"custom"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

dmatrixType
  Type of DMatrix. For sparse, NAs and 0 are treated equally. Possible values are ``"auto"``, ``"dense"``, ``"sparse"``.

  *Default value:* ``"auto"``
  
  *Also available on the trained model.*

eta
  (same as learn_rate) Learning rate (from 0.0 to 1.0).

  *Default value:* ``0.3``
  
  *Also available on the trained model.*

exportCheckpointsDir
  Automatically export generated models to this directory.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

featuresCols
  Name of feature columns

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  
  *Also available on the trained model.*

foldAssignment
  Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems. Possible values are ``"AUTO"``, ``"Random"``, ``"Modulo"``, ``"Stratified"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

foldCol
  Column with cross-validation fold index assignment per observation.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

gainsliftBins
  Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning.

  *Default value:* ``-1``
  
  *Also available on the trained model.*

gamma
  (same as min_split_improvement) Minimum relative improvement in squared error reduction for a split to happen.

  *Scala default value:* ``0.0f`` *; Python default value:* ``0.0``
  
  *Also available on the trained model.*

gpuId
  Which GPU(s) to use. .

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

growPolicy
  Grow policy - depthwise is standard GBM, lossguide is LightGBM. Possible values are ``"depthwise"``, ``"lossguide"``.

  *Default value:* ``"depthwise"``
  
  *Also available on the trained model.*

ignoreConstCols
  Ignore constant columns.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

interactionConstraints
  A set of allowed column interactions.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

keepBinaryModels
  If set to true, all binary models created during execution of the ``fit`` method will be kept in DKV of H2O-3 cluster.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

keepCrossValidationFoldAssignment
  Whether to keep the cross-validation fold assignment.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

keepCrossValidationModels
  Whether to keep the cross-validation models.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

keepCrossValidationPredictions
  Whether to keep the predictions of the cross-validation models.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

labelCol
  Response variable column.

  *Default value:* ``"label"``
  
  *Also available on the trained model.*

learnRate
  (same as eta) Learning rate (from 0.0 to 1.0).

  *Default value:* ``0.3``
  
  *Also available on the trained model.*

maxAbsLeafnodePred
  (same as max_delta_step) Maximum absolute value of a leaf node prediction.

  *Scala default value:* ``0.0f`` *; Python default value:* ``0.0``
  
  *Also available on the trained model.*

maxBins
  For tree_method=hist only: maximum number of bins.

  *Default value:* ``256``
  
  *Also available on the trained model.*

maxDeltaStep
  (same as max_abs_leafnode_pred) Maximum absolute value of a leaf node prediction.

  *Scala default value:* ``0.0f`` *; Python default value:* ``0.0``
  
  *Also available on the trained model.*

maxDepth
  Maximum tree depth (0 for unlimited).

  *Default value:* ``6``
  
  *Also available on the trained model.*

maxLeaves
  For tree_method=hist only: maximum number of leaves.

  *Default value:* ``0``
  
  *Also available on the trained model.*

maxRuntimeSecs
  Maximum allowed runtime in seconds for model training. Use 0 to disable.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

minChildWeight
  (same as min_rows) Fewest allowed (weighted) observations in a leaf.

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

minRows
  (same as min_child_weight) Fewest allowed (weighted) observations in a leaf.

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

minSplitImprovement
  (same as gamma) Minimum relative improvement in squared error reduction for a split to happen.

  *Scala default value:* ``0.0f`` *; Python default value:* ``0.0``
  
  *Also available on the trained model.*

modelId
  Destination id for this model; auto-generated if not specified.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

namedMojoOutputColumns
  Mojo Output is not stored in the array but in the properly named columns

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

nfolds
  Number of folds for K-fold cross-validation (0 to disable or >= 2).

  *Default value:* ``0``
  
  *Also available on the trained model.*

normalizeType
  For booster=dart only: normalize_type. Possible values are ``"tree"``, ``"forest"``.

  *Default value:* ``"tree"``
  
  *Also available on the trained model.*

nthread
  Number of parallel threads that can be used to run XGBoost. Cannot exceed H2O cluster limits (-nthreads parameter). Defaults to maximum available.

  *Default value:* ``-1``
  
  *Also available on the trained model.*

ntrees
  (same as n_estimators) Number of trees.

  *Default value:* ``50``
  
  *Also available on the trained model.*

offsetCol
  Offset column. This will be added to the combination of columns before applying the link function.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

oneDrop
  For booster=dart only: one_drop.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

predictionCol
  Prediction column name

  *Default value:* ``"prediction"``
  
  *Also available on the trained model.*

quietMode
  Enable quiet mode.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

rateDrop
  For booster=dart only: rate_drop (0..1).

  *Scala default value:* ``0.0f`` *; Python default value:* ``0.0``
  
  *Also available on the trained model.*

regAlpha
  L1 regularization.

  *Scala default value:* ``0.0f`` *; Python default value:* ``0.0``
  
  *Also available on the trained model.*

regLambda
  L2 regularization.

  *Scala default value:* ``1.0f`` *; Python default value:* ``1.0``
  
  *Also available on the trained model.*

sampleRate
  (same as subsample) Row sample rate per tree (from 0.0 to 1.0).

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

sampleType
  For booster=dart only: sample_type. Possible values are ``"uniform"``, ``"weighted"``.

  *Default value:* ``"uniform"``
  
  *Also available on the trained model.*

saveMatrixDirectory
  Directory where to save matrices passed to XGBoost library. Useful for debugging.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

scalePosWeight
  Controls the effect of observations with positive labels in relation to the observations with negative labels on gradient calculation. Useful for imbalanced problems.

  *Scala default value:* ``1.0f`` *; Python default value:* ``1.0``
  
  *Also available on the trained model.*

scoreEachIteration
  Whether to score during each iteration of model training.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

scoreTreeInterval
  Score the model after every so many trees. Disabled if set to 0.

  *Default value:* ``0``
  
  *Also available on the trained model.*

seed
  Seed for pseudo random number generator (if applicable).

  *Scala default value:* ``-1L`` *; Python default value:* ``-1``
  
  *Also available on the trained model.*

skipDrop
  For booster=dart only: skip_drop (0..1).

  *Scala default value:* ``0.0f`` *; Python default value:* ``0.0``
  
  *Also available on the trained model.*

splitRatio
  Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

  *Default value:* ``1.0``
  

stoppingMetric
  Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anonomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are ``"AUTO"``, ``"deviance"``, ``"logloss"``, ``"MSE"``, ``"RMSE"``, ``"MAE"``, ``"RMSLE"``, ``"AUC"``, ``"AUCPR"``, ``"lift_top_group"``, ``"misclassification"``, ``"mean_per_class_error"``, ``"anomaly_score"``, ``"custom"``, ``"custom_increasing"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

stoppingRounds
  Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).

  *Default value:* ``0``
  
  *Also available on the trained model.*

stoppingTolerance
  Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).

  *Default value:* ``0.001``
  
  *Also available on the trained model.*

subsample
  (same as sample_rate) Row sample rate per tree (from 0.0 to 1.0).

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

treeMethod
  Tree method. Possible values are ``"auto"``, ``"exact"``, ``"approx"``, ``"hist"``.

  *Default value:* ``"auto"``
  
  *Also available on the trained model.*

tweediePower
  Tweedie power for Tweedie regression, must be between 1 and 2.

  *Default value:* ``1.5``
  
  *Also available on the trained model.*

validationDataFrame
  A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the 'splitRatio' parameter. The parameter is not serializable!

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

weightCol
  Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

withContributions
  Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

withLeafNodeAssignments
  Enables or disables computation of leaf node assignments.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

withStageResults
  Enables or disables computation of stage results.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*