Parameters of H2OXGBoost¶
Affected Classes¶
ai.h2o.sparkling.ml.algos.H2OXGBoost
ai.h2o.sparkling.ml.algos.classification.H2OXGBoostClassifier
ai.h2o.sparkling.ml.algos.regression.H2OXGBoostRegressor
Parameters¶
Each parameter has also a corresponding getter and setter method. (E.g.:
label
->getLabel()
,setLabel(...)
)
- calibrationDataFrame
Calibration frame for Platt Scaling. To enable usage of the data frame, set the parameter calibrateModel to True.
Scala default value:
null
; Python default value:None
- ignoredCols
Names of columns to ignore for training.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- monotoneConstraints
A key must correspond to a feature name and value could be 1 or -1
Scala default value:
Map()
; Python default value:{}
Also available on the trained model.
- aucType
Set default multinomial AUC type. Possible values are
"AUTO"
,"NONE"
,"MACRO_OVR"
,"WEIGHTED_OVR"
,"MACRO_OVO"
,"WEIGHTED_OVO"
.Default value:
"AUTO"
Also available on the trained model.
- backend
Backend. By default (auto), a GPU is used if available. Possible values are
"auto"
,"gpu"
,"cpu"
.Default value:
"auto"
Also available on the trained model.
- booster
Booster type. Possible values are
"gbtree"
,"gblinear"
,"dart"
.Default value:
"gbtree"
Also available on the trained model.
- buildTreeOneNode
Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- calibrateModel
Use Platt Scaling (default) or Isotonic Regression to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- calibrationMethod
Calibration method to use. Possible values are
"AUTO"
,"PlattScaling"
,"IsotonicRegression"
.Default value:
"AUTO"
Also available on the trained model.
- categoricalEncoding
Encoding scheme for categorical features. Possible values are
"AUTO"
,"OneHotInternal"
,"OneHotExplicit"
,"Enum"
,"Binary"
,"Eigen"
,"LabelEncoder"
,"SortByResponse"
,"EnumLimited"
.Default value:
"AUTO"
Also available on the trained model.
- colSampleByLevel
(same as col_sample_rate) Column sample rate (from 0.0 to 1.0).
Default value:
1.0
Also available on the trained model.
- colSampleByNode
Column sample rate per tree node (from 0.0 to 1.0).
Default value:
1.0
Also available on the trained model.
- colSampleByTree
(same as col_sample_rate_per_tree) Column sample rate per tree (from 0.0 to 1.0).
Default value:
1.0
Also available on the trained model.
- colSampleRate
(same as colsample_bylevel) Column sample rate (from 0.0 to 1.0).
Default value:
1.0
Also available on the trained model.
- colSampleRatePerTree
(same as colsample_bytree) Column sample rate per tree (from 0.0 to 1.0).
Default value:
1.0
Also available on the trained model.
- columnsToCategorical
List of columns to convert to categorical before modelling
Scala default value:
Array()
; Python default value:[]
- convertInvalidNumbersToNa
If set to ‘true’, the model converts invalid numbers to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- convertUnknownCategoricalLevelsToNa
If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- dataFrameSerializer
A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.
Default value:
"ai.h2o.sparkling.utils.JSONDataFrameSerializer"
Also available on the trained model.
- detailedPredictionCol
Column containing additional prediction details, its content depends on the model type.
Default value:
"detailed_prediction"
Also available on the trained model.
- distribution
Distribution function. Possible values are
"AUTO"
,"bernoulli"
,"quasibinomial"
,"modified_huber"
,"multinomial"
,"ordinal"
,"gaussian"
,"poisson"
,"gamma"
,"tweedie"
,"huber"
,"laplace"
,"quantile"
,"fractionalbinomial"
,"negativebinomial"
,"custom"
.Default value:
"AUTO"
Also available on the trained model.
- dmatrixType
Type of DMatrix. For sparse, NAs and 0 are treated equally. Possible values are
"auto"
,"dense"
,"sparse"
.Default value:
"auto"
Also available on the trained model.
- eta
(same as learn_rate) Learning rate (from 0.0 to 1.0).
Default value:
0.3
Also available on the trained model.
- evalMetric
Specification of evaluation metric that will be passed to the native XGBoost backend.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- exportCheckpointsDir
Automatically export generated models to this directory.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- featuresCols
Name of feature columns
Scala default value:
Array()
; Python default value:[]
Also available on the trained model.
- foldAssignment
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. Possible values are
"AUTO"
,"Random"
,"Modulo"
,"Stratified"
.Default value:
"AUTO"
Also available on the trained model.
- foldCol
Column with cross-validation fold index assignment per observation.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- gainsliftBins
Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning.
Default value:
-1
Also available on the trained model.
- gamma
(same as min_split_improvement) Minimum relative improvement in squared error reduction for a split to happen.
Scala default value:
0.0f
; Python default value:0.0
Also available on the trained model.
- gpuId
Which GPU(s) to use. .
Scala default value:
null
; Python default value:None
Also available on the trained model.
- growPolicy
Grow policy - depthwise is standard GBM, lossguide is LightGBM. Possible values are
"depthwise"
,"lossguide"
.Default value:
"depthwise"
Also available on the trained model.
- ignoreConstCols
Ignore constant columns.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- interactionConstraints
A set of allowed column interactions.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- keepBinaryModels
If set to true, all binary models created during execution of the
fit
method will be kept in DKV of H2O-3 cluster.Scala default value:
false
; Python default value:False
- keepCrossValidationFoldAssignment
Whether to keep the cross-validation fold assignment.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- keepCrossValidationModels
Whether to keep the cross-validation models.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- keepCrossValidationPredictions
Whether to keep the predictions of the cross-validation models.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- labelCol
Response variable column.
Default value:
"label"
Also available on the trained model.
- learnRate
(same as eta) Learning rate (from 0.0 to 1.0).
Default value:
0.3
Also available on the trained model.
- maxAbsLeafnodePred
(same as max_delta_step) Maximum absolute value of a leaf node prediction.
Scala default value:
0.0f
; Python default value:0.0
Also available on the trained model.
- maxBins
For tree_method=hist only: maximum number of bins.
Default value:
256
Also available on the trained model.
- maxDeltaStep
(same as max_abs_leafnode_pred) Maximum absolute value of a leaf node prediction.
Scala default value:
0.0f
; Python default value:0.0
Also available on the trained model.
- maxDepth
Maximum tree depth (0 for unlimited).
Default value:
6
Also available on the trained model.
- maxLeaves
For tree_method=hist only: maximum number of leaves.
Default value:
0
Also available on the trained model.
- maxRuntimeSecs
Maximum allowed runtime in seconds for model training. Use 0 to disable.
Default value:
0.0
Also available on the trained model.
- minChildWeight
(same as min_rows) Fewest allowed (weighted) observations in a leaf.
Default value:
1.0
Also available on the trained model.
- minRows
(same as min_child_weight) Fewest allowed (weighted) observations in a leaf.
Default value:
1.0
Also available on the trained model.
- minSplitImprovement
(same as gamma) Minimum relative improvement in squared error reduction for a split to happen.
Scala default value:
0.0f
; Python default value:0.0
Also available on the trained model.
- modelId
Destination id for this model; auto-generated if not specified.
Scala default value:
null
; Python default value:None
- nfolds
Number of folds for K-fold cross-validation (0 to disable or >= 2).
Default value:
0
Also available on the trained model.
- normalizeType
For booster=dart only: normalize_type. Possible values are
"tree"
,"forest"
.Default value:
"tree"
Also available on the trained model.
- nthread
Number of parallel threads that can be used to run XGBoost. Cannot exceed H2O cluster limits (-nthreads parameter). Defaults to maximum available.
Default value:
-1
Also available on the trained model.
- ntrees
(same as n_estimators) Number of trees.
Default value:
50
Also available on the trained model.
- offsetCol
Offset column. This will be added to the combination of columns before applying the link function.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- oneDrop
For booster=dart only: one_drop.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- parallelizeCrossValidation
Allow parallel training of cross-validation models.
Scala default value:
true
; Python default value:True
- predictionCol
Prediction column name
Default value:
"prediction"
Also available on the trained model.
- quietMode
Enable quiet mode.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- rateDrop
For booster=dart only: rate_drop (0..1).
Scala default value:
0.0f
; Python default value:0.0
Also available on the trained model.
- regAlpha
L1 regularization.
Scala default value:
0.0f
; Python default value:0.0
Also available on the trained model.
- regLambda
L2 regularization.
Scala default value:
1.0f
; Python default value:1.0
Also available on the trained model.
- sampleRate
(same as subsample) Row sample rate per tree (from 0.0 to 1.0).
Default value:
1.0
Also available on the trained model.
- sampleType
For booster=dart only: sample_type. Possible values are
"uniform"
,"weighted"
.Default value:
"uniform"
Also available on the trained model.
- saveMatrixDirectory
Directory where to save matrices passed to XGBoost library. Useful for debugging.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- scalePosWeight
Controls the effect of observations with positive labels in relation to the observations with negative labels on gradient calculation. Useful for imbalanced problems.
Scala default value:
1.0f
; Python default value:1.0
Also available on the trained model.
- scoreEachIteration
Whether to score during each iteration of model training.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- scoreEvalMetricOnly
If enabled, score only the evaluation metric. This can make model training faster if scoring is frequent (eg. each iteration).
Scala default value:
false
; Python default value:False
Also available on the trained model.
- scoreTreeInterval
Score the model after every so many trees. Disabled if set to 0.
Default value:
0
Also available on the trained model.
- seed
Seed for pseudo random number generator (if applicable).
Scala default value:
-1L
; Python default value:-1
Also available on the trained model.
- skipDrop
For booster=dart only: skip_drop (0..1).
Scala default value:
0.0f
; Python default value:0.0
Also available on the trained model.
- splitRatio
Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.
Default value:
1.0
- stoppingMetric
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are
"AUTO"
,"deviance"
,"logloss"
,"MSE"
,"RMSE"
,"MAE"
,"RMSLE"
,"AUC"
,"AUCPR"
,"lift_top_group"
,"misclassification"
,"mean_per_class_error"
,"anomaly_score"
,"custom"
,"custom_increasing"
.Default value:
"AUTO"
Also available on the trained model.
- stoppingRounds
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).
Default value:
0
Also available on the trained model.
- stoppingTolerance
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).
Default value:
0.001
Also available on the trained model.
- subsample
(same as sample_rate) Row sample rate per tree (from 0.0 to 1.0).
Default value:
1.0
Also available on the trained model.
- treeMethod
Tree method. Possible values are
"auto"
,"exact"
,"approx"
,"hist"
.Default value:
"auto"
Also available on the trained model.
- tweediePower
Tweedie power for Tweedie regression, must be between 1 and 2.
Default value:
1.5
Also available on the trained model.
- validationDataFrame
A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable!
Scala default value:
null
; Python default value:None
- weightCol
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- withContributions
Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- withLeafNodeAssignments
Enables or disables computation of leaf node assignments.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- withStageResults
Enables or disables computation of stage results.
Scala default value:
false
; Python default value:False
Also available on the trained model.