Parameters of H2OGBM¶
Affected Classes¶
ai.h2o.sparkling.ml.algos.H2OGBM
ai.h2o.sparkling.ml.algos.classification.H2OGBMClassifier
ai.h2o.sparkling.ml.algos.regression.H2OGBMRegressor
Parameters¶
Each parameter has also a corresponding getter and setter method. (E.g.:
label
->getLabel()
,setLabel(...)
)
- calibrationDataFrame
Calibration frame for Platt Scaling. To enable usage of the data frame, set the parameter calibrateModel to True.
Scala default value:
null
; Python default value:None
- ignoredCols
Names of columns to ignore for training.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- monotoneConstraints
A key must correspond to a feature name and value could be 1 or -1
Scala default value:
Map()
; Python default value:{}
Also available on the trained model.
- aucType
Set default multinomial AUC type. Possible values are
"AUTO"
,"NONE"
,"MACRO_OVR"
,"WEIGHTED_OVR"
,"MACRO_OVO"
,"WEIGHTED_OVO"
.Default value:
"AUTO"
Also available on the trained model.
- balanceClasses
Balance training data class counts via over/under-sampling (for imbalanced data).
Scala default value:
false
; Python default value:False
Also available on the trained model.
- buildTreeOneNode
Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- calibrateModel
Use Platt Scaling to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- categoricalEncoding
Encoding scheme for categorical features. Possible values are
"AUTO"
,"OneHotInternal"
,"OneHotExplicit"
,"Enum"
,"Binary"
,"Eigen"
,"LabelEncoder"
,"SortByResponse"
,"EnumLimited"
.Default value:
"AUTO"
Also available on the trained model.
- checkConstantResponse
Check if response column is constant. If enabled, then an exception is thrown if the response column is a constant value.If disabled, then model will train regardless of the response column being a constant value or not.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- classSamplingFactors
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- colSampleRate
Column sample rate (from 0.0 to 1.0).
Default value:
1.0
Also available on the trained model.
- colSampleRateChangePerLevel
Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0).
Default value:
1.0
Also available on the trained model.
- colSampleRatePerTree
Column sample rate per tree (from 0.0 to 1.0).
Default value:
1.0
Also available on the trained model.
- columnsToCategorical
List of columns to convert to categorical before modelling
Scala default value:
Array()
; Python default value:[]
- convertInvalidNumbersToNa
If set to ‘true’, the model converts invalid numbers to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- convertUnknownCategoricalLevelsToNa
If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- customDistributionFunc
Reference to custom distribution, format: language:keyName=funcName.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- customMetricFunc
Reference to custom evaluation function, format: language:keyName=funcName.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- detailedPredictionCol
Column containing additional prediction details, its content depends on the model type.
Default value:
"detailed_prediction"
Also available on the trained model.
- distribution
Distribution function. Possible values are
"AUTO"
,"bernoulli"
,"quasibinomial"
,"modified_huber"
,"multinomial"
,"ordinal"
,"gaussian"
,"poisson"
,"gamma"
,"tweedie"
,"huber"
,"laplace"
,"quantile"
,"fractionalbinomial"
,"negativebinomial"
,"custom"
.Default value:
"AUTO"
Also available on the trained model.
- exportCheckpointsDir
Automatically export generated models to this directory.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- featuresCols
Name of feature columns
Scala default value:
Array()
; Python default value:[]
Also available on the trained model.
- foldAssignment
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. Possible values are
"AUTO"
,"Random"
,"Modulo"
,"Stratified"
.Default value:
"AUTO"
Also available on the trained model.
- foldCol
Column with cross-validation fold index assignment per observation.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- gainsliftBins
Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning.
Default value:
-1
Also available on the trained model.
- histogramType
What type of histogram to use for finding optimal split points. Possible values are
"AUTO"
,"UniformAdaptive"
,"Random"
,"QuantilesGlobal"
,"RoundRobin"
.Default value:
"AUTO"
Also available on the trained model.
- huberAlpha
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
Default value:
0.9
Also available on the trained model.
- ignoreConstCols
Ignore constant columns.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- keepCrossValidationFoldAssignment
Whether to keep the cross-validation fold assignment.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- keepCrossValidationModels
Whether to keep the cross-validation models.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- keepCrossValidationPredictions
Whether to keep the predictions of the cross-validation models.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- labelCol
Response variable column.
Default value:
"label"
Also available on the trained model.
- learnRate
Learning rate (from 0.0 to 1.0).
Default value:
0.1
Also available on the trained model.
- learnRateAnnealing
Scale the learning rate by this factor after each tree (e.g., 0.99 or 0.999) .
Default value:
1.0
Also available on the trained model.
- maxAbsLeafnodePred
Maximum absolute value of a leaf node prediction.
Scala default value:
1.7976931348623157e308
; Python default value:1.7976931348623157E308
Also available on the trained model.
- maxAfterBalanceSize
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
Scala default value:
5.0f
; Python default value:5.0
Also available on the trained model.
- maxConfusionMatrixSize
[Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs.
Default value:
20
Also available on the trained model.
- maxDepth
Maximum tree depth (0 for unlimited).
Default value:
5
Also available on the trained model.
- maxRuntimeSecs
Maximum allowed runtime in seconds for model training. Use 0 to disable.
Default value:
0.0
Also available on the trained model.
- minRows
Fewest allowed (weighted) observations in a leaf.
Default value:
10.0
Also available on the trained model.
- minSplitImprovement
Minimum relative improvement in squared error reduction for a split to happen.
Scala default value:
1.0e-5
; Python default value:1.0E-5
Also available on the trained model.
- modelId
Destination id for this model; auto-generated if not specified.
Scala default value:
null
; Python default value:None
- namedMojoOutputColumns
Mojo Output is not stored in the array but in the properly named columns
Scala default value:
true
; Python default value:True
Also available on the trained model.
- nbins
For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point.
Default value:
20
Also available on the trained model.
- nbinsCats
For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting.
Default value:
1024
Also available on the trained model.
- nbinsTopLevel
For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level.
Default value:
1024
Also available on the trained model.
- nfolds
Number of folds for K-fold cross-validation (0 to disable or >= 2).
Default value:
0
Also available on the trained model.
- ntrees
Number of trees.
Default value:
50
Also available on the trained model.
- offsetCol
Offset column. This will be added to the combination of columns before applying the link function.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- predNoiseBandwidth
Bandwidth (sigma) of Gaussian multiplicative noise ~N(1,sigma) for tree node predictions.
Default value:
0.0
Also available on the trained model.
- predictionCol
Prediction column name
Default value:
"prediction"
Also available on the trained model.
- quantileAlpha
Desired quantile for Quantile regression, must be between 0 and 1.
Default value:
0.5
Also available on the trained model.
- sampleRate
Row sample rate per tree (from 0.0 to 1.0).
Default value:
1.0
Also available on the trained model.
- sampleRatePerClass
A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- scoreEachIteration
Whether to score during each iteration of model training.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- scoreTreeInterval
Score the model after every so many trees. Disabled if set to 0.
Default value:
0
Also available on the trained model.
- seed
Seed for pseudo random number generator (if applicable).
Scala default value:
-1L
; Python default value:-1
Also available on the trained model.
- splitRatio
Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.
Default value:
1.0
- stoppingMetric
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anonomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are
"AUTO"
,"deviance"
,"logloss"
,"MSE"
,"RMSE"
,"MAE"
,"RMSLE"
,"AUC"
,"AUCPR"
,"lift_top_group"
,"misclassification"
,"mean_per_class_error"
,"anomaly_score"
,"custom"
,"custom_increasing"
.Default value:
"AUTO"
Also available on the trained model.
- stoppingRounds
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).
Default value:
0
Also available on the trained model.
- stoppingTolerance
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).
Default value:
0.001
Also available on the trained model.
- tweediePower
Tweedie power for Tweedie regression, must be between 1 and 2.
Default value:
1.5
Also available on the trained model.
- validationDataFrame
A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter.
Scala default value:
null
; Python default value:None
- weightCol
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- withContributions
Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- withLeafNodeAssignments
Enables or disables computation of leaf node assignments.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- withStageResults
Enables or disables computation of stage results.
Scala default value:
false
; Python default value:False
Also available on the trained model.