.. _parameters_H2OGLM:

Parameters of H2OGLM
--------------------

Affected Classes
################

- ``ai.h2o.sparkling.ml.algos.H2OGLM``
- ``ai.h2o.sparkling.ml.algos.classification.H2OGLMClassifier``
- ``ai.h2o.sparkling.ml.algos.regression.H2OGLMRegressor``

Parameters
##########

- *Each parameter has also a corresponding getter and setter method.*
*(E.g.:* ``label`` *->* ``getLabel()`` *,* ``setLabel(...)`` *)*

HGLM
If set to true, will return HGLM model. Otherwise, normal GLM model will be returned.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

betaConstraints
Data frame of beta constraints enabling to set special conditions over the model coefficients.

*Scala default value:* ``null`` *; Python default value:* ``None``

ignoredCols
Names of columns to ignore for training.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

interactionPairs
A list of pairwise (first order) column interactions.

*Scala default value:* ``null`` *; Python default value:* ``None``

plugValues
A map containing values that will be used to impute missing values of the training/validation frame, use with conjunction missingValuesHandling = "PlugValues")

*Scala default value:* ``null`` *; Python default value:* ``None``

randomCols
Names of random columns for HGLM.

*Scala default value:* ``null`` *; Python default value:* ``None``

alphaValue
Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'L-BFGS'; 0.5 otherwise.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

aucType
Set default multinomial AUC type. Possible values are ``"AUTO"``, ``"NONE"``, ``"MACRO_OVR"``, ``"WEIGHTED_OVR"``, ``"MACRO_OVO"``, ``"WEIGHTED_OVO"``.

*Default value:* ``"AUTO"``

*Also available on the trained model.*

balanceClasses
Balance training data class counts via over/under-sampling (for imbalanced data).

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

betaEpsilon
Converge if beta changes less (using L-infinity norm) than beta esilon, ONLY applies to IRLSM solver .

*Scala default value:* ``1.0e-4`` *; Python default value:* ``1.0E-4``

*Also available on the trained model.*

buildNullModel
If set, will build a model with only the intercept. Default to false.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

calcLike
if true, will return likelihood function value.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

checkpoint
Model checkpoint to resume training with.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

classSamplingFactors
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

coldStart
Only applicable to multiple alpha/lambda values. If false, build the next model for next set of alpha/lambda values starting from the values provided by current model. If true will start GLM model from scratch.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

columnsToCategorical
List of columns to convert to categorical before modelling

*Scala default value:* ``Array()`` *; Python default value:* ``[]``

computePValues
Request p-values computation, p-values work only with IRLSM solver and no regularization.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

convertInvalidNumbersToNa
If set to 'true', the model converts invalid numbers to NA during making predictions.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

convertUnknownCategoricalLevelsToNa
If set to 'true', the model converts unknown categorical levels to NA during making predictions.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

customMetricFunc
Reference to custom evaluation function, format: `language:keyName=funcName`.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

dataFrameSerializer
A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.

*Default value:* ``"ai.h2o.sparkling.utils.JSONDataFrameSerializer"``

*Also available on the trained model.*

detailedPredictionCol
Column containing additional prediction details, its content depends on the model type.

*Default value:* ``"detailed_prediction"``

*Also available on the trained model.*

dispersionEpsilon
If changes in dispersion parameter estimation or loglikelihood value is smaller than dispersion_epsilon, will break out of the dispersion parameter estimation loop using maximum likelihood.

*Scala default value:* ``1.0e-4`` *; Python default value:* ``1.0E-4``

*Also available on the trained model.*

dispersionLearningRate
Dispersion learning rate is only valid for tweedie family dispersion parameter estimation using ml. It must be > 0. This controls how much the dispersion parameter estimate is to be changed when the calculated loglikelihood actually decreases with the new dispersion. In this case, instead of setting new dispersion = dispersion + change, we set new dispersion = dispersion + dispersion_learning_rate * change. Defaults to 0.5.

*Default value:* ``0.5``

*Also available on the trained model.*

dispersionParameterMethod
Method used to estimate the dispersion parameter for Tweedie, Gamma and Negative Binomial only. Possible values are ``"pearson"``, ``"ml"``, ``"deviance"``.

*Default value:* ``"pearson"``

*Also available on the trained model.*

earlyStopping
Stop early when there is no more relative improvement on train or validation (if provided).

*Scala default value:* ``true`` *; Python default value:* ``True``

*Also available on the trained model.*

exportCheckpointsDir
Automatically export generated models to this directory.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

family
Family. Use binomial for classification with logistic regression, others are for regression problems. Possible values are ``"AUTO"``, ``"gaussian"``, ``"binomial"``, ``"fractionalbinomial"``, ``"quasibinomial"``, ``"poisson"``, ``"gamma"``, ``"multinomial"``, ``"tweedie"``, ``"ordinal"``, ``"negativebinomial"``.

*Default value:* ``"AUTO"``

*Also available on the trained model.*

featuresCols
Name of feature columns

*Scala default value:* ``Array()`` *; Python default value:* ``[]``

*Also available on the trained model.*

fixDispersionParameter
Only used for Tweedie, Gamma and Negative Binomial GLM. If set, will use the dispsersion parameter in init_dispersion_parameter as the standard error and use it to calculate the p-values. Default to false.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

fixTweedieVariancePower
If true, will fix tweedie variance power value to the value set in tweedie_variance_power.

*Scala default value:* ``true`` *; Python default value:* ``True``

*Also available on the trained model.*

foldAssignment
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems. Possible values are ``"AUTO"``, ``"Random"``, ``"Modulo"``, ``"Stratified"``.

*Default value:* ``"AUTO"``

*Also available on the trained model.*

foldCol
Column with cross-validation fold index assignment per observation.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

generateScoringHistory
If set to true, will generate scoring history for GLM. This may significantly slow down the algo.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

generateVariableInflationFactors
if true, will generate variable inflation factors for numerical predictors. Default to false.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

gradientEpsilon
Converge if objective changes less (using L-infinity norm) than this, ONLY applies to L-BFGS solver. Default (of -1.0) indicates: If lambda_search is set to False and lambda is equal to zero, the default value of gradient_epsilon is equal to .000001, otherwise the default value is .0001. If lambda_search is set to True, the conditional values above are 1E-8 and 1E-6 respectively.

*Default value:* ``-1.0``

*Also available on the trained model.*

ignoreConstCols
Ignore constant columns.

*Scala default value:* ``true`` *; Python default value:* ``True``

*Also available on the trained model.*

influence
If set to dfbetas will calculate the difference in beta when a datarow is included and excluded in the dataset. Possible values are ``"dfbetas"``.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

initDispersionParameter
Only used for Tweedie, Gamma and Negative Binomial GLM. Store the initial value of dispersion parameter. If fix_dispersion_parameter is set, this value will be used in the calculation of p-values.Default to 1.0.

*Default value:* ``1.0``

*Also available on the trained model.*

interactions
A list of predictor column indices to interact. All pairwise combinations will be computed for the list.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

intercept
Include constant term in the model.

*Scala default value:* ``true`` *; Python default value:* ``True``

*Also available on the trained model.*

keepBinaryModels
If set to true, all binary models created during execution of the ``fit`` method will be kept in DKV of H2O-3 cluster.

*Scala default value:* ``false`` *; Python default value:* ``False``

keepCrossValidationFoldAssignment
Whether to keep the cross-validation fold assignment.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

keepCrossValidationModels
Whether to keep the cross-validation models.

*Scala default value:* ``true`` *; Python default value:* ``True``

*Also available on the trained model.*

keepCrossValidationPredictions
Whether to keep the predictions of the cross-validation models.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

labelCol
Response variable column.

*Default value:* ``"label"``

*Also available on the trained model.*

lambdaMinRatio
Minimum lambda used in lambda search, specified as a ratio of lambda_max (the smallest lambda that drives all coefficients to zero). Default indicates: if the number of observations is greater than the number of variables, then lambda_min_ratio is set to 0.0001; if the number of observations is less than the number of variables, then lambda_min_ratio is set to 0.01.

*Default value:* ``-1.0``

*Also available on the trained model.*

lambdaSearch
Use lambda search starting at lambda max, given lambda is then interpreted as lambda min.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

lambdaValue
Regularization strength.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

link
Link function. Possible values are ``"family_default"``, ``"identity"``, ``"logit"``, ``"log"``, ``"inverse"``, ``"tweedie"``, ``"multinomial"``, ``"ologit"``, ``"oprobit"``, ``"ologlog"``.

*Default value:* ``"family_default"``

*Also available on the trained model.*

maxActivePredictors
Maximum number of active predictors during computation. Use as a stopping criterion to prevent expensive model building with many predictors. Default indicates: If the IRLSM solver is used, the value of max_active_predictors is set to 5000 otherwise it is set to 100000000.

*Default value:* ``-1``

*Also available on the trained model.*

maxAfterBalanceSize
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.

*Scala default value:* ``5.0f`` *; Python default value:* ``5.0``

*Also available on the trained model.*

maxConfusionMatrixSize
[Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs.

*Default value:* ``20``

*Also available on the trained model.*

maxIterations
Maximum number of iterations.

*Default value:* ``-1``

*Also available on the trained model.*

maxIterationsDispersion
Control the maximum number of iterations in the dispersion parameter estimation loop using maximum likelihood.

*Default value:* ``3000``

*Also available on the trained model.*

maxRuntimeSecs
Maximum allowed runtime in seconds for model training. Use 0 to disable.

*Default value:* ``0.0``

*Also available on the trained model.*

missingValuesHandling
Handling of missing values. Either MeanImputation, Skip or PlugValues. Possible values are ``"MeanImputation"``, ``"PlugValues"``, ``"Skip"``.

*Default value:* ``"MeanImputation"``

*Also available on the trained model.*

modelId
Destination id for this model; auto-generated if not specified.

*Scala default value:* ``null`` *; Python default value:* ``None``

nfolds
Number of folds for K-fold cross-validation (0 to disable or >= 2).

*Default value:* ``0``

*Also available on the trained model.*

nlambdas
Number of lambdas to be used in a search. Default indicates: If alpha is zero, with lambda search set to True, the value of nlamdas is set to 30 (fewer lambdas are needed for ridge regression) otherwise it is set to 100.

*Default value:* ``-1``

*Also available on the trained model.*

nonNegative
Restrict coefficients (not intercept) to be non-negative.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

objReg
Likelihood divider in objective value computation, default (of -1.0) will set it to 1/nobs.

*Default value:* ``-1.0``

*Also available on the trained model.*

objectiveEpsilon
Converge if objective value changes less than this. Default (of -1.0) indicates: If lambda_search is set to True the value of objective_epsilon is set to .0001. If the lambda_search is set to False and lambda is equal to zero, the value of objective_epsilon is set to .000001, for any other value of lambda the default value of objective_epsilon is set to .0001.

*Default value:* ``-1.0``

*Also available on the trained model.*

offsetCol
Offset column. This will be added to the combination of columns before applying the link function.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

predictionCol
Prediction column name

*Default value:* ``"prediction"``

*Also available on the trained model.*

prior
Prior probability for y==1. To be used only for logistic regression iff the data has been sampled and the mean of response does not reflect reality.

*Default value:* ``-1.0``

*Also available on the trained model.*

randomFamily
Random Component Family array. One for each random component. Only support gaussian for now. Possible values are ``"AUTO"``, ``"gaussian"``, ``"binomial"``, ``"fractionalbinomial"``, ``"quasibinomial"``, ``"poisson"``, ``"gamma"``, ``"multinomial"``, ``"tweedie"``, ``"ordinal"``, ``"negativebinomial"``.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

randomLink
Link function array for random component in HGLM. Possible values are ``"family_default"``, ``"identity"``, ``"logit"``, ``"log"``, ``"inverse"``, ``"tweedie"``, ``"multinomial"``, ``"ologit"``, ``"oprobit"``, ``"ologlog"``.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

removeCollinearCols
In case of linearly dependent columns, remove some of the dependent columns.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

scoreEachIteration
Whether to score during each iteration of model training.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

scoreIterationInterval
Perform scoring for every score_iteration_interval iterations.

*Default value:* ``-1``

*Also available on the trained model.*

seed
Seed for pseudo random number generator (if applicable).

*Scala default value:* ``-1L`` *; Python default value:* ``-1``

*Also available on the trained model.*

solver
AUTO will set the solver based on given data and the other parameters. IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, L_BFGS scales better for datasets with many columns. Possible values are ``"AUTO"``, ``"IRLSM"``, ``"L_BFGS"``, ``"COORDINATE_DESCENT_NAIVE"``, ``"COORDINATE_DESCENT"``, ``"GRADIENT_DESCENT_LH"``, ``"GRADIENT_DESCENT_SQERR"``.

*Default value:* ``"AUTO"``

*Also available on the trained model.*

splitRatio
Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

*Default value:* ``1.0``

standardize
Standardize numeric columns to have zero mean and unit variance.

*Scala default value:* ``true`` *; Python default value:* ``True``

*Also available on the trained model.*

startval
double array to initialize fixed and random coefficients for HGLM, coefficients for GLM.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

stoppingMetric
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are ``"AUTO"``, ``"deviance"``, ``"logloss"``, ``"MSE"``, ``"RMSE"``, ``"MAE"``, ``"RMSLE"``, ``"AUC"``, ``"AUCPR"``, ``"lift_top_group"``, ``"misclassification"``, ``"mean_per_class_error"``, ``"anomaly_score"``, ``"custom"``, ``"custom_increasing"``.

*Default value:* ``"AUTO"``

*Also available on the trained model.*

stoppingRounds
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).

*Default value:* ``0``

*Also available on the trained model.*

stoppingTolerance
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).

*Default value:* ``0.001``

*Also available on the trained model.*

theta
Theta.

*Scala default value:* ``1.0e-10`` *; Python default value:* ``1.0E-10``

*Also available on the trained model.*

tweedieEpsilon
In estimating tweedie dispersion parameter using maximum likelihood, this is used to choose the lower and upper indices in the approximating of the infinite series summation.

*Scala default value:* ``8.0e-17`` *; Python default value:* ``8.0E-17``

*Also available on the trained model.*

tweedieLinkPower
Tweedie link power.

*Default value:* ``1.0``

*Also available on the trained model.*

tweedieVariancePower
Tweedie variance power.

*Default value:* ``0.0``

*Also available on the trained model.*

validationDataFrame
A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the 'splitRatio' parameter. The parameter is not serializable!

*Scala default value:* ``null`` *; Python default value:* ``None``

weightCol
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.

*Scala default value:* ``null`` *; Python default value:* ``None``

*Also available on the trained model.*

withContributions
Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

withLeafNodeAssignments
Enables or disables computation of leaf node assignments.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*

withStageResults
Enables or disables computation of stage results.

*Scala default value:* ``false`` *; Python default value:* ``False``

*Also available on the trained model.*