.. _parameters_H2OGLM:

Parameters of H2OGLM
--------------------

Affected Classes
################

- ``ai.h2o.sparkling.ml.algos.H2OGLM``
- ``ai.h2o.sparkling.ml.algos.classification.H2OGLMClassifier``
- ``ai.h2o.sparkling.ml.algos.regression.H2OGLMRegressor``

Parameters
##########

- *Each parameter has also a corresponding getter and setter method.*
  *(E.g.:* ``label`` *->* ``getLabel()`` *,* ``setLabel(...)`` *)*

HGLM
  If set to true, will return HGLM model.  Otherwise, normal GLM model will be returned.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

betaConstraints
  Data frame of beta constraints enabling to set special conditions over the model coefficients.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

ignoredCols
  Names of columns to ignore for training.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

interactionPairs
  A list of pairwise (first order) column interactions.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

plugValues
  A map containing values that will be used to impute missing values of the training/validation frame, use with conjunction missingValuesHandling = "PlugValues")

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

randomCols
  Names of random columns for HGLM.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

alphaValue
  Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'L-BFGS'; 0.5 otherwise.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

aucType
  Set default multinomial AUC type. Possible values are ``"AUTO"``, ``"NONE"``, ``"MACRO_OVR"``, ``"WEIGHTED_OVR"``, ``"MACRO_OVO"``, ``"WEIGHTED_OVO"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

balanceClasses
  Balance training data class counts via over/under-sampling (for imbalanced data).

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

betaEpsilon
  Converge if  beta changes less (using L-infinity norm) than beta esilon, ONLY applies to IRLSM solver .

  *Scala default value:* ``1.0e-4`` *; Python default value:* ``1.0E-4``
  
  *Also available on the trained model.*

buildNullModel
  If set, will build a model with only the intercept.  Default to false.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

calcLike
  if true, will return likelihood function value.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

checkpoint
  Model checkpoint to resume training with.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

classSamplingFactors
  Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

coldStart
  Only applicable to multiple alpha/lambda values.  If false, build the next model for next set of alpha/lambda values starting from the values provided by current model.  If true will start GLM model from scratch.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

columnsToCategorical
  List of columns to convert to categorical before modelling

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  

computePValues
  Request p-values computation, p-values work only with IRLSM solver and no regularization.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

convertInvalidNumbersToNa
  If set to 'true', the model converts invalid numbers to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

convertUnknownCategoricalLevelsToNa
  If set to 'true', the model converts unknown categorical levels to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

customMetricFunc
  Reference to custom evaluation function, format: `language:keyName=funcName`.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

dataFrameSerializer
  A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.

  *Default value:* ``"ai.h2o.sparkling.utils.JSONDataFrameSerializer"``
  
  *Also available on the trained model.*

detailedPredictionCol
  Column containing additional prediction details, its content depends on the model type.

  *Default value:* ``"detailed_prediction"``
  
  *Also available on the trained model.*

dispersionEpsilon
  If changes in dispersion parameter estimation or loglikelihood value is smaller than dispersion_epsilon, will break out of the dispersion parameter estimation loop using maximum likelihood.

  *Scala default value:* ``1.0e-4`` *; Python default value:* ``1.0E-4``
  
  *Also available on the trained model.*

dispersionLearningRate
  Dispersion learning rate is only valid for tweedie family dispersion parameter estimation using ml. It must be > 0.  This controls how much the dispersion parameter estimate is to be changed when the calculated loglikelihood actually decreases with the new dispersion.  In this case, instead of setting new dispersion = dispersion + change, we set new dispersion = dispersion + dispersion_learning_rate * change. Defaults to 0.5.

  *Default value:* ``0.5``
  
  *Also available on the trained model.*

dispersionParameterMethod
  Method used to estimate the dispersion parameter for Tweedie, Gamma and Negative Binomial only. Possible values are ``"pearson"``, ``"ml"``, ``"deviance"``.

  *Default value:* ``"pearson"``
  
  *Also available on the trained model.*

earlyStopping
  Stop early when there is no more relative improvement on train or validation (if provided).

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

exportCheckpointsDir
  Automatically export generated models to this directory.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

family
  Family. Use binomial for classification with logistic regression, others are for regression problems. Possible values are ``"AUTO"``, ``"gaussian"``, ``"binomial"``, ``"fractionalbinomial"``, ``"quasibinomial"``, ``"poisson"``, ``"gamma"``, ``"multinomial"``, ``"tweedie"``, ``"ordinal"``, ``"negativebinomial"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

featuresCols
  Name of feature columns

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  
  *Also available on the trained model.*

fixDispersionParameter
  Only used for Tweedie, Gamma and Negative Binomial GLM.  If set, will use the dispsersion parameter in init_dispersion_parameter as the standard error and use it to calculate the p-values. Default to false.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

fixTweedieVariancePower
  If true, will fix tweedie variance power value to the value set in tweedie_variance_power.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

foldAssignment
  Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems. Possible values are ``"AUTO"``, ``"Random"``, ``"Modulo"``, ``"Stratified"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

foldCol
  Column with cross-validation fold index assignment per observation.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

generateScoringHistory
  If set to true, will generate scoring history for GLM.  This may significantly slow down the algo.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

generateVariableInflationFactors
  if true, will generate variable inflation factors for numerical predictors.  Default to false.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

gradientEpsilon
  Converge if  objective changes less (using L-infinity norm) than this, ONLY applies to L-BFGS solver. Default (of -1.0) indicates: If lambda_search is set to False and lambda is equal to zero, the default value of gradient_epsilon is equal to .000001, otherwise the default value is .0001. If lambda_search is set to True, the conditional values above are 1E-8 and 1E-6 respectively.

  *Default value:* ``-1.0``
  
  *Also available on the trained model.*

ignoreConstCols
  Ignore constant columns.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

influence
  If set to dfbetas will calculate the difference in beta when a datarow is included and excluded in the dataset. Possible values are ``"dfbetas"``.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

initDispersionParameter
  Only used for Tweedie, Gamma and Negative Binomial GLM.  Store the initial value of dispersion parameter.  If fix_dispersion_parameter is set, this value will be used in the calculation of p-values.Default to 1.0.

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

interactions
  A list of predictor column indices to interact. All pairwise combinations will be computed for the list.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

intercept
  Include constant term in the model.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

keepBinaryModels
  If set to true, all binary models created during execution of the ``fit`` method will be kept in DKV of H2O-3 cluster.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

keepCrossValidationFoldAssignment
  Whether to keep the cross-validation fold assignment.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

keepCrossValidationModels
  Whether to keep the cross-validation models.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

keepCrossValidationPredictions
  Whether to keep the predictions of the cross-validation models.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

labelCol
  Response variable column.

  *Default value:* ``"label"``
  
  *Also available on the trained model.*

lambdaMinRatio
  Minimum lambda used in lambda search, specified as a ratio of lambda_max (the smallest lambda that drives all coefficients to zero). Default indicates: if the number of observations is greater than the number of variables, then lambda_min_ratio is set to 0.0001; if the number of observations is less than the number of variables, then lambda_min_ratio is set to 0.01.

  *Default value:* ``-1.0``
  
  *Also available on the trained model.*

lambdaSearch
  Use lambda search starting at lambda max, given lambda is then interpreted as lambda min.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

lambdaValue
  Regularization strength.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

link
  Link function. Possible values are ``"family_default"``, ``"identity"``, ``"logit"``, ``"log"``, ``"inverse"``, ``"tweedie"``, ``"multinomial"``, ``"ologit"``, ``"oprobit"``, ``"ologlog"``.

  *Default value:* ``"family_default"``
  
  *Also available on the trained model.*

maxActivePredictors
  Maximum number of active predictors during computation. Use as a stopping criterion to prevent expensive model building with many predictors. Default indicates: If the IRLSM solver is used, the value of max_active_predictors is set to 5000 otherwise it is set to 100000000.

  *Default value:* ``-1``
  
  *Also available on the trained model.*

maxAfterBalanceSize
  Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.

  *Scala default value:* ``5.0f`` *; Python default value:* ``5.0``
  
  *Also available on the trained model.*

maxConfusionMatrixSize
  [Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs.

  *Default value:* ``20``
  
  *Also available on the trained model.*

maxIterations
  Maximum number of iterations.

  *Default value:* ``-1``
  
  *Also available on the trained model.*

maxIterationsDispersion
  Control the maximum number of iterations in the dispersion parameter estimation loop using maximum likelihood.

  *Default value:* ``3000``
  
  *Also available on the trained model.*

maxRuntimeSecs
  Maximum allowed runtime in seconds for model training. Use 0 to disable.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

missingValuesHandling
  Handling of missing values. Either MeanImputation, Skip or PlugValues. Possible values are ``"MeanImputation"``, ``"PlugValues"``, ``"Skip"``.

  *Default value:* ``"MeanImputation"``
  
  *Also available on the trained model.*

modelId
  Destination id for this model; auto-generated if not specified.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

nfolds
  Number of folds for K-fold cross-validation (0 to disable or >= 2).

  *Default value:* ``0``
  
  *Also available on the trained model.*

nlambdas
  Number of lambdas to be used in a search. Default indicates: If alpha is zero, with lambda search set to True, the value of nlamdas is set to 30 (fewer lambdas are needed for ridge regression) otherwise it is set to 100.

  *Default value:* ``-1``
  
  *Also available on the trained model.*

nonNegative
  Restrict coefficients (not intercept) to be non-negative.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

objReg
  Likelihood divider in objective value computation, default (of -1.0) will set it to 1/nobs.

  *Default value:* ``-1.0``
  
  *Also available on the trained model.*

objectiveEpsilon
  Converge if  objective value changes less than this. Default (of -1.0) indicates: If lambda_search is set to True the value of objective_epsilon is set to .0001. If the lambda_search is set to False and lambda is equal to zero, the value of objective_epsilon is set to .000001, for any other value of lambda the default value of objective_epsilon is set to .0001.

  *Default value:* ``-1.0``
  
  *Also available on the trained model.*

offsetCol
  Offset column. This will be added to the combination of columns before applying the link function.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

predictionCol
  Prediction column name

  *Default value:* ``"prediction"``
  
  *Also available on the trained model.*

prior
  Prior probability for y==1. To be used only for logistic regression iff the data has been sampled and the mean of response does not reflect reality.

  *Default value:* ``-1.0``
  
  *Also available on the trained model.*

randomFamily
  Random Component Family array.  One for each random component. Only support gaussian for now. Possible values are ``"AUTO"``, ``"gaussian"``, ``"binomial"``, ``"fractionalbinomial"``, ``"quasibinomial"``, ``"poisson"``, ``"gamma"``, ``"multinomial"``, ``"tweedie"``, ``"ordinal"``, ``"negativebinomial"``.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

randomLink
  Link function array for random component in HGLM. Possible values are ``"family_default"``, ``"identity"``, ``"logit"``, ``"log"``, ``"inverse"``, ``"tweedie"``, ``"multinomial"``, ``"ologit"``, ``"oprobit"``, ``"ologlog"``.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

removeCollinearCols
  In case of linearly dependent columns, remove some of the dependent columns.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

scoreEachIteration
  Whether to score during each iteration of model training.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

scoreIterationInterval
  Perform scoring for every score_iteration_interval iterations.

  *Default value:* ``-1``
  
  *Also available on the trained model.*

seed
  Seed for pseudo random number generator (if applicable).

  *Scala default value:* ``-1L`` *; Python default value:* ``-1``
  
  *Also available on the trained model.*

solver
  AUTO will set the solver based on given data and the other parameters. IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, L_BFGS scales better for datasets with many columns. Possible values are ``"AUTO"``, ``"IRLSM"``, ``"L_BFGS"``, ``"COORDINATE_DESCENT_NAIVE"``, ``"COORDINATE_DESCENT"``, ``"GRADIENT_DESCENT_LH"``, ``"GRADIENT_DESCENT_SQERR"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

splitRatio
  Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

  *Default value:* ``1.0``
  

standardize
  Standardize numeric columns to have zero mean and unit variance.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

startval
  double array to initialize fixed and random coefficients for HGLM, coefficients for GLM.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

stoppingMetric
  Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are ``"AUTO"``, ``"deviance"``, ``"logloss"``, ``"MSE"``, ``"RMSE"``, ``"MAE"``, ``"RMSLE"``, ``"AUC"``, ``"AUCPR"``, ``"lift_top_group"``, ``"misclassification"``, ``"mean_per_class_error"``, ``"anomaly_score"``, ``"custom"``, ``"custom_increasing"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

stoppingRounds
  Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).

  *Default value:* ``0``
  
  *Also available on the trained model.*

stoppingTolerance
  Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).

  *Default value:* ``0.001``
  
  *Also available on the trained model.*

theta
  Theta.

  *Scala default value:* ``1.0e-10`` *; Python default value:* ``1.0E-10``
  
  *Also available on the trained model.*

tweedieEpsilon
  In estimating tweedie dispersion parameter using maximum likelihood, this is used to choose the lower and upper indices in the approximating of the infinite series summation.

  *Scala default value:* ``8.0e-17`` *; Python default value:* ``8.0E-17``
  
  *Also available on the trained model.*

tweedieLinkPower
  Tweedie link power.

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

tweedieVariancePower
  Tweedie variance power.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

validationDataFrame
  A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the 'splitRatio' parameter. The parameter is not serializable!

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

weightCol
  Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

withContributions
  Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

withLeafNodeAssignments
  Enables or disables computation of leaf node assignments.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

withStageResults
  Enables or disables computation of stage results.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*