.. _parameters_H2OGAM:

Parameters of H2OGAM
--------------------

Affected Classes
################

- ``ai.h2o.sparkling.ml.algos.H2OGAM``
- ``ai.h2o.sparkling.ml.algos.classification.H2OGAMClassifier``
- ``ai.h2o.sparkling.ml.algos.regression.H2OGAMRegressor``

Parameters
##########

- *Each parameter has also a corresponding getter and setter method.*
  *(E.g.:* ``label`` *->* ``getLabel()`` *,* ``setLabel(...)`` *)*

betaConstraints
  Data frame of beta constraints enabling to set special conditions over the model coefficients.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

ignoredCols
  Names of columns to ignore for training.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

alphaValue
  Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'L-BFGS'; 0.5 otherwise.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

aucType
  Set default multinomial AUC type. Possible values are ``"AUTO"``, ``"NONE"``, ``"MACRO_OVR"``, ``"WEIGHTED_OVR"``, ``"MACRO_OVO"``, ``"WEIGHTED_OVO"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

balanceClasses
  Balance training data class counts via over/under-sampling (for imbalanced data).

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

betaEpsilon
  Converge if  beta changes less (using L-infinity norm) than beta esilon, ONLY applies to IRLSM solver .

  *Scala default value:* ``1.0e-4`` *; Python default value:* ``1.0E-4``
  
  *Also available on the trained model.*

bs
  Basis function type for each gam predictors, 0 for cr, 1 for thin plate regression with knots, 2 for monotone I-splines, 3 for NBSplineTypeI M-splines (refer to doc here: https://github.com/h2oai/h2o-3/issues/6926).  If specified, must be the same size as gam_columns.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

classSamplingFactors
  Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

coldStart
  Only applicable to multiple alpha/lambda values when calling GLM from GAM.  If false, build the next model for next set of alpha/lambda values starting from the values provided by current model.  If true will start GLM model from scratch.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

columnsToCategorical
  List of columns to convert to categorical before modelling

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  

computePValues
  Request p-values computation, p-values work only with IRLSM solver and no regularization.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

convertInvalidNumbersToNa
  If set to 'true', the model converts invalid numbers to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

convertUnknownCategoricalLevelsToNa
  If set to 'true', the model converts unknown categorical levels to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

dataFrameSerializer
  A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.

  *Default value:* ``"ai.h2o.sparkling.utils.JSONDataFrameSerializer"``
  
  *Also available on the trained model.*

detailedPredictionCol
  Column containing additional prediction details, its content depends on the model type.

  *Default value:* ``"detailed_prediction"``
  
  *Also available on the trained model.*

earlyStopping
  Stop early when there is no more relative improvement on train or validation (if provided).

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

exportCheckpointsDir
  Automatically export generated models to this directory.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

family
  Family. Use binomial for classification with logistic regression, others are for regression problems. Possible values are ``"AUTO"``, ``"gaussian"``, ``"binomial"``, ``"fractionalbinomial"``, ``"quasibinomial"``, ``"poisson"``, ``"gamma"``, ``"multinomial"``, ``"tweedie"``, ``"ordinal"``, ``"negativebinomial"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

featuresCols
  Name of feature columns

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  
  *Also available on the trained model.*

foldAssignment
  Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems. Possible values are ``"AUTO"``, ``"Random"``, ``"Modulo"``, ``"Stratified"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

foldCol
  Column with cross-validation fold index assignment per observation.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

gainsliftBins
  Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning.

  *Default value:* ``-1``
  
  *Also available on the trained model.*

gamCols
  Arrays of predictor column names for gam for smoothers using single or multiple predictors like {{'c1'},{'c2','c3'},{'c4'},...}

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

gradientEpsilon
  Converge if  objective changes less (using L-infinity norm) than this, ONLY applies to L-BFGS solver. Default indicates: If lambda_search is set to False and lambda is equal to zero, the default value of gradient_epsilon is equal to .000001, otherwise the default value is .0001. If lambda_search is set to True, the conditional values above are 1E-8 and 1E-6 respectively.

  *Default value:* ``-1.0``
  
  *Also available on the trained model.*

ignoreConstCols
  Ignore constant columns.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

interactions
  A list of predictor column indices to interact. All pairwise combinations will be computed for the list.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

intercept
  Include constant term in the model.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

keepBinaryModels
  If set to true, all binary models created during execution of the ``fit`` method will be kept in DKV of H2O-3 cluster.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

keepCrossValidationFoldAssignment
  Whether to keep the cross-validation fold assignment.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

keepCrossValidationModels
  Whether to keep the cross-validation models.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

keepCrossValidationPredictions
  Whether to keep the predictions of the cross-validation models.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

keepGamCols
  Save keys of model matrix.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

knotIds
  Array storing frame keys of knots.  One for each gam column set specified in gam_columns.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

labelCol
  Response variable column.

  *Default value:* ``"label"``
  
  *Also available on the trained model.*

lambdaSearch
  Use lambda search starting at lambda max, given lambda is then interpreted as lambda min.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

lambdaValue
  Regularization strength.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

link
  Link function. Possible values are ``"family_default"``, ``"identity"``, ``"logit"``, ``"log"``, ``"inverse"``, ``"tweedie"``, ``"multinomial"``, ``"ologit"``, ``"oprobit"``, ``"ologlog"``.

  *Default value:* ``"family_default"``
  
  *Also available on the trained model.*

maxActivePredictors
  Maximum number of active predictors during computation. Use as a stopping criterion to prevent expensive model building with many predictors. Default indicates: If the IRLSM solver is used, the value of max_active_predictors is set to 5000 otherwise it is set to 100000000.

  *Default value:* ``-1``
  
  *Also available on the trained model.*

maxAfterBalanceSize
  Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.

  *Scala default value:* ``5.0f`` *; Python default value:* ``5.0``
  
  *Also available on the trained model.*

maxConfusionMatrixSize
  [Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs.

  *Default value:* ``20``
  
  *Also available on the trained model.*

maxIterations
  Maximum number of iterations.

  *Default value:* ``-1``
  
  *Also available on the trained model.*

maxRuntimeSecs
  Maximum allowed runtime in seconds for model training. Use 0 to disable.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

missingValuesHandling
  Handling of missing values. Either MeanImputation, Skip or PlugValues. Possible values are ``"MeanImputation"``, ``"PlugValues"``, ``"Skip"``.

  *Default value:* ``"MeanImputation"``
  
  *Also available on the trained model.*

modelId
  Destination id for this model; auto-generated if not specified.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

nfolds
  Number of folds for K-fold cross-validation (0 to disable or >= 2).

  *Default value:* ``0``
  
  *Also available on the trained model.*

nlambdas
  Number of lambdas to be used in a search. Default indicates: If alpha is zero, with lambda search set to True, the value of nlamdas is set to 30 (fewer lambdas are needed for ridge regression) otherwise it is set to 100.

  *Default value:* ``-1``
  
  *Also available on the trained model.*

nonNegative
  Restrict coefficients (not intercept) to be non-negative.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

numKnots
  Number of knots for gam predictors.  If specified, must specify one for each gam predictor.  For monotone I-splines, mininum = 2, for cs spline, minimum = 3.  For thin plate, minimum is size of polynomial basis + 2.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

objReg
  Likelihood divider in objective value computation, default is 1/nobs.

  *Default value:* ``-1.0``
  
  *Also available on the trained model.*

objectiveEpsilon
  Converge if  objective value changes less than this. Default indicates: If lambda_search is set to True the value of objective_epsilon is set to .0001. If the lambda_search is set to False and lambda is equal to zero, the value of objective_epsilon is set to .000001, for any other value of lambda the default value of objective_epsilon is set to .0001.

  *Default value:* ``-1.0``
  
  *Also available on the trained model.*

offsetCol
  Offset column. This will be added to the combination of columns before applying the link function.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

predictionCol
  Prediction column name

  *Default value:* ``"prediction"``
  
  *Also available on the trained model.*

prior
  Prior probability for y==1. To be used only for logistic regression iff the data has been sampled and the mean of response does not reflect reality.

  *Default value:* ``-1.0``
  
  *Also available on the trained model.*

removeCollinearCols
  In case of linearly dependent columns, remove some of the dependent columns.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

scale
  Smoothing parameter for gam predictors.  If specified, must be of the same length as gam_columns.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

scaleTpPenaltyMat
  Scale penalty matrix for tp (thin plate) smoothers as in R.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

scoreEachIteration
  Whether to score during each iteration of model training.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

seed
  Seed for pseudo random number generator (if applicable).

  *Scala default value:* ``-1L`` *; Python default value:* ``-1``
  

solver
  AUTO will set the solver based on given data and the other parameters. IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, L_BFGS scales better for datasets with many columns. Possible values are ``"AUTO"``, ``"IRLSM"``, ``"L_BFGS"``, ``"COORDINATE_DESCENT_NAIVE"``, ``"COORDINATE_DESCENT"``, ``"GRADIENT_DESCENT_LH"``, ``"GRADIENT_DESCENT_SQERR"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

splineOrders
  Order of I-splines or NBSplineTypeI M-splines used for gam predictors. If specified, must be the same size as gam_columns.  For I-splines, the spline_orders will be the same as the polynomials used to generate the splines.  For M-splines, the polynomials used to generate the splines will be spline_order-1.  Values for bs=0 or 1 will be ignored.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

splinesNonNegative
  Valid for I-spline (bs=2) only.  True if the I-splines are monotonically increasing (and monotonically non-decreasing) and False if the I-splines are monotonically decreasing (and monotonically non-increasing).  If specified, must be the same size as gam_columns.  Values for other spline types will be ignored.  Default to true.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

splitRatio
  Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

  *Default value:* ``1.0``
  

standardize
  Standardize numeric columns to have zero mean and unit variance.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

standardizeTpGamCols
  standardize tp (thin plate) predictor columns.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

startval
  double array to initialize coefficients for GAM.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

stoppingMetric
  Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are ``"AUTO"``, ``"deviance"``, ``"logloss"``, ``"MSE"``, ``"RMSE"``, ``"MAE"``, ``"RMSLE"``, ``"AUC"``, ``"AUCPR"``, ``"lift_top_group"``, ``"misclassification"``, ``"mean_per_class_error"``, ``"anomaly_score"``, ``"AUUC"``, ``"ATE"``, ``"ATT"``, ``"ATC"``, ``"qini"``, ``"custom"``, ``"custom_increasing"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

stoppingRounds
  Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).

  *Default value:* ``0``
  
  *Also available on the trained model.*

stoppingTolerance
  Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).

  *Default value:* ``0.001``
  
  *Also available on the trained model.*

storeKnotLocations
  If set to true, will return knot locations as double[][] array for gam column names found knots_for_gam.  Default to false.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

theta
  Theta.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

tweedieLinkPower
  Tweedie link power.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

tweedieVariancePower
  Tweedie variance power.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

validationDataFrame
  A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the 'splitRatio' parameter. The parameter is not serializable!

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

weightCol
  Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

withContributions
  Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

withLeafNodeAssignments
  Enables or disables computation of leaf node assignments.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

withStageResults
  Enables or disables computation of stage results.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*