Parameters of H2OAutoML¶
Affected Classes¶
- ai.h2o.sparkling.ml.algos.H2OAutoML
- ai.h2o.sparkling.ml.algos.classification.H2OAutoMLClassifier
- ai.h2o.sparkling.ml.algos.regression.H2OAutoMLRegressor
Parameters¶
- Each parameter has also a corresponding getter and setter method. (E.g.: - label->- getLabel(),- setLabel(...))
- blendingDataFrame
- This parameter is used for computing the predictions that serve as the training frame for the meta-learner. If provided, this triggers blending mode on the stacked ensemble training stage. Blending mode is faster than cross-validating the base learners (though these ensembles may not perform as well as the Super Learner ensemble) - Scala default value: - null; Python default value:- None
- ignoredCols
- Names of columns to ignore for training. - Scala default value: - null; Python default value:- None
- leaderboardDataFrame
- This parameter allows the user to specify a particular data frame to use to score and rank models on the leaderboard. This data frame will not be used for anything besides leaderboard scoring. - Scala default value: - null; Python default value:- None
- monotoneConstraints
- A key must correspond to a feature name and value could be 1 or -1 - Scala default value: - Map(); Python default value:- {}
- balanceClasses
- Balance training data class counts via over/under-sampling (for imbalanced data). - Scala default value: - false; Python default value:- False
- classSamplingFactors
- Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. - Scala default value: - null; Python default value:- None
- columnsToCategorical
- List of columns to convert to categorical before modelling - Scala default value: - Array(); Python default value:- []
- convertInvalidNumbersToNa
- If set to ‘true’, the model converts invalid numbers to NA during making predictions. - Scala default value: - false; Python default value:- False
- convertUnknownCategoricalLevelsToNa
- If set to ‘true’, the model converts unknown categorical levels to NA during making predictions. - Scala default value: - false; Python default value:- False
- detailedPredictionCol
- Column containing additional prediction details, its content depends on the model type. - Default value: - "detailed_prediction"
- excludeAlgos
- A list of algorithms to skip during the model-building phase. Possible values are - "GLM",- "DRF",- "GBM",- "DeepLearning",- "StackedEnsemble",- "XGBoost".- Scala default value: - null; Python default value:- None
- exploitationRatio
- The budget ratio (between 0 and 1) dedicated to the exploitation (vs exploration) phase. - Default value: - -1.0
- exportCheckpointsDir
- Path to a directory where every generated model will be stored. - Scala default value: - null; Python default value:- None
- featuresCols
- Name of feature columns - Scala default value: - Array(); Python default value:- []
- foldCol
- Fold column (contains fold IDs) in the training frame. These assignments are used to create the folds for cross-validation of the models. - Scala default value: - null; Python default value:- None
- includeAlgos
- A list of algorithms to restrict to during the model-building phase. Possible values are - "GLM",- "DRF",- "GBM",- "DeepLearning",- "StackedEnsemble",- "XGBoost".- Scala default value: - Array("GLM", "DRF", "GBM", "DeepLearning", "StackedEnsemble", "XGBoost"); Python default value:- ["GLM", "DRF", "GBM", "DeepLearning", "StackedEnsemble", "XGBoost"]
- keepBinaryModels
- If set to true, all binary models created during execution of the - fitmethod will be kept in DKV of H2O-3 cluster.- Scala default value: - false; Python default value:- False
- keepCrossValidationFoldAssignment
- Whether to keep cross-validation assignments. - Scala default value: - false; Python default value:- False
- keepCrossValidationModels
- Whether to keep the cross-validated models. Keeping cross-validation models may consume significantly more memory in the H2O cluster. - Scala default value: - false; Python default value:- False
- keepCrossValidationPredictions
- Whether to keep the predictions of the cross-validation predictions. This needs to be set to TRUE if running the same AutoML object for repeated runs because CV predictions are required to build additional Stacked Ensemble models in AutoML. - Scala default value: - false; Python default value:- False
- labelCol
- Response column. - Default value: - "label"
- maxAfterBalanceSize
- Maximum relative size of the training data after balancing class counts (defaults to 5.0 and can be less than 1.0). Requires balance_classes. - Scala default value: - 5.0f; Python default value:- 5.0
- maxModels
- Maximum number of models to build (optional). - Default value: - 0
- maxRuntimeSecs
- This argument specifies the maximum time that the AutoML process will run for. If neither max_runtime_secs nor max_models are specified by the user, then max_runtime_secs defaults to 3600 seconds (1 hour). - Default value: - 0.0
- maxRuntimeSecsPerModel
- Maximum time to spend on each individual model (optional). - Default value: - 0.0
- namedMojoOutputColumns
- Mojo Output is not stored in the array but in the properly named columns - Scala default value: - true; Python default value:- True
- nfolds
- Number of folds for k-fold cross-validation (defaults to 5, must be >=2 or use 0 to disable). Disabling prevents Stacked Ensembles from being built. - Default value: - 5
- predictionCol
- Prediction column name - Default value: - "prediction"
- projectName
- Optional project name used to group models from multiple AutoML runs into a single Leaderboard; derived from the training data name if not specified. - Scala default value: - null; Python default value:- None
- seed
- Seed for random number generator; set to a value other than -1 for reproducibility. - Scala default value: - -1L; Python default value:- -1
- sortMetric
- Metric used to sort leaderboard. Possible values are - "AUTO",- "deviance",- "logloss",- "MSE",- "RMSE",- "MAE",- "RMSLE",- "AUC",- "mean_per_class_error".- Default value: - "AUTO"
- splitRatio
- Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set. - Default value: - 1.0
- stoppingMetric
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Possible values are - "AUTO",- "deviance",- "logloss",- "MSE",- "RMSE",- "MAE",- "RMSLE",- "AUC",- "AUCPR",- "lift_top_group",- "misclassification",- "mean_per_class_error",- "anomaly_score",- "custom",- "custom_increasing".- Default value: - "AUTO"
- stoppingRounds
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable). - Default value: - 3
- stoppingTolerance
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much). - Default value: - -1.0
- validationDataFrame
- A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. - Scala default value: - null; Python default value:- None
- weightCol
- Weights column in the training frame, which specifies the row weights used in model training. - Scala default value: - null; Python default value:- None
- withContributions
- Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values. - Scala default value: - false; Python default value:- False
- withLeafNodeAssignments
- Enables or disables computation of leaf node assignments. - Scala default value: - false; Python default value:- False
- withStageResults
- Enables or disables computation of stage results. - Scala default value: - false; Python default value:- False