Parameters of H2OKMeans¶
Affected Class¶
- ai.h2o.sparkling.ml.algos.H2OKMeans
Parameters¶
- Each parameter has also a corresponding getter and setter method. (E.g.: - label->- getLabel(),- setLabel(...))
- ignoredCols
- Names of columns to ignore for training. - Scala default value: - null; Python default value:- None- Also available on the trained model. 
- userPoints
- This option allows you to specify array of points, where each point represents coordinates of an initial cluster center. The user-specified points must have the same number of columns as the training observations. The number of rows must equal the number of clusters. - Scala default value: - null; Python default value:- None
- categoricalEncoding
- Encoding scheme for categorical features. Possible values are - "AUTO",- "OneHotInternal",- "OneHotExplicit",- "Enum",- "Binary",- "Eigen",- "LabelEncoder",- "SortByResponse",- "EnumLimited".- Default value: - "AUTO"- Also available on the trained model. 
- clusterSizeConstraints
- An array specifying the minimum number of points that should be in each cluster. The length of the constraints array has to be the same as the number of clusters. - Scala default value: - null; Python default value:- None- Also available on the trained model. 
- columnsToCategorical
- List of columns to convert to categorical before modelling - Scala default value: - Array(); Python default value:- []
- convertInvalidNumbersToNa
- If set to ‘true’, the model converts invalid numbers to NA during making predictions. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- convertUnknownCategoricalLevelsToNa
- If set to ‘true’, the model converts unknown categorical levels to NA during making predictions. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- dataFrameSerializer
- A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam. - Default value: - "ai.h2o.sparkling.utils.JSONDataFrameSerializer"- Also available on the trained model. 
- detailedPredictionCol
- Column containing additional prediction details, its content depends on the model type. - Default value: - "detailed_prediction"- Also available on the trained model. 
- estimateK
- Whether to estimate the number of clusters (<=k) iteratively and deterministically. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- exportCheckpointsDir
- Automatically export generated models to this directory. - Scala default value: - null; Python default value:- None- Also available on the trained model. 
- featuresCols
- Name of feature columns - Scala default value: - Array(); Python default value:- []- Also available on the trained model. 
- foldAssignment
- Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. Possible values are - "AUTO",- "Random",- "Modulo",- "Stratified".- Default value: - "AUTO"- Also available on the trained model. 
- foldCol
- Column with cross-validation fold index assignment per observation. - Scala default value: - null; Python default value:- None- Also available on the trained model. 
- ignoreConstCols
- Ignore constant columns. - Scala default value: - true; Python default value:- True- Also available on the trained model. 
- init
- Initialization mode. Possible values are - "Random",- "PlusPlus",- "Furthest",- "User".- Default value: - "Furthest"- Also available on the trained model. 
- k
- The max. number of clusters. If estimate_k is disabled, the model will find k centroids, otherwise it will find up to k centroids. - Default value: - 1- Also available on the trained model. 
- keepBinaryModels
- If set to true, all binary models created during execution of the - fitmethod will be kept in DKV of H2O-3 cluster.- Scala default value: - false; Python default value:- False
- keepCrossValidationFoldAssignment
- Whether to keep the cross-validation fold assignment. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- keepCrossValidationModels
- Whether to keep the cross-validation models. - Scala default value: - true; Python default value:- True- Also available on the trained model. 
- keepCrossValidationPredictions
- Whether to keep the predictions of the cross-validation models. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- maxIterations
- Maximum training iterations (if estimate_k is enabled, then this is for each inner Lloyds iteration). - Default value: - 10- Also available on the trained model. 
- maxRuntimeSecs
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Default value: - 0.0- Also available on the trained model. 
- modelId
- Destination id for this model; auto-generated if not specified. - Scala default value: - null; Python default value:- None
- namedMojoOutputColumns
- Mojo Output is not stored in the array but in the properly named columns - Scala default value: - true; Python default value:- True- Also available on the trained model. 
- nfolds
- Number of folds for K-fold cross-validation (0 to disable or >= 2). - Default value: - 0- Also available on the trained model. 
- predictionCol
- Prediction column name - Default value: - "prediction"- Also available on the trained model. 
- scoreEachIteration
- Whether to score during each iteration of model training. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- seed
- RNG Seed. - Scala default value: - -1L; Python default value:- -1- Also available on the trained model. 
- splitRatio
- Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set. - Default value: - 1.0
- standardize
- Standardize columns before computing distances. - Scala default value: - true; Python default value:- True- Also available on the trained model. 
- validationDataFrame
- A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable! - Scala default value: - null; Python default value:- None
- withContributions
- Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- withLeafNodeAssignments
- Enables or disables computation of leaf node assignments. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- withStageResults
- Enables or disables computation of stage results. - Scala default value: - false; Python default value:- False- Also available on the trained model.