Parameters of H2OPCA¶
Affected Class¶
ai.h2o.sparkling.ml.features.H2OPCA
Parameters¶
Each parameter has also a corresponding getter and setter method. (E.g.:
label
->getLabel()
,setLabel(...)
)
- ignoredCols
Names of columns to ignore for training.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- outputCol
Output column name
Default value:
"H2OPCA_29072b5c6db2__output"
Also available on the trained model.
- columnsToCategorical
List of columns to convert to categorical before modelling
Scala default value:
Array()
; Python default value:[]
- computeMetrics
Whether to compute metrics on the training data.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- convertInvalidNumbersToNa
If set to ‘true’, the model converts invalid numbers to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- convertUnknownCategoricalLevelsToNa
If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- dataFrameSerializer
A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.
Default value:
"ai.h2o.sparkling.utils.JSONDataFrameSerializer"
Also available on the trained model.
- exportCheckpointsDir
Automatically export generated models to this directory.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- ignoreConstCols
Ignore constant columns.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- imputeMissing
Whether to impute missing entries with the column mean.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- inputCols
The array of input columns
Scala default value:
Array()
; Python default value:[]
Also available on the trained model.
- k
Rank of matrix approximation.
Default value:
1
Also available on the trained model.
- keepBinaryModels
If set to true, all binary models created during execution of the
fit
method will be kept in DKV of H2O-3 cluster.Scala default value:
false
; Python default value:False
- maxIterations
Maximum training iterations.
Default value:
1000
Also available on the trained model.
- maxRuntimeSecs
Maximum allowed runtime in seconds for model training. Use 0 to disable.
Default value:
0.0
Also available on the trained model.
- modelId
Destination id for this model; auto-generated if not specified.
Scala default value:
null
; Python default value:None
- pcaImpl
Specify the implementation to use for computing PCA (via SVD or EVD): MTJ_EVD_DENSEMATRIX - eigenvalue decompositions for dense matrix using MTJ; MTJ_EVD_SYMMMATRIX - eigenvalue decompositions for symmetric matrix using MTJ; MTJ_SVD_DENSEMATRIX - singular-value decompositions for dense matrix using MTJ; JAMA - eigenvalue decompositions for dense matrix using JAMA. References: JAMA - http://math.nist.gov/javanumerics/jama/; MTJ - https://github.com/fommil/matrix-toolkits-java/. Possible values are
"MTJ_EVD_DENSEMATRIX"
,"MTJ_EVD_SYMMMATRIX"
,"MTJ_SVD_DENSEMATRIX"
,"JAMA"
.Default value:
"MTJ_EVD_SYMMMATRIX"
Also available on the trained model.
- pcaMethod
Specify the algorithm to use for computing the principal components: GramSVD - uses a distributed computation of the Gram matrix, followed by a local SVD; Power - computes the SVD using the power iteration method (experimental); Randomized - uses randomized subspace iteration method; GLRM - fits a generalized low-rank model with L2 loss function and no regularization and solves for the SVD using local matrix algebra (experimental). Possible values are
"GramSVD"
,"Power"
,"Randomized"
,"GLRM"
.Default value:
"GramSVD"
Also available on the trained model.
- scoreEachIteration
Whether to score during each iteration of model training.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- seed
RNG seed for initialization.
Scala default value:
-1L
; Python default value:-1
Also available on the trained model.
- splitRatio
Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.
Default value:
1.0
- transform
Transformation of training data. Possible values are
"NONE"
,"STANDARDIZE"
,"NORMALIZE"
,"DEMEAN"
,"DESCALE"
.Default value:
"NONE"
Also available on the trained model.
- useAllFactorLevels
Whether first factor level is included in each categorical expansion.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- validationDataFrame
A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable!
Scala default value:
null
; Python default value:None