Parameters of H2OAutoEncoder¶
Affected Class¶
ai.h2o.sparkling.ml.features.H2OAutoEncoder
Parameters¶
Each parameter has also a corresponding getter and setter method. (E.g.:
label
->getLabel()
,setLabel(...)
)
- activation
Activation function. Possible values are
"Tanh"
,"TanhWithDropout"
,"Rectifier"
,"RectifierWithDropout"
,"Maxout"
,"MaxoutWithDropout"
,"ExpRectifier"
,"ExpRectifierWithDropout"
.Default value:
"Rectifier"
Also available on the trained model.
- adaptiveRate
Adaptive learning rate.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- mseCol
MSE column name. This column contains mean square error calculated from original and output values.
Default value:
"H2OAutoEncoder_e79a454231a0__mse"
Also available on the trained model.
- originalCol
Original column name. This column contains input values to the neural network of auto encoder.
Default value:
"H2OAutoEncoder_e79a454231a0__original"
Also available on the trained model.
- withMSECol
A flag identifying whether a column with mean square error will be produced or not.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- withOriginalCol
A flag identifying whether a column with input values to the neural network will be produced or not.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- ignoredCols
Names of columns to ignore for training.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- initialBiases
A array of weight vectors to be used for bias initialization of every network layer.If this parameter is set, the parameter ‘initialWeights’ has to be set as well.
Scala default value:
null
; Python default value:None
- initialWeights
A array of weight matrices to be used for initialization of the neural network. If this parameter is set, the parameter ‘initialBiases’ has to be set as well.
Scala default value:
null
; Python default value:None
- outputCol
Output column name
Default value:
"H2OAutoEncoder_e79a454231a0__output"
Also available on the trained model.
- averageActivation
Average activation for sparse auto-encoder. #Experimental.
Default value:
0.0
Also available on the trained model.
- calculateFeatureImportances
Compute variable importances for input features (Gedeon method) - can be slow for large networks.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- categoricalEncoding
Encoding scheme for categorical features. Possible values are
"AUTO"
,"OneHotInternal"
,"OneHotExplicit"
,"Enum"
,"Binary"
,"Eigen"
,"LabelEncoder"
,"SortByResponse"
,"EnumLimited"
.Default value:
"AUTO"
Also available on the trained model.
- columnsToCategorical
List of columns to convert to categorical before modelling
Scala default value:
Array()
; Python default value:[]
- convertInvalidNumbersToNa
If set to ‘true’, the model converts invalid numbers to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- convertUnknownCategoricalLevelsToNa
If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- dataFrameSerializer
A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.
Default value:
"ai.h2o.sparkling.utils.JSONDataFrameSerializer"
Also available on the trained model.
- diagnostics
Enable diagnostics for hidden layers.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- elasticAveraging
Elastic averaging between compute nodes can improve distributed model convergence. #Experimental.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- elasticAveragingMovingRate
Elastic averaging moving rate (only if elastic averaging is enabled).
Default value:
0.9
Also available on the trained model.
- elasticAveragingRegularization
Elastic averaging regularization strength (only if elastic averaging is enabled).
Default value:
0.001
Also available on the trained model.
- epochs
How many times the dataset should be iterated (streamed), can be fractional.
Default value:
10.0
Also available on the trained model.
- epsilon
Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress).
Scala default value:
1.0e-8
; Python default value:1.0E-8
Also available on the trained model.
- exportCheckpointsDir
Automatically export generated models to this directory.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- exportWeightsAndBiases
Whether to export Neural Network weights and biases to H2O Frames.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- fastMode
Enable fast mode (minor approximation in back-propagation).
Scala default value:
true
; Python default value:True
Also available on the trained model.
- forceLoadBalance
Force extra load balancing to increase training speed for small datasets (to keep all cores busy).
Scala default value:
true
; Python default value:True
Also available on the trained model.
- hidden
Hidden layer sizes (e.g. [100, 100]).
Scala default value:
Array(200, 200)
; Python default value:[200, 200]
Also available on the trained model.
- hiddenDropoutRatios
Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- ignoreConstCols
Ignore constant columns.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- initialWeightDistribution
Initial weight distribution. Possible values are
"UniformAdaptive"
,"Uniform"
,"Normal"
.Default value:
"UniformAdaptive"
Also available on the trained model.
- initialWeightScale
Uniform: -value…value, Normal: stddev.
Default value:
1.0
Also available on the trained model.
- inputCols
The array of input columns
Scala default value:
Array()
; Python default value:[]
Also available on the trained model.
- inputDropoutRatio
Input layer dropout ratio (can improve generalization, try 0.1 or 0.2).
Default value:
0.0
Also available on the trained model.
- keepBinaryModels
If set to true, all binary models created during execution of the
fit
method will be kept in DKV of H2O-3 cluster.Scala default value:
false
; Python default value:False
- l1
L1 regularization (can add stability and improve generalization, causes many weights to become 0).
Default value:
0.0
Also available on the trained model.
- l2
L2 regularization (can add stability and improve generalization, causes many weights to be small.
Default value:
0.0
Also available on the trained model.
- loss
Loss function. Possible values are
"Automatic"
,"Quadratic"
,"CrossEntropy"
,"ModifiedHuber"
,"Huber"
,"Absolute"
,"Quantile"
.Default value:
"Automatic"
Also available on the trained model.
- maxCategoricalFeatures
Max. number of categorical features, enforced via hashing. #Experimental.
Default value:
2147483647
Also available on the trained model.
- maxRuntimeSecs
Maximum allowed runtime in seconds for model training. Use 0 to disable.
Default value:
0.0
Also available on the trained model.
- maxW2
Constraint for squared sum of incoming weights per unit (e.g. for Rectifier).
Scala default value:
3.402823e38f
; Python default value:3.402823E38
Also available on the trained model.
- miniBatchSize
Mini-batch size (smaller leads to better fit, larger can speed up and generalize better).
Default value:
1
Also available on the trained model.
- missingValuesHandling
Handling of missing values. Either MeanImputation or Skip. Possible values are
"MeanImputation"
,"Skip"
.Default value:
"MeanImputation"
Also available on the trained model.
- modelId
Destination id for this model; auto-generated if not specified.
Scala default value:
null
; Python default value:None
- momentumRamp
Number of training samples for which momentum increases.
Default value:
1000000.0
Also available on the trained model.
- momentumStable
Final momentum after the ramp is over (try 0.99).
Default value:
0.0
Also available on the trained model.
- momentumStart
Initial momentum at the beginning of training (try 0.5).
Default value:
0.0
Also available on the trained model.
- nesterovAcceleratedGradient
Use Nesterov accelerated gradient (recommended).
Scala default value:
true
; Python default value:True
Also available on the trained model.
- overwriteWithBestModel
If enabled, override the final model with the best model found during training.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- quietMode
Enable quiet mode for less output to standard output.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- rate
Learning rate (higher => less stable, lower => slower convergence).
Default value:
0.005
Also available on the trained model.
- rateAnnealing
Learning rate annealing: rate / (1 + rate_annealing * samples).
Scala default value:
1.0e-6
; Python default value:1.0E-6
Also available on the trained model.
- rateDecay
Learning rate decay factor between layers (N-th layer: rate * rate_decay ^ (n - 1).
Default value:
1.0
Also available on the trained model.
- replicateTrainingData
Replicate the entire training dataset onto every node for faster training on small datasets.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- reproducible
Force reproducibility on small data (will be slow - only uses 1 thread).
Scala default value:
false
; Python default value:False
Also available on the trained model.
- rho
Adaptive learning rate time decay factor (similarity to prior updates).
Default value:
0.99
Also available on the trained model.
- scoreDutyCycle
Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring).
Default value:
0.1
Also available on the trained model.
- scoreEachIteration
Whether to score during each iteration of model training.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- scoreInterval
Shortest time interval (in seconds) between model scoring.
Default value:
5.0
Also available on the trained model.
- scoreTrainingSamples
Number of training set samples for scoring (0 for all).
Scala default value:
10000L
; Python default value:10000
Also available on the trained model.
- scoreValidationSamples
Number of validation set samples for scoring (0 for all).
Scala default value:
0L
; Python default value:0
Also available on the trained model.
- scoreValidationSampling
Method used to sample validation dataset for scoring. Possible values are
"Uniform"
,"Stratified"
.Default value:
"Uniform"
Also available on the trained model.
- seed
Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded.
Scala default value:
-1L
; Python default value:-1
Also available on the trained model.
- shuffleTrainingData
Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows, of if using balance_classes).
Scala default value:
false
; Python default value:False
Also available on the trained model.
- singleNodeMode
Run on a single node for fine-tuning of model parameters.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- sparse
Sparse data handling (more efficient for data with lots of 0 values).
Scala default value:
false
; Python default value:False
Also available on the trained model.
- sparsityBeta
Sparsity regularization. #Experimental.
Default value:
0.0
Also available on the trained model.
- splitRatio
Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.
Default value:
1.0
- standardize
If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- stoppingMetric
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anonomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are
"AUTO"
,"deviance"
,"logloss"
,"MSE"
,"RMSE"
,"MAE"
,"RMSLE"
,"AUC"
,"AUCPR"
,"lift_top_group"
,"misclassification"
,"mean_per_class_error"
,"anomaly_score"
,"custom"
,"custom_increasing"
.Default value:
"AUTO"
Also available on the trained model.
- stoppingRounds
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).
Default value:
5
Also available on the trained model.
- stoppingTolerance
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).
Default value:
0.0
Also available on the trained model.
- targetRatioCommToComp
Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning).
Default value:
0.05
Also available on the trained model.
- trainSamplesPerIteration
Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic.
Scala default value:
-2L
; Python default value:-2
Also available on the trained model.
- useAllFactorLevels
Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- validationDataFrame
A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable!
Scala default value:
null
; Python default value:None
- weightCol
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.
Scala default value:
null
; Python default value:None
Also available on the trained model.