.. _parameters_H2OAutoEncoder:

Parameters of H2OAutoEncoder
----------------------------

Affected Class
##############

- ``ai.h2o.sparkling.ml.features.H2OAutoEncoder``

Parameters
##########

- *Each parameter has also a corresponding getter and setter method.*
  *(E.g.:* ``label`` *->* ``getLabel()`` *,* ``setLabel(...)`` *)*

activation
  Activation function. Possible values are ``"Tanh"``, ``"TanhWithDropout"``, ``"Rectifier"``, ``"RectifierWithDropout"``, ``"Maxout"``, ``"MaxoutWithDropout"``, ``"ExpRectifier"``, ``"ExpRectifierWithDropout"``.

  *Default value:* ``"Rectifier"``
  
  *Also available on the trained model.*

adaptiveRate
  Adaptive learning rate.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

mseCol
  MSE column name. This column contains mean square error calculated from original and output values.

  *Default value:* ``"H2OAutoEncoder_55602fa91ddf__mse"``
  
  *Also available on the trained model.*

originalCol
  Original column name. This column contains input values to the neural network of auto encoder.

  *Default value:* ``"H2OAutoEncoder_55602fa91ddf__original"``
  
  *Also available on the trained model.*

withMSECol
  A flag identifying whether a column with mean square error will be produced or not.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

withOriginalCol
  A flag identifying whether a column with input values to the neural network will be produced or not.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

ignoredCols
  Names of columns to ignore for training.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

initialBiases
  A array of weight vectors to be used for bias initialization of every network layer.If this parameter is set, the parameter 'initialWeights' has to be set as well.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

initialWeights
  A array of weight matrices to be used for initialization of the neural network. If this parameter is set, the parameter 'initialBiases' has to be set as well.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

outputCol
  Output column name

  *Default value:* ``"H2OAutoEncoder_55602fa91ddf__output"``
  
  *Also available on the trained model.*

averageActivation
  Average activation for sparse auto-encoder. #Experimental.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

calculateFeatureImportances
  Compute variable importances for input features (Gedeon method) - can be slow for large networks.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

categoricalEncoding
  Encoding scheme for categorical features. Possible values are ``"AUTO"``, ``"OneHotInternal"``, ``"OneHotExplicit"``, ``"Enum"``, ``"Binary"``, ``"Eigen"``, ``"LabelEncoder"``, ``"SortByResponse"``, ``"EnumLimited"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

columnsToCategorical
  List of columns to convert to categorical before modelling

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  

convertInvalidNumbersToNa
  If set to 'true', the model converts invalid numbers to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

convertUnknownCategoricalLevelsToNa
  If set to 'true', the model converts unknown categorical levels to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

dataFrameSerializer
  A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.

  *Default value:* ``"ai.h2o.sparkling.utils.JSONDataFrameSerializer"``
  
  *Also available on the trained model.*

diagnostics
  Enable diagnostics for hidden layers.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

elasticAveraging
  Elastic averaging between compute nodes can improve distributed model convergence. #Experimental.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

elasticAveragingMovingRate
  Elastic averaging moving rate (only if elastic averaging is enabled).

  *Default value:* ``0.9``
  
  *Also available on the trained model.*

elasticAveragingRegularization
  Elastic averaging regularization strength (only if elastic averaging is enabled).

  *Default value:* ``0.001``
  
  *Also available on the trained model.*

epochs
  How many times the dataset should be iterated (streamed), can be fractional.

  *Default value:* ``10.0``
  
  *Also available on the trained model.*

epsilon
  Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress).

  *Scala default value:* ``1.0e-8`` *; Python default value:* ``1.0E-8``
  
  *Also available on the trained model.*

exportCheckpointsDir
  Automatically export generated models to this directory.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

exportWeightsAndBiases
  Whether to export Neural Network weights and biases to H2O Frames.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

fastMode
  Enable fast mode (minor approximation in back-propagation).

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

forceLoadBalance
  Force extra load balancing to increase training speed for small datasets (to keep all cores busy).

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

hidden
  Hidden layer sizes (e.g. [100, 100]).

  *Scala default value:* ``Array(200, 200)`` *; Python default value:* ``[200, 200]``
  
  *Also available on the trained model.*

hiddenDropoutRatios
  Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

ignoreConstCols
  Ignore constant columns.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

initialWeightDistribution
  Initial weight distribution. Possible values are ``"UniformAdaptive"``, ``"Uniform"``, ``"Normal"``.

  *Default value:* ``"UniformAdaptive"``
  
  *Also available on the trained model.*

initialWeightScale
  Uniform: -value...value, Normal: stddev.

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

inputCols
  The array of input columns

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  
  *Also available on the trained model.*

inputDropoutRatio
  Input layer dropout ratio (can improve generalization, try 0.1 or 0.2).

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

keepBinaryModels
  If set to true, all binary models created during execution of the ``fit`` method will be kept in DKV of H2O-3 cluster.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

l1
  L1 regularization (can add stability and improve generalization, causes many weights to become 0).

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

l2
  L2 regularization (can add stability and improve generalization, causes many weights to be small.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

loss
  Loss function. Possible values are ``"Automatic"``, ``"Quadratic"``, ``"CrossEntropy"``, ``"ModifiedHuber"``, ``"Huber"``, ``"Absolute"``, ``"Quantile"``.

  *Default value:* ``"Automatic"``
  
  *Also available on the trained model.*

maxCategoricalFeatures
  Max. number of categorical features, enforced via hashing. #Experimental.

  *Default value:* ``2147483647``
  
  *Also available on the trained model.*

maxRuntimeSecs
  Maximum allowed runtime in seconds for model training. Use 0 to disable.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

maxW2
  Constraint for squared sum of incoming weights per unit (e.g. for Rectifier).

  *Scala default value:* ``3.402823e38f`` *; Python default value:* ``3.402823E38``
  
  *Also available on the trained model.*

miniBatchSize
  Mini-batch size (smaller leads to better fit, larger can speed up and generalize better).

  *Default value:* ``1``
  
  *Also available on the trained model.*

missingValuesHandling
  Handling of missing values. Either MeanImputation or Skip. Possible values are ``"MeanImputation"``, ``"Skip"``.

  *Default value:* ``"MeanImputation"``
  
  *Also available on the trained model.*

modelId
  Destination id for this model; auto-generated if not specified.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

momentumRamp
  Number of training samples for which momentum increases.

  *Default value:* ``1000000.0``
  
  *Also available on the trained model.*

momentumStable
  Final momentum after the ramp is over (try 0.99).

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

momentumStart
  Initial momentum at the beginning of training (try 0.5).

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

nesterovAcceleratedGradient
  Use Nesterov accelerated gradient (recommended).

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

overwriteWithBestModel
  If enabled, override the final model with the best model found during training.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

quietMode
  Enable quiet mode for less output to standard output.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

rate
  Learning rate (higher => less stable, lower => slower convergence).

  *Default value:* ``0.005``
  
  *Also available on the trained model.*

rateAnnealing
  Learning rate annealing: rate / (1 + rate_annealing * samples).

  *Scala default value:* ``1.0e-6`` *; Python default value:* ``1.0E-6``
  
  *Also available on the trained model.*

rateDecay
  Learning rate decay factor between layers (N-th layer: rate * rate_decay ^ (n - 1).

  *Default value:* ``1.0``
  
  *Also available on the trained model.*

replicateTrainingData
  Replicate the entire training dataset onto every node for faster training on small datasets.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

reproducible
  Force reproducibility on small data (will be slow - only uses 1 thread).

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

rho
  Adaptive learning rate time decay factor (similarity to prior updates).

  *Default value:* ``0.99``
  
  *Also available on the trained model.*

scoreDutyCycle
  Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring).

  *Default value:* ``0.1``
  
  *Also available on the trained model.*

scoreEachIteration
  Whether to score during each iteration of model training.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

scoreInterval
  Shortest time interval (in seconds) between model scoring.

  *Default value:* ``5.0``
  
  *Also available on the trained model.*

scoreTrainingSamples
  Number of training set samples for scoring (0 for all).

  *Scala default value:* ``10000L`` *; Python default value:* ``10000``
  
  *Also available on the trained model.*

scoreValidationSamples
  Number of validation set samples for scoring (0 for all).

  *Scala default value:* ``0L`` *; Python default value:* ``0``
  
  *Also available on the trained model.*

scoreValidationSampling
  Method used to sample validation dataset for scoring. Possible values are ``"Uniform"``, ``"Stratified"``.

  *Default value:* ``"Uniform"``
  
  *Also available on the trained model.*

seed
  Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded.

  *Scala default value:* ``-1L`` *; Python default value:* ``-1``
  
  *Also available on the trained model.*

shuffleTrainingData
  Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows, of if using balance_classes).

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

singleNodeMode
  Run on a single node for fine-tuning of model parameters.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

sparse
  Sparse data handling (more efficient for data with lots of 0 values).

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

sparsityBeta
  Sparsity regularization. #Experimental.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

splitRatio
  Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

  *Default value:* ``1.0``
  

standardize
  If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

stoppingMetric
  Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anonomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are ``"AUTO"``, ``"deviance"``, ``"logloss"``, ``"MSE"``, ``"RMSE"``, ``"MAE"``, ``"RMSLE"``, ``"AUC"``, ``"AUCPR"``, ``"lift_top_group"``, ``"misclassification"``, ``"mean_per_class_error"``, ``"anomaly_score"``, ``"custom"``, ``"custom_increasing"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

stoppingRounds
  Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).

  *Default value:* ``5``
  
  *Also available on the trained model.*

stoppingTolerance
  Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

targetRatioCommToComp
  Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning).

  *Default value:* ``0.05``
  
  *Also available on the trained model.*

trainSamplesPerIteration
  Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic.

  *Scala default value:* ``-2L`` *; Python default value:* ``-2``
  
  *Also available on the trained model.*

useAllFactorLevels
  Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

validationDataFrame
  A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the 'splitRatio' parameter. The parameter is not serializable!

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

weightCol
  Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*