.. _parameters_H2OAutoEncoder: Parameters of H2OAutoEncoder ---------------------------- Affected Class ############## - ``ai.h2o.sparkling.ml.features.H2OAutoEncoder`` Parameters ########## - *Each parameter has also a corresponding getter and setter method.* *(E.g.:* ``label`` *->* ``getLabel()`` *,* ``setLabel(...)`` *)* activation Activation function. Possible values are ``"Tanh"``, ``"TanhWithDropout"``, ``"Rectifier"``, ``"RectifierWithDropout"``, ``"Maxout"``, ``"MaxoutWithDropout"``, ``"ExpRectifier"``, ``"ExpRectifierWithDropout"``. *Default value:* ``"Rectifier"`` *Also available on the trained model.* adaptiveRate Adaptive learning rate. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* mseCol MSE column name. This column contains mean square error calculated from original and output values. *Default value:* ``"H2OAutoEncoder_55602fa91ddf__mse"`` *Also available on the trained model.* originalCol Original column name. This column contains input values to the neural network of auto encoder. *Default value:* ``"H2OAutoEncoder_55602fa91ddf__original"`` *Also available on the trained model.* withMSECol A flag identifying whether a column with mean square error will be produced or not. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* withOriginalCol A flag identifying whether a column with input values to the neural network will be produced or not. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* ignoredCols Names of columns to ignore for training. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* initialBiases A array of weight vectors to be used for bias initialization of every network layer.If this parameter is set, the parameter 'initialWeights' has to be set as well. *Scala default value:* ``null`` *; Python default value:* ``None`` initialWeights A array of weight matrices to be used for initialization of the neural network. If this parameter is set, the parameter 'initialBiases' has to be set as well. *Scala default value:* ``null`` *; Python default value:* ``None`` outputCol Output column name *Default value:* ``"H2OAutoEncoder_55602fa91ddf__output"`` *Also available on the trained model.* averageActivation Average activation for sparse auto-encoder. #Experimental. *Default value:* ``0.0`` *Also available on the trained model.* calculateFeatureImportances Compute variable importances for input features (Gedeon method) - can be slow for large networks. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* categoricalEncoding Encoding scheme for categorical features. Possible values are ``"AUTO"``, ``"OneHotInternal"``, ``"OneHotExplicit"``, ``"Enum"``, ``"Binary"``, ``"Eigen"``, ``"LabelEncoder"``, ``"SortByResponse"``, ``"EnumLimited"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* columnsToCategorical List of columns to convert to categorical before modelling *Scala default value:* ``Array()`` *; Python default value:* ``[]`` convertInvalidNumbersToNa If set to 'true', the model converts invalid numbers to NA during making predictions. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* convertUnknownCategoricalLevelsToNa If set to 'true', the model converts unknown categorical levels to NA during making predictions. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* dataFrameSerializer A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam. *Default value:* ``"ai.h2o.sparkling.utils.JSONDataFrameSerializer"`` *Also available on the trained model.* diagnostics Enable diagnostics for hidden layers. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* elasticAveraging Elastic averaging between compute nodes can improve distributed model convergence. #Experimental. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* elasticAveragingMovingRate Elastic averaging moving rate (only if elastic averaging is enabled). *Default value:* ``0.9`` *Also available on the trained model.* elasticAveragingRegularization Elastic averaging regularization strength (only if elastic averaging is enabled). *Default value:* ``0.001`` *Also available on the trained model.* epochs How many times the dataset should be iterated (streamed), can be fractional. *Default value:* ``10.0`` *Also available on the trained model.* epsilon Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress). *Scala default value:* ``1.0e-8`` *; Python default value:* ``1.0E-8`` *Also available on the trained model.* exportCheckpointsDir Automatically export generated models to this directory. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* exportWeightsAndBiases Whether to export Neural Network weights and biases to H2O Frames. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* fastMode Enable fast mode (minor approximation in back-propagation). *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* forceLoadBalance Force extra load balancing to increase training speed for small datasets (to keep all cores busy). *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* hidden Hidden layer sizes (e.g. [100, 100]). *Scala default value:* ``Array(200, 200)`` *; Python default value:* ``[200, 200]`` *Also available on the trained model.* hiddenDropoutRatios Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* ignoreConstCols Ignore constant columns. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* initialWeightDistribution Initial weight distribution. Possible values are ``"UniformAdaptive"``, ``"Uniform"``, ``"Normal"``. *Default value:* ``"UniformAdaptive"`` *Also available on the trained model.* initialWeightScale Uniform: -value...value, Normal: stddev. *Default value:* ``1.0`` *Also available on the trained model.* inputCols The array of input columns *Scala default value:* ``Array()`` *; Python default value:* ``[]`` *Also available on the trained model.* inputDropoutRatio Input layer dropout ratio (can improve generalization, try 0.1 or 0.2). *Default value:* ``0.0`` *Also available on the trained model.* keepBinaryModels If set to true, all binary models created during execution of the ``fit`` method will be kept in DKV of H2O-3 cluster. *Scala default value:* ``false`` *; Python default value:* ``False`` l1 L1 regularization (can add stability and improve generalization, causes many weights to become 0). *Default value:* ``0.0`` *Also available on the trained model.* l2 L2 regularization (can add stability and improve generalization, causes many weights to be small. *Default value:* ``0.0`` *Also available on the trained model.* loss Loss function. Possible values are ``"Automatic"``, ``"Quadratic"``, ``"CrossEntropy"``, ``"ModifiedHuber"``, ``"Huber"``, ``"Absolute"``, ``"Quantile"``. *Default value:* ``"Automatic"`` *Also available on the trained model.* maxCategoricalFeatures Max. number of categorical features, enforced via hashing. #Experimental. *Default value:* ``2147483647`` *Also available on the trained model.* maxRuntimeSecs Maximum allowed runtime in seconds for model training. Use 0 to disable. *Default value:* ``0.0`` *Also available on the trained model.* maxW2 Constraint for squared sum of incoming weights per unit (e.g. for Rectifier). *Scala default value:* ``3.402823e38f`` *; Python default value:* ``3.402823E38`` *Also available on the trained model.* miniBatchSize Mini-batch size (smaller leads to better fit, larger can speed up and generalize better). *Default value:* ``1`` *Also available on the trained model.* missingValuesHandling Handling of missing values. Either MeanImputation or Skip. Possible values are ``"MeanImputation"``, ``"Skip"``. *Default value:* ``"MeanImputation"`` *Also available on the trained model.* modelId Destination id for this model; auto-generated if not specified. *Scala default value:* ``null`` *; Python default value:* ``None`` momentumRamp Number of training samples for which momentum increases. *Default value:* ``1000000.0`` *Also available on the trained model.* momentumStable Final momentum after the ramp is over (try 0.99). *Default value:* ``0.0`` *Also available on the trained model.* momentumStart Initial momentum at the beginning of training (try 0.5). *Default value:* ``0.0`` *Also available on the trained model.* nesterovAcceleratedGradient Use Nesterov accelerated gradient (recommended). *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* overwriteWithBestModel If enabled, override the final model with the best model found during training. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* quietMode Enable quiet mode for less output to standard output. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* rate Learning rate (higher => less stable, lower => slower convergence). *Default value:* ``0.005`` *Also available on the trained model.* rateAnnealing Learning rate annealing: rate / (1 + rate_annealing * samples). *Scala default value:* ``1.0e-6`` *; Python default value:* ``1.0E-6`` *Also available on the trained model.* rateDecay Learning rate decay factor between layers (N-th layer: rate * rate_decay ^ (n - 1). *Default value:* ``1.0`` *Also available on the trained model.* replicateTrainingData Replicate the entire training dataset onto every node for faster training on small datasets. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* reproducible Force reproducibility on small data (will be slow - only uses 1 thread). *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* rho Adaptive learning rate time decay factor (similarity to prior updates). *Default value:* ``0.99`` *Also available on the trained model.* scoreDutyCycle Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring). *Default value:* ``0.1`` *Also available on the trained model.* scoreEachIteration Whether to score during each iteration of model training. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* scoreInterval Shortest time interval (in seconds) between model scoring. *Default value:* ``5.0`` *Also available on the trained model.* scoreTrainingSamples Number of training set samples for scoring (0 for all). *Scala default value:* ``10000L`` *; Python default value:* ``10000`` *Also available on the trained model.* scoreValidationSamples Number of validation set samples for scoring (0 for all). *Scala default value:* ``0L`` *; Python default value:* ``0`` *Also available on the trained model.* scoreValidationSampling Method used to sample validation dataset for scoring. Possible values are ``"Uniform"``, ``"Stratified"``. *Default value:* ``"Uniform"`` *Also available on the trained model.* seed Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded. *Scala default value:* ``-1L`` *; Python default value:* ``-1`` *Also available on the trained model.* shuffleTrainingData Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows, of if using balance_classes). *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* singleNodeMode Run on a single node for fine-tuning of model parameters. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* sparse Sparse data handling (more efficient for data with lots of 0 values). *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* sparsityBeta Sparsity regularization. #Experimental. *Default value:* ``0.0`` *Also available on the trained model.* splitRatio Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set. *Default value:* ``1.0`` standardize If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* stoppingMetric Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anonomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are ``"AUTO"``, ``"deviance"``, ``"logloss"``, ``"MSE"``, ``"RMSE"``, ``"MAE"``, ``"RMSLE"``, ``"AUC"``, ``"AUCPR"``, ``"lift_top_group"``, ``"misclassification"``, ``"mean_per_class_error"``, ``"anomaly_score"``, ``"custom"``, ``"custom_increasing"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* stoppingRounds Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable). *Default value:* ``5`` *Also available on the trained model.* stoppingTolerance Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much). *Default value:* ``0.0`` *Also available on the trained model.* targetRatioCommToComp Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning). *Default value:* ``0.05`` *Also available on the trained model.* trainSamplesPerIteration Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic. *Scala default value:* ``-2L`` *; Python default value:* ``-2`` *Also available on the trained model.* useAllFactorLevels Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* validationDataFrame A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the 'splitRatio' parameter. The parameter is not serializable! *Scala default value:* ``null`` *; Python default value:* ``None`` weightCol Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.*