Parameters of H2OAutoEncoder¶

Affected Class¶

ai.h2o.sparkling.ml.features.H2OAutoEncoder

Parameters¶

Each parameter has also a corresponding getter and setter method. (E.g.: label -> getLabel() , setLabel(...) )

activation

Activation function. Possible values are "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout", "ExpRectifier", "ExpRectifierWithDropout".

Default value: "Rectifier"

Also available on the trained model.

adaptiveRate

Adaptive learning rate.

Scala default value: true ; Python default value: True

Also available on the trained model.

mseCol

MSE column name. This column contains mean square error calculated from original and output values.

Default value: "H2OAutoEncoder_28043ca1df97__mse"

Also available on the trained model.

originalCol

Original column name. This column contains input values to the neural network of auto encoder.

Default value: "H2OAutoEncoder_28043ca1df97__original"

Also available on the trained model.

withMSECol

A flag identifying whether a column with mean square error will be produced or not.

Scala default value: false ; Python default value: False

Also available on the trained model.

withOriginalCol

A flag identifying whether a column with input values to the neural network will be produced or not.

Scala default value: false ; Python default value: False

Also available on the trained model.

ignoredCols

Names of columns to ignore for training.

Scala default value: null ; Python default value: None

Also available on the trained model.

initialBiases

A array of weight vectors to be used for bias initialization of every network layer.If this parameter is set, the parameter ‘initialWeights’ has to be set as well.

Scala default value: null ; Python default value: None

initialWeights

A array of weight matrices to be used for initialization of the neural network. If this parameter is set, the parameter ‘initialBiases’ has to be set as well.

Scala default value: null ; Python default value: None

outputCol

Output column name

Default value: "H2OAutoEncoder_28043ca1df97__output"

Also available on the trained model.

averageActivation

Average activation for sparse auto-encoder. #Experimental.

Default value: 0.0

Also available on the trained model.

calculateFeatureImportances

Compute variable importances for input features (Gedeon method) - can be slow for large networks.

Scala default value: true ; Python default value: True

Also available on the trained model.

categoricalEncoding

Encoding scheme for categorical features. Possible values are "AUTO", "OneHotInternal", "OneHotExplicit", "Enum", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited".

Default value: "AUTO"

Also available on the trained model.

columnsToCategorical

List of columns to convert to categorical before modelling

Scala default value: Array() ; Python default value: []

convertInvalidNumbersToNa

If set to ‘true’, the model converts invalid numbers to NA during making predictions.

Scala default value: false ; Python default value: False

Also available on the trained model.

convertUnknownCategoricalLevelsToNa

If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.

Scala default value: false ; Python default value: False

Also available on the trained model.

dataFrameSerializer

A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.

Default value: "ai.h2o.sparkling.utils.JSONDataFrameSerializer"

Also available on the trained model.

diagnostics

Enable diagnostics for hidden layers.

Scala default value: true ; Python default value: True

Also available on the trained model.

elasticAveraging

Elastic averaging between compute nodes can improve distributed model convergence. #Experimental.

Scala default value: false ; Python default value: False

Also available on the trained model.

elasticAveragingMovingRate

Elastic averaging moving rate (only if elastic averaging is enabled).

Default value: 0.9

Also available on the trained model.

elasticAveragingRegularization

Elastic averaging regularization strength (only if elastic averaging is enabled).

Default value: 0.001

Also available on the trained model.

epochs

How many times the dataset should be iterated (streamed), can be fractional.

Default value: 10.0

Also available on the trained model.

epsilon

Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress).

Scala default value: 1.0e-8 ; Python default value: 1.0E-8

Also available on the trained model.

exportCheckpointsDir

Automatically export generated models to this directory.

Scala default value: null ; Python default value: None

Also available on the trained model.

exportWeightsAndBiases

Whether to export Neural Network weights and biases to H2O Frames.

Scala default value: false ; Python default value: False

Also available on the trained model.

fastMode

Enable fast mode (minor approximation in back-propagation).

Scala default value: true ; Python default value: True

Also available on the trained model.

forceLoadBalance

Force extra load balancing to increase training speed for small datasets (to keep all cores busy).

Scala default value: true ; Python default value: True

Also available on the trained model.

hidden

Hidden layer sizes (e.g. [100, 100]).

Scala default value: Array(200, 200) ; Python default value: [200, 200]

Also available on the trained model.

hiddenDropoutRatios

Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.

Scala default value: null ; Python default value: None

Also available on the trained model.

ignoreConstCols

Ignore constant columns.

Scala default value: true ; Python default value: True

Also available on the trained model.

initialWeightDistribution

Initial weight distribution. Possible values are "UniformAdaptive", "Uniform", "Normal".

Default value: "UniformAdaptive"

Also available on the trained model.

initialWeightScale

Uniform: -value…value, Normal: stddev.

Default value: 1.0

Also available on the trained model.

inputCols

The array of input columns

Scala default value: Array() ; Python default value: []

Also available on the trained model.

inputDropoutRatio

Input layer dropout ratio (can improve generalization, try 0.1 or 0.2).

Default value: 0.0

Also available on the trained model.

keepBinaryModels

If set to true, all binary models created during execution of the fit method will be kept in DKV of H2O-3 cluster.

Scala default value: false ; Python default value: False

l1

L1 regularization (can add stability and improve generalization, causes many weights to become 0).

Default value: 0.0

Also available on the trained model.

l2

L2 regularization (can add stability and improve generalization, causes many weights to be small.

Default value: 0.0

Also available on the trained model.

loss

Loss function. Possible values are "Automatic", "Quadratic", "CrossEntropy", "ModifiedHuber", "Huber", "Absolute", "Quantile".

Default value: "Automatic"

Also available on the trained model.

maxCategoricalFeatures

Max. number of categorical features, enforced via hashing. #Experimental.

Default value: 2147483647

Also available on the trained model.

maxRuntimeSecs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

Default value: 0.0

Also available on the trained model.

maxW2

Constraint for squared sum of incoming weights per unit (e.g. for Rectifier).

Scala default value: 3.402823e38f ; Python default value: 3.402823E38

Also available on the trained model.

miniBatchSize

Mini-batch size (smaller leads to better fit, larger can speed up and generalize better).

Default value: 1

Also available on the trained model.

missingValuesHandling

Handling of missing values. Either MeanImputation or Skip. Possible values are "MeanImputation", "Skip".

Default value: "MeanImputation"

Also available on the trained model.

modelId

Destination id for this model; auto-generated if not specified.

Scala default value: null ; Python default value: None

momentumRamp

Number of training samples for which momentum increases.

Default value: 1000000.0

Also available on the trained model.

momentumStable

Final momentum after the ramp is over (try 0.99).

Default value: 0.0

Also available on the trained model.

momentumStart

Initial momentum at the beginning of training (try 0.5).

Default value: 0.0

Also available on the trained model.

nesterovAcceleratedGradient

Use Nesterov accelerated gradient (recommended).

Scala default value: true ; Python default value: True

Also available on the trained model.

overwriteWithBestModel

If enabled, override the final model with the best model found during training.

Scala default value: true ; Python default value: True

Also available on the trained model.

quietMode

Enable quiet mode for less output to standard output.

Scala default value: false ; Python default value: False

Also available on the trained model.

rate

Learning rate (higher => less stable, lower => slower convergence).

Default value: 0.005

Also available on the trained model.

rateAnnealing

Learning rate annealing: rate / (1 + rate_annealing * samples).

Scala default value: 1.0e-6 ; Python default value: 1.0E-6

Also available on the trained model.

rateDecay

Learning rate decay factor between layers (N-th layer: rate * rate_decay ^ (n - 1).

Default value: 1.0

Also available on the trained model.

replicateTrainingData

Replicate the entire training dataset onto every node for faster training on small datasets.

Scala default value: true ; Python default value: True

Also available on the trained model.

reproducible

Force reproducibility on small data (will be slow - only uses 1 thread).

Scala default value: false ; Python default value: False

Also available on the trained model.

rho

Adaptive learning rate time decay factor (similarity to prior updates).

Default value: 0.99

Also available on the trained model.

scoreDutyCycle

Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring).

Default value: 0.1

Also available on the trained model.

scoreEachIteration

Whether to score during each iteration of model training.

Scala default value: false ; Python default value: False

Also available on the trained model.

scoreInterval

Shortest time interval (in seconds) between model scoring.

Default value: 5.0

Also available on the trained model.

scoreTrainingSamples

Number of training set samples for scoring (0 for all).

Scala default value: 10000L ; Python default value: 10000

Also available on the trained model.

scoreValidationSamples

Number of validation set samples for scoring (0 for all).

Scala default value: 0L ; Python default value: 0

Also available on the trained model.

scoreValidationSampling

Method used to sample validation dataset for scoring. Possible values are "Uniform", "Stratified".

Default value: "Uniform"

Also available on the trained model.

seed

Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded.

Scala default value: -1L ; Python default value: -1

Also available on the trained model.

shuffleTrainingData

Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows, of if using balance_classes).

Scala default value: false ; Python default value: False

Also available on the trained model.

singleNodeMode

Run on a single node for fine-tuning of model parameters.

Scala default value: false ; Python default value: False

Also available on the trained model.

sparse

Sparse data handling (more efficient for data with lots of 0 values).

Scala default value: false ; Python default value: False

Also available on the trained model.

sparsityBeta

Sparsity regularization. #Experimental.

Default value: 0.0

Also available on the trained model.

splitRatio

Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

Default value: 1.0

standardize

If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data.

Scala default value: true ; Python default value: True

Also available on the trained model.

stoppingMetric

Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "AUCPR", "lift_top_group", "misclassification", "mean_per_class_error", "anomaly_score", "custom", "custom_increasing".

Default value: "AUTO"

Also available on the trained model.

stoppingRounds

Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).

Default value: 5

Also available on the trained model.

stoppingTolerance

Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).

Default value: 0.0

Also available on the trained model.

targetRatioCommToComp

Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning).

Default value: 0.05

Also available on the trained model.

trainSamplesPerIteration

Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic.

Scala default value: -2L ; Python default value: -2

Also available on the trained model.

useAllFactorLevels

Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder.

Scala default value: true ; Python default value: True

Also available on the trained model.

validationDataFrame

A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable!

Scala default value: null ; Python default value: None

weightCol

Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.

Scala default value: null ; Python default value: None

Also available on the trained model.