Parameters of H2ODeepLearning¶

Affected Classes¶

ai.h2o.sparkling.ml.algos.H2ODeepLearning
ai.h2o.sparkling.ml.algos.classification.H2ODeepLearningClassifier
ai.h2o.sparkling.ml.algos.regression.H2ODeepLearningRegressor

Parameters¶

Each parameter has also a corresponding getter and setter method. (E.g.: label -> getLabel() , setLabel(...) )

activation

Activation function. Possible values are "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout", "ExpRectifier", "ExpRectifierWithDropout".

Default value: "Rectifier"

Also available on the trained model.

adaptiveRate

Adaptive learning rate.

Scala default value: true ; Python default value: True

Also available on the trained model.

ignoredCols

Names of columns to ignore for training.

Scala default value: null ; Python default value: None

Also available on the trained model.

initialBiases

A array of weight vectors to be used for bias initialization of every network layer.If this parameter is set, the parameter ‘initialWeights’ has to be set as well.

Scala default value: null ; Python default value: None

initialWeights

A array of weight matrices to be used for initialization of the neural network. If this parameter is set, the parameter ‘initialBiases’ has to be set as well.

Scala default value: null ; Python default value: None

aucType

Set default multinomial AUC type. Possible values are "AUTO", "NONE", "MACRO_OVR", "WEIGHTED_OVR", "MACRO_OVO", "WEIGHTED_OVO".

Default value: "AUTO"

Also available on the trained model.

averageActivation

Average activation for sparse auto-encoder. #Experimental.

Default value: 0.0

Also available on the trained model.

balanceClasses

Balance training data class counts via over/under-sampling (for imbalanced data).

Scala default value: false ; Python default value: False

Also available on the trained model.

calculateFeatureImportances

Compute variable importances for input features (Gedeon method) - can be slow for large networks.

Scala default value: true ; Python default value: True

Also available on the trained model.

categoricalEncoding

Encoding scheme for categorical features. Possible values are "AUTO", "OneHotInternal", "OneHotExplicit", "Enum", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited".

Default value: "AUTO"

Also available on the trained model.

classSamplingFactors

Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.

Scala default value: null ; Python default value: None

Also available on the trained model.

classificationStop

Stopping criterion for classification error fraction on training data (-1 to disable).

Default value: 0.0

Also available on the trained model.

columnsToCategorical

List of columns to convert to categorical before modelling

Scala default value: Array() ; Python default value: []

convertInvalidNumbersToNa

If set to ‘true’, the model converts invalid numbers to NA during making predictions.

Scala default value: false ; Python default value: False

Also available on the trained model.

convertUnknownCategoricalLevelsToNa

If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.

Scala default value: false ; Python default value: False

Also available on the trained model.

customMetricFunc

Reference to custom evaluation function, format: language:keyName=funcName.

Scala default value: null ; Python default value: None

Also available on the trained model.

dataFrameSerializer

A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.

Default value: "ai.h2o.sparkling.utils.JSONDataFrameSerializer"

Also available on the trained model.

detailedPredictionCol

Column containing additional prediction details, its content depends on the model type.

Default value: "detailed_prediction"

Also available on the trained model.

diagnostics

Enable diagnostics for hidden layers.

Scala default value: true ; Python default value: True

Also available on the trained model.

distribution

Distribution function. Possible values are "AUTO", "bernoulli", "quasibinomial", "modified_huber", "multinomial", "ordinal", "gaussian", "poisson", "gamma", "tweedie", "huber", "laplace", "quantile", "fractionalbinomial", "negativebinomial", "custom".

Default value: "AUTO"

Also available on the trained model.

elasticAveraging

Elastic averaging between compute nodes can improve distributed model convergence. #Experimental.

Scala default value: false ; Python default value: False

Also available on the trained model.

elasticAveragingMovingRate

Elastic averaging moving rate (only if elastic averaging is enabled).

Default value: 0.9

Also available on the trained model.

elasticAveragingRegularization

Elastic averaging regularization strength (only if elastic averaging is enabled).

Default value: 0.001

Also available on the trained model.

epochs

How many times the dataset should be iterated (streamed), can be fractional.

Default value: 10.0

Also available on the trained model.

epsilon

Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress).

Scala default value: 1.0e-8 ; Python default value: 1.0E-8

Also available on the trained model.

exportCheckpointsDir

Automatically export generated models to this directory.

Scala default value: null ; Python default value: None

Also available on the trained model.

exportWeightsAndBiases

Whether to export Neural Network weights and biases to H2O Frames.

Scala default value: false ; Python default value: False

Also available on the trained model.

fastMode

Enable fast mode (minor approximation in back-propagation).

Scala default value: true ; Python default value: True

Also available on the trained model.

featuresCols

Name of feature columns

Scala default value: Array() ; Python default value: []

Also available on the trained model.

foldAssignment

Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. Possible values are "AUTO", "Random", "Modulo", "Stratified".

Default value: "AUTO"

Also available on the trained model.

foldCol

Column with cross-validation fold index assignment per observation.

Scala default value: null ; Python default value: None

Also available on the trained model.

forceLoadBalance

Force extra load balancing to increase training speed for small datasets (to keep all cores busy).

Scala default value: true ; Python default value: True

Also available on the trained model.

gainsliftBins

Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning.

Default value: -1

Also available on the trained model.

hidden

Hidden layer sizes (e.g. [100, 100]).

Scala default value: Array(200, 200) ; Python default value: [200, 200]

Also available on the trained model.

hiddenDropoutRatios

Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.

Scala default value: null ; Python default value: None

Also available on the trained model.

huberAlpha

Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).

Default value: 0.9

Also available on the trained model.

ignoreConstCols

Ignore constant columns.

Scala default value: true ; Python default value: True

Also available on the trained model.

initialWeightDistribution

Initial weight distribution. Possible values are "UniformAdaptive", "Uniform", "Normal".

Default value: "UniformAdaptive"

Also available on the trained model.

initialWeightScale

Uniform: -value…value, Normal: stddev.

Default value: 1.0

Also available on the trained model.

inputDropoutRatio

Input layer dropout ratio (can improve generalization, try 0.1 or 0.2).

Default value: 0.0

Also available on the trained model.

keepBinaryModels

If set to true, all binary models created during execution of the fit method will be kept in DKV of H2O-3 cluster.

Scala default value: false ; Python default value: False

keepCrossValidationFoldAssignment

Whether to keep the cross-validation fold assignment.

Scala default value: false ; Python default value: False

Also available on the trained model.

keepCrossValidationModels

Whether to keep the cross-validation models.

Scala default value: true ; Python default value: True

Also available on the trained model.

keepCrossValidationPredictions

Whether to keep the predictions of the cross-validation models.

Scala default value: false ; Python default value: False

Also available on the trained model.

l1

L1 regularization (can add stability and improve generalization, causes many weights to become 0).

Default value: 0.0

Also available on the trained model.

l2

L2 regularization (can add stability and improve generalization, causes many weights to be small.

Default value: 0.0

Also available on the trained model.

labelCol

Response variable column.

Default value: "label"

Also available on the trained model.

loss

Loss function. Possible values are "Automatic", "Quadratic", "CrossEntropy", "ModifiedHuber", "Huber", "Absolute", "Quantile".

Default value: "Automatic"

Also available on the trained model.

maxAfterBalanceSize

Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.

Scala default value: 5.0f ; Python default value: 5.0

Also available on the trained model.

maxCategoricalFeatures

Max. number of categorical features, enforced via hashing. #Experimental.

Default value: 2147483647

Also available on the trained model.

maxRuntimeSecs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

Default value: 0.0

Also available on the trained model.

maxW2

Constraint for squared sum of incoming weights per unit (e.g. for Rectifier).

Scala default value: 3.402823e38f ; Python default value: 3.402823E38

Also available on the trained model.

miniBatchSize

Mini-batch size (smaller leads to better fit, larger can speed up and generalize better).

Default value: 1

Also available on the trained model.

missingValuesHandling

Handling of missing values. Either MeanImputation or Skip. Possible values are "MeanImputation", "Skip".

Default value: "MeanImputation"

Also available on the trained model.

modelId

Destination id for this model; auto-generated if not specified.

Scala default value: null ; Python default value: None

momentumRamp

Number of training samples for which momentum increases.

Default value: 1000000.0

Also available on the trained model.

momentumStable

Final momentum after the ramp is over (try 0.99).

Default value: 0.0

Also available on the trained model.

momentumStart

Initial momentum at the beginning of training (try 0.5).

Default value: 0.0

Also available on the trained model.

nesterovAcceleratedGradient

Use Nesterov accelerated gradient (recommended).

Scala default value: true ; Python default value: True

Also available on the trained model.

nfolds

Number of folds for K-fold cross-validation (0 to disable or >= 2).

Default value: 0

Also available on the trained model.

offsetCol

Offset column. This will be added to the combination of columns before applying the link function.

Scala default value: null ; Python default value: None

Also available on the trained model.

overwriteWithBestModel

If enabled, override the final model with the best model found during training.

Scala default value: true ; Python default value: True

Also available on the trained model.

predictionCol

Prediction column name

Default value: "prediction"

Also available on the trained model.

quantileAlpha

Desired quantile for Quantile regression, must be between 0 and 1.

Default value: 0.5

Also available on the trained model.

quietMode

Enable quiet mode for less output to standard output.

Scala default value: false ; Python default value: False

Also available on the trained model.

rate

Learning rate (higher => less stable, lower => slower convergence).

Default value: 0.005

Also available on the trained model.

rateAnnealing

Learning rate annealing: rate / (1 + rate_annealing * samples).

Scala default value: 1.0e-6 ; Python default value: 1.0E-6

Also available on the trained model.

rateDecay

Learning rate decay factor between layers (N-th layer: rate * rate_decay ^ (n - 1).

Default value: 1.0

Also available on the trained model.

regressionStop

Stopping criterion for regression error (MSE) on training data (-1 to disable).

Scala default value: 1.0e-6 ; Python default value: 1.0E-6

Also available on the trained model.

replicateTrainingData

Replicate the entire training dataset onto every node for faster training on small datasets.

Scala default value: true ; Python default value: True

Also available on the trained model.

reproducible

Force reproducibility on small data (will be slow - only uses 1 thread).

Scala default value: false ; Python default value: False

Also available on the trained model.

rho

Adaptive learning rate time decay factor (similarity to prior updates).

Default value: 0.99

Also available on the trained model.

scoreDutyCycle

Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring).

Default value: 0.1

Also available on the trained model.

scoreEachIteration

Whether to score during each iteration of model training.

Scala default value: false ; Python default value: False

Also available on the trained model.

scoreInterval

Shortest time interval (in seconds) between model scoring.

Default value: 5.0

Also available on the trained model.

scoreTrainingSamples

Number of training set samples for scoring (0 for all).

Scala default value: 10000L ; Python default value: 10000

Also available on the trained model.

scoreValidationSamples

Number of validation set samples for scoring (0 for all).

Scala default value: 0L ; Python default value: 0

Also available on the trained model.

scoreValidationSampling

Method used to sample validation dataset for scoring. Possible values are "Uniform", "Stratified".

Default value: "Uniform"

Also available on the trained model.

seed

Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded.

Scala default value: -1L ; Python default value: -1

Also available on the trained model.

shuffleTrainingData

Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows, of if using balance_classes).

Scala default value: false ; Python default value: False

Also available on the trained model.

singleNodeMode

Run on a single node for fine-tuning of model parameters.

Scala default value: false ; Python default value: False

Also available on the trained model.

sparse

Sparse data handling (more efficient for data with lots of 0 values).

Scala default value: false ; Python default value: False

Also available on the trained model.

sparsityBeta

Sparsity regularization. #Experimental.

Default value: 0.0

Also available on the trained model.

splitRatio

Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

Default value: 1.0

standardize

If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data.

Scala default value: true ; Python default value: True

Also available on the trained model.

stoppingMetric

Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "AUCPR", "lift_top_group", "misclassification", "mean_per_class_error", "anomaly_score", "AUUC", "ATE", "ATT", "ATC", "qini", "custom", "custom_increasing".

Default value: "AUTO"

Also available on the trained model.

stoppingRounds

Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).

Default value: 5

Also available on the trained model.

stoppingTolerance

Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).

Default value: 0.0

Also available on the trained model.

targetRatioCommToComp

Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning).

Default value: 0.05

Also available on the trained model.

trainSamplesPerIteration

Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic.

Scala default value: -2L ; Python default value: -2

Also available on the trained model.

tweediePower

Tweedie power for Tweedie regression, must be between 1 and 2.

Default value: 1.5

Also available on the trained model.

useAllFactorLevels

Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder.

Scala default value: true ; Python default value: True

Also available on the trained model.

validationDataFrame

A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable!

Scala default value: null ; Python default value: None

weightCol

Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.

Scala default value: null ; Python default value: None

Also available on the trained model.

withContributions

Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features.

Scala default value: false ; Python default value: False

Also available on the trained model.

withLeafNodeAssignments

Enables or disables computation of leaf node assignments.

Scala default value: false ; Python default value: False

Also available on the trained model.

withStageResults

Enables or disables computation of stage results.

Scala default value: false ; Python default value: False

Also available on the trained model.