Builds a feed-forward multilayer artificial neural network on an H2OFrame.

h2o.deeplearning(
  x,
  y,
  training_frame,
  model_id = NULL,
  validation_frame = NULL,
  nfolds = 0,
  keep_cross_validation_models = TRUE,
  keep_cross_validation_predictions = FALSE,
  keep_cross_validation_fold_assignment = FALSE,
  fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"),
  fold_column = NULL,
  ignore_const_cols = TRUE,
  score_each_iteration = FALSE,
  weights_column = NULL,
  offset_column = NULL,
  balance_classes = FALSE,
  class_sampling_factors = NULL,
  max_after_balance_size = 5,
  max_hit_ratio_k = 0,
  checkpoint = NULL,
  pretrained_autoencoder = NULL,
  overwrite_with_best_model = TRUE,
  use_all_factor_levels = TRUE,
  standardize = TRUE,
  activation = c("Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout",
    "Maxout", "MaxoutWithDropout"),
  hidden = c(200, 200),
  epochs = 10,
  train_samples_per_iteration = -2,
  target_ratio_comm_to_comp = 0.05,
  seed = -1,
  adaptive_rate = TRUE,
  rho = 0.99,
  epsilon = 1e-08,
  rate = 0.005,
  rate_annealing = 1e-06,
  rate_decay = 1,
  momentum_start = 0,
  momentum_ramp = 1e+06,
  momentum_stable = 0,
  nesterov_accelerated_gradient = TRUE,
  input_dropout_ratio = 0,
  hidden_dropout_ratios = NULL,
  l1 = 0,
  l2 = 0,
  max_w2 = 3.4028235e+38,
  initial_weight_distribution = c("UniformAdaptive", "Uniform", "Normal"),
  initial_weight_scale = 1,
  initial_weights = NULL,
  initial_biases = NULL,
  loss = c("Automatic", "CrossEntropy", "Quadratic", "Huber", "Absolute", "Quantile"),
  distribution = c("AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma",
    "tweedie", "laplace", "quantile", "huber"),
  quantile_alpha = 0.5,
  tweedie_power = 1.5,
  huber_alpha = 0.9,
  score_interval = 5,
  score_training_samples = 10000,
  score_validation_samples = 0,
  score_duty_cycle = 0.1,
  classification_stop = 0,
  regression_stop = 1e-06,
  stopping_rounds = 5,
  stopping_metric = c("AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE",
    "AUC", "AUCPR", "lift_top_group", "misclassification", "mean_per_class_error",
    "custom", "custom_increasing"),
  stopping_tolerance = 0,
  max_runtime_secs = 0,
  score_validation_sampling = c("Uniform", "Stratified"),
  diagnostics = TRUE,
  fast_mode = TRUE,
  force_load_balance = TRUE,
  variable_importances = TRUE,
  replicate_training_data = TRUE,
  single_node_mode = FALSE,
  shuffle_training_data = FALSE,
  missing_values_handling = c("MeanImputation", "Skip"),
  quiet_mode = FALSE,
  autoencoder = FALSE,
  sparse = FALSE,
  col_major = FALSE,
  average_activation = 0,
  sparsity_beta = 0,
  max_categorical_features = 2147483647,
  reproducible = FALSE,
  export_weights_and_biases = FALSE,
  mini_batch_size = 1,
  categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary",
    "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"),
  elastic_averaging = FALSE,
  elastic_averaging_moving_rate = 0.9,
  elastic_averaging_regularization = 0.001,
  export_checkpoints_dir = NULL,
  verbose = FALSE
)

Arguments

x

(Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used.

y

The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model.

training_frame

Id of the training data frame.

model_id

Destination id for this model; auto-generated if not specified.

validation_frame

Id of the validation data frame.

nfolds

Number of folds for K-fold cross-validation (0 to disable or >= 2). Defaults to 0.

keep_cross_validation_models

Logical. Whether to keep the cross-validation models. Defaults to TRUE.

keep_cross_validation_predictions

Logical. Whether to keep the predictions of the cross-validation models. Defaults to FALSE.

keep_cross_validation_fold_assignment

Logical. Whether to keep the cross-validation fold assignment. Defaults to FALSE.

fold_assignment

Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". Defaults to AUTO.

fold_column

Column with cross-validation fold index assignment per observation.

ignore_const_cols

Logical. Ignore constant columns. Defaults to TRUE.

score_each_iteration

Logical. Whether to score during each iteration of model training. Defaults to FALSE.

weights_column

Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.

offset_column

Offset column. This will be added to the combination of columns before applying the link function.

balance_classes

Logical. Balance training data class counts via over/under-sampling (for imbalanced data). Defaults to FALSE.

class_sampling_factors

Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.

max_after_balance_size

Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. Defaults to 5.0.

max_hit_ratio_k

Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable). Defaults to 0.

checkpoint

Model checkpoint to resume training with.

pretrained_autoencoder

Pretrained autoencoder model to initialize this model with.

overwrite_with_best_model

Logical. If enabled, override the final model with the best model found during training. Defaults to TRUE.

use_all_factor_levels

Logical. Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder. Defaults to TRUE.

standardize

Logical. If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data. Defaults to TRUE.

activation

Activation function. Must be one of: "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout". Defaults to Rectifier.

hidden

Hidden layer sizes (e.g. [100, 100]). Defaults to c(200, 200).

epochs

How many times the dataset should be iterated (streamed), can be fractional. Defaults to 10.

train_samples_per_iteration

Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic. Defaults to -2.

target_ratio_comm_to_comp

Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning). Defaults to 0.05.

seed

Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Note: only reproducible when running single threaded. Defaults to -1 (time-based random number).

adaptive_rate

Logical. Adaptive learning rate. Defaults to TRUE.

rho

Adaptive learning rate time decay factor (similarity to prior updates). Defaults to 0.99.

epsilon

Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress). Defaults to 1e-08.

rate

Learning rate (higher => less stable, lower => slower convergence). Defaults to 0.005.

rate_annealing

Learning rate annealing: rate / (1 + rate_annealing * samples). Defaults to 1e-06.

rate_decay

Learning rate decay factor between layers (N-th layer: rate * rate_decay ^ (n - 1). Defaults to 1.

momentum_start

Initial momentum at the beginning of training (try 0.5). Defaults to 0.

momentum_ramp

Number of training samples for which momentum increases. Defaults to 1000000.

momentum_stable

Final momentum after the ramp is over (try 0.99). Defaults to 0.

nesterov_accelerated_gradient

Logical. Use Nesterov accelerated gradient (recommended). Defaults to TRUE.

input_dropout_ratio

Input layer dropout ratio (can improve generalization, try 0.1 or 0.2). Defaults to 0.

hidden_dropout_ratios

Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.

l1

L1 regularization (can add stability and improve generalization, causes many weights to become 0). Defaults to 0.

l2

L2 regularization (can add stability and improve generalization, causes many weights to be small. Defaults to 0.

max_w2

Constraint for squared sum of incoming weights per unit (e.g. for Rectifier). Defaults to 3.4028235e+38.

initial_weight_distribution

Initial weight distribution. Must be one of: "UniformAdaptive", "Uniform", "Normal". Defaults to UniformAdaptive.

initial_weight_scale

Uniform: -value...value, Normal: stddev. Defaults to 1.

initial_weights

A list of H2OFrame ids to initialize the weight matrices of this model with.

initial_biases

A list of H2OFrame ids to initialize the bias vectors of this model with.

loss

Loss function. Must be one of: "Automatic", "CrossEntropy", "Quadratic", "Huber", "Absolute", "Quantile". Defaults to Automatic.

distribution

Distribution function Must be one of: "AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber". Defaults to AUTO.

quantile_alpha

Desired quantile for Quantile regression, must be between 0 and 1. Defaults to 0.5.

tweedie_power

Tweedie power for Tweedie regression, must be between 1 and 2. Defaults to 1.5.

huber_alpha

Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1). Defaults to 0.9.

score_interval

Shortest time interval (in seconds) between model scoring. Defaults to 5.

score_training_samples

Number of training set samples for scoring (0 for all). Defaults to 10000.

score_validation_samples

Number of validation set samples for scoring (0 for all). Defaults to 0.

score_duty_cycle

Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring). Defaults to 0.1.

classification_stop

Stopping criterion for classification error fraction on training data (-1 to disable). Defaults to 0.

regression_stop

Stopping criterion for regression error (MSE) on training data (-1 to disable). Defaults to 1e-06.

stopping_rounds

Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 5.

stopping_metric

Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anonomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Must be one of: "AUTO", "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "AUCPR", "lift_top_group", "misclassification", "mean_per_class_error", "custom", "custom_increasing". Defaults to AUTO.

stopping_tolerance

Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) Defaults to 0.

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0.

score_validation_sampling

Method used to sample validation dataset for scoring. Must be one of: "Uniform", "Stratified". Defaults to Uniform.

diagnostics

Logical. Enable diagnostics for hidden layers. Defaults to TRUE.

fast_mode

Logical. Enable fast mode (minor approximation in back-propagation). Defaults to TRUE.

force_load_balance

Logical. Force extra load balancing to increase training speed for small datasets (to keep all cores busy). Defaults to TRUE.

variable_importances

Logical. Compute variable importances for input features (Gedeon method) - can be slow for large networks. Defaults to TRUE.

replicate_training_data

Logical. Replicate the entire training dataset onto every node for faster training on small datasets. Defaults to TRUE.

single_node_mode

Logical. Run on a single node for fine-tuning of model parameters. Defaults to FALSE.

shuffle_training_data

Logical. Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows, of if using balance_classes). Defaults to FALSE.

missing_values_handling

Handling of missing values. Either MeanImputation or Skip. Must be one of: "MeanImputation", "Skip". Defaults to MeanImputation.

quiet_mode

Logical. Enable quiet mode for less output to standard output. Defaults to FALSE.

autoencoder

Logical. Auto-Encoder. Defaults to FALSE.

sparse

Logical. Sparse data handling (more efficient for data with lots of 0 values). Defaults to FALSE.

col_major

Logical. #DEPRECATED Use a column major weight matrix for input layer. Can speed up forward propagation, but might slow down backpropagation. Defaults to FALSE.

average_activation

Average activation for sparse auto-encoder. #Experimental Defaults to 0.

sparsity_beta

Sparsity regularization. #Experimental Defaults to 0.

max_categorical_features

Max. number of categorical features, enforced via hashing. #Experimental Defaults to 2147483647.

reproducible

Logical. Force reproducibility on small data (will be slow - only uses 1 thread). Defaults to FALSE.

export_weights_and_biases

Logical. Whether to export Neural Network weights and biases to H2O Frames. Defaults to FALSE.

mini_batch_size

Mini-batch size (smaller leads to better fit, larger can speed up and generalize better). Defaults to 1.

categorical_encoding

Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO.

elastic_averaging

Logical. Elastic averaging between compute nodes can improve distributed model convergence. #Experimental Defaults to FALSE.

elastic_averaging_moving_rate

Elastic averaging moving rate (only if elastic averaging is enabled). Defaults to 0.9.

elastic_averaging_regularization

Elastic averaging regularization strength (only if elastic averaging is enabled). Defaults to 0.001.

export_checkpoints_dir

Automatically export generated models to this directory.

verbose

Logical. Print scoring history to the console (Metrics per epoch). Defaults to FALSE.

See also

predict.H2OModel for prediction

Examples

# NOT RUN {
library(h2o)
h2o.init()
iris_hf <- as.h2o(iris)
iris_dl <- h2o.deeplearning(x = 1:4, y = 5, training_frame = iris_hf, seed=123456)

# now make a prediction
predictions <- h2o.predict(iris_dl, iris_hf)
# }