Create a synthetic H2O Frame with random data. You can specify the number of rows/columns, as well as column types: integer, real, boolean, time, string, categorical. The frame may also have a dedicated “response” column, and some of the entries in the dataset may be created as missing.
POST /3/ModelMetrics/models/{model}/frames/{frame}
Return the scoring metrics for the specified Frame with the specified Model. If the Frame has already been scored with the Model then cached results will be returned; otherwise predictions for all rows in the Frame will be generated and the metrics will be returned.
POST /3/ModelMetrics/predictions_frame/{predictions_frame}/actuals_frame/{actuals_frame}
Create a ModelMetrics object from the predicted and actual values, and a domain for classification problems or a distribution family for regression problems.
Return the model in the MOJO format. This format can then be interpreted by gen_model.jar in order to perform prediction / scoring. Currently works for GBM and DRF algos only.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
ColSpecifierV3
column_name string
Name of the column
In/Out
is_member_of_frames string[]
List of fields which specify columns that must contain this column
In/Out
ColV3
label string
label
Out
missing_count long
missing
Out
zero_count long
zeros
Out
positive_infinity_count long
positive infinities
Out
negative_infinity_count long
negative infinities
Out
mins double[]
mins
Out
maxs double[]
maxs
Out
mean double
mean
Out
sigma double
sigma
Out
type string
datatype: {enum, string, int, real, time, uuid}
Out
domain string[]
domain; not-null for categorical columns only
Out
domain_cardinality int
cardinality of this column’s domain; not-null for categorical columns only
Out
data double[]
data
Out
string_data string[]
string data
Out
precision byte
decimal precision, -1 for all digits
Out
histogram_bins long[]
Histogram bins; null if not computed
Out
histogram_base double
Start of histogram bin zero
Out
histogram_stride double
Stride per bin
Out
percentiles double[]
Percentile values, matching the default percentiles
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrt{p} for classification and p/3 for regression (where p is the # of predictors
In
binomial_double_trees boolean
For binary classification: Build 2x as many trees (one per class) - can lead to higher accuracy.
In
ntrees int
Number of trees.
In
max_depth int
Maximum tree depth.
In
min_rows double
Fewest allowed (weighted) observations in a leaf (in R called ‘nodesize’).
In
nbins int
For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point
In
nbins_top_level int
For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level
In
nbins_cats int
For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting.
In
r2_stopping double
r2_stopping is no longer supported and will be ignored if set - please use stopping_rounds, stopping_metric and stopping_tolerance instead. Previous version of H2O would stop making trees when the R^2 metric equals or exceeds this
In
seed long
Seed for pseudo random number generator (if applicable)
In
build_tree_one_node boolean
Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.
In
sample_rate double
Row sample rate per tree (from 0.0 to 1.0)
In
sample_rate_per_class double[]
Row sample rate per tree per class (from 0.0 to 1.0)
In
col_sample_rate_per_tree double
Column sample rate per tree (from 0.0 to 1.0)
In
col_sample_rate_change_per_level double
Relative change of the column sampling rate for every level (from 0.0 to 2.0)
In
score_tree_interval int
Score the model after every so many trees. Disabled if set to 0.
In
min_split_improvement double
Minimum relative improvement in squared error reduction for a split to happen
In
histogram_type enum
What type of histogram to use for finding optimal split points
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
Maximum size (# classes) for confusion matrices to be printed in the Logs
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
Maximum size (# classes) for confusion matrices to be printed in the Logs.
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable).
In/Out
activation enum
Activation function.
In/Out
hidden int[]
Hidden layer sizes (e.g. [100, 100]).
In/Out
epochs double
How many times the dataset should be iterated (streamed), can be fractional.
In/Out
train_samples_per_iteration long
Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic.
In/Out
target_ratio_comm_to_comp double
Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning).
In/Out
seed long
Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded.
In/Out
adaptive_rate boolean
Adaptive learning rate.
In/Out
rho double
Adaptive learning rate time decay factor (similarity to prior updates).
In/Out
epsilon double
Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress).
In/Out
rate double
Learning rate (higher => less stable, lower => slower convergence).
A list of H2OFrame ids to initialize the bias vectors of this model with.
In/Out
loss enum
Loss function.
In/Out
score_interval double
Shortest time interval (in seconds) between model scoring.
In/Out
score_training_samples long
Number of training set samples for scoring (0 for all).
In/Out
score_validation_samples long
Number of validation set samples for scoring (0 for all).
In/Out
score_duty_cycle double
Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring).
In/Out
classification_stop double
Stopping criterion for classification error fraction on training data (-1 to disable).
In/Out
regression_stop double
Stopping criterion for regression error (MSE) on training data (-1 to disable).
In/Out
quiet_mode boolean
Enable quiet mode for less output to standard output.
In/Out
score_validation_sampling enum
Method used to sample validation dataset for scoring.
In/Out
overwrite_with_best_model boolean
If enabled, override the final model with the best model found during training.
In/Out
autoencoder boolean
Auto-Encoder.
In/Out
use_all_factor_levels boolean
Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder.
In/Out
standardize boolean
If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data.
In/Out
diagnostics boolean
Enable diagnostics for hidden layers.
In/Out
variable_importances boolean
Compute variable importances for input features (Gedeon method) - can be slow for large networks.
In/Out
fast_mode boolean
Enable fast mode (minor approximation in back-propagation).
In/Out
force_load_balance boolean
Force extra load balancing to increase training speed for small datasets (to keep all cores busy).
In/Out
replicate_training_data boolean
Replicate the entire training dataset onto every node for faster training on small datasets.
In/Out
single_node_mode boolean
Run on a single node for fine-tuning of model parameters.
In/Out
shuffle_training_data boolean
Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows, of if using balance_classes).
In/Out
missing_values_handling enum
Handling of missing values. Either Skip or MeanImputation.
In/Out
sparse boolean
Sparse data handling (more efficient for data with lots of 0 values).
In/Out
col_major boolean
id="deprecated-use-a-column-major-weight-matrix-for-input-layer-can-speed-up-forward-propagation-but-might-slow-down-backpropagation-">DEPRECATED Use a column major weight matrix for input layer. Can speed up forward propagation, but might slow down backpropagation.<
In/Out
average_activation double
Average activation for sparse auto-encoder. #Experimental
In/Out
sparsity_beta double
Sparsity regularization. #Experimental
In/Out
max_categorical_features int
Max. number of categorical features, enforced via hashing. #Experimental
In/Out
reproducible boolean
Force reproducibility on small data (will be slow - only uses 1 thread).
In/Out
export_weights_and_biases boolean
Whether to export Neural Network weights and biases to H2O Frames.
In/Out
mini_batch_size int
Mini-batch size (smaller leads to better fit, larger can speed up and generalize better).
In/Out
elastic_averaging boolean
Elastic averaging between compute nodes can improve distributed model convergence. #Experimental
In/Out
elastic_averaging_moving_rate double
Elastic averaging moving rate (only if elastic averaging is enabled).
In/Out
elastic_averaging_regularization double
Elastic averaging regularization strength (only if elastic averaging is enabled).
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
problem_type enum
Problem type, auto-detected by default. If set to image, the H2OFrame must contain a string column containing the path (URI or URL) to the images in the first column. If set to text, the H2OFrame must contain a string column containing the text in the first column. If set to dataset, Deep Water behaves just like any other H2O Model and builds a model on the provided H2OFrame (non-String columns).
In/Out
activation enum
Activation function. Only used if no user-defined network architecture file is provided, and only for problem_type=dataset.
In/Out
hidden int[]
Hidden layer sizes (e.g. [200, 200]). Only used if no user-defined network architecture file is provided, and only for problem_type=dataset.
In/Out
input_dropout_ratio double
Input layer dropout ratio (can improve generalization, try 0.1 or 0.2).
In/Out
hidden_dropout_ratios double[]
Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.
In/Out
max_confusion_matrix_size int
Maximum size (# classes) for confusion matrices to be printed in the Logs.
In/Out
sparse boolean
Sparse data handling (more efficient for data with lots of 0 values).
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable).
In/Out
epochs double
How many times the dataset should be iterated (streamed), can be fractional.
In/Out
train_samples_per_iteration long
Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic.
In/Out
target_ratio_comm_to_comp double
Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning).
In/Out
seed long
Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded.
In/Out
learning_rate double
Learning rate (higher => less stable, lower => slower convergence).
Initial momentum at the beginning of training (try 0.5).
In/Out
momentum_ramp double
Number of training samples for which momentum increases.
In/Out
momentum_stable double
Final momentum after the ramp is over (try 0.99).
In/Out
score_interval double
Shortest time interval (in seconds) between model scoring.
In/Out
score_training_samples long
Number of training set samples for scoring (0 for all).
In/Out
score_validation_samples long
Number of validation set samples for scoring (0 for all).
In/Out
score_duty_cycle double
Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring).
In/Out
quiet_mode boolean
Enable quiet mode for less output to standard output.
In/Out
overwrite_with_best_model boolean
If enabled, override the final model with the best model found during training.
In/Out
autoencoder boolean
Auto-Encoder.
In/Out
diagnostics boolean
Enable diagnostics for hidden layers.
In/Out
variable_importances boolean
Compute variable importances for input features (Gedeon method) - can be slow for large networks.
In/Out
replicate_training_data boolean
Replicate the entire training dataset onto every node for faster training on small datasets.
In/Out
single_node_mode boolean
Run on a single node for fine-tuning of model parameters.
In/Out
shuffle_training_data boolean
Enable global shuffling of training data.
In/Out
mini_batch_size int
Mini-batch size (smaller leads to better fit, larger can speed up and generalize better).
In/Out
clip_gradient double
Clip gradients once their absolute value is larger than this value.
In/Out
network enum
Network architecture.
In/Out
backend enum
Deep Learning Backend.
In/Out
image_shape int[]
Width and height of image.
In/Out
channels int
Number of (color) channels.
In/Out
gpu boolean
Whether to use a GPU (if available).
In/Out
device_id int[]
Device IDs (which GPUs to use).
In/Out
network_definition_file string
Path of file containing network definition (graph, architecture).
In/Out
network_parameters_file string
Path of file containing network (initial) parameters (weights, biases).
In/Out
mean_image_file string
Path of file containing the mean image data for data normalization.
In/Out
export_native_parameters_prefix string
Path (prefix) where to export the native model parameters after every iteration.
In/Out
standardize boolean
If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data.
In/Out
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Schema name for this field, if it is_schema, or the name of the enum, if it’s an enum.
In
name string
Field name in the Schema
Out
type string
Type for this field
Out
is_schema boolean
Type for this field is itself a Schema.
Out
value Polymorphic
Value for this field
Out
help string
A short help description to appear alongside the field in a UI
Out
label string
The label that should be displayed for the field if the name is insufficient
Out
required boolean
Is this field required, or is the default value generally sufficient?
Out
level enum
How important is this field? The web UI uses the level to do a slow reveal of the parameters
Out
direction enum
Is this field an input, output or inout?
Out
is_inherited boolean
Is the field inherited from the parent schema?
Out
inherited_from string
If this field is inherited from a class higher in the hierarchy which one?
Out
is_gridable boolean
Is the field gridable (i.e., it can be used in grid call)
Out
values string[]
For enum-type fields the allowed values are specified using the values annotation; this is used in UIs to tell the user the allowed values, and for validation
Out
json boolean
Should this field be rendered in the JSON representation?
Out
is_member_of_frames string[]
For Vec-type fields this is the set of other Vec-type fields which must contain mutually exclusive values; for example, for a SupervisedModel the response_column must be mutually exclusive with the weights_column
Out
is_mutually_exclusive_with string[]
For Vec-type fields this is the set of Frame-type fields which must contain the named column; for example, for a SupervisedModel the response_column must be in both the training_frame and (if it’s set) the validation_frame
Scale the learning rate by this factor after each tree (e.g., 0.99 or 0.999)
In
col_sample_rate double
Column sample rate (from 0.0 to 1.0)
In
max_abs_leafnode_pred double
Maximum absolute value of a leaf node prediction
In
pred_noise_bandwidth double
Bandwidth (sigma) of Gaussian multiplicative noise ~N(1,sigma) for tree node predictions
In
ntrees int
Number of trees.
In
max_depth int
Maximum tree depth.
In
min_rows double
Fewest allowed (weighted) observations in a leaf (in R called ‘nodesize’).
In
nbins int
For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point
In
nbins_top_level int
For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level
In
nbins_cats int
For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting.
In
r2_stopping double
r2_stopping is no longer supported and will be ignored if set - please use stopping_rounds, stopping_metric and stopping_tolerance instead. Previous version of H2O would stop making trees when the R^2 metric equals or exceeds this
In
seed long
Seed for pseudo random number generator (if applicable)
In
build_tree_one_node boolean
Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.
In
sample_rate double
Row sample rate per tree (from 0.0 to 1.0)
In
sample_rate_per_class double[]
Row sample rate per tree per class (from 0.0 to 1.0)
In
col_sample_rate_per_tree double
Column sample rate per tree (from 0.0 to 1.0)
In
col_sample_rate_change_per_level double
Relative change of the column sampling rate for every level (from 0.0 to 2.0)
In
score_tree_interval int
Score the model after every so many trees. Disabled if set to 0.
In
min_split_improvement double
Minimum relative improvement in squared error reduction for a split to happen
In
histogram_type enum
What type of histogram to use for finding optimal split points
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
Maximum size (# classes) for confusion matrices to be printed in the Logs
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Seed for pseudo random number generator (if applicable)
In
family enum
Family. Use binomial for classification with logistic regression, others are for regression problems.
In
tweedie_variance_power double
Tweedie variance power
In
tweedie_link_power double
Tweedie link power
In
solver enum
AUTO will set the solver based on given data and the other parameters. IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, L_BFGS scales better for datasets with many columns. Coordinate descent is experimental (beta).
In
alpha double[]
distribution of regularization between L1 and L2.
In
lambda double[]
regularization strength
In
lambda_search boolean
use lambda search starting at lambda max, given lambda is then interpreted as lambda min
In
early_stopping boolean
stop early when there is no more relative improvement on train or validation (if provided)
In
nlambdas int
Number of lambdas to be used in a search. Default indicates: If alpha is zero, with lambda search set to True, the value of nlamdas is set to 30 (fewer lambdas are needed for ridge regression) otherwise it is set to 100.
In
standardize boolean
Standardize numeric columns to have zero mean and unit variance
In
non_negative boolean
Restrict coefficients (not intercept) to be non-negative
In
max_iterations int
Maximum number of iterations
In
beta_epsilon double
converge if beta changes less (using L-infinity norm) than beta esilon, ONLY applies to IRLSM solver
In
objective_epsilon double
Converge if objective value changes less than this. Default indicates: If lambda_search is set to True the value of objective_epsilon is set to .0001. If the lambda_search is set to False and lambda is equal to zero, the value of objective_epsilon is set to .000001, for any other value of lambda the default value of objective_epsilon is set to .0001.
In
gradient_epsilon double
Converge if objective changes less (using L-infinity norm) than this, ONLY applies to L-BFGS solver. Default indicates: If lambda_search is set to False and lambda is equal to zero, the default value of gradient_epsilon is equal to .000001, otherwise the default value is .0001. If lambda_search is set to True, the conditional values above are 1E-8 and 1E-6 respectively.
In
obj_reg double
likelihood divider in objective value computation, default is 1/nobs
In
link enum
(No description available)
In
intercept boolean
include constant term in the model
In
prior double
prior probability for y==1. To be used only for logistic regression iff the data has been sampled and the mean of response does not reflect reality.
In
lambda_min_ratio double
Min lambda used in lambda search, specified as a ratio of lambda_max. Default indicates: if the number of observations is greater than the number of variables then lambda_min_ratio is set to 0.0001; if the number of observations is less than the number of variables then lambda_min_ratio is set to 0.01.
Maximum number of active predictors during computation. Use as a stopping criterion to prevent expensive model building with many predictors. Default indicates: If the IRLSM solver is used, the value of max_active_predictors is set to 7000 otherwise it is set to 100000000.
In
interactions string[]
A list of predictor column indices to interact. All pairwise combinations will be computed for the list.
In
compute_p_values boolean
request p-values computation, p-values work only with IRLSM solver and no regularization
In
remove_collinear_columns boolean
in case of linearly dependent columns remove some of the dependent columns
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
missing_values_handling enum
Handling of missing values. Either Skip or MeanImputation.
In/Out
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
Maximum size (# classes) for confusion matrices to be printed in the Logs
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Milliseconds since the epoch for the time that this H2OError instance was created. Generally this is a short time since the underlying error ocurred.
Out
error_url string
Error url
Out
msg string
Message intended for the end user (a data scientist).
Out
dev_msg string
Potentially more detailed message intended for a developer (e.g. a front end engineer or someone designing a language binding).
Out
http_status int
HTTP status code for this error.
Out
values Map
Any values that are relevant to reporting or handling this error. Examples are a key name if the error is on a key, or a field name and object name if it’s on a specific field.
Milliseconds since the epoch for the time that this H2OError instance was created. Generally this is a short time since the underlying error ocurred.
Out
error_url string
Error url
Out
msg string
Message intended for the end user (a data scientist).
Out
dev_msg string
Potentially more detailed message intended for a developer (e.g. a front end engineer or someone designing a language binding).
Out
http_status int
HTTP status code for this error.
Out
values Map
Any values that are relevant to reporting or handling this error. Examples are a key name if the error is on a key, or a field name and object name if it’s on a specific field.
Out
exception_type string
Exception type, if any.
Out
exception_msg string
Raw exception message, if any.
Out
stacktrace string[]
Stacktrace, if any.
Out
HeartBeatEvent
sends int
number of sent heartbeats
In
recvs int
number of received heartbeats
In
date string
Time when the event was recorded. Format is hh:mm:ss:ms
In
nanos long
Time in nanos
In
type enum
type of recorded event
In
HyperSpaceSearchCriteriaV99
strategy enum
Hyperparameter space search strategy.
In/Out
IOEvent
io_flavor string
flavor of the recorded io (ice/hdfs/…)
In
node string
node where this io event happened
In
data string
data info
In
date string
Time when the event was recorded. Format is hh:mm:ss:ms
In
nanos long
Time in nanos
In
type enum
type of recorded event
In
ImportFilesV3
path string
path
In
_exclude_fields string
Comma-separated list of JSON field paths to exclude from the result, used like: “/3/Frames?_exclude_fields=frames/frame_id/URL,__meta”
In
files string[]
files
Out
destination_frames string[]
names
Out
fails string[]
fails
Out
dels string[]
dels
Out
ImportSQLTableV99
connection_url string
connection_url
In
table string
table
In
select_query string
select_query
In
username string
username
In
password string
password
In
columns string
columns
In
optimize boolean
optimize
In
_exclude_fields string
Comma-separated list of JSON field paths to exclude from the result, used like: “/3/Frames?_exclude_fields=frames/frame_id/URL,__meta”
In
InitIDV3
_exclude_fields string
Comma-separated list of JSON field paths to exclude from the result, used like: “/3/Frames?_exclude_fields=frames/frame_id/URL,__meta”
In
session_key string
Session ID
In/Out
InputSchemaV4
_fields string
Filter on the set of output fields: if you set _fields=”foo,bar,baz”, then only those fields will be included in the output; or you can specify _fields=”-goo,gee” to include all fields except goo and gee. If the result contains nested data structures, then you can refer to the fields within those structures as well. For example if you specify _fields=”foo(oof),bar(-rab)”, then only fields foo and bar will be included, and within foo there will be only field oof, whereas within bar all fields except rab will be reported.
In
InteractionV3
_exclude_fields string
Comma-separated list of JSON field paths to exclude from the result, used like: “/3/Frames?_exclude_fields=frames/frame_id/URL,__meta”
Whether to create pairwise quadratic interactions between factors (otherwise create one higher-order interaction). Only applicable if there are 3 or more factors.
In/Out
max_factors int
Max. number of factor levels in pair-wise interaction terms (if enforced, one extra catch-all factor will be made)
In/Out
min_occurrence int
Min. occurrence threshold for factor levels in pair-wise interaction terms
In/Out
IoStatsEntry
backend string
Back end type
Out
store_count long
Number of store events
Out
store_bytes long
Cumulative stored bytes
Out
delete_count long
Number of delete events
Out
load_count long
Number of load events
Out
load_bytes long
Cumulative loaded bytes
Out
JStackV3
_exclude_fields string
Comma-separated list of JSON field paths to exclude from the result, used like: “/3/Frames?_exclude_fields=frames/frame_id/URL,__meta”
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
For Vec-type fields this is the set of other Vec-type fields which must contain mutually exclusive values; for example, for a SupervisedModel the response_column must be mutually exclusive with the weights_column
In
is_mutually_exclusive_with string[]
For Vec-type fields this is the set of Frame-type fields which must contain the named column; for example, for a SupervisedModel the response_column must be in both the training_frame and (if it’s set) the validation_frame
In
name string
name in the JSON, e.g. “lambda”
Out
label string
[DEPRECATED] same as name.
Out
help string
help for the UI, e.g. “regularization multiplier, typically used for foo bar baz etc.”
Out
required boolean
the field is required
Out
type string
Java type, e.g. “double”
Out
default_value Polymorphic
default value, e.g. 1
Out
actual_value Polymorphic
actual value as set by the user and / or modified by the ModelBuilder, e.g., 10
Out
level string
the importance of the parameter, used by the UI, e.g. “critical”, “extended” or “expert”
Out
values string[]
list of valid values for use by the front-end
Out
gridable boolean
Parameter can be used in grid call
Out
ModelParametersSchemaV3
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Min. standard deviation to use for observations with not enough data
In
eps_sdev double
Cutoff below which standard deviation is replaced with min_sdev
In
min_prob double
Min. probability to use for observations with not enough data
In
eps_prob double
Cutoff below which probability is replaced with min_prob
In
compute_metrics boolean
Compute metrics on training data
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
Maximum size (# classes) for confusion matrices to be printed in the Logs
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)
In/Out
seed long
Seed for pseudo random number generator (only used for cross-validation and fold_assignment=”Random” or “AUTO”)
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Seed for random number generator; set to a value other than -1 for reproducibility.
In/Out
max_models int
Maximum number of models to build (optional).
In/Out
max_runtime_secs double
Maximum time to spend building models (optional).
In/Out
stopping_rounds int
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
strategy enum
Hyperparameter space search strategy.
In/Out
RapidsExpressionV3
name string
(Class) name of the language construct
In
is_abstract boolean
If true, then this is not a standalone construct but purely a grouping level.
In
pattern string
Code fragment pattern.
In
description string
Description of the functionality provided by this language construct.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Fewest allowed (weighted) observations in a leaf (in R called ‘nodesize’).
In
nbins int
For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point
In
nbins_top_level int
For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level
In
nbins_cats int
For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting.
In
r2_stopping double
r2_stopping is no longer supported and will be ignored if set - please use stopping_rounds, stopping_metric and stopping_tolerance instead. Previous version of H2O would stop making trees when the R^2 metric equals or exceeds this
In
seed long
Seed for pseudo random number generator (if applicable)
In
build_tree_one_node boolean
Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.
In
sample_rate double
Row sample rate per tree (from 0.0 to 1.0)
In
sample_rate_per_class double[]
Row sample rate per tree per class (from 0.0 to 1.0)
In
col_sample_rate_per_tree double
Column sample rate per tree (from 0.0 to 1.0)
In
col_sample_rate_change_per_level double
Relative change of the column sampling rate for every level (from 0.0 to 2.0)
In
score_tree_interval int
Score the model after every so many trees. Disabled if set to 0.
In
min_split_improvement double
Minimum relative improvement in squared error reduction for a split to happen
In
histogram_type enum
What type of histogram to use for finding optimal split points
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
Maximum size (# classes) for confusion matrices to be printed in the Logs
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
Set threshold for occurrence of words. Those that appear with higher frequency in the training data
will be randomly down-sampled; useful range is (0, 1e-5)
In
normModel enum
Use Hierarchical Softmax or Negative Sampling
In
negSampleCnt int
Number of negative examples, common values are 3 - 10 (0 = not used)
In
epochs int
Number of training iterations to run
In
minWordFreq int
This will discard words that appear less than times
In
initLearningRate float
Set the starting learning rate
In
wordModel enum
Use the continuous bag of words model or the Skip-Gram model
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)