Create a synthetic H2O Frame with random data. You can specify the number of rows/columns, as well as column types: integer, real, boolean, time, string, categorical. The frame may also have a dedicated "response" column, and some of the entries in the dataset may be created as missing.
POST /3/ModelMetrics/models/{model}/frames/{frame}
Return the scoring metrics for the specified Frame with the specified Model. If the Frame has already been scored with the Model then cached results will be returned; otherwise predictions for all rows in the Frame will be generated and the metrics will be returned.
POST /3/ModelMetrics/predictions_frame/{predictions_frame}/actuals_frame/{actuals_frame}
Create a ModelMetrics object from the predicted and actual values, and a domain for classification problems or a distribution family for regression problems.
Return the model in the MOJO format. This format can then be interpreted by gen_model.jar in order to perform prediction / scoring. Currently works for GBM and DRF algos only.
Create frame with random (uniformly distributed) data. You can specify how many columns of each type to make; and what the desired range for each column type.
Method for computing PCA (Caution: GLRM is currently experimental and unstable)
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
k int
Rank of matrix approximation
In/Out
max_iterations int
Maximum number of iterations for PCA
In/Out
target_num_exemplars int
Targeted number of exemplars
In/Out
rel_tol_num_exemplars double
Relative tolerance for number of exemplars (e.g, 0.5 is +/- 50 percents)
In/Out
seed long
RNG seed for initialization
In/Out
use_all_factor_levels boolean
Whether first factor level is included in each categorical expansion
In/Out
save_mapping_frame boolean
Whether to export the mapping of the aggregated frame
In/Out
num_iteration_without_new_exemplar int
The number of iterations to run before aggregator exits if the number of exemplars collected didn't change
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Model performance based stopping criteria for the AutoML run.
In
nfolds int
Number of folds for k-fold cross-validation (defaults to 5, must be >=2 or use 0 to disable). Disabling prevents Stacked Ensembles from being built.
In
keep_cross_validation_predictions boolean
Whether to keep the predictions of the cross-validation predictions. This needs to be set to TRUE if running the same AutoML object for repeated runs because CV predictions are required to build additional Stacked Ensemble models in AutoML.
In
keep_cross_validation_models boolean
Whether to keep the cross-validated models. Keeping cross-validation models may consume significantly more memory in the H2O cluster.
In
keep_cross_validation_fold_assignment boolean
Whether to keep cross-validation assignments.
In
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (defaults to 5.0 and can be less than 1.0). Requires balance_classes.
In/Out
export_checkpoints_dir string
Path to a directory where every generated model will be stored.
In/Out
AutoMLBuildModelsV99
exclude_algos enum[]
A list algorithms to skip during the model-building phase.
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
k int
The max. number of clusters. If estimate_k is disabled, the model will find k centroids, otherwise it will find up to k centroids.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
In/Out
ColSpecifierV3
column_name string
Name of the column
In/Out
is_member_of_frames string[]
List of fields which specify columns that must contain this column
In/Out
ColV3
label string
label
Out
missing_count long
missing
Out
zero_count long
zeros
Out
positive_infinity_count long
positive infinities
Out
negative_infinity_count long
negative infinities
Out
mins double[]
mins
Out
maxs double[]
maxs
Out
mean double
mean
Out
sigma double
sigma
Out
type string
datatype: {enum, string, int, real, time, uuid}
Out
domain string[]
domain; not-null for categorical columns only
Out
domain_cardinality int
cardinality of this column's domain; not-null for categorical columns only
Out
data double[]
data
Out
string_data string[]
string data
Out
precision byte
decimal precision, -1 for all digits
Out
histogram_bins long[]
Histogram bins; null if not computed
Out
histogram_base double
Start of histogram bin zero
Out
histogram_stride double
Stride per bin
Out
percentiles double[]
Percentile values, matching the default percentiles
A list of pairwise (first order) column interactions.
In
use_all_factor_levels boolean
(Internal. For development only!) Indicates whether to use all factor levels.
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Number of data columns (in addition to the first response column)
In
seed long
Random number seed that determines the random values
In
randomize boolean
Whether frame should be randomized
In
value long
Constant value (for randomize=false)
In
real_range double
Range for real variables (-range ... range)
In
categorical_fraction double
Fraction of categorical columns (for randomize=true)
In
factors int
Factor levels for categorical variables
In
integer_fraction double
Fraction of integer columns (for randomize=true)
In
integer_range int
Range for integer variables (-range ... range)
In
binary_fraction double
Fraction of binary columns (for randomize=true)
In
binary_ones_fraction double
Fraction of 1's in binary columns
In
time_fraction double
Fraction of date/time columns (for randomize=true)
In
string_fraction double
Fraction of string columns (for randomize=true)
In
missing_fraction double
Fraction of missing values
In
has_response boolean
Whether an additional response column should be generated
In
response_factors int
Number of factor levels of the first column (1=real, 2=binomial, N=multinomial)
In
positive_response boolean
For real-valued response variable: Whether the response should be positive only.
In
_fields string
Filter on the set of output fields: if you set _fields="foo,bar,baz", then only those fields will be included in the output; or you can specify _fields="-goo,gee" to include all fields except goo and gee. If the result contains nested data structures, then you can refer to the fields within those structures as well. For example if you specify _fields="foo(oof),bar(-rab)", then only fields foo and bar will be included, and within foo there will be only field oof, whereas within bar all fields except rab will be reported.
Random number seed that determines the random values.
In
nrows int
Number of rows.
In
ncols_real int
Number of real-valued columns. Values in these columns will be uniformly distributed between real_lb and real_ub.
In
ncols_int int
Number of integer columns.
In
ncols_enum int
Number of enum (categorical) columns.
In
ncols_bool int
Number of boolean (binary) columns.
In
ncols_str int
Number of string columns.
In
ncols_time int
Number of time columns.
In
real_lb double
Lower bound for the range of the real-valued columns.
In
real_ub double
Upper bound for the range of the real-valued columns.
In
int_lb int
Lower bound for the range of integer columns.
In
int_ub int
Upper bound for the range of integer columns.
In
enum_nlevels int
Number of levels (categories) for the enum columns.
In
bool_p double
Fraction of ones in each boolean (binary) column.
In
time_lb long
Lower bound for the range of time columns (in ms since the epoch).
In
time_ub long
Upper bound for the range of time columns (in ms since the epoch).
In
str_length int
Length of generated strings in string columns.
In
missing_fraction double
Fraction of missing values.
In
response_type enum
Type of the response column to add.
In
response_lb double
Lower bound for the response variable (real/int/time types).
In
response_ub double
Upper bound for the response variable (real/int/time types).
In
response_p double
Frequency of 1s for the bool (binary) response column.
In
response_nlevels int
Number of categorical levels for the enum response column.
In
_fields string
Filter on the set of output fields: if you set _fields="foo,bar,baz", then only those fields will be included in the output; or you can specify _fields="-goo,gee" to include all fields except goo and gee. If the result contains nested data structures, then you can refer to the fields within those structures as well. For example if you specify _fields="foo(oof),bar(-rab)", then only fields foo and bar will be included, and within foo there will be only field oof, whereas within bar all fields except rab will be reported.
In
CreateFrameV3
_exclude_fields string
Comma-separated list of JSON field paths to exclude from the result, used like: "/3/Frames?_exclude_fields=frames/frame_id/URL,__meta"
Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrt{p} for classification and p/3 for regression (where p is the # of predictors
In
binomial_double_trees boolean
For binary classification: Build 2x as many trees (one per class) - can lead to higher accuracy.
In
sample_rate double
Row sample rate per tree (from 0.0 to 1.0)
In
ntrees int
Number of trees.
In
max_depth int
Maximum tree depth.
In
min_rows double
Fewest allowed (weighted) observations in a leaf.
In
nbins int
For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point
In
nbins_top_level int
For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level
In
nbins_cats int
For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting.
In
r2_stopping double
r2_stopping is no longer supported and will be ignored if set - please use stopping_rounds, stopping_metric and stopping_tolerance instead. Previous version of H2O would stop making trees when the R^2 metric equals or exceeds this
In
seed long
Seed for pseudo random number generator (if applicable)
In
build_tree_one_node boolean
Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.
In
sample_rate_per_class double[]
A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree
In
col_sample_rate_per_tree double
Column sample rate per tree (from 0.0 to 1.0)
In
col_sample_rate_change_per_level double
Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0)
In
score_tree_interval int
Score the model after every so many trees. Disabled if set to 0.
In
min_split_improvement double
Minimum relative improvement in squared error reduction for a split to happen
In
histogram_type enum
What type of histogram to use for finding optimal split points
In
calibrate_model boolean
Use Platt Scaling to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities.
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
[Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)
Check if response column is constant. If enabled, then an exception is thrown if the response column is a constant value.If disabled, then model will train regardless of the response column being a constant value or not.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
[Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs.
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable).
In/Out
activation enum
Activation function.
In/Out
hidden int[]
Hidden layer sizes (e.g. [100, 100]).
In/Out
epochs double
How many times the dataset should be iterated (streamed), can be fractional.
In/Out
train_samples_per_iteration long
Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic.
In/Out
target_ratio_comm_to_comp double
Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning).
In/Out
seed long
Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded.
In/Out
adaptive_rate boolean
Adaptive learning rate.
In/Out
rho double
Adaptive learning rate time decay factor (similarity to prior updates).
In/Out
epsilon double
Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress).
In/Out
rate double
Learning rate (higher => less stable, lower => slower convergence).
A list of H2OFrame ids to initialize the bias vectors of this model with.
In/Out
loss enum
Loss function.
In/Out
score_interval double
Shortest time interval (in seconds) between model scoring.
In/Out
score_training_samples long
Number of training set samples for scoring (0 for all).
In/Out
score_validation_samples long
Number of validation set samples for scoring (0 for all).
In/Out
score_duty_cycle double
Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring).
In/Out
classification_stop double
Stopping criterion for classification error fraction on training data (-1 to disable).
In/Out
regression_stop double
Stopping criterion for regression error (MSE) on training data (-1 to disable).
In/Out
quiet_mode boolean
Enable quiet mode for less output to standard output.
In/Out
score_validation_sampling enum
Method used to sample validation dataset for scoring.
In/Out
overwrite_with_best_model boolean
If enabled, override the final model with the best model found during training.
In/Out
autoencoder boolean
Auto-Encoder.
In/Out
use_all_factor_levels boolean
Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder.
In/Out
standardize boolean
If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data.
In/Out
diagnostics boolean
Enable diagnostics for hidden layers.
In/Out
variable_importances boolean
Compute variable importances for input features (Gedeon method) - can be slow for large networks.
In/Out
fast_mode boolean
Enable fast mode (minor approximation in back-propagation).
In/Out
force_load_balance boolean
Force extra load balancing to increase training speed for small datasets (to keep all cores busy).
In/Out
replicate_training_data boolean
Replicate the entire training dataset onto every node for faster training on small datasets.
In/Out
single_node_mode boolean
Run on a single node for fine-tuning of model parameters.
In/Out
shuffle_training_data boolean
Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows, of if using balance_classes).
In/Out
missing_values_handling enum
Handling of missing values. Either MeanImputation or Skip.
In/Out
sparse boolean
Sparse data handling (more efficient for data with lots of 0 values).
In/Out
col_major boolean
#DEPRECATED Use a column major weight matrix for input layer. Can speed up forward propagation, but might slow down backpropagation.
In/Out
average_activation double
Average activation for sparse auto-encoder. #Experimental
In/Out
sparsity_beta double
Sparsity regularization. #Experimental
In/Out
max_categorical_features int
Max. number of categorical features, enforced via hashing. #Experimental
In/Out
reproducible boolean
Force reproducibility on small data (will be slow - only uses 1 thread).
In/Out
export_weights_and_biases boolean
Whether to export Neural Network weights and biases to H2O Frames.
In/Out
mini_batch_size int
Mini-batch size (smaller leads to better fit, larger can speed up and generalize better).
In/Out
elastic_averaging boolean
Elastic averaging between compute nodes can improve distributed model convergence. #Experimental
In/Out
elastic_averaging_moving_rate double
Elastic averaging moving rate (only if elastic averaging is enabled).
In/Out
elastic_averaging_regularization double
Elastic averaging regularization strength (only if elastic averaging is enabled).
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
problem_type enum
Problem type, auto-detected by default. If set to image, the H2OFrame must contain a string column containing the path (URI or URL) to the images in the first column. If set to text, the H2OFrame must contain a string column containing the text in the first column. If set to dataset, Deep Water behaves just like any other H2O Model and builds a model on the provided H2OFrame (non-String columns).
In/Out
activation enum
Activation function. Only used if no user-defined network architecture file is provided, and only for problem_type=dataset.
In/Out
hidden int[]
Hidden layer sizes (e.g. [200, 200]). Only used if no user-defined network architecture file is provided, and only for problem_type=dataset.
In/Out
input_dropout_ratio double
Input layer dropout ratio (can improve generalization, try 0.1 or 0.2).
In/Out
hidden_dropout_ratios double[]
Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.
In/Out
max_confusion_matrix_size int
[Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs.
In/Out
sparse boolean
Sparse data handling (more efficient for data with lots of 0 values).
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable).
In/Out
epochs double
How many times the dataset should be iterated (streamed), can be fractional.
In/Out
train_samples_per_iteration long
Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic.
In/Out
target_ratio_comm_to_comp double
Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning).
In/Out
seed long
Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded.
In/Out
learning_rate double
Learning rate (higher => less stable, lower => slower convergence).
Initial momentum at the beginning of training (try 0.5).
In/Out
momentum_ramp double
Number of training samples for which momentum increases.
In/Out
momentum_stable double
Final momentum after the ramp is over (try 0.99).
In/Out
score_interval double
Shortest time interval (in seconds) between model scoring.
In/Out
score_training_samples long
Number of training set samples for scoring (0 for all).
In/Out
score_validation_samples long
Number of validation set samples for scoring (0 for all).
In/Out
score_duty_cycle double
Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring).
In/Out
classification_stop double
Stopping criterion for classification error fraction on training data (-1 to disable).
In/Out
regression_stop double
Stopping criterion for regression error (MSE) on training data (-1 to disable).
In/Out
quiet_mode boolean
Enable quiet mode for less output to standard output.
In/Out
overwrite_with_best_model boolean
If enabled, override the final model with the best model found during training.
In/Out
autoencoder boolean
Auto-Encoder.
In/Out
diagnostics boolean
Enable diagnostics for hidden layers.
In/Out
variable_importances boolean
Compute variable importances for input features (Gedeon method) - can be slow for large networks.
In/Out
replicate_training_data boolean
Replicate the entire training dataset onto every node for faster training on small datasets.
In/Out
single_node_mode boolean
Run on a single node for fine-tuning of model parameters.
In/Out
shuffle_training_data boolean
Enable global shuffling of training data.
In/Out
mini_batch_size int
Mini-batch size (smaller leads to better fit, larger can speed up and generalize better).
In/Out
clip_gradient double
Clip gradients once their absolute value is larger than this value.
In/Out
network enum
Network architecture.
In/Out
backend enum
Deep Learning Backend.
In/Out
image_shape int[]
Width and height of image.
In/Out
channels int
Number of (color) channels.
In/Out
gpu boolean
Whether to use a GPU (if available).
In/Out
device_id int[]
Device IDs (which GPUs to use).
In/Out
cache_data boolean
Whether to cache the data in memory (automatically disabled if data size is too large).
In/Out
network_definition_file string
Path of file containing network definition (graph, architecture).
In/Out
network_parameters_file string
Path of file containing network (initial) parameters (weights, biases).
In/Out
mean_image_file string
Path of file containing the mean image data for data normalization.
In/Out
export_native_parameters_prefix string
Path (prefix) where to export the native model parameters after every iteration.
In/Out
standardize boolean
If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data.
In/Out
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Schema name for this field, if it is_schema, or the name of the enum, if it's an enum.
In
name string
Field name in the Schema
Out
type string
Type for this field
Out
is_schema boolean
Type for this field is itself a Schema.
Out
value Polymorphic
Value for this field
Out
help string
A short help description to appear alongside the field in a UI
Out
label string
The label that should be displayed for the field if the name is insufficient
Out
required boolean
Is this field required, or is the default value generally sufficient?
Out
level enum
How important is this field? The web UI uses the level to do a slow reveal of the parameters
Out
direction enum
Is this field an input, output or inout?
Out
is_inherited boolean
Is the field inherited from the parent schema?
Out
inherited_from string
If this field is inherited from a class higher in the hierarchy which one?
Out
is_gridable boolean
Is the field gridable (i.e., it can be used in grid call)
Out
values string[]
For enum-type fields the allowed values are specified using the values annotation; this is used in UIs to tell the user the allowed values, and for validation
Out
json boolean
Should this field be rendered in the JSON representation?
Out
is_member_of_frames string[]
For Vec-type fields this is the set of other Vec-type fields which must contain mutually exclusive values; for example, for a SupervisedModel the response_column must be mutually exclusive with the weights_column
Out
is_mutually_exclusive_with string[]
For Vec-type fields this is the set of Frame-type fields which must contain the named column; for example, for a SupervisedModel the response_column must be in both the training_frame and (if it's set) the validation_frame
A mapping representing monotonic constraints. Use +1 to enforce an increasing constraint and -1 to specify a decreasing constraint.
In
max_abs_leafnode_pred double
Maximum absolute value of a leaf node prediction
In
pred_noise_bandwidth double
Bandwidth (sigma) of Gaussian multiplicative noise ~N(1,sigma) for tree node predictions
In
ntrees int
Number of trees.
In
max_depth int
Maximum tree depth.
In
min_rows double
Fewest allowed (weighted) observations in a leaf.
In
nbins int
For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point
In
nbins_top_level int
For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level
In
nbins_cats int
For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting.
In
r2_stopping double
r2_stopping is no longer supported and will be ignored if set - please use stopping_rounds, stopping_metric and stopping_tolerance instead. Previous version of H2O would stop making trees when the R^2 metric equals or exceeds this
In
seed long
Seed for pseudo random number generator (if applicable)
In
build_tree_one_node boolean
Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.
In
sample_rate_per_class double[]
A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree
In
col_sample_rate_per_tree double
Column sample rate per tree (from 0.0 to 1.0)
In
col_sample_rate_change_per_level double
Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0)
In
score_tree_interval int
Score the model after every so many trees. Disabled if set to 0.
In
min_split_improvement double
Minimum relative improvement in squared error reduction for a split to happen
In
histogram_type enum
What type of histogram to use for finding optimal split points
In
calibrate_model boolean
Use Platt Scaling to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities.
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
[Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)
Check if response column is constant. If enabled, then an exception is thrown if the response column is a constant value.If disabled, then model will train regardless of the response column being a constant value or not.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Seed for pseudo random number generator (if applicable)
In
family enum
Family. Use binomial for classification with logistic regression, others are for regression problems.
In
tweedie_variance_power double
Tweedie variance power
In
tweedie_link_power double
Tweedie link power
In
solver enum
AUTO will set the solver based on given data and the other parameters. IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, L_BFGS scales better for datasets with many columns.
In
alpha double[]
Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'L-BFGS'; 0.5 otherwise.
In
lambda double[]
Regularization strength
In
lambda_search boolean
Use lambda search starting at lambda max, given lambda is then interpreted as lambda min
In
early_stopping boolean
Stop early when there is no more relative improvement on train or validation (if provided)
In
nlambdas int
Number of lambdas to be used in a search. Default indicates: If alpha is zero, with lambda search set to True, the value of nlamdas is set to 30 (fewer lambdas are needed for ridge regression) otherwise it is set to 100.
In
standardize boolean
Standardize numeric columns to have zero mean and unit variance
In
non_negative boolean
Restrict coefficients (not intercept) to be non-negative
In
max_iterations int
Maximum number of iterations
In
beta_epsilon double
Converge if beta changes less (using L-infinity norm) than beta esilon, ONLY applies to IRLSM solver
In
objective_epsilon double
Converge if objective value changes less than this. Default indicates: If lambda_search is set to True the value of objective_epsilon is set to .0001. If the lambda_search is set to False and lambda is equal to zero, the value of objective_epsilon is set to .000001, for any other value of lambda the default value of objective_epsilon is set to .0001.
In
gradient_epsilon double
Converge if objective changes less (using L-infinity norm) than this, ONLY applies to L-BFGS solver. Default indicates: If lambda_search is set to False and lambda is equal to zero, the default value of gradient_epsilon is equal to .000001, otherwise the default value is .0001. If lambda_search is set to True, the conditional values above are 1E-8 and 1E-6 respectively.
In
obj_reg double
Likelihood divider in objective value computation, default is 1/nobs
In
link enum
(No description available)
In
intercept boolean
Include constant term in the model
In
prior double
Prior probability for y==1. To be used only for logistic regression iff the data has been sampled and the mean of response does not reflect reality.
In
lambda_min_ratio double
Minimum lambda used in lambda search, specified as a ratio of lambda_max (the smallest lambda that drives all coefficients to zero). Default indicates: if the number of observations is greater than the number of variables, then lambda_min_ratio is set to 0.0001; if the number of observations is less than the number of variables, then lambda_min_ratio is set to 0.01.
Maximum number of active predictors during computation. Use as a stopping criterion to prevent expensive model building with many predictors. Default indicates: If the IRLSM solver is used, the value of max_active_predictors is set to 5000 otherwise it is set to 100000000.
In
interactions string[]
A list of predictor column indices to interact. All pairwise combinations will be computed for the list.
A list of pairwise (first order) column interactions.
In
compute_p_values boolean
Request p-values computation, p-values work only with IRLSM solver and no regularization
In
remove_collinear_columns boolean
In case of linearly dependent columns, remove some of the dependent columns
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
missing_values_handling enum
Handling of missing values. Either MeanImputation or Skip.
In/Out
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
[Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs
In/Out
max_hit_ratio_k int
Maximum number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Expand categorical columns in user-specified initial Y
In
impute_original boolean
Reconstruct original training data by reversing transform
In
recover_svd boolean
Recover singular values and eigenvectors of XY
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Milliseconds since the epoch for the time that this H2OError instance was created. Generally this is a short time since the underlying error ocurred.
Out
error_url string
Error url
Out
msg string
Message intended for the end user (a data scientist).
Out
dev_msg string
Potentially more detailed message intended for a developer (e.g. a front end engineer or someone designing a language binding).
Out
http_status int
HTTP status code for this error.
Out
values Map
Any values that are relevant to reporting or handling this error. Examples are a key name if the error is on a key, or a field name and object name if it's on a specific field.
Milliseconds since the epoch for the time that this H2OError instance was created. Generally this is a short time since the underlying error ocurred.
Out
error_url string
Error url
Out
msg string
Message intended for the end user (a data scientist).
Out
dev_msg string
Potentially more detailed message intended for a developer (e.g. a front end engineer or someone designing a language binding).
Out
http_status int
HTTP status code for this error.
Out
values Map
Any values that are relevant to reporting or handling this error. Examples are a key name if the error is on a key, or a field name and object name if it's on a specific field.
Out
exception_type string
Exception type, if any.
Out
exception_msg string
Raw exception message, if any.
Out
stacktrace string[]
Stacktrace, if any.
Out
HeartBeatEvent
sends int
number of sent heartbeats
In
recvs int
number of received heartbeats
In
date string
Time when the event was recorded. Format is hh:mm:ss:ms
In
nanos long
Time in nanos
In
type enum
type of recorded event
In
HyperSpaceSearchCriteriaV99
strategy enum
Hyperparameter space search strategy.
In/Out
IOEvent
io_flavor string
flavor of the recorded io (ice/hdfs/...)
In
node string
node where this io event happened
In
data string
data info
In
date string
Time when the event was recorded. Format is hh:mm:ss:ms
In
nanos long
Time in nanos
In
type enum
type of recorded event
In
ImportFilesMultiV3
paths string[]
paths
In
pattern string
pattern
In
_exclude_fields string
Comma-separated list of JSON field paths to exclude from the result, used like: "/3/Frames?_exclude_fields=frames/frame_id/URL,__meta"
In
files string[]
files
Out
destination_frames string[]
names
Out
fails string[]
fails
Out
dels string[]
dels
Out
ImportFilesV3
path string
path
In
pattern string
pattern
In
_exclude_fields string
Comma-separated list of JSON field paths to exclude from the result, used like: "/3/Frames?_exclude_fields=frames/frame_id/URL,__meta"
In
files string[]
files
Out
destination_frames string[]
names
Out
fails string[]
fails
Out
dels string[]
dels
Out
ImportSQLTableV99
connection_url string
connection_url
In
table string
table
In
select_query string
select_query
In
username string
username
In
password string
password
In
columns string
columns
In
fetch_mode string
Mode for data loading. All modes may not be supported by all databases.
In
_exclude_fields string
Comma-separated list of JSON field paths to exclude from the result, used like: "/3/Frames?_exclude_fields=frames/frame_id/URL,__meta"
In
InitIDV3
_exclude_fields string
Comma-separated list of JSON field paths to exclude from the result, used like: "/3/Frames?_exclude_fields=frames/frame_id/URL,__meta"
In
session_key string
Session ID
In/Out
InputSchemaV4
_fields string
Filter on the set of output fields: if you set _fields="foo,bar,baz", then only those fields will be included in the output; or you can specify _fields="-goo,gee" to include all fields except goo and gee. If the result contains nested data structures, then you can refer to the fields within those structures as well. For example if you specify _fields="foo(oof),bar(-rab)", then only fields foo and bar will be included, and within foo there will be only field oof, whereas within bar all fields except rab will be reported.
In
InteractionV3
_exclude_fields string
Comma-separated list of JSON field paths to exclude from the result, used like: "/3/Frames?_exclude_fields=frames/frame_id/URL,__meta"
Whether to create pairwise quadratic interactions between factors (otherwise create one higher-order interaction). Only applicable if there are 3 or more factors.
In/Out
max_factors int
Max. number of factor levels in pair-wise interaction terms (if enforced, one extra catch-all factor will be made)
In/Out
min_occurrence int
Min. occurrence threshold for factor levels in pair-wise interaction terms
Number of randomly sampled observations used to train each Isolation Forest tree. Only one of parameters sample_size and sample_rate should be defined. If sample_rate is defined, sample_size will be ignored.
In
sample_rate double
Rate of randomly sampled observations used to train each Isolation Forest tree. Needs to be in range from 0.0 to 1.0. If set to -1, sample_rate is disabled and sample_size will be used instead.
In
mtries int
Number of variables randomly sampled as candidates at each split. If set to -1, defaults (number of predictors)/3.
In
ntrees int
Number of trees.
In
max_depth int
Maximum tree depth.
In
min_rows double
Fewest allowed (weighted) observations in a leaf.
In
nbins int
For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point
In
nbins_top_level int
For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level
In
nbins_cats int
For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting.
In
r2_stopping double
r2_stopping is no longer supported and will be ignored if set - please use stopping_rounds, stopping_metric and stopping_tolerance instead. Previous version of H2O would stop making trees when the R^2 metric equals or exceeds this
In
seed long
Seed for pseudo random number generator (if applicable)
In
build_tree_one_node boolean
Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.
In
sample_rate_per_class double[]
A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree
In
col_sample_rate_per_tree double
Column sample rate per tree (from 0.0 to 1.0)
In
col_sample_rate_change_per_level double
Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0)
In
score_tree_interval int
Score the model after every so many trees. Disabled if set to 0.
In
min_split_improvement double
Minimum relative improvement in squared error reduction for a split to happen
In
histogram_type enum
What type of histogram to use for finding optimal split points
In
calibrate_model boolean
Use Platt Scaling to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities.
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
[Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)
Check if response column is constant. If enabled, then an exception is thrown if the response column is a constant value.If disabled, then model will train regardless of the response column being a constant value or not.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Filter on the set of output fields: if you set _fields="foo,bar,baz", then only those fields will be included in the output; or you can specify _fields="-goo,gee" to include all fields except goo and gee. If the result contains nested data structures, then you can refer to the fields within those structures as well. For example if you specify _fields="foo(oof),bar(-rab)", then only fields foo and bar will be included, and within foo there will be only field oof, whereas within bar all fields except rab will be reported.
In
JobKeyV3
name string
Name (string representation) for this Key.
In/Out
type string
Name (string representation) for the type of Keyed this Key points to.
In/Out
URL string
URL for the resource that this Key points to, if one exists.
This option allows you to specify a dataframe, where each row represents an initial cluster center. The user-specified points must have the same number of columns as the training observations. The number of rows must equal the number of clusters
In
max_iterations int
Maximum training iterations (if estimate_k is enabled, then this is for each inner Lloyds iteration)
In
standardize boolean
Standardize columns before computing distances
In
seed long
RNG Seed
In
init enum
Initialization mode
In
estimate_k boolean
Whether to estimate the number of clusters (<=k) iteratively and deterministically.
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
k int
The max. number of clusters. If estimate_k is disabled, the model will find k centroids, otherwise it will find up to k centroids.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
For Vec-type fields this is the set of other Vec-type fields which must contain mutually exclusive values; for example, for a SupervisedModel the response_column must be mutually exclusive with the weights_column
In
is_mutually_exclusive_with string[]
For Vec-type fields this is the set of Frame-type fields which must contain the named column; for example, for a SupervisedModel the response_column must be in both the training_frame and (if it's set) the validation_frame
In
name string
name in the JSON, e.g. "lambda"
Out
label string
[DEPRECATED] same as name.
Out
help string
help for the UI, e.g. "regularization multiplier, typically used for foo bar baz etc."
Out
required boolean
the field is required
Out
type string
Java type, e.g. "double"
Out
default_value Polymorphic
default value, e.g. 1
Out
actual_value Polymorphic
actual value as set by the user and / or modified by the ModelBuilder, e.g., 10
Out
level string
the importance of the parameter, used by the UI, e.g. "critical", "extended" or "expert"
Out
values string[]
list of valid values for use by the front-end
Out
gridable boolean
Parameter can be used in grid call
Out
ModelParametersSchemaV3
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Min. standard deviation to use for observations with not enough data
In
eps_sdev double
Cutoff below which standard deviation is replaced with min_sdev
In
min_prob double
Min. probability to use for observations with not enough data
In
eps_prob double
Cutoff below which probability is replaced with min_prob
In
compute_metrics boolean
Compute metrics on training data
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
[Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)
In/Out
seed long
Seed for pseudo random number generator (only used for cross-validation and fold_assignment="Random" or "AUTO")
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Specify the algorithm to use for computing the principal components: GramSVD - uses a distributed computation of the Gram matrix, followed by a local SVD; Power - computes the SVD using the power iteration method (experimental); Randomized - uses randomized subspace iteration method; GLRM - fits a generalized low-rank model with L2 loss function and no regularization and solves for the SVD using local matrix algebra (experimental)
In
pca_impl enum
Specify the implementation to use for computing PCA (via SVD or EVD): MTJ_EVD_DENSEMATRIX - eigenvalue decompositions for dense matrix using MTJ; MTJ_EVD_SYMMMATRIX - eigenvalue decompositions for symmetric matrix using MTJ; MTJ_SVD_DENSEMATRIX - singular-value decompositions for dense matrix using MTJ; JAMA - eigenvalue decompositions for dense matrix using JAMA. References: JAMA - http://math.nist.gov/javanumerics/jama/; MTJ - https://github.com/fommil/matrix-toolkits-java/
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
k int
Rank of matrix approximation
In/Out
max_iterations int
Maximum training iterations
In/Out
seed long
RNG seed for initialization
In/Out
use_all_factor_levels boolean
Whether first factor level is included in each categorical expansion
In/Out
compute_metrics boolean
Whether to compute metrics on the training data
In/Out
impute_missing boolean
Whether to impute missing entries with the column mean
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Seed for random number generator; set to a value other than -1 for reproducibility.
In/Out
max_models int
Maximum number of models to build (optional).
In/Out
max_runtime_secs double
Maximum time to spend building models (optional).
In/Out
stopping_rounds int
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression)
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
strategy enum
Hyperparameter space search strategy.
In/Out
RapidsExpressionV3
name string
(Class) name of the language construct
In
pattern string
Code fragment pattern.
In
description string
Description of the functionality provided by this language construct.
In
RapidsFrameV3
ast string
A Rapids AstRoot expression
In
session_id string
Session key
In
id string
[DEPRECATED] Key name to assign Frame results
In
_exclude_fields string
Comma-separated list of JSON field paths to exclude from the result, used like: "/3/Frames?_exclude_fields=frames/frame_id/URL,__meta"
Method for computing SVD (Caution: Randomized is currently experimental and unstable)
In
nv int
Number of right singular vectors
In
max_iterations int
Maximum iterations
In
seed long
RNG seed for k-means++ initialization
In
keep_u boolean
Save left singular vectors?
In
u_name string
Frame key to save left singular vectors
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
use_all_factor_levels boolean
Whether first factor level is included in each categorical expansion
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point
In
nbins_top_level int
For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level
In
nbins_cats int
For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting.
In
r2_stopping double
r2_stopping is no longer supported and will be ignored if set - please use stopping_rounds, stopping_metric and stopping_tolerance instead. Previous version of H2O would stop making trees when the R^2 metric equals or exceeds this
In
seed long
Seed for pseudo random number generator (if applicable)
In
build_tree_one_node boolean
Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.
In
sample_rate_per_class double[]
A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree
In
col_sample_rate_per_tree double
Column sample rate per tree (from 0.0 to 1.0)
In
col_sample_rate_change_per_level double
Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0)
In
score_tree_interval int
Score the model after every so many trees. Disabled if set to 0.
In
min_split_improvement double
Minimum relative improvement in squared error reduction for a split to happen
In
histogram_type enum
What type of histogram to use for finding optimal split points
In
calibrate_model boolean
Use Platt Scaling to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities.
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
balance_classes boolean
Balance training data class counts via over/under-sampling (for imbalanced data).
In/Out
class_sampling_factors float[]
Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.
In/Out
max_after_balance_size float
Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.
In/Out
max_confusion_matrix_size int
[Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs
In/Out
max_hit_ratio_k int
Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)
Check if response column is constant. If enabled, then an exception is thrown if the response column is a constant value.If disabled, then model will train regardless of the response column being a constant value or not.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
List of models (or model ids) to ensemble/stack together. Models must have been cross-validated using nfolds > 1, and folds must be identical across models.
In
keep_levelone_frame boolean
Keep level one frame used for metalearner training.
In
seed long
Seed for random numbers; passed through to the metalearner algorithm. Defaults to -1 (time-based random number)
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
In
metalearner_algorithm enum
Type of algorithm to use as the metalearner. Options include 'AUTO' (GLM with non negative weights; if validation_frame is present, a lambda search is performed), 'glm' (GLM with default parameters), 'gbm' (GBM with default parameters), 'drf' (Random Forest with default parameters), or 'deeplearning' (Deep Learning with default parameters).
In/Out
metalearner_nfolds int
Number of folds for K-fold cross-validation of the metalearner algorithm (0 to disable or >= 2).
In/Out
metalearner_fold_assignment enum
Cross-validation fold assignment scheme for metalearner cross-validation. Defaults to AUTO (which is currently set to Random). The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
Set threshold for occurrence of words. Those that appear with higher frequency in the training data
will be randomly down-sampled; useful range is (0, 1e-5)
In
norm_model enum
Use Hierarchical Softmax
In
epochs int
Number of training iterations to run
In
min_word_freq int
This will discard words that appear less than times
Id of a data frame that contains a pre-trained (external) word2vec model
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.
A mapping representing monotonic constraints. Use +1 to enforce an increasing constraint and -1 to specify a decreasing constraint.
In
max_abs_leafnode_pred float
(same as max_delta_step) Maximum absolute value of a leaf node prediction
In
max_delta_step float
(same as max_abs_leafnode_pred) Maximum absolute value of a leaf node prediction
In
score_tree_interval int
Score the model after every so many trees. Disabled if set to 0.
In
seed long
Seed for pseudo random number generator (if applicable)
In
min_split_improvement float
(same as gamma) Minimum relative improvement in squared error reduction for a split to happen
In
gamma float
(same as min_split_improvement) Minimum relative improvement in squared error reduction for a split to happen
In
nthread int
Number of parallel threads that can be used to run XGBoost. Cannot exceed H2O cluster limits (-nthreads parameter). Defaults to maximum available
In
max_bins int
For tree_method=hist only: maximum number of bins
In
max_leaves int
For tree_method=hist only: maximum number of leaves
In
min_sum_hessian_in_leaf float
For tree_method=hist only: the mininum sum of hessian in a leaf to keep splitting
In
min_data_in_leaf float
For tree_method=hist only: the mininum data in a leaf to keep splitting
In
tree_method enum
Tree method
In
grow_policy enum
Grow policy - depthwise is standard GBM, lossguide is LightGBM
In
booster enum
Booster type
In
reg_lambda float
L2 regularization
In
reg_alpha float
L1 regularization
In
quiet_mode boolean
Enable quiet mode
In
sample_type enum
For booster=dart only: sample_type
In
normalize_type enum
For booster=dart only: normalize_type
In
rate_drop float
For booster=dart only: rate_drop (0..1)
In
one_drop boolean
For booster=dart only: one_drop
In
skip_drop float
For booster=dart only: skip_drop (0..1)
In
dmatrix_type enum
Type of DMatrix. For sparse, NAs and 0 are treated equally.
In
backend enum
Backend. By default (auto), a GPU is used if available.
In
gpu_id int
Which GPU to use.
In
distribution enum
Distribution function
In
tweedie_power double
Tweedie power for Tweedie regression, must be between 1 and 2.
In
quantile_alpha double
Desired quantile for Quantile regression, must be between 0 and 1.
In
huber_alpha double
Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).
In
max_categorical_levels int
For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.
Column with cross-validation fold index assignment per observation.
In/Out
fold_assignment enum
Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems.
In/Out
categorical_encoding enum
Encoding scheme for categorical features
In/Out
ignored_columns string[]
Names of columns to ignore for training.
In/Out
ignore_const_cols boolean
Ignore constant columns.
In/Out
score_each_iteration boolean
Whether to score during each iteration of model training.
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable)
In/Out
max_runtime_secs double
Maximum allowed runtime in seconds for model training. Use 0 to disable.
In/Out
stopping_metric enum
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client.
In/Out
stopping_tolerance double
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much)
In/Out
custom_metric_func string
Reference to custom evaluation function, format: language:keyName=funcName
In/Out
export_checkpoints_dir string
Automatically export generated models to this directory.