Introduction

This guide will walk you through how to use H2O-dev’s web UI, H2O Flow. To view a demo video of H2O Flow, click here.

About H2O Flow

H2O Flow is an open-source user interface for H2O. It is a web-based interactive computational environment that allows you to combine code execution, text, mathematics, plots, and rich media in a single document, similar to iPython Notebooks.

With H2O Flow, you can capture, rerun, annotate, present, and share your workflow. H2O Flow allows you to use H2O interactively to import files, build models, and iteratively improve them. Based on your models, you can make predictions and add rich text to create vignettes of your work - all within Flow’s browser-based environment.

Flow’s hybrid user interface seamlessly blends command-line computing with a modern graphical user interface. However, rather than displaying output as plain text, Flow provides a point-and-click user interface for every H2O operation. It allows you to access any H2O object in the form of well-organized tabular data.

H2O Flow sends commands to H2O as a sequence of executable cells. The cells can be modified, rearranged, or saved to a library. Each cell contains an input field that allows you to enter commands, define functions, call other functions, and access other cells or objects on the page. When you execute the cell, the output is a graphical object, which can be inspected to view additional details.

While H2O Flow supports REST API, R scripts, and Coffeescript, no programming experience is required to run H2O Flow. You can click your way through any H2O operation without ever writing a single line of code. You can even disable the input cells to run H2O Flow using only the GUI. H2O Flow is designed to guide you every step of the way, by providing input prompts, interactive help, and example flows.

Getting Help

First, let’s go over the basics. Type h to view a list of helpful shortcuts.

The following help window displays:

help menu

To close this window, click the X in the upper-right corner, or click the Close button in the lower-right corner. You can also click behind the window to close it. You can also access this list of shortcuts by clicking the Help menu and selecting Keyboard Shortcuts.

For additional help, select the Help sidebar to the right and click the Assist Me! button.

Assist Me

You can also type assist in a blank cell and press Ctrl+Enter. A list of common tasks displays to help you find the correct command.

Assist Me links

There are multiple resources to help you get started with Flow in the Help sidebar. To access this document, select the Getting Started with H2O Flow link below the Help Topics heading.

You can also explore the pre-configured flows available in H2O Flow for a demonstration of how to create a flow. To view the example flows, click the Browse installed packs… link in the Packs subsection of the Help sidebar. Click the examples folder and select the example flow from the list.

Flow Packs

If you have a flow currently open, a confirmation window appears asking if the current notebook should be replaced. To load the example flow, click the Load Notebook button.

To view the REST API documentation, click the Help tab in the sidebar and then select the type of REST API documentation (Routes or Schemas).

REST API documentation

Before getting started with H2O Flow, make sure you understand the different cell modes.

Understanding Cell Modes

Using Edit Mode
Using Command Mode
Running Flows
Using Keyboard Shortcuts
Using Flow Buttons

There are two modes for cells: edit and command.

Using Edit Mode

In edit mode, the cell is yellow with a blinking bar to indicate where text can be entered and there is an orange flag to the left of the cell.

Edit Mode

Using Command Mode

In command mode, the flag is yellow. The flag also indicates the cell’s format:

MD: Markdown

Note: Markdown formatting is not applied until you run the cell by clicking the Run button or clicking the Run menu and selecting Run.
CS: Code
RAW: Raw format (for code comments)
H[1-5]: Heading level (where 1 is a first-level heading)

NOTE: If there is an error in the cell, the flag is red.

Cell error

If the cell is executing commands, the flag is teal. The flag returns to yellow when the task is complete.

Cell executing

Running Flows

When you run the flow, a progress bar that indicates the current status of the flow. You can cancel the currently running flow by clicking the Stop button in the progress bar.

Flow Progress Bar

When the flow is complete, a message displays in the upper right. Note: If there is an error in the flow, H2O Flow stops the flow at the cell that contains the error.

Flow - Completed Successfully Flow - Did Not Complete

Using Keyboard Shortcuts

Here are some important keyboard shortcuts to remember:

Click a cell and press Enter to enter edit mode, which allows you to change the contents of a cell.
To exit edit mode, press Esc.
To execute the contents of a cell, press the Ctrl and Enter buttons at the same time.

The following commands must be entered in command mode.

To add a new cell above the current cell, press a.
To add a new cell below the current cell, press b.
To delete the current cell, press the d key twice. (dd).

You can view these shortcuts by clicking Help > Keyboard Shortcuts or by clicking the Help tab in the sidebar.

Using Flow Buttons

There are also a series of buttons at the top of the page below the flow name that allow you to save the current flow, add a new cell, move cells up or down, run the current cell, and cut, copy, or paste the current cell. If you hover over the button, a description of the button’s function displays.

Flow buttons

You can also use the menus at the top of the screen to edit the cells, view specific format types (such as input or output), change the cell’s format, or run the cell. You can also access troubleshooting information or obtain help with Flow.

Flow menus

Note: To disable the code input and use H2O Flow strictly as a GUI, click the Cell menu, then Toggle Cell Input.

Now that you are familiar with the cell modes, let’s import some data.

Importing Data

If you don’t have any of your own data to work with, you can find some example datasets here:

There are multiple ways to import data in H2O flow:

Click the Assist Me! button in the Help sidebar, then click the importFiles link. Enter the file path in the auto-completing Search entry field and press Enter. Select the file from the search results and select it by clicking the Add All link.
You can also drag and drop the file onto the Search field in the cell.
In a blank cell, select the CS format, then enter importFiles ["path/filename.format"] (where path/filename.format represents the complete file path to the file, including the full file name. The file path can be a local file path or a website address.

After selecting the file to import, the file path displays in the “Search Results” section. To import a single file, click the plus sign next to the file. To import all files in the search results, click the Add all link. The files selected for import display in the “Selected Files” section.

Import Files

To import the selected file(s), click the Import button.
To remove all files from the “Selected Files” list, click the Clear All link.
To remove a specific file, click the X next to the file path.

After you click the Import button, the raw code for the current job displays. A summary displays the results of the file import, including the number of imported files and their Network File System (nfs) locations.

Import Files - Results

Uploading Data

To upload a local file, click the Data menu and select Upload File…. Click the Choose File button, select the file, click the Choose button, then click the Upload button.

File Upload Pop-Up

When the file has uploaded successfully, a message displays in the upper right and the Setup Parse cell displays.

File Upload Successful

Ok, now that your data is available in H2O Flow, let’s move on to the next step: parsing. Click the Parse these files button to continue.

Parsing Data

After you have imported your data, parse the data.

Select the parser type (if necessary) from the drop-down Parser list. For most data parsing, H2O automatically recognizes the data type, so the default settings typically do not need to be changed. The following options are available:

Auto
XLS
CSV
SVMLight

If a separator or delimiter is used, select it from the Separator list.

Select a column header option, if applicable:

Auto: Automatically detect header types.
First row contains column names: Specify heading as column names.
First row contains data: Specify heading as data. This option is selected by default.

Select any necessary additional options:

Enable single quotes as a field quotation character: Treat single quote marks (also known as apostrophes) in the data as a character, rather than an enum. This option is not selected by default.
Delete on done: Check this checkbox to delete the imported data after parsing. This option is selected by default.

A preview of the data displays in the “Data Preview” section. Flow - Parse options

Note: To change the column type, select the drop-down list at the top of the column and select the data type. The options are:

Unknown
Numeric
Enum
Time
UUID
String
Invalid

After making your selections, click the Parse button.

After you click the Parse button, the code for the current job displays.

Flow - Parse code

Since we’ve submitted a couple of jobs (data import & parse) to H2O now, let’s take a moment to learn more about jobs in H2O.

Viewing Jobs

Viewing All Jobs
Viewing Specific Jobs

Any command (such as importFiles) you enter in H2O is submitted as a job, which is associated with a key. The key identifies the job within H2O and is used as a reference.

Viewing All Jobs

To view all jobs, click the Admin menu, then click Jobs, or enter getJobs in a cell in CS mode.

View Jobs

The following information displays:

Type (for example, Frame or Model)
Link to the object
Description of the job type (for example, Parse or GBM)
Start time
End time
Run time

To refresh this information, click the Refresh button. To view the details of the job, click the View button.

Viewing Specific Jobs

To view a specific job, click the link in the “Destination” column.

View Job - Model

The following information displays:

Type (for example, Frame)
Link to object (key)
Description (for example, Parse)
Status
Run time
Progress

NOTE: For a better understanding of how jobs work, make sure to review the Viewing Frames section as well.

Ok, now that you understand how to find jobs in H2O, let’s submit a new one by building a model.

Building Models

To build a model:

Click the Assist Me! button and select buildModel

or
Click the Assist Me! button, select getFrames, then click the Build Model… button below the parsed .hex data set

or
Click the View button after parsing data, then click the Build Model button

or
Click the drop-down Model menu and select the model type from the list

The Build Model… button can be accessed from any page containing the .hex key for the parsed data (for example, getJobs > getFrame).

In the Build a Model cell, select an algorithm from the drop-down menu:

K-means: Create a K-Means model.

Generalized Linear Model: Create a Generalized Linear model.

Distributed RF: Create a distributed Random Forest model.

Naive Bayes: Create a Naive Bayes model.

Principal Component Analysis: Create a Principal Components Analysis model for modeling without regularization or performing dimensionality reduction.

Gradient Boosting Machine: Create a Gradient Boosted model

Deep Learning: Create a Deep Learning model.

The available options vary depending on the selected model. If an option is only available for a specific model type, the model type is listed. If no model type is specified, the option is applicable to all model types.

Model_ID: (Optional) Enter a custom name for the model to use as a reference. By default, H2O automatically generates an ID containing the model type (for example, gbm-6f6bdc8b-ccbc-474a-b590-4579eea44596).
Training_frame: (Required) Select the dataset used to build the model.

NOTE: If you click the Build a model button from the Parse cell, the training frame is entered automatically.
Validation_frame: (Optional) Select the dataset used to evaluate the accuracy of the model.
Ignored_columns: (Optional) Click the plus sign next to a column name to add it to the list of columns excluded from the model. To add all columns, click the Add all button. To remove a column from the list of ignored columns, click the X next to the column name. To remove all columns from the list of ignored columns, click the Clear all button.
Drop_na20_cols: (Optional) Check this checkbox to drop columns that are missing (i.e., use 0 or NA) over 20% of their values
User_points: (K-Means, PCA) For K-Means, specify the number of initial cluster centers. For PCA, specify the initial Y matrix. Note: The PCA User_points parameter should only be used by advanced users for testing purposes.
Transform: (PCA) Select the transformation method for the training data: None, Standardize, Normalize, Demean, or Descale. The default is None.
Response_column: (Required for GLM, GBM, DL, DRF, NaiveBayes) Select the column to use as the independent variable.
Solver: (GLM) Select the solver to use (IRLSM, L_BFGS, or auto). IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, while L_BFGS scales better for datasets with many columns. The default is IRLSM.
Ntrees: (GBM, DRF) Specify the number of trees. The default value is 50.
Max_depth: (GBM, DRF) Specify the maximum tree depth. For GBM, the default value is 5. For DRF, the default value is 20.
Min_rows: (GBM), (DRF) Specify the minimum number of observations for a leaf (“nodesize” in R). For Grid Search, use comma-separated values. The default value is 10.
Nbins: (GBM, DRF) Specify the number of bins for the histogram. The default value is 20.
Mtries: (DRF) Specify the columns to randomly select at each level. To use the square root of the columns, enter -1. The default value is -1.
Sample_rate: (DRF) Specify the sample rate. The range is 0 to 1.0 and the default value is 0.6666667.
Build_tree_one_node: (DRF) To run on a single node, check this checkbox. This is suitable for small datasets as there is no network overhead but fewer CPUs are used. The default setting is disabled.
Learn_rate: (GBM) Specify the learning rate. The range is 0.0 to 1.0 and the default is 0.1.
Distribution: (GBM) Select the distribution type from the drop-down list. The options are auto, bernoulli, multinomial, or gaussian and the default is auto.
Loss: (DL) Select the loss function. For DL, the options are Automatic, MeanSquare, CrossEntropy, Huber, or Absolute and the default value is Automatic. Absolute, MeanSquare, and Huber are applicable for regression or classification, while CrossEntropy is only applicable for classification. Huber can improve for regression problems with outliers.
Score_each_iteration: (K-Means, DRF, NaiveBayes, PCA, GBM) To score during each iteration of the model training, check this checkbox.
K: (K-Means), (PCA) For K-Means, specify the number of clusters. For PCA, specify the rank of matrix approximation. The default for K-Means and PCA is 1.
Gamma: (PCA) Specify the regularization weight for PCA. The default is 0.
Max_iterations: (K-Means, PCA,GLM) Specify the number of training iterations. For K-Means and PCA, the default is 1000. For GLM, the default is -1.
Beta_epsilon: (GLM) Specify the beta epsilon value. If the L1 normalization of the current beta change is below this threshold, consider using convergence.
Init: (K-Means, PCA) Select the initialization mode. For K-Means, the options are Furthest, PlusPlus, Random, or User. For PCA, the options are PlusPlus, User, or None.

Note: If PlusPlus is selected, the initial Y matrix is chosen by the final cluster centers from the K-Means PlusPlus algorithm.
Family: (GLM) Select the model type (Gaussian, Binomial, Poisson, or Gamma).
Activation: (DL) Select the activation function (Tanh, TanhWithDropout, Rectifier, RectifierWithDropout, Maxout, MaxoutWithDropout). The default option is Rectifier.
Hidden: (DL) Specify the hidden layer sizes (e.g., 100,100). For Grid Search, use comma-separated values: (10,10),(20,20,20). The default value is [200,200]. The specified value(s) must be positive.
Epochs: (DL) Specify the number of times to iterate (stream) the dataset. The value can be a fraction. The default value for DL is 10.0.
Variable_importances: (DL) Check this checkbox to compute variable importance. This option is not selected by default.
Laplace: (NaiveBayes) Specify the Laplace smoothing parameter. The default value is 0.
Min_sdev: (NaiveBayes) Specify the minimum standard deviation to use for observations without enough data. The default value is 0.001.
Eps_sdev: (NaiveBayes) Specify the threshold for standard deviation. If this threshold is not met, the min_sdev value is used. The default value is 0.
Min_prob: (NaiveBayes) Specify the minimum probability to use for observations without enough data. The default value is 0.001.
Eps_prob: (NaiveBayes) Specify the threshold for standard deviation. If this threshold is not met, the min_sdev value is used. The default value is 0.
Standardize: (K-Means, GLM) To standardize the numeric columns to have mean of zero and unit variance, check this checkbox. Standardization is highly recommended; if you do not use standardization, the results can include components that are dominated by variables that appear to have larger variances relative to other attributes as a matter of scale, rather than true contribution. This option is selected by default.
Beta_constraints: (GLM)To use beta constraints, select a dataset from the drop-down menu. The selected frame is used to constraint the coefficient vector to provide upper and lower bounds.

Advanced Options

Checkpoint: (DL) Enter a model key associated with a previously-trained Deep Learning model. Use this option to build a new model as a continuation of a previously-generated model (e.g., by a grid search).
Use_all_factor_levels: (GLM, DL) Check this checkbox to use all factor levels in the possible set of predictors; if you enable this option, sufficient regularization is required. By default, the first factor level is skipped. For Deep Learning models, this option is useful for determining variable importances and is automatically enabled if the autoencoder is selected.
Train_samples_per_iteration: (DL) Specify the number of global training samples per MapReduce iteration. To specify one epoch, enter 0. To specify all available data (e.g., replicated training data), enter -1. To use the automatic values, enter -2. The default is -2.
Adaptive_rate: (DL) Check this checkbox to enable the adaptive learning rate (ADADELTA). This option is selected by default. If this option is enabled, the following parameters are ignored: rate, rate_decay, rate_annealing, momentum_start, momentum_ramp, momentum_stable, and nesterov_accelerated_gradient.
Input_dropout_ratio: (DL) Specify the input layer dropout ratio to improve generalization. Suggested values are 0.1 or 0.2. The range is >= 0 to <1 and the default value is 0.
L1: (DL) Specify the L1 regularization to add stability and improve generalization; sets the value of many weights to 0. The default value is 0.
L2: (DL) Specify the L2 regularization to add stability and improve generalization; sets the value of many weights to smaller values. The default value is 0.
Score_interval: (DL) Specify the shortest time interval (in seconds) to wait between model scoring. The default value is 5.
Score_training_samples: (DL) Specify the number of training set samples for scoring. To use all training samples, enter 0. The default value is 10000.
Score_validation_samples: (DL) (Requires selection from the Validation_Frame drop-down list) Specify the number of validation set samples for scoring. To use all validation set samples, enter 0. The default value is 0. This option is applicable to classification only.
Score_duty_cycle: (DL) Specify the maximum duty cycle fraction for scoring. A lower value results in more training and a higher value results in more scoring. The default value is 0.1.
Autoencoder: (DL) Check this checkbox to enable the Deep Learning autoencoder. This option is not selected by default. Note: This option requires a loss function other than CrossEntropy. If this option is enabled, use_all_factor_levels must be enabled.
Balance_classes: (GLM, GBM, DRF, DL, NaiveBayes) Oversample the minority classes to balance the class distribution. This option is not selected by default. This option is only applicable for classification. Majority classes can be undersampled to satisfy the Max_after_balance_size parameter.
Max_confusion_matrix_size: (DRF, NaiveBayes, GBM) Specify the maximum size (in number of classes) for confusion matrices to be printed in the Logs.
Max_hit_ratio_k: (DRF, NaiveBayes) Specify the maximum number (top K) of predictions to use for hit ratio computation. Applicable to multi-class only. To disable, enter 0.
Link: (GLM) Select a link function (Identity, Family_Default, Logit, Log, or Inverse).
Alpha: (GLM) Specify the regularization distribution between L2 and L2. The default value is 0.5.
Lambda: (GLM) Specify the regularization strength. There is no default value.
Lambda_search: (GLM) Check this checkbox to enable lambda search, starting with lambda max. The given lambda is then interpreted as lambda min.
Rate: (DL) Specify the learning rate. Higher rates result in less stable models and lower rates result in slower convergence. The default value is 0.005. Not applicable if adaptive_rate is enabled.
Rate_annealing: (DL) Specify the learning rate annealing. The formula is rate/(1+rate_annealing value * samples). The default value is 10.000001. Not applicable if adaptive_rate is enabled.
Momentum_start: (DL) Specify the initial momentum at the beginning of training. A suggested value is 0.5. The default value is 0. Not applicable if adaptive_rate is enabled.
Momentum_ramp: (DL) Specify the number of training samples for increasing the momentum. The default value is 1000000. Not applicable if adaptive_rate is enabled.
Momentum_stable: DL Specify the final momentum value reached after the momentum_ramp training samples. Not applicable if adaptive_rate is enabled.
Nesterov_accelerated_gradient: (DL) Check this checkbox to use the Nesterov accelerated gradient. This option is recommended and selected by default. Not applicable is adaptive_rate is enabled.
Hidden_dropout_ratios: (DL) Specify the hidden layer dropout ratios to improve generalization. Specify one value per hidden layer, each value between 0 and 1 (exclusive). There is no default value. This option is applicable only if TanhwithDropout, RectifierwithDropout, or MaxoutWithDropout is selected from the Activation drop-down list.

Expert Options

Keep_cross_validation_splits: (DL) Check this checkbox to keep the cross-validation frames. This option is not selected by default.
Override_with_best_model: (DL) Check this checkbox to override the final model with the best model found during training. This option is selected by default.
Target_ratio_comm_to_comp: (DL) Specify the target ratio of communication overhead to computation. This option is only enabled for multi-node operation and if train_samples_per_iteration equals -2 (auto-tuning). The default value is 0.02.
Rho: (DL) Specify the adaptive learning rate time decay factor. The default value is 0.99. This option is only applicable if adaptive_rate is enabled.
Epsilon: (DL) Specify the adaptive learning rate time smoothing factor to avoid dividing by zero. The default value is 1.0E-8. This option is only applicable if adaptive_rate is enabled.
Max_W2: (DL) Specify the constraint for the squared sum of the incoming weights per unit (e.g., for Rectifier). The default value is infinity.
Initial_weight_distribution: (DL) Select the initial weight distribution (Uniform Adaptive, Uniform, or Normal). The default is Uniform Adaptive. If Uniform Adaptive is used, the initial_weight_scale parameter is not applicable.
Initial_weight_scale: (DL) Specify the initial weight scale of the distribution function for Uniform or Normal distributions. For Uniform, the values are drawn uniformly from initial weight scale. For Normal, the values are drawn from a Normal distribution with the standard deviation of the initial weight scale. The default value is 1.0. If Uniform Adaptive is selected as the initial_weight_distribution, the initial_weight_scale parameter is not applicable.
Classification_stop: (DL) (Applicable to discrete/categorical datasets only) Specify the stopping criterion for classification error fractions on training data. To disable this option, enter -1. The default value is 0.0.
Max_hit_ratio_k: (DL,)GLM (Classification only) Specify the maximum number (top K) of predictions to use for hit ratio computation (for multi-class only). To disable this option, enter 0. The default value is 10.
Regression_stop: (DL) (Applicable to real value/continuous datasets only) Specify the stopping criterion for regression error (MSE) on the training data. To disable this option, enter -1. The default value is 0.000001.
Diagnostics: (DL) Check this checkbox to compute the variable importances for input features (using the Gedeon method). For large networks, selecting this option can reduce speed. This option is selected by default.

Fast_mode: (DL) Check this checkbox to enable fast mode, a minor approximation in back-propagation. This option is selected by default.
Ignore_const_cols: (DL) Check this checkbox to ignore constant training columns, since no information can be gained from them. This option is selected by default.
Force_load_balance: (DL) Check this checkbox to force extra load balancing to increase training speed for small datasets and use all cores. This option is selected by default.
Single_node_mode: (DL) Check this checkbox to force H2O to run on a single node for fine-tuning of model parameters. This option is not selected by default.
Replicate_training_data: (DL) Check this checkbox to replicate the entire training dataset on every node for faster training on small datasets. This option is not selected by default. This option is only applicable for clouds with more than one node.
Shuffle_training_data: (DL) Check this checkbox to shuffle the training data. This option is recommended if the training data is replicated and the value of train_samples_per_iteration is close to the number of nodes times the number of rows. This option is not selected by default.
Missing_values_handling: (DL) Select how to handle missing values (Skip or MeanImputation). The default value is MeanImputation.
Quiet_mode: (DL) Check this checkbox to display less output in the standard output. This option is not selected by default.
Sparse: (DL) Check this checkbox to use sparse iterators for the input layer. This option is not selected by default as it rarely improves performance.
Col_major: (DL) Check this checkbox to use a column major weight matrix for the input layer. This option can speed up forward propagation but may reduce the speed of backpropagation. This option is not selected by default.
Average_activation: (DL) Specify the average activation for the sparse autoencoder. The default value is 0. If Rectifier is selected as the Activation type, this value must be positive. For Tanh, the value must be in (-1,1).
Sparsity_beta: (DL) Specify the sparsity regularization. The default value is 0.
Max_categorical_features: (DL) Specify the maximum number of categorical features enforced via hashing. The default is unlimited.
Reproducible: (DL) To force reproducibility on small data, check this checkbox. If this option is enabled, the model takes more time to generate, since it uses only one thread.
Export_weights_and_biases: (DL) To export the neural network weights and biases as H2O frames, check this checkbox.
Class_sampling_factors: (GLM, DRF, NaiveBayes), GBM, DL) Specify the per-class (in lexicographical order) over/under-sampling ratios. By default, these ratios are automatically computed during training to obtain the class balance. There is no default value. This option is only applicable for classification problems and when Balance_Classes is enabled.
Seed: (K-Means, GBM, DL, DRF) Specify the random number generator (RNG) seed for algorithm components dependent on randomization. The seed is consistent for each H2O instance so that you can create models with the same starting conditions in alternative configurations.
Prior: (GLM) Specify prior probability for y ==1. Use this parameter for logistic regression if the data has been sampled and the mean of response does not reflect reality. The default value is -1.
Max_active_predictors: (GLM) Specify the maximum number of active predictors during computation. This value is used as a stopping criterium to prevent expensive model building with many predictors.

Viewing Models

Click the Assist Me! button, then click the getModels link, or enter getModels in the cell in CS mode and press Ctrl+Enter. A list of available models displays.

Flow Models

To view all current models, you can also click the Model menu and click List All Models.

To inspect a model, check its checkbox then click the Inspect button, or click the Inspect button to the right of the model name.

Flow Model

A summary of the model’s parameters displays. To display more details, click the Show All Parameters button.

NOTE: The Clone this model… button will be supported in a future version.

To compare models, check the checkboxes for the models to use in the comparison and click the Compare selected models button. To select all models, check the checkbox at the top of the checkbox column (next to the KEY heading).

To delete a model, click the Delete button.

To generate a POJO to be able to use the model outside of H2O, click the Preview POJO button.

To learn how to make predictions, continue to the next section.

Making Predictions

After creating your model, click the key link for the model, then click the Predict button. Select the model to use in the prediction from the drop-down Model: menu and the data frame to use in the prediction from the drop-down Frame menu, then click the Predict button.

Making Predictions

Viewing Predictions

Click the Assist Me! button, then click the getPredictions link, or enter getPredictions in the cell in CS mode and press Ctrl+Enter. A list of the stored predictions displays. To view a prediction, click the View button to the right of the model name.

Viewing Predictions

You can also view predictions by clicking the drop-down Score menu and selecting List All Predictions.

Viewing Frames

Splitting Frames
Creating Frames
Plotting Frames

To view a specific frame, click the “Key” link for the specified frame, or enter getFrame "FrameName" in a cell in CS mode (where FrameName is the name of a frame, such as allyears2k.hex.

Viewing specified frame

From the getFrame cell, you can:

view a truncated list of the rows in the data frame by clicking the View Data button
create a model by clicking the Build Model button
make a prediction based on the data by clicking the Predict button
download the data as a .csv file by clicking the Download button
view the columns, data, and factors in more detail or plot a graph by clicking the Inspect button
view the characteristics or domain of a specific column by clicking the Summary link

When you view a frame, you can “drill-down” to the necessary level of detail (such as a specific column or row) using the View Data and Inspect buttons. The following screenshot displays the results of clicking the Inspect button.

Inspecting Frames

This screenshot displays the results of clicking the Summary link for the first column.

Inspecting Columns

To view all frames, click the Assist Me! button, then click the getFrames link, or enter getFrames in the cell in CS mode and press Ctrl+Enter. You can also view all current frames by clicking the drop-down Data menu and selecting List All Frames.

A list of the current frames in H2O displays that includes the following information for each frame:

Column headings
Number of rows and columns
Size

For parsed data, the following information displays:

Link to the .hex file
The Build Model, Predict, and Inspect buttons

To make a prediction, check the checkboxes for the frames you want to use to make the prediction, then click the Predict on Selected Frames button.

Splitting Frames

In H2O Flow, you can split datasets within Flow for use in training and testing.

splitFrame cell

To split a frame, click the Assist Me button, then click splitFrame. Note: You can also click the drop-down Data menu and select Split Frame….
From the drop-down Frame: list, select the frame to split.
In the second Ratio entry field, specify the fractional value to determine the split. The first Ratio field is automatically calculated based on the values entered in the second Ratio field.

Note: Only fractional values between 0 and 1 are supported (for example, enter .5 to split the frame in half). The total sum of the ratio values must equal one. H2O automatically adjusts the ratio values to equal one; if unsupported values are entered, an error displays.
In the Key entry field, specify a name for the new frame.
(Optional) To add another split, click the Add a split link. To remove a split, click the X to the right of the Key entry field.
Click the Create button.

Creating Frames

To create a frame with a large amount of random data (for example, to use for testing), click the drop-down Admin menu, then select Create Synthetic Frame. Customize the frame as needed, then click the Create button to create the frame.

Plotting Frames

To create a plot from a frame, click the Inspect button, then click the Plot button.

Select the type of plot (point, line, area, or interval) from the drop-down Type menu, then select the x-axis and y-axis from the following options:

label
missing
zeros
pinfs
ninfs
min
max
mean
sigma
type
cardinality

Select one of the above options from the drop-down Color menu to display the specified data in color, then click the Plot button to plot the data.

Using Clips

Clips enable you to save cells containing your workflow for later reuse. To save a cell as a clip, click the paperclip icon to the right of the cell (highlighted in the red box in the following screenshot).

To use a clip in a workflow, click the “Clips” tab in the sidebar on the right.

Clips tab

All saved clips, including the default system clips (such as assist, importFiles, and predict), are listed. Clips you have created are listed under the “My Clips” heading. To select a clip to insert, click the circular button to the left of the clip name. To delete a clip, click the trashcan icon to right of the clip name.

NOTE: The default clips listed under “System” cannot be deleted.

Deleted clips are stored in the trash. To permanently delete all clips in the trash, click the Empty Trash button.

NOTE: Saved data, including flows and clips, are persistent as long as the same IP address is used for the cluster. If a new IP is used, previously saved flows and clips are not available.

Viewing Outlines

The “Outline” tab in the sidebar displays a brief summary of the cells currently used in your flow; essentially, a command history.

To jump to a specific cell, click the cell description.
To delete a cell, select it and press the X key on your keyboard.

Saving Flows

Finding Saved Flows on your Disk
Saving Flows on a Hadoop cluster
Duplicating Flows
Downloading Flows
Loading Flows

You can save your flow for later reuse. To save your flow as a notebook, click the “Save” button (the first button in the row of buttons below the flow name), or click the drop-down “Flow” menu and select “Save.” To enter a custom name for the flow, click the default flow name (“Untitled Flow”) and type the desired flow name. A pencil icon indicates where to enter the desired name.

Renaming Flows

To confirm the name, click the checkmark to the right of the name field.

Confirm Name

To reuse a saved flow, click the “Flows” tab in the sidebar, then click the flow name. To delete a saved flow, click the trashcan icon to the right of the flow name.

Flows

Finding Saved Flows on your Disk

By default, flows are saved to the h2oflows directory underneath your home directory. The directory where flows are saved is printed to stdout:

03-20 14:54:20.945 172.16.2.39:54323     95667  main      INFO: Flow dir: '/Users/<UserName>/h2oflows'

To back up saved flows, copy this directory to your preferred backup location.

To specify a different location for saved flows, use the command-line argument -flow_dir when launching H2O:

java -jar h2o.jar -flow_dir /<New>/<Location>/<For>/<Saved>/<Flows>

where /<New>/<Location>/<For>/<Saved>/<Flows> represents the specified location. If the directory does not exist, it will be created the first time you save a flow.

Saving Flows on a Hadoop cluster

Note: If you are running H2O Flow on a Hadoop cluster, H2O will try to find the HDFS home directory to use as the default directory for flows. If the HDFS home directory is not found, flows cannot be saved unless a directory is specified while launching using -flow_dir:

hadoop jar h2odriver.jar -nodes 1 -mapperXmx 1g -output hdfsOutputDirName -flow_dir hdfs:///<Saved>/<Flows>/<Location>

The location specified in flow_dir may be either an hdfs or regular filesystem directory. If the directory does not exist, it will be created the first time you save a flow.

Duplicating Flows

To create a copy of the current flow, select the Flow menu, then click Make a Copy. The name of the current flow changes to “Copy of “ (where is the name of the flow). You can save the duplicated flow using this name by clicking Flow > Save.

Downloading Flows

After saving a flow as a notebook, click the Flow menu, then select Download…. A new window opens and the saved flow is downloaded to the default downloads folder on your computer. The file is exported as <filename> .flow, where <filename> is the name specified when the flow was saved.

Caution: You must have an active internet connection to export flows.

Loading Flows

To load a saved flow, click the Flows tab in the sidebar at the right. In the pop-up confirmation window that appears, select Load Notebook, or click Cancel to return to the current flow.

Confirm Replace Flow

After clicking Load Notebook, the saved flow is loaded.

To load an exported flow, click the Flow menu and select Open…. In the pop-up window that appears, click the Choose File button and select the exported flow, then click the Open button.

Open Flow

Notes:

Only exported flows using the default .flow filetype are supported. Other filetypes will not open.
If the current notebook has the same name as the selected file, a pop-up confirmation appears to confirm that the current notebook should be overwritten.

Troubleshooting

Viewing Cluster Status
Viewing CPU Status (Water Meter)
Viewing Logs
Downloading Logs
Viewing Stack Trace Information
Viewing Network Test Results
Accessing the Profiler
Viewing the Timeline
Shutting Down H2O

To troubleshoot issues in Flow, use the Admin menu. The Admin menu allows you to check the status of the cluster, view a timeline of events, and view or download logs for issue analysis.

NOTE: To view the current version, click the Help menu, then click About.

Viewing Cluster Status

Click the Admin menu, then select Cluster Status. A summary of the status of the cluster (also known as a cloud) displays, which includes the same information:

Cluster health
Whether all nodes can communicate (consensus)
Whether new nodes can join (locked/unlocked) Note: After you submit a job to H2O, the cluster does not accept new nodes.
H2O version
Number of used and available nodes
When the cluster was created

The following information displays for each node:

IP address (name)
Time of last ping
Number of cores
Load
Amount of data (used/total)
Percentage of cached data
GC (free/total/max)
Amount of disk space in GB (free/max)
Percentage of free disk space

To view more information, click the Show Advanced button.

Viewing CPU Status (Water Meter)

To view the current CPU usage, click the Admin menu, then click Water Meter (CPU Meter). A new window opens, displaying the current CPU use statistics.

Viewing Logs

To view the logs for troubleshooting, click the Admin menu, then click Inspect Log.

Inspect Log

To view the logs for a specific node, select it from the drop-down Select Node menu.

Downloading Logs

To download the logs for further analysis, click the Admin menu, then click Download Log. A new window opens and the logs download to your default download folder. You can close the new window after downloading the logs. Send the logs to support@h2o.ai for issue resolution.

Viewing Stack Trace Information

To view the stack trace information, click the Admin menu, then click Stack Trace.

Stack Trace

To view the stack trace information for a specific node, select it from the drop-down Select Node menu.

Viewing Network Test Results

To view network test results, click the Admin menu, then click Network Test.

Network Test Results

Accessing the Profiler

The Profiler looks across the cluster to see where the same stack trace occurs, and can be helpful for identifying what the currently used CPU is doing. To view the profiler, click the Admin menu, then click Profiler.

Profiler

To view the profiler information for a specific node, select it from the drop-down Select Node menu.

Viewing the Timeline

To view a timeline of events in Flow, click the Admin menu, then click Timeline. The following information displays for each event:

Time of occurrence (HH:MM:SS:MS)
Number of nanoseconds for duration
Originator of event (“who”)
I/O type
Event type
Number of bytes sent & received

To obtain the most recent information, click the Refresh button.

Shutting Down H2O

To shut down H2O, click the Admin menu, then click Shut Down. A Shut down complete message displays in the upper right when the cluster has been shut down.

How to Launch H2O-Dev from the Command Line

JVM Options
H2O Options
Cloud Formation Behavior
Flatfile Configuration

You can use Terminal (OS X) or the Command Prompt (Windows) to launch H2O-Dev. When you launch from the command line, you can include additional instructions to H2O-Dev, such as how many nodes to launch, how much memory to allocate for each node, assign names to the nodes in the cloud, and more.

There are two different argument types:

JVM arguments
H2O arguments

The arguments use the following format: java <JVM Options> -jar h2o.jar <H2O Options>.

JVM Options

-version: Display Java version info.
-Xmx<Heap Size>: To set the total heap size for an H2O node, configure the memory allocation option -Xmx. By default, this option is set to 1 Gb (-Xmx1g). When launching nodes, we recommend allocating a total of four times the memory of your data.

Note: Do not try to launch H2O with more memory than you have available.

H2O Options

h or -help: Display this information in the command line output.
-name <H2O-DevCloudName>: Assign a name to the H2O instance in the cloud (where <H2O-DevCloudName> is the name of the cloud. Nodes with the same cloud name will form an H2O cloud (also known as an H2O cluster).
-flatfile <FileName>: Specify a flatfile of IP address for faster cloud formation (where <FileName> is the name of the flatfile.
-ip <IPnodeAddress>: Specify an IP address other than the default localhost for the node to use (where <IPnodeAddress> is the IP address).
-port <#>: Specify a port number other than the default 54321 for the node to use (where <#> is the port number).
-network <IPv4NetworkSpecification1>[,<IPv4NetworkSpecification2> ...]: Specify a range (where applicable) of IP addresses (where <IPv4NetworkSpecification1> represents the first interface, <IPv4NetworkSpecification2> represents the second, and so on). The IP address discovery code binds to the first interface that matches one of the networks in the comma-separated list. For example, 10.1.2.0/24 supports 256 possibilities.
-ice_root <fileSystemPath>: Specify a directory for H2O to spill temporary data to disk (where <fileSystemPath> is the file path).
-flow_dir <server-side or HDFS directory>: Specify a directory for saved flows. The default is /Users/h2o-<H2OUserName>/h2oflows (where <H2OUserName> is your user name).
nthreads <#ofThreads>: Specify the maximum number of threads in the low-priority batch work queue (where <#ofThreads> is the number of threads). The default is 99.
-client: Launch H2O node in client mode. This is used mostly for running Sparkling Water.

Cloud Formation Behavior

New H2O nodes join to form a cloud during launch. After a job has started on the cloud, it prevents new members from joining.

To start an H2O node with 4GB of memory and a default cloud name: java -Xmx4g -jar h2o.jar
To start an H2O node with 6GB of memory and a specific cloud name: java -Xmx6g -jar h2o.jar -name MyCloud
To start an H2O cloud with three 2GB nodes using the default cloud names: java -Xmx2g -jar h2o.jar & java -Xmx2g -jar h2o.jar & java -Xmx2g -jar h2o.jar &

Wait for the INFO: Registered: # schemas in: #mS output before entering the above command again to add another node (the number for # will vary).

Flatfile Configuration

If you are configuring many nodes, it is faster and easier to use the -flatfile option, rather than -ip and -port.

To configure H2O-Dev on a multi-node cluster:

Locate a set of hosts.
Download the appropriate version of H2O-Dev for your environment.
Verify that the same h2o.jar file is available on all hosts.
Create a flatfile (a plain text file with the IP and port numbers of the hosts). Use one entry per line. For example:
```
192.168.1.163:54321
192.168.1.164:54321
```
Copy the flatfile.txt to each node in the cluster.
Use the -Xmx option to specify the amount of memory for each node. The cluster’s memory capacity is the sum of all H2O nodes in the cluster.

For example, if you create a cluster with four 20g nodes (by specifying -Xmx20g four times), H2O will have a total of 80 gigs of memory available.

For best performance, we recommend sizing your cluster to be about four times the size of your data. To avoid swapping, the -Xmx allocation must not exceed the physical memory on any node. Allocating the same amount of memory for all nodes is strongly recommended, as H2O-Dev works best with symmetric nodes.

Note the optional -ip and -port options specify the IP address and ports to use. The -ip option is especially helpful for hosts with multiple network interfaces.

java -Xmx20g -jar h2o.jar -flatfile flatfile.txt -port 54321

The output will resemble the following:
```
 04-20 16:14:00.253 192.168.1.70:54321    2754   main      INFO:   1. Open a terminal and run 'ssh -L 55555:localhost:54321 H2O-DevUser@###.###.#.##'
 04-20 16:14:00.253 192.168.1.70:54321    2754   main      INFO:   2. Point your browser to http://localhost:55555
 04-20 16:14:00.437 192.168.1.70:54321    2754   main      INFO: Log dir: '/tmp/h2o-H2O-DevUser/h2ologs'
 04-20 16:14:00.437 192.168.1.70:54321    2754   main      INFO: Cur dir: '/Users/H2O-DevUser/h2o-dev'
 04-20 16:14:00.459 192.168.1.70:54321    2754   main      INFO: HDFS subsystem successfully initialized
 04-20 16:14:00.460 192.168.1.70:54321    2754   main      INFO: S3 subsystem successfully initialized
 04-20 16:14:00.460 192.168.1.70:54321    2754   main      INFO: Flow dir: '/Users/H2O-DevUser/h2oflows'
 04-20 16:14:00.475 192.168.1.70:54321    2754   main      INFO: Cloud of size 1 formed [/192.168.1.70:54321]
```
As you add more nodes to your cluster, the output is updated: INFO WATER: Cloud of size 2 formed [/...]...
Access the H2O-Dev web UI (Flow) with your browser. Point your browser to the HTTP address specified in the output Listening for HTTP and REST traffic on ....

To check if the cloud is available, point to the url http://<ip>:<port>/Cloud.json (an example of the JSON response is provided below). Wait for cloud_size to be the expected value and the consensus field to be true:
```
{
...

"cloud_size": 2,
"consensus": true,

...
}
```

H2O-Dev on EC2

Launch H2O-Dev
Launch H2O-Dev from the Command Line
Important Notes
Step-by-Step Walk-Through

Tested on Redhat AMI, Amazon Linux AMI, and Ubuntu AMI

Launch H2O-Dev

Selecting the Operating System and Virtualization Type
Configuring the Instance
Downloading Java and H2O

+Note: Before launching H2O on an EC2 cluster, verify that ports 54321 and 54322 are both accessible by TCP and UDP.

Selecting the Operating System and Virtualization Type

Select your operating system and the virtualization type of the prebuilt AMI on Amazon. If you are using Windows, you will need to use a hardware-assisted virtual machine (HVM). If you are using Linux, you can choose between para-virtualization (PV) and HVM. These selections determine the type of instances you can launch.

EC2 Systems

For more information about virtualization types, refer to Amazon.

Configuring the Instance

Select the IAM role and policy to use to launch the instance. H2O detects the temporary access keys associated with the instance, so you don’t need to copy your AWS credentials to the instances.
When launching the instance, select an accessible key pair.

(Windows Users) Tunneling into the Instance

For Windows users that do not have the ability to use ssh from the terminal, either download Cygwin or a Git Bash that has the capability to run ssh:

ssh -i amy_account.pem ec2-user@54.165.25.98

Otherwise, download PuTTY and follow these instructions:

Launch the PuTTY Key Generator.
Load your downloaded AWS pem key file. Note: To see the file, change the browser file type to “All”.
Save the private key as a .ppk file.
Launch the PuTTY client.
In the Session section, enter the host name or IP address. For Ubuntu users, the default host name is ubuntu@<ip-address>. For Linux users, the default host name is ec2-user@<ip-address>.
Select SSH, then Auth in the sidebar, and click the Browse button to select the private key file for authentication.

Start a new session and click the Yes button to confirm caching of the server’s rsa2 key fingerprint and continue connecting.

Downloading Java and H2O

Download Java (JDK 1.7 or later) if it is not already available on the instance.
To download H2O, run the wget command with the link to the zip file available on our website by copying the link associated with the Download button for the selected H2O-Dev build.
```
 wget http://h2o-release.s3.amazonaws.com/h2o-dev/rel-serre/1/index.html
 unzip h2o-dev-0.2.1.1.zip
 cd h2o-dev-0.2.1.1
 java -Xmx4g -jar h2o.jar
```
From your browser, navigate to <Private_IP_Address>:54321 or <Public_DNS>:54321 to use H2O’s web interface.

Launch H2O-Dev from the Command Line

Important Notes

Java is a pre-requisite for H2O; if you do not already have Java installed, make sure to install it before installing H2O. Java is available free on the web, and can be installed quickly. Although Java is required to run H2O, no programming is necessary. For users that only want to run H2O without compiling their own code, Java Runtime Environment (version 1.6 or later) is sufficient, but for users planning on compiling their own builds, we strongly recommend using Java Development Kit 1.7 or later.

After installation, launch H2O using the argument -Xmx. Xmx is the amount of memory given to H2O. If your data set is large, allocate more memory to H2O by using -Xmx4g instead of the default -Xmx1g, which will allocate 4g instead of the default 1g to your instance. For best performance, the amount of memory allocated to H2O should be four times the size of your data, but never more than the total amount of memory on your computer.

Step-by-Step Walk-Through

Download the .zip file containing the latest release of H2O-Dev from the H2O downloads page.
From your terminal, change your working directory to the same directory as the location of the .zip file.
From your terminal, unzip the .zip file. For example, unzip h2o-dev-0.2.1.1.zip.

At the prompt, enter the following commands:

 cd h2o-dev-0.2.1.1  #change working directory to the downloaded file
 java -Xmx4g -jar h2o.jar #run the basic java command to start h2o

After a few moments, output similar to the following appears in your terminal window:

  03-23 14:57:52.930 172.16.2.39:54321     1932   main      INFO: ----- H2O started  -----
 03-23 14:57:52.997 172.16.2.39:54321     1932   main      INFO: Build git branch: rel-serre
 03-23 14:57:52.998 172.16.2.39:54321     1932   main      INFO: Build git hash: 9eaa5f0c4ca39144b1fd180aedb535b5ba08b2ce
 03-23 14:57:52.998 172.16.2.39:54321     1932   main      INFO: Build git describe: jenkins-rel-serre-1
 03-23 14:57:52.998 172.16.2.39:54321     1932   main      INFO: Build project version: 0.2.1.1
 03-23 14:57:52.998 172.16.2.39:54321     1932   main      INFO: Built by: 'jenkins'
 03-23 14:57:52.998 172.16.2.39:54321     1932   main      INFO: Built on: '2015-03-18 12:55:28'
 03-23 14:57:52.998 172.16.2.39:54321     1932   main      INFO: Java availableProcessors: 8
 03-23 14:57:52.999 172.16.2.39:54321     1932   main      INFO: Java heap totalMemory: 245.5 MB
 03-23 14:57:52.999 172.16.2.39:54321     1932   main      INFO: Java heap maxMemory: 3.56 GB
 03-23 14:57:52.999 172.16.2.39:54321     1932   main      INFO: Java version: Java 1.7.0_67 (from Oracle Corporation)
 03-23 14:57:52.999 172.16.2.39:54321     1932   main      INFO: OS   version: Mac OS X 10.10.2 (x86_64)
 03-23 14:57:52.999 172.16.2.39:54321     1932   main      INFO: Machine physical memory: 16.00 GB
 03-23 14:57:52.999 172.16.2.39:54321     1932   main      INFO: Possible IP Address: en5 (en5), fe80:0:0:0:daeb:97ff:feb3:6d4b%10
 03-23 14:57:52.999 172.16.2.39:54321     1932   main      INFO: Possible IP Address: en5 (en5), 172.16.2.39
 03-23 14:57:53.000 172.16.2.39:54321     1932   main      INFO: Possible IP Address: lo0 (lo0), fe80:0:0:0:0:0:0:1%1
 03-23 14:57:53.000 172.16.2.39:54321     1932   main      INFO: Possible IP Address: lo0 (lo0), 0:0:0:0:0:0:0:1
 03-23 14:57:53.000 172.16.2.39:54321     1932   main      INFO: Possible IP Address: lo0 (lo0), 127.0.0.1
 03-23 14:57:53.000 172.16.2.39:54321     1932   main      INFO: Internal communication uses port: 54322
 03-23 14:57:53.000 172.16.2.39:54321     1932   main      INFO: Listening for HTTP and REST traffic on  http://172.16.2.39:54321/
 03-23 14:57:53.001 172.16.2.39:54321     1932   main      INFO: H2O cloud name: 'H2O-Dev-User' on /172.16.2.39:54321, discovery address /238.222.48.136:61150
 03-23 14:57:53.001 172.16.2.39:54321     1932   main      INFO: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555):
 03-23 14:57:53.001 172.16.2.39:54321     1932   main      INFO:   1. Open a terminal and run 'ssh -L 55555:localhost:54321 H2O-Dev-User@172.16.2.39'
 03-23 14:57:53.001 172.16.2.39:54321     1932   main      INFO:   2. Point your browser to http://localhost:55555
 03-23 14:57:53.211 172.16.2.39:54321     1932   main      INFO: Log dir: '/tmp/h2o-H2O-Dev-User/h2ologs'
 03-23 14:57:53.211 172.16.2.39:54321     1932   main      INFO: Cur dir: '/Users/H2O-Dev-User/Downloads/h2o-dev-0.2.1.1'
 03-23 14:57:53.234 172.16.2.39:54321     1932   main      INFO: HDFS subsystem successfully initialized
 03-23 14:57:53.234 172.16.2.39:54321     1932   main      INFO: S3 subsystem successfully initialized
 03-23 14:57:53.235 172.16.2.39:54321     1932   main      INFO: Flow dir: '/Users/H2O-Dev-User/h2oflows'
 03-23 14:57:53.248 172.16.2.39:54321     1932   main      INFO: Cloud of size 1 formed [/172.16.2.39:54321]
 03-23 14:57:53.776 172.16.2.39:54321     1932   main      WARN: Found schema field which violates the naming convention; name has mixed lowercase and uppercase characters: ModelParametersSchema.dropNA20Cols
 03-23 14:57:53.935 172.16.2.39:54321     1932   main      INFO: Registered: 142 schemas in: 605mS

Point your web browser to http://localhost:54321/

The user interface appears in your browser, and now H2O-Dev is ready to go.

WARNING: On Windows systems, Internet Explorer is frequently blocked due to security settings. If you cannot reach http://localhost:54321, try using a different web browser, such as Firefox or Chrome.

Running H2O-Dev on Hadoop

Prerequisite: Open Communication Paths
Tutorial
Hadoop Launch Parameters

Currently supported versions:

CDH 5.2
CDH 5.3
HDP 2.1
HDP 2.2
MapR 3.1.1
MapR 4.0.1

Important Points to Remember:

Each H2O node runs as a mapper
Run only one mapper per host
There are no combiners or reducers
Each H2O cluster must have a unique job name
-mapperXmx, -nodes, and -output are required
Root permissions are not required - just unzip the H2O .zip file on any single node

Prerequisite: Open Communication Paths

H2O communicates using two communication paths. Verify these are open and available for use by H2O.

Path 1: mapper to driver

Optionally specify this port using the -driverport option in the hadoop jar command (see “Hadoop Launch Parameters” below). This port is opened on the driver host (the host where you entered the hadoop jar command). By default, this port is chosen randomly by the operating system.

Path 2: mapper to mapper

Optionally specify this port using the -baseport option in the hadoop jar command (see “Hadoop Launch Parameters” below). This port and the next subsequent port are opened on the mapper hosts (the Hadoop worker nodes) where the H2O mapper nodes are placed by the Resource Manager. By default, ports 54321 (TCP) and 54322 (TCP & UDP) are used.

The mapper port is adaptive: if 54321 and 54322 are not available, H2O will try 54323 and 54324 and so on. The mapper port is designed to be adaptive because sometimes if the YARN cluster is low on resources, YARN will place two H2O mappers for the same H2O cluster request on the same physical host. For this reason, we recommend opening a range of more than two ports (20 ports should be sufficient).

Tutorial

The following tutorial will walk the user through the download or build of H2O and the parameters involved in launching H2O from the command line.

Download the latest H2O-dev release for your version of Hadoop:

 wget http://h2o-release.s3.amazonaws.com/h2o-dev/master/1110/h2o-dev-0.3.0.1110-cdh5.2.zip
 wget http://h2o-release.s3.amazonaws.com/h2o-dev/master/1110/h2o-dev-0.3.0.1110-cdh5.3.zip
 wget http://h2o-release.s3.amazonaws.com/h2o-dev/master/1110/h2o-dev-0.3.0.1110-hdp2.1.zip
 wget http://h2o-release.s3.amazonaws.com/h2o-dev/master/1110/h2o-dev-0.3.0.1110-hdp2.2.zip
 wget http://h2o-release.s3.amazonaws.com/h2o-dev/master/1110/h2o-dev-0.3.0.1110-mapr3.1.1.zip
 wget http://h2o-release.s3.amazonaws.com/h2o-dev/master/1110/h2o-dev-0.3.0.1110-mapr4.0.1.zip

Note: Enter only one of the above commands.

Prepare the job input on the Hadoop Node by unzipping the build file and changing to the directory with the Hadoop and H2O’s driver jar files.
```
 unzip h2o-0.3.0.1110-*.zip
 cd h2o-0.3.0.1110-*
```
To launch H2O nodes and form a cluster on the Hadoop cluster, run:
```
 hadoop jar h2odriver.jar -nodes 1 -mapperXmx 1g -output hdfsOutputDirName
```
- The above command launches a 1g node of H2O. We recommend you launch the cluster with at least four times the memory of your data file size.
- mapperXmx is the mapper size or the amount of memory allocated to each node.
- nodes is the number of nodes requested to form the cluster.
- output is the name of the directory created each time a H2O cloud is created so it is necessary for the name to be unique each time it is launched.

To monitor your job, direct your web browser to your standard job tracker Web UI. To access H2O’s Web UI, direct your web browser to one of the launched instances. If you are unsure where your JVM is launched, review the output from your command after the nodes has clouded up and formed a cluster. Any of the nodes’ IP addresses will work as there is no master node.

 Determining driver host interface for mapper->driver callback...
 [Possible callback IP address: 172.16.2.181]
 [Possible callback IP address: 127.0.0.1]
 ...
 Waiting for H2O cluster to come up...
 H2O node 172.16.2.184:54321 requested flatfile
 Sending flatfiles to nodes...
  [Sending flatfile to node 172.16.2.184:54321]
 H2O node 172.16.2.184:54321 reports H2O cluster size 1 
 H2O cluster (1 nodes) is up
 Blocking until the H2O cluster shuts down...

Hadoop Launch Parameters

-libjars <.../h2o.jar>: Add external jar files; must end with h2o.jar.
-h | -help: Display help
-job name <JobName>: Specify a job name; the default is H2O_nnnnn (where n is chosen randomly)
-driverif <IP address of mapper -> driver callback interface>: Specify the IP address for callback messages from the mapper to the driver.
-driverport <port of mapper -> callback interface>: Specify the port number for callback messages from the mapper to the driver.
-network <IPv4Network1>[,<IPv4Network2>]: Specify the IPv4 network(s) to bind to the H2O nodes; multiple networks can be specified to force H2O to use the specified host in the Hadoop cluster. 10.1.2.0/24 allows 256 possibilities.
-timeout <seconds>: Specify the timeout duration (in seconds) to wait for the cluster to form before failing.
-disown: Exit the driver after the cluster forms.
notify <notification file name>: Specify a file to write when the cluster is up. The file contains the IP and port of the embedded web server for one of the nodes in the cluster. All mappers must start before the H2O cloud is considered “up”.
mapperXmx <per mapper Java Xmx heap size>: Specify the amount of memory to allocate to H2O.
extramempercent <0-20>: Specify the extra memory for internal JVM use outside of the Java heap. This is a percentage of mapperXmx.
-n | -nodes <number of H2O nodes>: Specify the number of nodes.
-nthreads <maximum number of CPUs>: Specify the number of CPUs to use. Enter -1 to use all CPUs on the host, or enter a positive integer.
-baseport <initialization port for H2O nodes>: Specify the initialization port for the H2O nodes. The default is 54321.
-ea: Enable assertions to verify boolean expressions for error detection.
-verbose:gc: Include heap and garbage collection information in the logs.
-XX:+PrintGCDetails: Include a short message after each garbage collection.
-license <license file name>: Specify the directory of local filesytem location and the license file name.
-o | -output <HDFS output directory>: Specify the HDFS directory for the output.
-flow_dir <Saved Flows directory>: Specify the directory for saved flows. By default, H2O will try to find the HDFS home directory to use as the directory for flows. If the HDFS home directory is not found, flows cannot be saved unless a directory is specified using -flow_dir.

REST API Reference

/3/About
/3/Cloud
/3/Cloud
/3/CreateFrame
/3/DKV
/3/DKV/(?.*)
/3/DownloadDataset
/3/Find
/3/Frames
/3/Frames
/3/Frames/(?.*)
/3/Frames/(?.*)
/3/Frames/(?.*)/columns
/3/Frames/(?.*)/columns/(?.*)
/3/Frames/(?.*)/columns/(?.*)/domain
/3/Frames/(?.*)/columns/(?.*)/summary
/3/Frames/(?.*)/export/(?.*)/overwrite/(?.*)
/3/Frames/(?.*)/summary
/3/ImportFiles
/3/InitID
/3/JStack
/3/Jobs
/3/Jobs/(?.*)
/3/Jobs/(?.*)/cancel
/3/KillMinus3
/3/LogAndEcho
/3/Logs/nodes/(?.*)/files/(?.*)
/3/MakeGLMModel
/3/Metadata/endpoints
/3/Metadata/endpoints/(?[0-9]+)
/3/Metadata/endpoints/(?.*)
/3/Metadata/schemaclasses/(?.*)
/3/Metadata/schemas
/3/Metadata/schemas/(?.*)
/3/MissingInserter
/3/ModelBuilders
/3/ModelBuilders/(?.*)
/3/ModelBuilders/deeplearning
/3/ModelBuilders/deeplearning/parameters
/3/ModelBuilders/drf
/3/ModelBuilders/drf/parameters
/3/ModelBuilders/gbm
/3/ModelBuilders/gbm/parameters
/3/ModelBuilders/glm
/3/ModelBuilders/glm/parameters
/3/ModelBuilders/kmeans
/3/ModelBuilders/kmeans/parameters
/3/ModelBuilders/naivebayes
/3/ModelBuilders/naivebayes/parameters
/3/ModelBuilders/pca
/3/ModelBuilders/pca/parameters
/3/ModelMetrics
/3/ModelMetrics/frames/(?.*)
/3/ModelMetrics/frames/(?.*)/models/(?.*)
/3/ModelMetrics/frames/(?.*)/models/(?.*)
/3/ModelMetrics/models/(?.*)
/3/ModelMetrics/models/(?.*)/frames/(?.*)
/3/ModelMetrics/models/(?.*)/frames/(?.*)
/3/ModelMetrics/models/(?.*)/frames/(?.*)
/3/Models
/3/Models
/3/Models/(?.*)
/3/Models/(?.*)
/3/Models/(?.*)/preview
/3/NetworkTest
/3/NodePersistentStorage/(?.*)
/3/NodePersistentStorage/(?.*)
/3/NodePersistentStorage/(?.*)/(?.*)
/3/NodePersistentStorage/(?.*)/(?.*)
/3/NodePersistentStorage/(?.*)/(?.*)
/3/NodePersistentStorage/categories/(?.*)/exists
/3/NodePersistentStorage/categories/(?.*)/names/(?.*)/exists
/3/NodePersistentStorage/configured
/3/Parse
/3/ParseSetup
/3/Predictions/models/(?.*)/frames/(?.*)
/3/Profiler
/3/Rapids
/3/Rapids/isEval
/3/Shutdown
/3/SplitFrame
/3/Timeline
/3/Tutorials
/3/Typeahead/files
/3/UnlockKeys
/3/WaterMeterCpuTicks/(?.*)
/3/WaterMeterIo
/3/WaterMeterIo/(?.*)
/99/Sample

GET /3/About

Return information about this H2O.

Input	`AboutV3`
Output	`AboutV3`

GET /3/Cloud

Determine the status of the nodes in the H2O cloud.

Input	`CloudV3`
Output	`CloudV3`

HEAD /3/Cloud

Determine the status of the nodes in the H2O cloud.

Input	`CloudV3`
Output	`CloudV3`

POST /3/CreateFrame

Create a synthetic H2O Frame.

Input	`CreateFrameV3`
Output	`CreateFrameV3`

DELETE /3/DKV

Remove all keys from the H2O distributed K/V store.

Input	`RemoveAllV3`
Output	`RemoveAllV3`

DELETE /3/DKV/(?.*)

Remove an arbitrary key from the H2O distributed K/V store.

Input	`RemoveV3`
Output	`RemoveV3`

GET /3/DownloadDataset

Download something something.

Input	`DownloadDataV3`
Output	`DownloadDataV3`

GET /3/Find

Find a value within a Frame.

Input	`FindV3`
Output	`FindV3`

GET /3/Frames

Return all Frames in the H2O distributed K/V store.

Input	`FramesV3`
Output	`FramesV3`

DELETE /3/Frames

Delete all Frames from the H2O distributed K/V store.

Input	`FramesV3`
Output	`FramesV3`

GET /3/Frames/(?.*)

Return the specified Frame.

Input	`FramesV3`
Output	`FramesV3`

DELETE /3/Frames/(?.*)

Delete the specified Frame from the H2O distributed K/V store.

Input	`FramesV3`
Output	`FramesV3`

GET /3/Frames/(?.*)/columns

Return all the columns from a Frame.

Input	`FramesV3`
Output	`FramesV3`

GET /3/Frames/(?.)/columns/(?.)

Return the specified column from a Frame.

Input	`FramesV3`
Output	`FramesV3`

GET /3/Frames/(?.)/columns/(?.)/domain

Return the domains for the specified column. “null” if the column is not an Enum.

Input	`FramesV3`
Output	`FramesV3`

GET /3/Frames/(?.)/columns/(?.)/summary

Return the summary metrics for a column, e.g. mins, maxes, mean, sigma, percentiles, etc.

Input	`FramesV3`
Output	`FramesV3`

GET /3/Frames/(?.)/export/(?.)/overwrite/(?.*)

Export a Frame to the given path with optional overwrite.

Input	`FramesV3`
Output	`FramesV3`

GET /3/Frames/(?.*)/summary

Return a Frame, including the histograms, after forcing computation of rollups.

Input	`FramesV3`
Output	`FramesV3`

GET /3/ImportFiles

Import raw data files into a single-column H2O Frame.

Input	`ImportFilesV3`
Output	`ImportFilesV3`

GET /3/InitID

Issue a new session ID.

Input	`InitIDV3`
Output	`InitIDV3`

GET /3/JStack

Something something something.

Input	`JStackV3`
Output	`JStackV3`

GET /3/Jobs

Get a list of all the H2O Jobs (long-running actions).

Input	`JobsV3`
Output	`Schema`

GET /3/Jobs/(?.*)

Get the status of the given H2O Job (long-running action).

Input	`JobsV3`
Output	`Schema`

POST /3/Jobs/(?.*)/cancel

Cancel a running job.

Input	`JobsV3`
Output	`Schema`

GET /3/KillMinus3

Kill minus 3 on this node

Input	`KillMinus3V3`
Output	`KillMinus3V3`

POST /3/LogAndEcho

Save a message to the H2O logfile.

Input	`LogAndEchoV3`
Output	`LogAndEchoV3`

GET /3/Logs/nodes/(?.)/files/(?.)

Get named log file for a node.

Input	`LogsV3`
Output	`LogsV3`

POST /3/MakeGLMModel

make a new GLM model based on existing one

Input	`MakeGLMModelV3`
Output	`GLMModelV3`

GET /3/Metadata/endpoints

Return a list of all the REST API endpoints.

Input	`DocsV3`
Output	`DocsV3`

GET /3/Metadata/endpoints/(?[0-9]+)

Return the REST API endpoint metadata, including documentation, for the endpoint specified by number.

Input	`DocsV3`
Output	`DocsV3`

GET /3/Metadata/endpoints/(?.*)

Return the REST API endpoint metadata, including documentation, for the endpoint specified by path.

Input	`DocsV3`
Output	`DocsV3`

GET /3/Metadata/schemaclasses/(?.*)

Return the REST API schema metadata for specified schema class.

Input	`DocsV3`
Output	`DocsV3`

GET /3/Metadata/schemas

Return list of all REST API schemas.

Input	`DocsV3`
Output	`DocsV3`

GET /3/Metadata/schemas/(?.*)

Return the REST API schema metadata for specified schema.

Input	`DocsV3`
Output	`DocsV3`

POST /3/MissingInserter

Insert missing values.

Input	`MissingInserterV3`
Output	`MissingInserterV3`

GET /3/ModelBuilders

Return the Model Builder metadata for all available algorithms.

Input	`ModelBuildersV3`
Output	`ModelBuildersV3`

GET /3/ModelBuilders/(?.*)

Return the Model Builder metadata for the specified algorithm.

Input	`ModelBuildersV3`
Output	`ModelBuildersV3`

POST /3/ModelBuilders/deeplearning

Train a Deep Learning model on the specified Frame.

Input	`DeepLearningV3`
Output	`Schema`

POST /3/ModelBuilders/deeplearning/parameters

Validate a set of Deep Learning model builder parameters.

Input	`DeepLearningV3`
Output	`DeepLearningV3`

POST /3/ModelBuilders/drf

Train a DRF model on the specified Frame.

Input	`DRFV3`
Output	`Schema`

POST /3/ModelBuilders/drf/parameters

Validate a set of DRF model builder parameters.

Input	`DRFV3`
Output	`DRFV3`

POST /3/ModelBuilders/gbm

Train a GBM model on the specified Frame.

Input	`GBMV3`
Output	`Schema`

POST /3/ModelBuilders/gbm/parameters

Validate a set of GBM model builder parameters.

Input	`GBMV3`
Output	`GBMV3`

POST /3/ModelBuilders/glm

Train a GLM model on the specified Frame.

Input	`GLMV3`
Output	`Schema`

POST /3/ModelBuilders/glm/parameters

Validate a set of GLM model builder parameters.

Input	`GLMV3`
Output	`GLMV3`

POST /3/ModelBuilders/kmeans

Train a KMeans model on the specified Frame.

Input	`KMeansV3`
Output	`Schema`

POST /3/ModelBuilders/kmeans/parameters

Validate a set of KMeans model builder parameters.

Input	`KMeansV3`
Output	`KMeansV3`

POST /3/ModelBuilders/naivebayes

Train a Naive Bayes model on the specified Frame.

Input	`NaiveBayesV3`
Output	`Schema`

POST /3/ModelBuilders/naivebayes/parameters

Validate a set of Naive Bayes model builder parameters.

Input	`NaiveBayesV3`
Output	`NaiveBayesV3`

POST /3/ModelBuilders/pca

Train a PCA model on the specified Frame.

Input	`PCAV3`
Output	`Schema`

POST /3/ModelBuilders/pca/parameters

Validate a set of PCA model builder parameters.

Input	`PCAV3`
Output	`PCAV3`

GET /3/ModelMetrics

Return all the saved scoring metrics.

Input	`ModelMetricsListSchemaV3`
Output	`ModelMetricsListSchemaV3`

GET /3/ModelMetrics/frames/(?.*)

Return the saved scoring metrics for the specified Frame.

Input	`ModelMetricsListSchemaV3`
Output	`ModelMetricsListSchemaV3`

GET /3/ModelMetrics/frames/(?.)/models/(?.)

Return the saved scoring metrics for the specified Model and Frame.

Input	`ModelMetricsListSchemaV3`
Output	`ModelMetricsListSchemaV3`

DELETE /3/ModelMetrics/frames/(?.)/models/(?.)

Return the saved scoring metrics for the specified Model and Frame.

Input	`ModelMetricsListSchemaV3`
Output	`ModelMetricsListSchemaV3`

GET /3/ModelMetrics/models/(?.*)

Return the saved scoring metrics for the specified Model.

Input	`ModelMetricsListSchemaV3`
Output	`ModelMetricsListSchemaV3`

GET /3/ModelMetrics/models/(?.)/frames/(?.)

Return the saved scoring metrics for the specified Model and Frame.

Input	`ModelMetricsListSchemaV3`
Output	`ModelMetricsListSchemaV3`

DELETE /3/ModelMetrics/models/(?.)/frames/(?.)

Return the saved scoring metrics for the specified Model and Frame.

Input	`ModelMetricsListSchemaV3`
Output	`ModelMetricsListSchemaV3`

POST /3/ModelMetrics/models/(?.)/frames/(?.)

Return the scoring metrics for the specified Frame with the specified Model. If the Frame has already been scored with the Model then cached results will be returned; otherwise predictions for all rows in the Frame will be generated and the metrics will be returned.

Input	`ModelMetricsListSchemaV3`
Output	`ModelMetricsListSchemaV3`

GET /3/Models

Return all Models from the H2O distributed K/V store.

Input	`ModelsV3`
Output	`ModelsV3`

DELETE /3/Models

Delete all Models from the H2O distributed K/V store.

Input	`ModelsV3`
Output	`ModelsV3`

GET /3/Models/(?.*)

Return the specified Model from the H2O distributed K/V store, optionally with the list of compatible Frames.

Input	`ModelsV3`
Output	`ModelsV3`

DELETE /3/Models/(?.*)

Delete the specified Model from the H2O distributed K/V store.

Input	`ModelsV3`
Output	`ModelsV3`

GET /3/Models/(?.*)/preview

Return potentially abridged model suitable for viewing in a browser (currently only used for java model code).

Input	`ModelsV3`
Output	`ModelsV3`

GET /3/NetworkTest

Something something something.

Input	`NetworkTestV3`
Output	`NetworkTestV3`

POST /3/NodePersistentStorage/(?.*)

Store a value.

Input	`NodePersistentStorageV3`
Output	`NodePersistentStorageV3`

GET /3/NodePersistentStorage/(?.*)

Return all keys stored for a given category.

Input	`NodePersistentStorageV3`
Output	`NodePersistentStorageV3`

POST /3/NodePersistentStorage/(?.)/(?.)

Store a named value.

Input	`NodePersistentStorageV3`
Output	`NodePersistentStorageV3`

GET /3/NodePersistentStorage/(?.)/(?.)

Return value for a given name.

Input	`NodePersistentStorageV3`
Output	`NodePersistentStorageV3`

DELETE /3/NodePersistentStorage/(?.)/(?.)

Delete a key.

Input	`NodePersistentStorageV3`
Output	`NodePersistentStorageV3`

GET /3/NodePersistentStorage/categories/(?.*)/exists

Return true or false.

Input	`NodePersistentStorageV3`
Output	`NodePersistentStorageV3`

GET /3/NodePersistentStorage/categories/(?.)/names/(?.)/exists

Return true or false.

Input	`NodePersistentStorageV3`
Output	`NodePersistentStorageV3`

GET /3/NodePersistentStorage/configured

Return true or false.

Input	`NodePersistentStorageV3`
Output	`NodePersistentStorageV3`

POST /3/Parse

Parse a raw byte-oriented Frame into a useful columnar data Frame.

Input	`ParseV3`
Output	`ParseV3`

POST /3/ParseSetup

Guess the parameters for parsing raw byte-oriented data into an H2O Frame.

Input	`ParseSetupV3`
Output	`ParseSetupV3`

POST /3/Predictions/models/(?.)/frames/(?.)

Score (generate predictions) for the specified Frame with the specified Model. Both the Frame of predictions and the metrics will be returned.

Input	`ModelMetricsListSchemaV3`
Output	`ModelMetricsListSchemaV3`

GET /3/Profiler

Something something something.

Input	`ProfilerV3`
Output	`ProfilerV3`

POST /3/Rapids

Something something R exec something.

Input	`RapidsV3`
Output	`RapidsV3`

GET /3/Rapids/isEval

something something r exec something.

Input	`RapidsV3`
Output	`RapidsV3`

POST /3/Shutdown

Shut down the cluster

Input	`ShutdownV3`
Output	`ShutdownV3`

POST /3/SplitFrame

Split a H2O Frame.

Input	`SplitFrameV3`
Output	`SplitFrameV3`

GET /3/Timeline

Something something something.

Input	`TimelineV3`
Output	`TimelineV3`

GET /3/Tutorials

H2O tutorials.

Input	`TutorialsV3`
Output	`TutorialsV3`

GET /3/Typeahead/files

Typehead hander for filename completion.

Input	`TypeaheadV3`
Output	`Schema`

POST /3/UnlockKeys

Unlock all keys in the H2O distributed K/V store, to attempt to recover from a crash.

Input	`UnlockKeysV3`
Output	`UnlockKeysV3`

GET /3/WaterMeterCpuTicks/(?.*)

Return a CPU usage snapshot of all cores of all nodes in the H2O cluster.

Input	`WaterMeterCpuTicksV3`
Output	`WaterMeterCpuTicksV3`

GET /3/WaterMeterIo

Return IO usage snapshot of all nodes in the H2O cluster.

Input	`WaterMeterIoV3`
Output	`WaterMeterIoV3`

GET /3/WaterMeterIo/(?.*)

Return IO usage snapshot of all nodes in the H2O cluster.

Input	`WaterMeterIoV3`
Output	`WaterMeterIoV3`

GET /99/Sample

Example of an experimental endpoint. Call via /EXPERIMENTAL/Sample. Experimental endpoints can change at any moment.

Input	`CloudV3`
Output	`CloudV3`

REST API Schema Reference

AboutEntryV3
AboutV3
CloudV3
ClusteringModelBuilderSchema
ClusteringModelParametersSchema
ColSpecifierV2
ColV2
ColumnSpecsBase
ConfusionMatrixBase
ConfusionMatrixV3
CoxPHModelOutputV3
CoxPHModelV3
CoxPHParametersV3
CoxPHV3
CreateFrameV3
DRFModelOutputV3
DRFModelV3
DRFParametersV3
DRFV3
DStackTraceV2
DeepLearningModelOutputV3
DeepLearningModelV3
DeepLearningParametersV3
DeepLearningV3
DocsBase
DocsV3
DownloadDataV3
EventV2
ExampleModelOutputV3
ExampleModelV3
ExampleParametersV3
ExampleV3
FieldMetadataBase
FieldMetadataV3
FindV3
FrameKeyV3
FrameV3
FramesBase
FramesV3
GBMModelOutputV3
GBMModelV3
GBMParametersV3
GBMV3
GLMModelOutputV3
GLMModelV3
GLMParametersV3
GLMV3
GrepModelOutputV3
GrepModelV3
GrepParametersV3
GrepV3
H2OErrorV3
H2OModelBuilderErrorV3
HeartBeatEvent
IOEvent
ImportFilesV3
InitIDV3
IoStatsEntry
JStackV3
JobKeyV3
JobV3
JobsV3
KMeansModelOutputV3
KMeansModelV3
KMeansParametersV3
KMeansV3
KeyV3
KillMinus3V3
LogAndEchoV3
LogsV3
MakeGLMModelV3
MissingInserterV3
ModelBuilderSchema
ModelBuildersBase
ModelBuildersV3
ModelKeyV3
ModelMetricsAutoEncoderV3
ModelMetricsBase
ModelMetricsBinomialGLMV3
ModelMetricsBinomialV3
ModelMetricsClusteringV3
ModelMetricsListSchemaV3
ModelMetricsMultinomialV3
ModelMetricsPCAV3
ModelMetricsRegressionGLMV3
ModelMetricsRegressionV3
ModelOutputSchema
ModelParameterSchemaV3
ModelParametersSchema
ModelSchema
ModelsBase
ModelsV3
NaiveBayesModelOutputV3
NaiveBayesModelV3
NaiveBayesParametersV3
NaiveBayesV3
NetworkEvent
NetworkTestV3
NodePersistentStorageEntryV3
NodePersistentStorageV3
NodeV1
PCAModelOutputV3
PCAModelV3
PCAParametersV3
PCAV3
ParseSetupV3
ParseV3
ProfilerNodeEntryV3
ProfilerNodeV3
ProfilerV3
QuantileParametersV2
QuantileV3
RapidsV3
RemoveAllV3
RemoveV3
RouteBase
RouteV3
Schema
SchemaMetadataBase
SchemaMetadataV3
SharedTreeModelOutputV3
SharedTreeModelV3
SharedTreeParametersV3
SharedTreeV3
ShutdownV3
SplitFrameV3
SupervisedModelBuilderSchema
SupervisedModelParametersSchema
SynonymV3
TimelineV3
TreeStatsV3
TutorialsV3
TwoDimTableBase
TwoDimTableV3
TypeaheadV3
UnlockKeysV3
ValidationMessageBase
ValidationMessageV2
VarImpBase
VarImpV3
VecKeyV3
WaterMeterCpuTicksV3
WaterMeterIoV3
Word2VecModelOutputV3
Word2VecModelV3
Word2VecParametersV3
Word2VecV3

AboutEntryV3

`name` `string`	Property name	Out
`value` `string`	Property value	Out

AboutV3

entries
Iced[] List of properties about this running H2O instance Out

CloudV3

`skip_ticks` `boolean`	skip_ticks	In
`version` `string`	version	Out
`node_idx` `int`	Node index number cloud status is collected from (zero-based)	Out
`cloud_name` `string`	cloud_name	Out
`cloud_size` `int`	cloud_size	Out
`cloud_uptime_millis` `long`	cloud_uptime_millis	Out
`cloud_healthy` `boolean`	cloud_healthy	Out
`bad_nodes` `int`	Nodes reporting unhealthy	Out
`consensus` `boolean`	Cloud voting is stable	Out
`locked` `boolean`	Cloud is accepting new members or not	Out
`nodes` `Iced[]`	nodes	Out

ClusteringModelBuilderSchema

`parameters` `Parameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

ClusteringModelParametersSchema

`k` `int`	Number of clusters	In/Out
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

ColSpecifierV2

`column_name` `string`	Name of the column	In/Out
`is_member_of_frames` `string[]`	List of fields which specify columns that must contain this column	In/Out

ColV2

`label` `string`	label	Out
`missing_count` `long`	missing	Out
`zero_count` `long`	zeros	Out
`positive_infinity_count` `long`	positive infinities	Out
`negative_infinity_count` `long`	negative infinities	Out
`mins` `double[]`	mins	Out
`maxs` `double[]`	maxs	Out
`mean` `double`	mean	Out
`sigma` `double`	sigma	Out
`type` `string`	datatype: {enum, string, int, real, time, uuid}	Out
`domain` `string[]`	domain; not-null for enum columns only	Out
`data` `double[]`	data	Out
`string_data` `string[]`	string data	Out
`precision` `byte`	decimal precision, -1 for all digits	Out
`histogram_bins` `long[]`	Histogram bins; null if not computed	Out
`histogram_base` `double`	Start of histogram bin zero	Out
`histogram_stride` `double`	Stride per bin	Out
`percentiles` `double[]`	Percentile values, matching the default percentiles	Out

ColumnSpecsBase

`name` `string`	Column Name	Out
`type` `string`	Column Type	Out
`format` `string`	Column Format (printf)	Out
`description` `string`	Column Description	Out

ConfusionMatrixBase

table
TwoDimTable Annotated confusion matrix Out

ConfusionMatrixV3

table
TwoDimTable Annotated confusion matrix Out

CoxPHModelOutputV3

`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

CoxPHModelV3

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `CoxPHParameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `CoxPHOutput`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

CoxPHParametersV3

`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

CoxPHV3

`parameters` `CoxPHParameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

CreateFrameV3

`rows` `long`	Number of rows	In
`cols` `int`	Number of data columns (in addition to the first response column)	In
`seed` `long`	Random number seed	In
`randomize` `boolean`	Whether frame should be randomized	In
`value` `long`	Constant value (for randomize=false)	In
`real_range` `long`	Range for real variables (-range … range)	In
`categorical_fraction` `double`	Fraction of categorical columns (for randomize=true)	In
`factors` `int`	Factor levels for categorical variables	In
`integer_fraction` `double`	Fraction of integer columns (for randomize=true)	In
`integer_range` `long`	Range for integer variables (-range … range)	In
`binary_fraction` `double`	Fraction of binary columns (for randomize=true)	In
`binary_ones_fraction` `double`	Fraction of 1’s in binary columns	In
`missing_fraction` `double`	Fraction of missing values	In
`response_factors` `int`	Number of factor levels of the first column (1=real, 2=binomial, N=multinomial)	In
`has_response` `boolean`	Whether an additional response column should be generated	In
`key` `Key`	Job Key	In
`description` `string`	Job description	In
`dest` `Key`	destination key	In/Out
`status` `string`	job status	Out
`progress` `float`	progress, from 0 to 1	Out
`progress_msg` `string`	current progress status description	Out
`start_time` `long`	Start time	Out
`msec` `long`	runtime	Out
`exception` `string`	exception	Out

DRFModelOutputV3

`variable_importances` `TwoDimTable`	Variable Importances	Out
`init_f` `double`	The Intercept term, the initial model function value to which trees make adjustments	Out
`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

DRFModelV3

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `DRFParameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `DRFOutput`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

DRFParametersV3

`mtries` `int`	Number of columns to randomly select at each level, or -1 for sqrt(#cols)	In
`sample_rate` `float`	Sample rate, from 0. to 1.0	In
`build_tree_one_node` `boolean`	Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.	In
`ntrees` `int`	Number of trees.	In
`max_depth` `int`	Maximum tree depth.	In
`min_rows` `int`	Fewest allowed observations in a leaf (in R called ‘nodesize’).	In
`nbins` `int`	Build a histogram of this many bins, then split at the best point	In
`seed` `long`	Seed for pseudo random number generator (if applicable)	In
`response_column` `VecSpecifier`	Response column	In/Out
`balance_classes` `boolean`	Balance training data class counts via over/under-sampling (for imbalanced data).	In/Out
`class_sampling_factors` `float[]`	Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.	In/Out
`max_after_balance_size` `float`	Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.	In/Out
`max_confusion_matrix_size` `int`	Maximum size (# classes) for confusion matrices to be printed in the Logs	In/Out
`max_hit_ratio_k` `int`	Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)	In/Out
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

DRFV3

`parameters` `DRFParameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

DStackTraceV2

`node` `string`	Node name	Out
`time` `long`	Unix epoch time	Out
`thread_traces` `string[]`	One trace per thread	Out

DeepLearningModelOutputV3

`weights` `Key[]`	Frame keys for weight matrices	In
`biases` `Key[]`	Frame keys for bias vectors	In
`variable_importances` `TwoDimTable`	Variable Importances	Out
`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

DeepLearningModelV3

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `DeepLearningParameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `DeepLearningModelOutput`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

DeepLearningParametersV3

`checkpoint` `Key`	Model checkpoint to resume training with	In/Out
`override_with_best_model` `boolean`	If enabled, override the final model with the best model found during training	In/Out
`autoencoder` `boolean`	Auto-Encoder	In/Out
`use_all_factor_levels` `boolean`	Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder.	In/Out
`activation` `enum`	Activation function	In/Out
`hidden` `int[]`	Hidden layer sizes (e.g. 100,100).	In/Out
`epochs` `double`	How many times the dataset should be iterated (streamed), can be fractional	In/Out
`train_samples_per_iteration` `long`	Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic	In/Out
`target_ratio_comm_to_comp` `double`	Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration=-2 (auto-tuning)	In/Out
`seed` `long`	Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded	In/Out
`adaptive_rate` `boolean`	Adaptive learning rate	In/Out
`rho` `double`	Adaptive learning rate time decay factor (similarity to prior updates)	In/Out
`epsilon` `double`	Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress)	In/Out
`rate` `double`	Learning rate (higher => less stable, lower => slower convergence)	In/Out
`rate_annealing` `double`	Learning rate annealing: rate / (1 + rate_annealing * samples)	In/Out
`rate_decay` `double`	Learning rate decay factor between layers (N-th layer: rate*alpha^(N-1))	In/Out
`momentum_start` `double`	Initial momentum at the beginning of training (try 0.5)	In/Out
`momentum_ramp` `double`	Number of training samples for which momentum increases	In/Out
`momentum_stable` `double`	Final momentum after the ramp is over (try 0.99)	In/Out
`nesterov_accelerated_gradient` `boolean`	Use Nesterov accelerated gradient (recommended)	In/Out
`input_dropout_ratio` `double`	Input layer dropout ratio (can improve generalization, try 0.1 or 0.2)	In/Out
`hidden_dropout_ratios` `double[]`	Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5	In/Out
`l1` `double`	L1 regularization (can add stability and improve generalization, causes many weights to become 0)	In/Out
`l2` `double`	L2 regularization (can add stability and improve generalization, causes many weights to be small	In/Out
`max_w2` `float`	Constraint for squared sum of incoming weights per unit (e.g. for Rectifier)	In/Out
`initial_weight_distribution` `enum`	Initial Weight Distribution	In/Out
`initial_weight_scale` `double`	Uniform: -value…value, Normal: stddev)	In/Out
`loss` `enum`	Loss function	In/Out
`score_interval` `double`	Shortest time interval (in secs) between model scoring	In/Out
`score_training_samples` `long`	Number of training set samples for scoring (0 for all)	In/Out
`score_validation_samples` `long`	Number of validation set samples for scoring (0 for all)	In/Out
`score_duty_cycle` `double`	Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring).	In/Out
`classification_stop` `double`	Stopping criterion for classification error fraction on training data (-1 to disable)	In/Out
`regression_stop` `double`	Stopping criterion for regression error (MSE) on training data (-1 to disable)	In/Out
`quiet_mode` `boolean`	Enable quiet mode for less output to standard output	In/Out
`score_validation_sampling` `enum`	Method used to sample validation dataset for scoring	In/Out
`diagnostics` `boolean`	Enable diagnostics for hidden layers	In/Out
`variable_importances` `boolean`	Compute variable importances for input features (Gedeon method) - can be slow for large networks	In/Out
`fast_mode` `boolean`	Enable fast mode (minor approximation in back-propagation)	In/Out
`ignore_const_cols` `boolean`	Ignore constant training columns (no information can be gained anyway)	In/Out
`force_load_balance` `boolean`	Force extra load balancing to increase training speed for small datasets (to keep all cores busy)	In/Out
`replicate_training_data` `boolean`	Replicate the entire training dataset onto every node for faster training on small datasets	In/Out
`single_node_mode` `boolean`	Run on a single node for fine-tuning of model parameters	In/Out
`shuffle_training_data` `boolean`	Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows)	In/Out
`missing_values_handling` `enum`	Handling of missing values. Either Skip or MeanImputation.	In/Out
`sparse` `boolean`	Sparse data handling (Experimental).	In/Out
`col_major` `boolean`	Use a column major weight matrix for input layer. Can speed up forward propagation, but might slow down backpropagation (Experimental).	In/Out
`average_activation` `double`	Average activation for sparse auto-encoder (Experimental)	In/Out
`sparsity_beta` `double`	Sparsity regularization (Experimental)	In/Out
`max_categorical_features` `int`	Max. number of categorical features, enforced via hashing (Experimental)	In/Out
`reproducible` `boolean`	Force reproducibility on small data (will be slow - only uses 1 thread)	In/Out
`export_weights_and_biases` `boolean`	Whether to export Neural Network weights and biases to H2O Frames	In/Out
`response_column` `VecSpecifier`	Response column	In/Out
`balance_classes` `boolean`	Balance training data class counts via over/under-sampling (for imbalanced data).	In/Out
`class_sampling_factors` `float[]`	Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.	In/Out
`max_after_balance_size` `float`	Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.	In/Out
`max_confusion_matrix_size` `int`	Maximum size (# classes) for confusion matrices to be printed in the Logs	In/Out
`max_hit_ratio_k` `int`	Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)	In/Out
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

DeepLearningV3

`parameters` `DeepLearningParameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

DocsBase

`num` `int`	Number for specifying an endpoint	In
`http_method` `string`	HTTP method (GET, POST, DELETE) if fetching by path	In
`path` `string`	Path for specifying an endpoint	In
`classname` `string`	Class name, for fetching docs for a schema (DEPRECATED)	In
`schemaname` `string`	Schema name (e.g., DocsV1), for fetching docs for a schema	In
`routes` `Route[]`	List of endpoint routes	Out
`schemas` `SchemaMetadata[]`	List of schemas	Out
`markdown` `string`	Table of Contents Markdown	Out

DocsV3

`num` `int`	Number for specifying an endpoint	In
`http_method` `string`	HTTP method (GET, POST, DELETE) if fetching by path	In
`path` `string`	Path for specifying an endpoint	In
`classname` `string`	Class name, for fetching docs for a schema (DEPRECATED)	In
`schemaname` `string`	Schema name (e.g., DocsV1), for fetching docs for a schema	In
`routes` `Route[]`	List of endpoint routes	Out
`schemas` `SchemaMetadata[]`	List of schemas	Out
`markdown` `string`	Table of Contents Markdown	Out

DownloadDataV3

`frame_id` `Key`	Frame to download	In
`hex_string` `boolean`	Emit double values in a machine readable lossless format with Double.toHexString().	In
`csv` `string`	CSV Stream	Out
`filename` `string`	Suggested Filename	Out

EventV2

`date` `string`	Time when the event was recorded. Format is hh:mm:ss:ms	In
`nanos` `long`	Time in nanos	In
`type` `enum`	type of recorded event	In

ExampleModelOutputV3

`iterations` `int`	Iterations executed	In
`maxs` `double[]`	(No description available)	In
`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

ExampleModelV3

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `ExampleParameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `ExampleOutput`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

ExampleParametersV3

`max_iterations` `int`	Maximum training iterations.	In
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

ExampleV3

`parameters` `ExampleParameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

FieldMetadataBase

`schema_name` `string`	Schema name for this field, if it is_schema, or the name of the enum, if it’s an enum.	In
`name` `string`	Field name in the Schema	Out
`type` `string`	Type for this field	Out
`is_schema` `boolean`	Type for this field is itself a Schema.	Out
`value` `Polymorphic`	Value for this field	Out
`help` `string`	A short help description to appear alongside the field in a UI	Out
`label` `string`	The label that should be displayed for the field if the name is insufficient	Out
`required` `boolean`	Is this field required, or is the default value generally sufficient?	Out
`level` `enum`	How important is this field? The web UI uses the level to do a slow reveal of the parameters	Out
`direction` `enum`	Is this field an input, output or inout?	Out
`values` `string[]`	For enum-type fields the allowed values are specified using the values annotation; this is used in UIs to tell the user the allowed values, and for validation	Out
`json` `boolean`	Should this field be rendered in the JSON representation?	Out
`is_member_of_frames` `string[]`	For Vec-type fields this is the set of other Vec-type fields which must contain mutually exclusive values; for example, for a SupervisedModel the response_column must be mutually exclusive with the weights_column	Out
`is_mutually_exclusive_with` `string[]`	For Vec-type fields this is the set of Frame-type fields which must contain the named column; for example, for a SupervisedModel the response_column must be in both the training_frame and (if it’s set) the validation_frame	Out

FieldMetadataV3

`schema_name` `string`	Schema name for this field, if it is_schema, or the name of the enum, if it’s an enum.	In
`name` `string`	Field name in the Schema	Out
`type` `string`	Type for this field	Out
`is_schema` `boolean`	Type for this field is itself a Schema.	Out
`value` `Polymorphic`	Value for this field	Out
`help` `string`	A short help description to appear alongside the field in a UI	Out
`label` `string`	The label that should be displayed for the field if the name is insufficient	Out
`required` `boolean`	Is this field required, or is the default value generally sufficient?	Out
`level` `enum`	How important is this field? The web UI uses the level to do a slow reveal of the parameters	Out
`direction` `enum`	Is this field an input, output or inout?	Out
`values` `string[]`	For enum-type fields the allowed values are specified using the values annotation; this is used in UIs to tell the user the allowed values, and for validation	Out
`json` `boolean`	Should this field be rendered in the JSON representation?	Out
`is_member_of_frames` `string[]`	For Vec-type fields this is the set of other Vec-type fields which must contain mutually exclusive values; for example, for a SupervisedModel the response_column must be mutually exclusive with the weights_column	Out
`is_mutually_exclusive_with` `string[]`	For Vec-type fields this is the set of Frame-type fields which must contain the named column; for example, for a SupervisedModel the response_column must be in both the training_frame and (if it’s set) the validation_frame	Out

FindV3

`key` `Frame`	Frame to search	In
`column` `string`	Column, or null for all	In
`row` `long`	Starting row for search	In
`match` `string`	Value to search for; leave blank for a search for missing values	In
`prev` `long`	previous row with matching value, or -1	Out
`next` `long`	next row with matching value, or -1	Out

FrameKeyV3

`name` `string`	Name (string representation) for this Key.	In/Out
`type` `string`	Name (string representation) for the type of Keyed this Key points to.	In/Out
`URL` `string`	URL for the resource that this Key points to, if one exists.	In/Out

FrameV3

`frame_id` `Key`	Key to inspect	In
`row_offset` `long`	Row offset to display	In
`row_count` `int`	Number of rows to display	In/Out
`checksum` `long`	checksum	Out
`rows` `long`	Number of rows	Out
`byte_size` `long`	Total data size in bytes	Out
`is_text` `boolean`	Raw unparsed text	Out
`default_percentiles` `double[]`	Default percentiles, from 0 to 1	Out
`columns` `Vec[]`	Columns	Out
`compatible_models` `string[]`	Compatible models, if requested	Out
`vec_ids` `Key[]`	The set of IDs of vectors in the Frame	Out
`chunk_summary` `TwoDimTable`	Chunk summary	Out
`distribution_summary` `TwoDimTable`	Distribution summary	Out

FramesBase

`frame_id` `Key`	Name of Frame of interest	In
`column` `string`	Name of column of interest	In
`find_compatible_models` `boolean`	Find and return compatible models?	In
`path` `string`	File output path	In
`force` `boolean`	Overwrite existing fil	In
`row_offset` `long`	Row offset to display	In/Out
`row_count` `int`	Number of rows to display	In/Out
`frames` `Frame[]`	Frames	Out
`compatible_models` `Model[]`	Compatible models	Out
`domain` `string[][]`	Domains	Out

FramesV3

`frame_id` `Key`	Name of Frame of interest	In
`column` `string`	Name of column of interest	In
`find_compatible_models` `boolean`	Find and return compatible models?	In
`path` `string`	File output path	In
`force` `boolean`	Overwrite existing fil	In
`row_offset` `long`	Row offset to display	In/Out
`row_count` `int`	Number of rows to display	In/Out
`frames` `Frame[]`	Frames	Out
`compatible_models` `Model[]`	Compatible models	Out
`domain` `string[][]`	Domains	Out

GBMModelOutputV3

`variable_importances` `TwoDimTable`	Variable Importances	Out
`init_f` `double`	The Intercept term, the initial model function value to which trees make adjustments	Out
`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

GBMModelV3

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `GBMParameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `GBMOutput`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

GBMParametersV3

`learn_rate` `float`	Learning rate from 0.0 to 1.0	In
`distribution` `enum`	Distribution function	In
`ntrees` `int`	Number of trees.	In
`max_depth` `int`	Maximum tree depth.	In
`min_rows` `int`	Fewest allowed observations in a leaf (in R called ‘nodesize’).	In
`nbins` `int`	Build a histogram of this many bins, then split at the best point	In
`seed` `long`	Seed for pseudo random number generator (if applicable)	In
`response_column` `VecSpecifier`	Response column	In/Out
`balance_classes` `boolean`	Balance training data class counts via over/under-sampling (for imbalanced data).	In/Out
`class_sampling_factors` `float[]`	Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.	In/Out
`max_after_balance_size` `float`	Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.	In/Out
`max_confusion_matrix_size` `int`	Maximum size (# classes) for confusion matrices to be printed in the Logs	In/Out
`max_hit_ratio_k` `int`	Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)	In/Out
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

GBMV3

`parameters` `GBMParameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

GLMModelOutputV3

`coefficients_table` `TwoDimTable`	Table of coefficients	In
`coefficients_magnitude` `TwoDimTable`	Coefficient magnitudes	In
`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

GLMModelV3

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `GLMParameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `GLMOutput`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

GLMParametersV3

`family` `enum`	Family. Use binomial for classification with logistic regression, others are for regression problems.	In
`solver` `enum`	Auto will pick solver better suited for the given dataset, in case of lambda search solvers may be changed during computation. IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, L_BFGS scales better for datasets with many columns.	In
`alpha` `double[]`	distribution of regularization between L1 and L2.	In
`lambda` `double[]`	regularization strength	In
`lambda_search` `boolean`	use lambda search starting at lambda max, given lambda is then interpreted as lambda min	In
`nlambdas` `int`	number of lambdas to be used in a search	In
`standardize` `boolean`	Standardize numeric columns to have zero mean and unit variance	In
`max_iterations` `int`	Maximum number of iterations	In
`beta_epsilon` `double`	beta esilon -> consider being converged if L1 norm of the current beta change is below this threshold	In
`link` `enum`	(No description available)	In
`prior` `double`	prior probability for y==1. To be used only for logistic regression iff the data has been sampled and the mean of response does not reflect reality.	In
`lambda_min_ratio` `double`	min lambda used in lambda search, specified as a ratio of lambda_max	In
`use_all_factor_levels` `boolean`	By default, first factor level is skipped from the possible set of predictors. Set this flag if you want use all of the levels. Needs sufficient regularization to solve!	In
`beta_constraints` `Key`	beta constraints	In
`max_active_predictors` `int`	Maximum number of active predictors during computation. Use as a stopping criterium to prevent expensive model building with many predictors.	In
`response_column` `VecSpecifier`	Response column	In/Out
`balance_classes` `boolean`	Balance training data class counts via over/under-sampling (for imbalanced data).	In/Out
`class_sampling_factors` `float[]`	Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.	In/Out
`max_after_balance_size` `float`	Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.	In/Out
`max_confusion_matrix_size` `int`	Maximum size (# classes) for confusion matrices to be printed in the Logs	In/Out
`max_hit_ratio_k` `int`	Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)	In/Out
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

GLMV3

`parameters` `GLMParameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

GrepModelOutputV3

`matches` `string[]`	Matching strings	In
`offsets` `long[]`	Byte offsets of matches	In
`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

GrepModelV3

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `GrepParameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `GrepOutput`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

GrepParametersV3

`regex` `string`	regex	In
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

GrepV3

`parameters` `GrepParameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

H2OErrorV3

`timestamp` `long`	Milliseconds since the epoch for the time that this H2OError instance was created. Generally this is a short time since the underlying error ocurred.	Out
`error_url` `string`	Error url	Out
`msg` `string`	Message intended for the end user (a data scientist).	Out
`dev_msg` `string`	Potentially more detailed message intended for a developer (e.g. a front end engineer or someone designing a language binding).	Out
`http_status` `int`	HTTP status code for this error.	Out
`values` `Map`	Any values that are relevant to reporting or handling this error. Examples are a key name if the error is on a key, or a field name and object name if it’s on a specific field.	Out
`exception_type` `string`	Exception type, if any.	Out
`exception_msg` `string`	Raw exception message, if any.	Out
`stacktrace` `string[]`	Stacktrace, if any.	Out

H2OModelBuilderErrorV3

`parameters` `Parameters`	Model builder parameters.	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out
`timestamp` `long`	Milliseconds since the epoch for the time that this H2OError instance was created. Generally this is a short time since the underlying error ocurred.	Out
`error_url` `string`	Error url	Out
`msg` `string`	Message intended for the end user (a data scientist).	Out
`dev_msg` `string`	Potentially more detailed message intended for a developer (e.g. a front end engineer or someone designing a language binding).	Out
`http_status` `int`	HTTP status code for this error.	Out
`values` `Map`	Any values that are relevant to reporting or handling this error. Examples are a key name if the error is on a key, or a field name and object name if it’s on a specific field.	Out
`exception_type` `string`	Exception type, if any.	Out
`exception_msg` `string`	Raw exception message, if any.	Out
`stacktrace` `string[]`	Stacktrace, if any.	Out

HeartBeatEvent

`sends` `int`	number of sent heartbeats	In
`recvs` `int`	number of received heartbeats	In
`date` `string`	Time when the event was recorded. Format is hh:mm:ss:ms	In
`nanos` `long`	Time in nanos	In
`type` `enum`	type of recorded event	In

IOEvent

`io_flavor` `string`	flavor of the recorded io (ice/hdfs/…)	In
`node` `string`	node where this io event happened	In
`data` `string`	data info	In
`date` `string`	Time when the event was recorded. Format is hh:mm:ss:ms	In
`nanos` `long`	Time in nanos	In
`type` `enum`	type of recorded event	In

ImportFilesV3

`path` `string`	path	In
`files` `string[]`	files	Out
`destination_frames` `string[]`	names	Out
`fails` `string[]`	fails	Out
`dels` `string[]`	dels	Out

InitIDV3

session_key
string Session ID Out

IoStatsEntry

`backend` `string`	Back end type	Out
`store_count` `long`	Number of store events	Out
`store_bytes` `long`	Cumulative stored bytes	Out
`delete_count` `long`	Number of delete events	Out
`load_count` `long`	Number of load events	Out
`load_bytes` `long`	Cumulative loaded bytes	Out

JStackV3

traces
DStackTrace[] Stacktraces Out

JobKeyV3

`name` `string`	Name (string representation) for this Key.	In/Out
`type` `string`	Name (string representation) for the type of Keyed this Key points to.	In/Out
`URL` `string`	URL for the resource that this Key points to, if one exists.	In/Out

JobV3

`key` `Key`	Job Key	In
`description` `string`	Job description	In
`dest` `Key`	destination key	In/Out
`status` `string`	job status	Out
`progress` `float`	progress, from 0 to 1	Out
`progress_msg` `string`	current progress status description	Out
`start_time` `long`	Start time	Out
`msec` `long`	runtime	Out
`exception` `string`	exception	Out

JobsV3

`job_id` `Key`	Optional Job identifier	In
`jobs` `Job[]`	jobs	Out

KMeansModelOutputV3

`centers` `TwoDimTable`	Cluster Centers[k][features]	In
`centers_std` `TwoDimTable`	Cluster Centers[k][features] on Standardized Data	In
`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

KMeansModelV3

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `KMeansParameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `KMeansOutput`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

KMeansParametersV3

`user_points` `Key`	User-specified points	In
`max_iterations` `int`	Maximum training iterations	In
`standardize` `boolean`	Standardize columns	In
`seed` `long`	RNG Seed	In
`init` `enum`	Initialization mode	In
`k` `int`	Number of clusters	In/Out
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

KMeansV3

`parameters` `KMeansParameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

KeyV3

`name` `string`	Name (string representation) for this Key.	In/Out
`type` `string`	Name (string representation) for the type of Keyed this Key points to.	In/Out
`URL` `string`	URL for the resource that this Key points to, if one exists.	In/Out

KillMinus3V3

(No fields)

LogAndEchoV3

message
string Message to be Logged and Echoed In

LogsV3

`nodeidx` `int`	Index of node to query ticks for (0-based). -1 means current node.	In
`name` `string`	Which specific log file to read from the log file directory. If left unspecified, the system chooses a default for you.	In
`log` `string`	Content of log file	Out

MakeGLMModelV3

`model` `Key`	source model	In
`dest` `Key`	destination key	In
`names` `string[]`	coefficient names	In
`beta` `double[]`	new glm coefficients	In
`threshold` `float`	decision threshold for label-generation	In

MissingInserterV3

`dataset` `Key`	dataset	In
`fraction` `double`	Fraction of data to replace with a missing value	In
`seed` `long`	Seed	In
`key` `Key`	Job Key	In
`description` `string`	Job description	In
`dest` `Key`	destination key	In/Out
`status` `string`	job status	Out
`progress` `float`	progress, from 0 to 1	Out
`progress_msg` `string`	current progress status description	Out
`start_time` `long`	Start time	Out
`msec` `long`	runtime	Out
`exception` `string`	exception	Out

ModelBuilderSchema

`parameters` `Parameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

ModelBuildersBase

`algo` `string`	Algo of ModelBuilder of interest	In
`model_builders` `Map`	ModelBuilders	Out

ModelBuildersV3

`algo` `string`	Algo of ModelBuilder of interest	In
`model_builders` `Map`	ModelBuilders	Out

ModelKeyV3

`name` `string`	Name (string representation) for this Key.	In/Out
`type` `string`	Name (string representation) for the type of Keyed this Key points to.	In/Out
`URL` `string`	URL for the resource that this Key points to, if one exists.	In/Out

ModelMetricsAutoEncoderV3

`model` `Key`	The model used for this scoring run.	In/Out
`model_checksum` `long`	The checksum for the model used for this scoring run.	In/Out
`frame` `Key`	The frame used for this scoring run.	In/Out
`frame_checksum` `long`	The checksum for the frame used for this scoring run.	In/Out
`description` `string`	Optional description for this scoring run (to note out-of-bag, sampled data, etc.)	Out
`model_category` `enum`	The category (e.g., Clustering) for the model used for this scoring run.	Out
`duration_in_ms` `long`	The duration in mS for this scoring run.	Out
`scoring_time` `long`	The time in mS since the epoch for the start of this scoring run.	Out
`predictions` `Frame`	Predictions Frame.	Out
`MSE` `double`	The Mean Squared Error of the prediction for this scoring run.	Out

ModelMetricsBase

`model` `Key`	The model used for this scoring run.	In/Out
`model_checksum` `long`	The checksum for the model used for this scoring run.	In/Out
`frame` `Key`	The frame used for this scoring run.	In/Out
`frame_checksum` `long`	The checksum for the frame used for this scoring run.	In/Out
`description` `string`	Optional description for this scoring run (to note out-of-bag, sampled data, etc.)	Out
`model_category` `enum`	The category (e.g., Clustering) for the model used for this scoring run.	Out
`duration_in_ms` `long`	The duration in mS for this scoring run.	Out
`scoring_time` `long`	The time in mS since the epoch for the start of this scoring run.	Out
`predictions` `Frame`	Predictions Frame.	Out
`MSE` `double`	The Mean Squared Error of the prediction for this scoring run.	Out

ModelMetricsBinomialGLMV3

`model` `Key`	The model used for this scoring run.	In/Out
`model_checksum` `long`	The checksum for the model used for this scoring run.	In/Out
`frame` `Key`	The frame used for this scoring run.	In/Out
`frame_checksum` `long`	The checksum for the frame used for this scoring run.	In/Out
`residual_deviance` `double`	residual deviance	Out
`null_deviance` `double`	null deviance	Out
`AIC` `double`	AIC	Out
`null_degrees_of_freedom` `long`	null DOF	Out
`residual_degrees_of_freedom` `long`	residual DOF	Out
`r2` `double`	The R^2 for this scoring run.	Out
`logloss` `double`	The logarithmic loss for this scoring run.	Out
`AUC` `double`	The AUC for this scoring run.	Out
`Gini` `double`	The Gini score for this scoring run.	Out
`thresholds_and_metric_scores` `TwoDimTable`	The Metrics for various thresholds.	Out
`max_criteria_and_metric_scores` `TwoDimTable`	The Metrics for various criteria.	Out
`description` `string`	Optional description for this scoring run (to note out-of-bag, sampled data, etc.)	Out
`model_category` `enum`	The category (e.g., Clustering) for the model used for this scoring run.	Out
`duration_in_ms` `long`	The duration in mS for this scoring run.	Out
`scoring_time` `long`	The time in mS since the epoch for the start of this scoring run.	Out
`predictions` `Frame`	Predictions Frame.	Out
`MSE` `double`	The Mean Squared Error of the prediction for this scoring run.	Out

ModelMetricsBinomialV3

`model` `Key`	The model used for this scoring run.	In/Out
`model_checksum` `long`	The checksum for the model used for this scoring run.	In/Out
`frame` `Key`	The frame used for this scoring run.	In/Out
`frame_checksum` `long`	The checksum for the frame used for this scoring run.	In/Out
`r2` `double`	The R^2 for this scoring run.	Out
`logloss` `double`	The logarithmic loss for this scoring run.	Out
`AUC` `double`	The AUC for this scoring run.	Out
`Gini` `double`	The Gini score for this scoring run.	Out
`thresholds_and_metric_scores` `TwoDimTable`	The Metrics for various thresholds.	Out
`max_criteria_and_metric_scores` `TwoDimTable`	The Metrics for various criteria.	Out
`description` `string`	Optional description for this scoring run (to note out-of-bag, sampled data, etc.)	Out
`model_category` `enum`	The category (e.g., Clustering) for the model used for this scoring run.	Out
`duration_in_ms` `long`	The duration in mS for this scoring run.	Out
`scoring_time` `long`	The time in mS since the epoch for the start of this scoring run.	Out
`predictions` `Frame`	Predictions Frame.	Out
`MSE` `double`	The Mean Squared Error of the prediction for this scoring run.	Out

ModelMetricsClusteringV3

`avg_within_ss` `double`	Average within cluster Mean Square Error	In
`avg_ss` `double`	Average Mean Square Error to grand mean	In
`avg_between_ss` `double`	Average between cluster Mean Square Error	In
`centroid_stats` `TwoDimTable`	Centroid Statistics	In
`model` `Key`	The model used for this scoring run.	In/Out
`model_checksum` `long`	The checksum for the model used for this scoring run.	In/Out
`frame` `Key`	The frame used for this scoring run.	In/Out
`frame_checksum` `long`	The checksum for the frame used for this scoring run.	In/Out
`description` `string`	Optional description for this scoring run (to note out-of-bag, sampled data, etc.)	Out
`model_category` `enum`	The category (e.g., Clustering) for the model used for this scoring run.	Out
`duration_in_ms` `long`	The duration in mS for this scoring run.	Out
`scoring_time` `long`	The time in mS since the epoch for the start of this scoring run.	Out
`predictions` `Frame`	Predictions Frame.	Out
`MSE` `double`	The Mean Squared Error of the prediction for this scoring run.	Out

ModelMetricsListSchemaV3

`model` `Key`	Key of Model of interest (optional)	In
`frame` `Key`	Key of Frame of interest (optional)	In
`reconstruction_error` `boolean`	Compute reconstruction error (optional, only for Deep Learning AutoEncoder models)	In
`deep_features_hidden_layer` `int`	Extract Deep Features for given hidden layer (optional, only for Deep Learning models)	In
`predictions_frame` `Key`	Key of predictions frame, if predictions are requested (optional)	In/Out
`model_metrics` `ModelMetrics[]`	ModelMetrics	Out

ModelMetricsMultinomialV3

`model` `Key`	The model used for this scoring run.	In/Out
`model_checksum` `long`	The checksum for the model used for this scoring run.	In/Out
`frame` `Key`	The frame used for this scoring run.	In/Out
`frame_checksum` `long`	The checksum for the frame used for this scoring run.	In/Out
`r2` `double`	The R^2 for this scoring run.	Out
`hit_ratio_table` `TwoDimTable`	The hit ratio table for this scoring run.	Out
`cm` `ConfusionMatrix`	The ConfusionMatrix object for this scoring run.	Out
`logloss` `double`	The logarithmic loss for this scoring run.	Out
`description` `string`	Optional description for this scoring run (to note out-of-bag, sampled data, etc.)	Out
`model_category` `enum`	The category (e.g., Clustering) for the model used for this scoring run.	Out
`duration_in_ms` `long`	The duration in mS for this scoring run.	Out
`scoring_time` `long`	The time in mS since the epoch for the start of this scoring run.	Out
`predictions` `Frame`	Predictions Frame.	Out
`MSE` `double`	The Mean Squared Error of the prediction for this scoring run.	Out

ModelMetricsPCAV3

`model` `Key`	The model used for this scoring run.	In/Out
`model_checksum` `long`	The checksum for the model used for this scoring run.	In/Out
`frame` `Key`	The frame used for this scoring run.	In/Out
`frame_checksum` `long`	The checksum for the frame used for this scoring run.	In/Out
`description` `string`	Optional description for this scoring run (to note out-of-bag, sampled data, etc.)	Out
`model_category` `enum`	The category (e.g., Clustering) for the model used for this scoring run.	Out
`duration_in_ms` `long`	The duration in mS for this scoring run.	Out
`scoring_time` `long`	The time in mS since the epoch for the start of this scoring run.	Out
`predictions` `Frame`	Predictions Frame.	Out
`MSE` `double`	The Mean Squared Error of the prediction for this scoring run.	Out

ModelMetricsRegressionGLMV3

`model` `Key`	The model used for this scoring run.	In/Out
`model_checksum` `long`	The checksum for the model used for this scoring run.	In/Out
`frame` `Key`	The frame used for this scoring run.	In/Out
`frame_checksum` `long`	The checksum for the frame used for this scoring run.	In/Out
`residual_deviance` `double`	residual deviance	Out
`null_deviance` `double`	null deviance	Out
`AIC` `double`	AIC	Out
`null_degrees_of_freedom` `long`	null DOF	Out
`residual_degrees_of_freedom` `long`	residual DOF	Out
`r2` `double`	The R^2 for this scoring run.	Out
`description` `string`	Optional description for this scoring run (to note out-of-bag, sampled data, etc.)	Out
`model_category` `enum`	The category (e.g., Clustering) for the model used for this scoring run.	Out
`duration_in_ms` `long`	The duration in mS for this scoring run.	Out
`scoring_time` `long`	The time in mS since the epoch for the start of this scoring run.	Out
`predictions` `Frame`	Predictions Frame.	Out
`MSE` `double`	The Mean Squared Error of the prediction for this scoring run.	Out

ModelMetricsRegressionV3

`model` `Key`	The model used for this scoring run.	In/Out
`model_checksum` `long`	The checksum for the model used for this scoring run.	In/Out
`frame` `Key`	The frame used for this scoring run.	In/Out
`frame_checksum` `long`	The checksum for the frame used for this scoring run.	In/Out
`r2` `double`	The R^2 for this scoring run.	Out
`description` `string`	Optional description for this scoring run (to note out-of-bag, sampled data, etc.)	Out
`model_category` `enum`	The category (e.g., Clustering) for the model used for this scoring run.	Out
`duration_in_ms` `long`	The duration in mS for this scoring run.	Out
`scoring_time` `long`	The time in mS since the epoch for the start of this scoring run.	Out
`predictions` `Frame`	Predictions Frame.	Out
`MSE` `double`	The Mean Squared Error of the prediction for this scoring run.	Out

ModelOutputSchema

`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

ModelParameterSchemaV3

`is_member_of_frames` `string[]`	For Vec-type fields this is the set of other Vec-type fields which must contain mutually exclusive values; for example, for a SupervisedModel the response_column must be mutually exclusive with the weights_column	In
`is_mutually_exclusive_with` `string[]`	For Vec-type fields this is the set of Frame-type fields which must contain the named column; for example, for a SupervisedModel the response_column must be in both the training_frame and (if it’s set) the validation_frame	In
`name` `string`	name in the JSON, e.g. “lambda”	Out
`label` `string`	label in the UI, e.g. “lambda”	Out
`help` `string`	help for the UI, e.g. “regularization multiplier, typically used for foo bar baz etc.”	Out
`required` `boolean`	the field is required	Out
`type` `string`	Java type, e.g. “double”	Out
`default_value` `Polymorphic`	default value, e.g. 1	Out
`actual_value` `Polymorphic`	actual value as set by the user and / or modified by the ModelBuilder, e.g., 10	Out
`level` `string`	the importance of the parameter, used by the UI, e.g. “critical”, “extended” or “expert”	Out
`values` `string[]`	list of valid values for use by the front-end	Out

ModelParametersSchema

`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

ModelSchema

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `Parameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `Output`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

ModelsBase

`model_id` `Key`	Name of Model of interest	In
`preview` `boolean`	Return potentially abridged model suitable for viewing in a browser	In
`find_compatible_frames` `boolean`	Find and return compatible frames?	In
`models` `Model[]`	Models	Out
`compatible_frames` `Frame[]`	Compatible frames	Out

ModelsV3

`model_id` `Key`	Name of Model of interest	In
`preview` `boolean`	Return potentially abridged model suitable for viewing in a browser	In
`find_compatible_frames` `boolean`	Find and return compatible frames?	In
`models` `Model[]`	Models	Out
`compatible_frames` `Frame[]`	Compatible frames	Out

NaiveBayesModelOutputV3

`levels` `string[]`	Categorical levels of the response	In
`apriori` `TwoDimTable`	A-priori probabilities of the response	In
`pcond` `TwoDimTable[]`	Conditional probabilities of the predictors	In
`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

NaiveBayesModelV3

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `NaiveBayesParameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `NaiveBayesOutput`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

NaiveBayesParametersV3

`laplace` `double`	Laplace smoothing parameter	In
`min_sdev` `double`	Min. standard deviation to use for observations with not enough data	In
`eps_sdev` `double`	Cutoff below which standard deviation is replaced with min_sdev	In
`min_prob` `double`	Min. probability to use for observations with not enough data	In
`eps_prob` `double`	Cutoff below which probability is replaced with min_prob	In
`response_column` `VecSpecifier`	Response column	In/Out
`balance_classes` `boolean`	Balance training data class counts via over/under-sampling (for imbalanced data).	In/Out
`class_sampling_factors` `float[]`	Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.	In/Out
`max_after_balance_size` `float`	Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.	In/Out
`max_confusion_matrix_size` `int`	Maximum size (# classes) for confusion matrices to be printed in the Logs	In/Out
`max_hit_ratio_k` `int`	Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)	In/Out
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

NaiveBayesV3

`parameters` `NaiveBayesParameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

NetworkEvent

`is_send` `boolean`	Boolean flag distinguishing between sends (true) and receives(false)	In
`protocol` `string`	network protocol (UDP/TCP)	In
`msg_type` `string`	UDP type (exec,ack, ackack,…	In
`from` `string`	Sending node	In
`to` `string`	Receiving node	In
`data` `string`	Pretty print of the first few bytes of the msg payload. Contains class name for tasks.	In
`date` `string`	Time when the event was recorded. Format is hh:mm:ss:ms	In
`nanos` `long`	Time in nanos	In
`type` `enum`	type of recorded event	In

NetworkTestV3

`microseconds_collective` `double[]`	Collective broadcast/reduce times in microseconds (for each message size)	Out
`bandwidths_collective` `double[]`	Collective bandwidths in Bytes/sec (for each message size, for each node)	Out
`microseconds` `double[][]`	Round-trip times in microseconds (for each message size, for each node)	Out
`bandwidths` `double[][]`	Bi-directional bandwidths in Bytes/sec (for each message size, for each node)	Out
`nodes` `string[]`	Nodes	Out
`table` `TwoDimTable`	NetworkTestResults	Out

NodePersistentStorageEntryV3

`category` `string`	Category name	Out
`name` `string`	Key name	Out
`size` `long`	Size in bytes of value	Out
`timestamp_millis` `long`	Epoch time in milliseconds of when the value was written	Out

NodePersistentStorageV3

`category` `string`	Category name	In/Out
`name` `string`	Key name	In/Out
`value` `string`	Value	In/Out
`configured` `boolean`	Configured	Out
`exists` `boolean`	Exists	Out
`entries` `Iced[]`	List of entries	Out

NodeV1

`h2o` `string`	IP	Out
`ip_port` `string`	IP address and port in the form a.b.c.d:e	Out
`healthy` `boolean`	(now-last_ping)<HeartbeatThread.TIMEOUT	Out
`last_ping` `long`	Time (in msec) of last ping	Out
`sys_load` `float`	System load; average #runnables/#cores	Out
`gflops` `double`	Linpack GFlops	Out
`mem_bw` `double`	Memory Bandwidth	Out
`total_value_size` `long`	Data on Node (memory or disk)	Out
`mem_value_size` `long`	Data on Node (memory only)	Out
`num_keys` `int`	id="local-keys">local keys<	Out
`free_mem` `long`	Free heap	Out
`tot_mem` `long`	Total heap	Out
`max_mem` `long`	Max heap	Out
`free_disk` `long`	Free disk	Out
`max_disk` `long`	Max disk	Out
`rpcs_active` `int`	Active Remote Procedure Calls	Out
`fjthrds` `short[]`	F/J Thread count, by priority	Out
`fjqueue` `short[]`	F/J Task count, by priority	Out
`tcps_active` `int`	Open TCP connections	Out
`open_fds` `int`	Open File Descripters	Out
`num_cpus` `int`	num_cpus	Out
`cpus_allowed` `int`	cpus_allowed	Out
`nthreads` `int`	nthreads	Out
`my_cpu_pct` `int`	System CPU percentage used by this H2O process in last interval	Out
`sys_cpu_pct` `int`	System CPU percentage used by everything in last interval	Out
`pid` `string`	PID	Out

PCAModelOutputV3

`iterations` `int`	Iterations executed	In
`archetypes` `double[][]`	Mapping from training data to lower dimensional k-space	In
`std_deviation` `double[]`	Standard deviation of each principal component	In
`eigenvectors` `TwoDimTable`	Principal components matrix	In
`pc_importance` `TwoDimTable`	Importance of each principal component	In
`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

PCAModelV3

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `PCAParameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `PCAOutput`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

PCAParametersV3

`transform` `enum`	Transformation of training data	In
`k` `int`	Rank of matrix approximation	In
`gamma` `double`	Regularization weight	In
`max_iterations` `int`	Maximum training iterations	In
`seed` `long`	RNG seed for k-means++ initialization	In
`init` `enum`	Initialization mode	In
`user_points` `Key`	User-specified initial Y	In
`loading_key` `Key`	Frame key to save resulting X	In
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

PCAV3

`parameters` `PCAParameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

ParseSetupV3

`source_frames` `Key[]`	Source frames	In/Out
`parse_type` `enum`	Parser type	In/Out
`separator` `byte`	Field separator	In/Out
`single_quotes` `boolean`	Single quotes	In/Out
`check_header` `int`	Check header: 0 means guess, +1 means 1st line is header not data, -1 means 1st line is data not header	In/Out
`column_names` `string[]`	Column names	In/Out
`column_types` `string[]`	Value types for columns	In/Out
`na_strings` `string[]`	NA strings for columns	In/Out
`destination_frame` `string`	Suggested name	Out
`is_valid` `boolean`	The initial parse is sane	Out
`invalid_lines` `long`	Number of broken/invalid lines found	Out
`header_lines` `long`	Number of header lines found	Out
`number_columns` `int`	Number of columns	Out
`domains` `string[][]`	Domains for categorical columns	Out
`data` `string[][]`	Sample data	Out
`chunk_size` `int`	Size of individual parse tasks	Out

ParseV3

`destination_frame` `Key`	Final frame name	In
`source_frames` `Key[]`	Source frames	In
`parse_type` `enum`	Parser type	In
`separator` `byte`	Field separator	In
`single_quotes` `boolean`	Single Quotes	In
`check_header` `int`	Check header: 0 means guess, +1 means 1st line is header not data, -1 means 1st line is data not header	In
`number_columns` `int`	Number of columns	In
`column_names` `string[]`	Column names	In
`column_types` `string[]`	Value types for columns	In
`domains` `string[][]`	Domains for categorical columns	In
`na_strings` `string[]`	NA strings for columns	In
`chunk_size` `int`	Size of individual parse tasks	In
`delete_on_done` `boolean`	Delete input key after parse	In
`blocking` `boolean`	Block until the parse completes (as opposed to returning early and requiring polling	In
`remove_frame` `boolean`	Remove frame after blocking parse, and return array of Vecs	In
`job` `Job`	Parse job	Out
`rows` `long`	Rows	Out
`vec_ids` `Key[]`	Vec IDs	Out

ProfilerNodeEntryV3

`stacktrace` `string`	Stack trace	Out
`count` `int`	Profile Count	Out

ProfilerNodeV3

`node_name` `string`	Node names	Out
`timestamp` `long`	Timestamp (millis since epoch)	Out
`entries` `Iced[]`	Profile entry list	Out

ProfilerV3

`depth` `int`	Stack trace depth	In
`nodes` `Iced[]`	(No description available)	Out

QuantileParametersV2

`probs` `double[]`	Probabilities for quantiles	In
`combine_method` `enum`	How to combine quantiles for even sample sizes	In
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

QuantileV3

`parameters` `QuantileParameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

RapidsV3

`ast` `string`	An Abstract Syntax Tree.	In
`fun` `string`	An array of function definitions.	In
`ast_key` `Key`	A pointer to a Frame	In
`error` `string`	Parsing error, if any	Out
`key` `Key`	Result key	Out
`num_rows` `long`	Rows in Frame result	Out
`num_cols` `int`	Columns in Frame result	Out
`scalar` `double`	Scalar result	Out
`funstr` `string`	Function result	Out
`col_names` `string[]`	Column Names	Out
`string` `string`	String result	Out
`result` `string`	result	Out
`evaluated` `boolean`	Was evaluated	Out
`head` `string[][]`	Head of a Frame result	Out
`result_type` `int`	Result Type.	Out
`vec_ids` `Key[]`	Vec keys for key result	Out

RemoveAllV3

(No fields)

RemoveV3

key
Key Object to be removed. In

RouteBase

`http_method` `string`	(No description available)	Out
`url_pattern` `string`	(No description available)	Out
`summary` `string`	(No description available)	Out
`handler_class` `string`	(No description available)	Out
`handler_method` `string`	(No description available)	Out
`input_schema` `string`	(No description available)	Out
`output_schema` `string`	(No description available)	Out
`doc_method` `string`	(No description available)	Out
`path_params` `string[]`	(No description available)	Out
`markdown` `string`	(No description available)	Out

RouteV3

`http_method` `string`	(No description available)	Out
`url_pattern` `string`	(No description available)	Out
`summary` `string`	(No description available)	Out
`handler_class` `string`	(No description available)	Out
`handler_method` `string`	(No description available)	Out
`input_schema` `string`	(No description available)	Out
`output_schema` `string`	(No description available)	Out
`doc_method` `string`	(No description available)	Out
`path_params` `string[]`	(No description available)	Out
`markdown` `string`	(No description available)	Out

Schema

(No fields)

SchemaMetadataBase

`version` `int`	Version number of the Schema.	In
`name` `string`	Simple name of the Schema. NOTE: the schema_names form a single namespace.	In
`superclass` `string`	Simple name of the superclass of the Schema. NOTE: the schema_names form a single namespace.	In
`type` `string`	Simple name of H2O type that this Schema represents. Must not be changed after creation (treat as final).	In
`fields` `FieldMetadata[]`	All the public fields of the schema	Out
`markdown` `string`	Documentation for the schema in Markdown format with GitHub extensions	Out

SchemaMetadataV3

`version` `int`	Version number of the Schema.	In
`name` `string`	Simple name of the Schema. NOTE: the schema_names form a single namespace.	In
`superclass` `string`	Simple name of the superclass of the Schema. NOTE: the schema_names form a single namespace.	In
`type` `string`	Simple name of H2O type that this Schema represents. Must not be changed after creation (treat as final).	In
`fields` `FieldMetadata[]`	All the public fields of the schema	Out
`markdown` `string`	Documentation for the schema in Markdown format with GitHub extensions	Out

SharedTreeModelOutputV3

`variable_importances` `TwoDimTable`	Variable Importances	Out
`init_f` `double`	The Intercept term, the initial model function value to which trees make adjustments	Out
`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

SharedTreeModelV3

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `Parameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `Output`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

SharedTreeParametersV3

`ntrees` `int`	Number of trees.	In
`max_depth` `int`	Maximum tree depth.	In
`min_rows` `int`	Fewest allowed observations in a leaf (in R called ‘nodesize’).	In
`nbins` `int`	Build a histogram of this many bins, then split at the best point	In
`seed` `long`	Seed for pseudo random number generator (if applicable)	In
`response_column` `VecSpecifier`	Response column	In/Out
`balance_classes` `boolean`	Balance training data class counts via over/under-sampling (for imbalanced data).	In/Out
`class_sampling_factors` `float[]`	Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.	In/Out
`max_after_balance_size` `float`	Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.	In/Out
`max_confusion_matrix_size` `int`	Maximum size (# classes) for confusion matrices to be printed in the Logs	In/Out
`max_hit_ratio_k` `int`	Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)	In/Out
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

SharedTreeV3

`parameters` `Parameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

ShutdownV3

(No fields)

SplitFrameV3

`dataset` `Key`	Dataset	In
`ratios` `double[]`	Split ratios - resulting number of split is ratios.length+1	In
`key` `Key`	Job Key	In
`description` `string`	Job description	In
`destination_frames` `Key[]`	Destination keys for each output frame split.	In/Out
`dest` `Key`	destination key	In/Out
`status` `string`	job status	Out
`progress` `float`	progress, from 0 to 1	Out
`progress_msg` `string`	current progress status description	Out
`start_time` `long`	Start time	Out
`msec` `long`	runtime	Out
`exception` `string`	exception	Out

SupervisedModelBuilderSchema

`parameters` `Parameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

SupervisedModelParametersSchema

`response_column` `VecSpecifier`	Response column	In/Out
`balance_classes` `boolean`	Balance training data class counts via over/under-sampling (for imbalanced data).	In/Out
`class_sampling_factors` `float[]`	Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.	In/Out
`max_after_balance_size` `float`	Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes.	In/Out
`max_confusion_matrix_size` `int`	Maximum size (# classes) for confusion matrices to be printed in the Logs	In/Out
`max_hit_ratio_k` `int`	Max. number (top K) of predictions to use for hit ratio computation (for multi-class only, 0 to disable)	In/Out
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

SynonymV3

`key` `Key`	A word2vec model key.	In
`target` `string`	The target string to find synonyms.	In
`cnt` `int`	Find the top `cnt` synonyms of the target word.	In
`synonyms` `string[]`	The synonyms.	Out
`cos_sim` `float[]`	The cosine similarities.	Out

TimelineV3

`now` `long`	Current time in millis.	Out
`self` `string`	This node	Out
`events` `Iced[]`	recorded timeline events	Out

TreeStatsV3

`min_depth` `int`	minDepth	In
`max_depth` `int`	maxDepth	In
`mean_depth` `float`	meanDepth	In
`min_leaves` `int`	minLeaves	In
`max_leaves` `int`	maxLeaves	In
`mean_leaves` `float`	meanLeaves	In

TutorialsV3

(No fields)

TwoDimTableBase

`name` `string`	Table Name	Out
`description` `string`	Table Description	Out
`columns` `Iced[]`	Column Specification	Out
`rowcount` `int`	Number of Rows	Out
`data` `Polymorphic[][]`	Table Data (col-major)	Out

TwoDimTableV3

`name` `string`	Table Name	Out
`description` `string`	Table Description	Out
`columns` `Iced[]`	Column Specification	Out
`rowcount` `int`	Number of Rows	Out
`data` `Polymorphic[][]`	Table Data (col-major)	Out

TypeaheadV3

`src` `string`	training_frame	In
`limit` `int`	limit	In
`matches` `string[]`	matches	Out

UnlockKeysV3

(No fields)

ValidationMessageBase

`message_type` `string`	Type of validation message (ERROR, WARN, INFO, HIDE)	Out
`field_name` `string`	Field to which the message applies	Out
`message` `string`	Message text	Out

ValidationMessageV2

`message_type` `string`	Type of validation message (ERROR, WARN, INFO, HIDE)	Out
`field_name` `string`	Field to which the message applies	Out
`message` `string`	Message text	Out

VarImpBase

`varimp` `float[]`	Variable importance of individual variables	Out
`names` `string[]`	Names of variables	Out

VarImpV3

`varimp` `float[]`	Variable importance of individual variables	Out
`names` `string[]`	Names of variables	Out

VecKeyV3

`name` `string`	Name (string representation) for this Key.	In/Out
`type` `string`	Name (string representation) for the type of Keyed this Key points to.	In/Out
`URL` `string`	URL for the resource that this Key points to, if one exists.	In/Out

WaterMeterCpuTicksV3

`nodeidx` `int`	Index of node to query ticks for (0-based)	In
`cpu_ticks` `long[][]`	array of tick counts per core	Out

WaterMeterIoV3

`nodeidx` `int`	Index of node to query ticks for (0-based)	In
`persist_stats` `Iced[]`	array of IO info	Out

Word2VecModelOutputV3

`names` `string[]`	Column names.	Out
`domains` `string[][]`	Domains for categorical (enum) columns.	Out
`model_category` `enum`	Category of the model (e.g., Binomial).	Out
`model_summary` `TwoDimTable`	Model summary	Out
`scoring_history` `TwoDimTable`	Scoring history	Out
`training_metrics` `ModelMetrics`	Training data model metrics	Out
`validation_metrics` `ModelMetrics`	Validation data model metrics	Out
`help` `Map`	Help information for output fields	Out

Word2VecModelV3

`model_id` `Key`	Model key	In/Out
`algo` `string`	The algo name for this Model.	Out
`algo_full_name` `string`	The pretty algo name for this Model (e.g., Generalized Linear Model, rather than GLM).	Out
`parameters` `Word2VecParameters`	The build parameters for the model (e.g. K for KMeans).	Out
`output` `Word2VecOutput`	The build output for the model (e.g. the cluster centers for KMeans).	Out
`compatible_frames` `string[]`	Compatible frames, if requested	Out
`checksum` `long`	Checksum for all the things that go into building the Model.	Out

Word2VecParametersV3

`vecSize` `int`	Set size of word vectors	In
`windowSize` `int`	Set max skip length between words	In
`sentSampleRate` `float`	Set threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled; useful range is (0, 1e-5)	In
`normModel` `enum`	Use Hierarchical Softmax or Negative Sampling	In
`negSampleCnt` `int`	Number of negative examples, common values are 3 - 10 (0 = not used)	In
`epochs` `int`	Number of training iterations to run	In
`minWordFreq` `int`	This will discard words that appear less than times	In
`initLearningRate` `float`	Set the starting learning rate	In
`wordModel` `enum`	Use the continuous bag of words model or the Skip-Gram model	In
`model_id` `Key`	Destination id for this model; auto-generated if not specified	In/Out
`training_frame` `Key`	Training frame	In/Out
`validation_frame` `Key`	Validation frame	In/Out
`ignored_columns` `string[]`	Ignored columns	In/Out
`drop_na20_cols` `boolean`	Drop columns with more than 20% missing values	In/Out
`score_each_iteration` `boolean`	Whether to score during each iteration of model training	In/Out

Word2VecV3

`parameters` `Word2VecParameters`	Model builder parameters.	In
`__http_status` `int`	HTTP status to return for this build.	In
`algo` `string`	The algo name for this ModelBuilder.	Out
`algo_full_name` `string`	The pretty algo name for this ModelBuilder (e.g., Generalized Linear Model, rather than GLM).	Out
`can_build` `enum[]`	Model categories this ModelBuilder can build.	Out
`job` `Job`	Job Key	Out
`validation_messages` `ValidationMessage[]`	Parameter validation messages	Out
`validation_error_count` `int`	Count of parameter validation errors	Out

H2O v0.0.0

Contents

Reference

Introduction

About H2O Flow

Getting Help

Understanding Cell Modes

Using Edit Mode

Using Command Mode

Running Flows

Using Keyboard Shortcuts

Using Flow Buttons

Importing Data

Uploading Data

Parsing Data

Viewing Jobs

Viewing All Jobs

Viewing Specific Jobs

Building Models

Viewing Models

Making Predictions

Viewing Predictions

Viewing Frames

Splitting Frames

Creating Frames

Plotting Frames

Using Clips

Viewing Outlines

Saving Flows

Finding Saved Flows on your Disk

Saving Flows on a Hadoop cluster

Duplicating Flows

Downloading Flows

Loading Flows

Troubleshooting

Viewing Cluster Status

Viewing CPU Status (Water Meter)

Viewing Logs

Downloading Logs

Viewing Stack Trace Information

Viewing Network Test Results

Accessing the Profiler

Viewing the Timeline

Shutting Down H2O

How to Launch H2O-Dev from the Command Line

JVM Options

H2O Options

Cloud Formation Behavior

Flatfile Configuration

H2O-Dev on EC2

Launch H2O-Dev

Selecting the Operating System and Virtualization Type

Configuring the Instance

(Windows Users) Tunneling into the Instance

Downloading Java and H2O

Launch H2O-Dev from the Command Line

Important Notes

Step-by-Step Walk-Through

Running H2O-Dev on Hadoop

Prerequisite: Open Communication Paths

Tutorial

Hadoop Launch Parameters

REST API Reference

GET /3/About

GET /3/Cloud

HEAD /3/Cloud

POST /3/CreateFrame

DELETE /3/DKV

DELETE /3/DKV/(?.*)

GET /3/DownloadDataset

GET /3/Find

GET /3/Frames

DELETE /3/Frames

GET /3/Frames/(?.*)

DELETE /3/Frames/(?.*)

GET /3/Frames/(?.*)/columns

GET /3/Frames/(?.*)/columns/(?.*)

GET /3/Frames/(?.*)/columns/(?.*)/domain

GET /3/Frames/(?.*)/columns/(?.*)/summary

GET /3/Frames/(?.*)/export/(?.*)/overwrite/(?.*)

H₂O ^v0.0.0

GET /3/Frames/(?.)/columns/(?.)

GET /3/Frames/(?.)/columns/(?.)/domain

GET /3/Frames/(?.)/columns/(?.)/summary

GET /3/Frames/(?.)/export/(?.)/overwrite/(?.*)

GET /3/Logs/nodes/(?.)/files/(?.)

GET /3/ModelMetrics/frames/(?.)/models/(?.)

DELETE /3/ModelMetrics/frames/(?.)/models/(?.)

GET /3/ModelMetrics/models/(?.)/frames/(?.)

DELETE /3/ModelMetrics/models/(?.)/frames/(?.)

POST /3/ModelMetrics/models/(?.)/frames/(?.)

POST /3/NodePersistentStorage/(?.)/(?.)

GET /3/NodePersistentStorage/(?.)/(?.)

DELETE /3/NodePersistentStorage/(?.)/(?.)

GET /3/NodePersistentStorage/categories/(?.)/names/(?.)/exists

POST /3/Predictions/models/(?.)/frames/(?.)