The purpose of this tutorial is to walk the new user through a Random Forest analysis beginning to end. By the end of this tutorial the user should know how to specify, run, and interpret Random Forest.
Those who have never used H2O before should see the quick start guide for how to run H2O on your computer.
This tutorial uses a publicly available data set that can be found
Internet ads data set http://archive.ics.uci.edu/ml/machine-learning-databases/internet_ads/
The data are composed of 3279 observations, 1557 attributes, and an priori grouping assignment. The objective is to build a prediction tool that predicts whether an object is an internet ad or not.
The RF output of main interest is a confusion matrix detailing the classification error rates for each level in the range of the target variable. In addition to the confusion matrix, the overall classification error, the number of trees and data use descriptives are included.
RF inspect in total also includes information about the user chosen tuning parameters at the top of RFView. At the top of the page there is also an option to go directly to generating predictions for another dataset.
To generate a prediction click on the Predict! link at the top of the RFView page. This function can also be found by going to the drop down menu Score, and choosing predict.
Using the predict function requires the .hex key associated with a model. To find this go to the drop down menu Admin and select Jobs.
All jobs created in the current instance of H2O will be listed here. Find the appropriate job (here labeled “Random Forest 150 Trees”). Save the associated key to clipboard, and paste into the model key field in the Request Generate Predictions Page. Enter a .hex key associated with a parsed data set other than the one used to build the model.
THE END.