Builds a Decision Tree model on an H2OFrame.
h2o.decision_tree( x, y, training_frame, model_id = NULL, ignore_const_cols = TRUE, categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"), seed = -1, max_depth = 20, min_rows = 10 )
x | (Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. |
---|---|
y | The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. |
training_frame | Id of the training data frame. |
model_id | Destination id for this model; auto-generated if not specified. |
ignore_const_cols |
|
categorical_encoding | Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. |
seed | Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to -1 (time-based random number). |
max_depth | Max depth of tree. Defaults to 20. |
min_rows | Fewest allowed (weighted) observations in a leaf. Defaults to 10. |
Creates a H2OModel object of the right type.
predict.H2OModel
for prediction
# NOT RUN { library(h2o) h2o.init() # Import the airlines dataset f <- "https://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv" data <- h2o.importFile(f) # Set predictors and response; set response as a factor data["CAPSULE"] <- as.factor(data["CAPSULE"]) predictors <- c("AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON") response <- "CAPSULE" # Train the DT model h2o_dt <- h2o.decision_tree(x = predictors, y = response, training_frame = data, seed = 1234) # }