Partial Dependence Plots

Partial dependence plot gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. Note: Unlike randomForest's partialPlot when plotting partial dependence the mean response (probabilities) is returned rather than the mean of the log class probability.

h2o.partialPlot(
  object,
  data,
  cols,
  destination_key,
  nbins = 20,
  plot = TRUE,
  plot_stddev = TRUE,
  weight_column = -1,
  include_na = FALSE,
  user_splits = NULL,
  col_pairs_2dpdp = NULL,
  save_to = NULL,
  row_index = -1
)

Arguments

object	An H2OModel object.
data	An H2OFrame object used for scoring and constructing the plot.
cols	Feature(s) for which partial dependence will be calculated.
destination_key	An key reference to the created partial dependence tables in H2O.
nbins	Number of bins used. For categorical columns make sure the number of bins exceeds the level count. If you enable add_missing_NA, the returned length will be nbin+1.
plot	A logical specifying whether to plot partial dependence table.
plot_stddev	A logical specifying whether to add std err to partial dependence plot.
weight_column	A string denoting which column of data should be used as the weight column.
include_na	A logical specifying whether missing value should be included in the Feature values.
user_splits	A two-level nested list containing user defined split points for pdp plots for each column. If there are two columns using user defined split points, there should be two lists in the nested list. Inside each list, the first element is the column name followed by values defined by the user.
col_pairs_2dpdp	A two-level nested list like this: col_pairs_2dpdp = list(c("col1_name", "col2_name"), c("col1_name","col3_name"), ...,) where a 2D partial plots will be generated for col1_name, col2_name pair, for col1_name, col3_name pair and whatever other pairs that are specified in the nested list.
save_to	Fully qualified prefix of the image files the resulting plots should be saved to, e.g. '/home/user/pdp'. Plots for each feature are saved separately in PNG format, each file receives a suffix equal to the corresponding feature name, e.g. `/home/user/pdp_AGE.png`. If the files already exists, they will be overridden. Files are only saves if plot = TRUE (default).
row_index	Row for which partial dependence will be calculated instead of the whole input frame.

Value

Plot and list of calculated mean response tables for each feature requested.

Examples

# NOT RUN {
library(h2o)
h2o.init()
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.uploadFile(path = prostate_path)
prostate[, "CAPSULE"] <- as.factor(prostate[, "CAPSULE"] )
prostate[, "RACE"] <- as.factor(prostate[,"RACE"] )
prostate_gbm <- h2o.gbm(x = c("AGE","RACE"),
                        y = "CAPSULE",
                        training_frame = prostate,
                        ntrees = 10,
                        max_depth = 5,
                        learn_rate = 0.1)
h2o.partialPlot(object = prostate_gbm, data = prostate, cols = c("AGE", "RACE"))
# }

Arguments

Value

Examples

Contents