R/extendedisolationforest.R
h2o.extendedIsolationForest.Rd
Trains an Extended Isolation Forest model
h2o.extendedIsolationForest( training_frame, x, model_id = NULL, ignore_const_cols = TRUE, categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"), ntrees = 100, sample_size = 256, extension_level = 0, seed = -1 )
training_frame | Id of the training data frame. |
---|---|
x | A vector containing the |
model_id | Destination id for this model; auto-generated if not specified. |
ignore_const_cols |
|
categorical_encoding | Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. |
ntrees | Number of Extended Isolation Forest trees. Defaults to 100. |
sample_size | Number of randomly sampled observations used to train each Extended Isolation Forest tree. Defaults to 256. |
extension_level | Maximum is N - 1 (N = numCols). Minimum is 0. Extended Isolation Forest with extension_Level = 0 behaves like Isolation Forest. Defaults to 0. |
seed | Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to -1 (time-based random number). |
# NOT RUN { library(h2o) h2o.init() # Import the prostate dataset p <- h2o.importFile(path="https://raw.github.com/h2oai/h2o/master/smalldata/logreg/prostate.csv") # Set the predictors predictors <- c("AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON") # Build an Extended Isolation forest model model <- h2o.extendedIsolationForest(x = predictors, training_frame = p, model_id = "eif.hex", ntrees = 100, sample_size = 256, extension_level = length(predictors) - 1) # Calculate score score <- h2o.predict(model, p) anomaly_score <- score$anomaly_score # Number in [0, 1] explicitly defined in Equation (1) from Extended Isolation Forest paper # or in paragraph '2 Isolation and Isolation Trees' of Isolation Forest paper anomaly_score <- score$anomaly_score # Average path length of the point in Isolation Trees from root to the leaf mean_length <- score$mean_length # }