sample_size
¶
- Available in: Isolation Forest
- Hyperparameter: yes
Description¶
This option specifies the number of randomly sampled observations used to train each Isolation Forest tree. If set to -1, sample_rate
will be used instead. This value defaults to 256.
Example¶
- r
- python
library(h2o)
h2o.init()
# import the ecg discord datasets:
train <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_train.csv")
test <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_test.csv")
# train using the `sample_size` parameter:
isofor_model <- h2o.isolationForest(training_frame=train, sample_size=5, ntrees=7)
# test the prediction
anomaly_score <- h2o.predict(isofor_model, test)
anomaly_score
predict mean_length
1 -0.16666667 2.857143
2 -0.16666667 2.857143
3 -0.08333333 2.714286
4 0.16666667 2.285714
5 0.00000000 2.571429
6 0.33333333 2.000000
[23 rows x 2 columns]