ignored_columns
¶
- Available in: GBM, DRF, Deep Learning, GLM, PCA, GLRM, Naïve-Bayes, K-Means
- Hyperparameter: no
Description¶
There may be instances when your dataset includes information that you want to be ignored when building a model. Use the ignored_columns
parameter to specify an array of column names that should be ignored. This is a strict parameter that takes into account the exact string of the column name. So, for example, if your dataset includes one column named Type and another column named Types, and you specify ignored_columns=["type"]
, then the algorithm will only ignore the Type column and will not ignore the Types column.
Example¶
- r
- python
library(h2o)
h2o.init()
# import the airlines dataset:
# This dataset is used to classify whether a flight will be delayed 'YES' or not "NO"
# original data can be found at http://www.transtats.bts.gov/
airlines <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")
# convert columns to factors
airlines["Year"] <- as.factor(airlines["Year"])
airlines["Month"] <- as.factor(airlines["Month"])
airlines["DayOfWeek"] <- as.factor(airlines["DayOfWeek"])
airlines["Cancelled"] <- as.factor(airlines["Cancelled"])
airlines['FlightNum'] <- as.factor(airlines['FlightNum'])
# set the predictor names and the response column name
predictors <- colnames(airlines[1:9])
response <- "IsDepDelayed"
# split into train and validation
airlines.splits <- h2o.splitFrame(data = airlines, ratios = .8, seed = 1234)
train <- airlines.splits[[1]]
valid <- airlines.splits[[2]]
# try using the `ignored_columns` parameter:
col_list <- c('DepTime','CRSDepTime','ArrTime','CRSArrTime')
# train your model
airlines.gbm <- h2o.gbm(x = predictors, y = response, training_frame = train, validation_frame = valid,
seed = 1234)
# print the auc for the validation data
print(h2o.auc(airlines.gbm, valid = TRUE))