Changing the Column Type¶
H2O algorithms will treat a problem as a classification problem if the column type is factor
and a regression problem if the column type is numeric
. You can force H2O to use either classification or regression by changing the column type.
library(h2o)
h2o.init()
# Import the cars dataset:
cars_df <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
# Check the column type for the "cylinders" column:
print(h2o.isnumeric(cars_df["cylinders"]))
#TRUE
# Change the column type to a factor:
cars_df["cylinders"] <- as.factor(cars_df["cylinders"])
# Verify that the column is now a factor:
print(h2o.isfactor(cars_df["cylinders"]))
#TRUE
# Change the column type back to numeric:
cars_df["cylinders"] <- as.numeric(cars_df["cylinders"])
# Verify that the column is now numeric and not a factor:
print(h2o.isfactor(cars_df["cylinders"]))
#FALSE
print(h2o.isnumeric(cars_df["cylinders"]))
#TRUE
# Change multiple columns to factors:
cars_df[c("cylinders","economy_20mpg")] <- as.factor(cars_df[c("cylinders","economy_20mpg")])
# Verify that the columns are now factors:
print(h2o.isfactor(cars_df[c("cylinders","economy_20mpg")]))
# TRUE TRUE
import h2o
h2o.init()
# Import the cars dataset:
cars_df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
# Check the column type for the 'cylinders' column:
print(cars_df['cylinders'].isnumeric())
#[True]
# Change the column type to a factor:
cars_df['cylinders'] = cars_df['cylinders'].asfactor()
# Verify that the column is now a factor:
print(cars_df['cylinders'].isfactor())
#[True]
# Change the column type back to numeric:
cars_df["cylinders"] = cars_df["cylinders"].asnumeric()
# Verify that the column is now numeric and not a factor:
print(cars_df['cylinders'].isfactor())
#[False]
print(cars_df['cylinders'].isnumeric())
#[True]
# Reload data:
cars_df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
# Change multiple columns to factors:
cars_df[['cylinders','economy_20mpg']] = cars_df[['cylinders','economy_20mpg']].asfactor()
# Verify that the columns are now factors:
print(cars_df[['cylinders','economy_20mpg']].isfactor())
# [True, True]
If the column type is enum
and you want to convert it to numeric
, you should first convert it to character
then convert it to numeric
. Otherwise, the values may be converted to underlying factor values, not the expected mapped values.
# Using the data from the above example, convert the 'name' column to numeric:
cars_df["name"] <- as.character(cars_df["name"])
cars_df["name"] <- as.numeric(cars_df["name"])
# Using the data from the above example, convert the 'name' column to numeric:
cars_df['name'] = cars_df['name'].ascharacter().asnumeric()