interaction_pairs

  • Available in: GLM, CoxPH
  • Hyperparameter: no

Description

By default, interactions between predictor columns are expanded and computed on the fly as GLM iterates over dataset. The interaction_pairs parameter allows you to define a list of specific interactions to include instead of all interactions.

Note that adding a list of interactions to a model changes the interpretation of all of the coefficients. For example, a typical predictor has the form ‘response ~ terms’ where ‘response’ is the (numeric) response vector, and ‘terms’ is a series of terms that specify a linear predictor for ‘response’. For ‘binomial’ and ‘quasibinomial’ families in GLM, the response can also be specified as a ‘factor’ (when the first level denotes failure and all other levels denote success) or as a two-column matrix with the columns giving the numbers of successes and failures.

When using this parameter, specify a list of pairwise columns that should interact. When specified, GLM will compute interactions between

Note that this option is mutually exclusive with interactions.

Example

  • r
  • python
library(h2o)
h2o.init()
# import the airlines dataset
df <-  h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")

# specify the columns to include
XY <- names(df)[c(1,2,3,4,6,8,9,13,17,18,19,31)

# specify the predictor column indices to interact
interactions <- XY[c(5,7,9)]

# train the model and build the coefficients table
m1 <- h2o.glm(x=XY[-length(XY)],
                 y=XY[length(XY)],
                 training_frame=df,
                 interactions=interactions,
                 lambda_search=TRUE,
                 family="binomial")
m1_coefs <- m1@model$coefficients_table

# train the model with the interaction pairs
m2 <- h2o.glm(x=XY[-length(XY)],
                  y=XY[length(XY)],
                  training_frame=df,
                  interaction_pairs=list(
                   c("CRSDepTime", "UniqueCarrier"),
                   c("CRSDepTime", "Origin"),
                   c("UniqueCarrier", "Origin")
                   ),
                  lambda_search=TRUE,
                  family="binomial")
m2_coefs <- m2@model$coefficients_table