stratify_by¶
- Available in: CoxPH 
- Hyperparameter: no 
Description¶
In a CoxPH model, stratification is useful as a diagnostic for checking the proportional hazards assumption, as it allows for as many different hazard functions as there are strata. For example, when attempting to predict X, you can include a secondary categorical predictor, Z, that can be adjusted for when making inferences about X’s relationship to the time-to-event endpoint.
Use the `stratify_by parameter to specify a list of columns to use for stratification when building a CoxPH model. The stratification column must be present in the x list in the <model_name>.train() call (e.g. if x=["PhoneService", "MultipleLines", "InternetService", "Contract"], then stratify_by must equal one of those columns).
Example¶
library(h2o)
h2o.init()
# import the heart dataset:
heart <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv")
# set the predictor and response column:
x <- "age"
y <- "event"
# set the start and stop columns:
start <- "start"
stop <- "stop"
# convert the age column to a factor:
heart["age"] <- as.factor(heart["age"])
# train your model:
heart_coxph <- h2o.coxph(x = c("year", x),
                         event_column = y,
                         start_column = start,
                         stop_column = stop,
                         stratify_by = x,
                         training_frame = heart)
# view the model details:
heart_coxph
Model Details:
==============
H2OCoxPHModel: coxph
Model ID:  CoxPH_model_R_1570209287520_5
Call:
Surv(start, stop, event) ~ year + strata(age)
        coef    exp(coef) se(coef)  z      p
year    4.734   113.717   8973.421  0.001  1
Likelihood ratio test = 1.39  on 1 df, p = 0.239
n = 172, number of events = 75
import h2o
from h2o.estimators import H2OCoxProportionalHazardsEstimator
h2o.init()
# import the heart dataset:
heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv")
# set the predictor and response column:
x = ["age", "year"]
y = "event"
# convert the age column to a factor:
heart["age"] = heart["age"].ascharacter()
heart["age"] = heart["age"].asfactor()
# build and train your model:
heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start",
                                                 stop_column="stop",
                                                 ties="breslow",
                                                 stratify_by=["age"])
heart_coxph.train(x=x, y=y, training_frame=heart)
# view the model details:
heart_coxph
Model Details
=============
H2OCoxProportionalHazardsEstimator :  Cox Proportional Hazards
Model Key:  CoxPH_model_python_1604581637715_647
Call:
Surv(start, stop, event) ~ year + strata(age)
Coefficients: CoxPH Coefficients
names    coefficients    exp_coef    exp_neg_coef    se_coef    z_coef
-------  --------------  ----------  --------------  ---------  -----------
year     4.73372         113.717     0.00879373      8973.42    0.000527526
Likelihood ratio test=1.386294
n=172, number of events=75