Cox Proportional Hazards Model¶

Cox proportional hazards models are the most widely used approach for modeling time to event data. As the name suggests, the hazard function, which computes the instantaneous rate of an event occurrence and is expressed mathematically as

\(h(t) = \lim_{\Delta t \downarrow 0} \frac{Pr[t \le T < t + \Delta t \mid T \ge t]}{\Delta t},\)

is assumed to be the product of a baseline hazard function and a risk score. Consequently, the hazard function for observation \(i\) in a Cox proportional hazards model is defined as

\(h_i(t) = \lambda(t)\exp(\mathbf{x}_i^T\beta)\)

where \(\lambda(t)\) is the baseline hazard function shared by all observations and \(\exp(\mathbf{x}_i^T\beta)\) is the risk score for observation \(i\), which is computed as the exponentiated linear combination of the covariate vector \(\mathbf{x}_i^T\) using a coefficient vector \(\beta\) common to all observations.

This combination of a non-parametric baseline hazard function and a parametric risk score results in Cox proportional hazards models being described as semi-parametric. In addition, a simple rearrangement of terms shows that unlike generalized linear models, an intercept (constant) term in the risk score adds no value to the model fit, due to the inclusion of a baseline hazard function.

Defining a Cox Proportional Hazards Model¶

destination key

The .hex key for the resulting Cox proportional hazards model.

source

The .hex key for the input data set to use in the Cox proportional hazards model.

start column

(Optional) The name of an integer column in the source data set representing the start time. If supplied, the value of the start column must be strictly less than the stop column in each row.

stop column

The name of an integer column in the source data set representing the stop time.

event column

The name of binary data column in the source data set representing the occurrence of an event.

x columns

The name of the predictor columns, both numeric and categorical, from the source data set in the Cox proportional hazards model.

weights column

(Optional) The name of a numeric column in the source data set representing case weights to use in the model. If supplied, all values in the weights column must be positive.

offset columns

(Optional) The name of a numeric column in the source data set representing the offset terms in the model.

ties

The approximation method for handling ties in the partial likelihood (either efron or breslow). See the Cox Proportional Hazards Model Details section below for more information.

init

(Optional) Initial values for the coefficients in the model.

lre min

A positive number to use as the minimum log-relative error (LRE) of subsequent log partial likelihood calculations to determine algorithmic convergence. The role this parameter plays in the stopping criteria of the model fitting algorithm is explained in the Cox Proportional Hazards Model Algorithm section below.

iter max

A positive integer defining the maximum number of iterations during model training. The role this parameter plays in the stopping criteria of the model-fitting algorithm is explained in the Cox Proportional Hazards Model Algorithm section below.

Cox Proportional Hazards Model Results¶

Data¶

Number of Complete Cases: The number of observations without missing values in any of the input columns.
Number of Non Complete Cases: The number of observations with at least one missing value in any of the input columns.
Number of Events in Complete Cases: The number of observed events in the complete cases.

Coefficients¶

\(\tt{name}\): The name given to the coefficient. If the predictor column is numeric, the corresponding coefficient has the same name. If the predictor column is categorical, the corresponding coefficients are a concatenation of the name of the column with the name of the categorical level the coefficient represents.
\(\tt{coef}\): The estimated coefficient value.
\(\tt{exp(coef)}\): The exponentiated coefficient value estimate.
\(\tt{se(coef)}\): The standard error of the coefficient estimate.
\(\tt{z}\): The z statistic, which is the ratio of the coefficient estimate to its standard error.

Model Statistics¶

Cox and Snell Generalized \(R^2\): \(\tt{R^2} := 1 - \exp\bigg(\frac{2\big(pl(\beta^{(0)}) - pl(\hat{\beta})\big)}{n}\bigg)\)
Maximum Possible Value for Cox and Snell Generalized \(R^2\): \(\tt{Max. R^2} := 1 - \exp\big(\frac{2 pl(\beta^{(0)})}{n}\big)\)
Likelihood Ratio Test: \(2\big(pl(\hat{\beta}) - pl(\beta^{(0)})\big)\), which under the null hypothesis of \(\hat{beta} = \beta^{(0)}\) follows a chi-square distribution with \(p\) degrees of freedom.
Wald Test: \(\big(\hat{\beta} - \beta^{(0)}\big)^T I\big(\hat{\beta}\big) \big(\hat{\beta} - \beta^{(0)}\big)\), which under the null hypothesis of \(\hat{beta} = \beta^{(0)}\) follows a chi-square distribution with \(p\) degrees of freedom. When there is a single coefficient in the model, the Wald test statistic value is that coefficient’s z statistic.
Score (Log-Rank) Test: \(U\big(\beta^{(0)}\big)^T \hat{I}\big(\beta^{0}\big)^{-1} U\big(\beta^{(0)}\big)\), which under the null hypothesis of \(\hat{beta} = \beta^{(0)}\) follows a chi-square distribution with \(p\) degrees of freedom.

where

\(n\) is the number of complete cases

\(p\) is the number of estimated coefficients

\(pl(\beta)\) is the log partial likelihood

\(U(\beta)\) is the derivative of the log partial likelihood

\(H(\beta)\) is the second derivative of the log partial likelihood

\(I(\beta) = - H(\beta)\) is the observed information matrix

Cox Proportional Hazards Model Details¶

A Cox proportional hazards model measures time on a scale defined by the ranking of the \(M\) distinct observed event occurrence times, \(t_1 < t_2 < \dots < t_M\). When no two events occur at the same time, the partial likelihood for the observations is given by

\(PL(\beta) = \prod_{m=1}^M\frac{\exp(w_m\mathbf{x}_m^T\beta)}{\sum_{j \in R_m} w_j \exp(\mathbf{x}_j^T\beta)}\)

where \(R_m\) is the set of all observations at risk of an event at time \(t_m\). In practical terms, \(R_m\) contains all the rows where (if supplied) the start time is less than \(t_m\) and the stop time is greater than or equal to \(t_m\). When two or more events are observed at the same time, the exact partial likelihood is given by

\(PL(\beta) = \prod_{m=1}^M\frac{\exp(\sum_{j \in D_m} w_j\mathbf{x}_j^T\beta)}{(\sum_{R^* : \mid R^* \mid = d_m} [\sum_{j \in R^*} w_j \exp(\mathbf{x}_j^T\beta)])^{\sum_{j \in D_m} w_j}}\)

where \(R_m\) is the risk set and \(D_m\) is the set of observations of size \(d_m\) with an observed event at time \(t_m\) respectively. Due to the combinatorial nature of the denominator, this exact partial likelihood becomes prohibitively expensive to calculate, leading to the common use of Efron’s and Breslow’s approximations.

Efron’s Approximation

Of the two approximations, Efron’s produces results closer to the exact combinatoric solution than Breslow’s. Under this approximation, the partial likelihood and log partial likelihood are defined as

\(PL(\beta) = \prod_{m=1}^M \frac{\exp(\sum_{j \in D_m} w_j\mathbf{x}_j^T\beta)}{\big[\prod_{k=1}^{d_m}(\sum_{j \in R_m} w_j \exp(\mathbf{x}_j^T\beta) - \frac{k-1}{d_m} \sum_{j \in D_m} w_j \exp(\mathbf{x}_j^T\beta))\big]^{(\sum_{j \in D_m} w_j)/d_m}}\)

\(pl(\beta) = \sum_{m=1}^M \big[\sum_{j \in D_m} w_j\mathbf{x}_j^T\beta - \frac{\sum_{j \in D_m} w_j}{d_m} \sum_{k=1}^{d_m} \log(\sum_{j \in R_m} w_j \exp(\mathbf{x}_j^T\beta) - \frac{k-1}{d_m} \sum_{j \in D_m} w_j \exp(\mathbf{x}_j^T\beta))\big]\)

Breslow’s Approximation

Under Breslow’s approximation, the partial likelihood and log partial likelihood are defined as

\(PL(\beta) = \prod_{m=1}^M \frac{\exp(\sum_{j \in D_m} w_j\mathbf{x}_j^T\beta)}{(\sum_{j \in R_m} w_j \exp(\mathbf{x}_j^T\beta))^{\sum_{j \in D_m} w_j}}\)

\(pl(\beta) = \sum_{m=1}^M \big[\sum_{j \in D_m} w_j\mathbf{x}_j^T\beta - (\sum_{j \in D_m} w_j)\log(\sum_{j \in R_m} w_j \exp(\mathbf{x}_j^T\beta))\big]\)

Cox Proportional Hazards Model Algorithm¶

H2O uses the Newton-Raphson algorithm to maximize the partial log-likelihood, an iterative procedure defined by the steps:

To add numeric stability to the model fitting calculations, the numeric: predictors and offsets are demeaned during the model fitting process.

Set an initial value, \(\beta^{(0)}\), for the coefficient vector and assume an initial log partial likelihood of \(- \infty\).
Increment iteration counter, \(n\), by 1.
Calculate the log partial likelihood, \(pl\big(\beta^{(n)}\big)\), at the current coefficient vector estimate.
Compare \(pl\big(\beta^{(n)}\big)\) to \(pl\big(\beta^{(n-1)}\big)\).

If \(pl\big(\beta^{(n)}\big) > pl\big(\beta^{(n-1)}\big)\), then accept the new coefficient vector, \(\beta^{(n)}\), as the current best estimate, \(\tilde{\beta}\), and set a new candidate coefficient vector to be \(\beta^{(n+1)} = \beta^{(n)} - \tt{step}\), where \(\tt{step} := H^{-1}(\beta^{(n)}) U(\beta^{(n)})\), which is the product of the inverse of the second derivative of \(pl\) times the first derivative of \(pl\) based upon the observed data.

If \(pl\big(\beta^{(n)}\big) \le pl\big(\beta^{(n-1)}\big)\), then set \(\tt{step} := \tt{step} / 2\) and \(\beta^{(n+1)} = \tilde{\beta} - \tt{step}\).

Repeat steps 2 - 4 until either

\(n = \tt{iter\ max}\) or

the log-relative error \(LRE\Big(pl\big(\beta^{(n)}\big), pl\big(\beta^{(n+1)}\big)\Big) >= \tt{lre\ min}\), where \(LRE(x, y) = - \log_{10}\big(\frac{\mid x - y \mid}{y}\big)\), if \(y \ne 0\)

\(LRE(x, y) = - \log_{10}(\mid x \mid)\), if \(y = 0\)

References¶

Andersen, P. and Gill, R. (1982). Cox’s regression model for counting processes, a large sample study. Annals of Statistics 10, 1100-1120.

Harrell, Jr. F.E., Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer-Verlag, 2001.

Therneau, T., Grambsch, P., Modeling Survival Data: Extending the Cox Model. Springer-Verlag, 2000.