# Objectives

Students who complete this unit will be able to

• identify the fundamental components of survival analysis
• understand the differences between semi-parametric and parametric survival models
• employ diagnostics for survival models
• identify recent extensions of survival models such as the incorporation of spatial dependence

# Introduction

Survival analysis is employed in a variety of disciplines, including biostatistics (the source of much of its terminology), economics, political science, psychology, and sociology. The common theme uniting this interest in survival analysis across these disciplines is a shared interest in explaining the timing of events. Among other interests, sociologists seek to explain why marriages terminate. Political scientists seek to explain why governments terminate. Economists seek to explain why trading relationships terminate. Survival analysis affords significant advantages in explaining event occurrence over alternative modeling approaches. The sections below detail the components of and concerns in survival analysis, which is also known alternatively as event history analysis and duration analysis (reflecting the many different disciplinary applications of this modeling approach).

# Components of Survival Models

Although many extensions have been developed for the estimation of survival models, at their heart these models are quite simple and straightforward. In fact, all survival models are built upon four interrelated components that define these models: the distribution and density functions, the survivor function, and the hazard rate. The distribution function defines the probability that a unit’s survival is less than or equal to a time t. The distribution function takes the form:

$F(t) = \text{Pr}(T \leq t).$

The survivor function then is the complement of this distribution function, as it defines the probability that a unit will survive to or beyond time t. In other words, the survivor function is equal to:

$S(t) = 1 - F(t) = \text{Pr}(T \geq t).$

If the distribution function is differentiable, we can then derive the probability density function, $f(t)$. As Box-Steffensmeier and Jones (1997, 1418) state, the probability density function may be interpreted as the instantaneous probability of the occurrence of an event T at time t.

With these definitions in hand, we can then examine the fourth component of survival models, the hazard rate, which is merely the density function divided by the survivor function:

$h(t) = \frac{f(t)}{S(t)}.$

As Box-Steffensmeier and Jones (2004, 14) note, "The hazard rate gives the rate at which units fail (or durations end) by t given that the unit had survived until t."

The hazard rate gives the rate at which units experience the event of interest if they have survived until time t.

# Censoring

One of the principal advantages of survival models over standard regression-based models is that the former can handle the case of censored observations: observations for units that experience the event of interest after the end of the study’s time frame. This is a critical advantage of survival models in that it is often the case that some units will only experience the event the analyst is interested in after the analyst’s study ends. Although the observed survival time for such censored units is equivalent to the observed survival time for units that experience the event exactly at the end of the study’s time frame, one would not want to treat these two sets of units as equivalent in the modeling of event timing. If one did, the result would be erroneous inferences regarding the effects of substantive covariates on event timing.

Survival models handle the frequent case of censored observations by including a censoring indicator. This censoring indicator measures whether the unit has experienced the event by the end of the study, or whether it is censored. A standard regression-based approach to modeling survival times suffers in comparison by not incorporating such an indicator, and as a consequence, treating units that only experience the event after the study ends as equivalent to those that experience the event at the end of the study.

# The Choice of Semi-Parametric vs. Parametric Models

Survival models can be dichotomized based upon how they treat the baseline hazard of event occurrence. Semi-parametric models (i.e., the Cox proportional hazards model) leave this baseline hazard unspecified. Alternatively, parametric models employ a specific distributional form for the baseline hazard, with this distribution depending upon the form that the model takes. Thus, for example, Weibull parametric models assume that the baseline hazard follows a Weibull distribution.

The choice between semi-parametric and parametric models is one of the most critical choices facing the survival analyst and merits particular attention in the modeling process. By not specifying a parametric distribution for the baseline hazard, the Cox model is quite flexible to alternative functional forms. This flexibility is a real advantage for the Cox model in many modeling settings, as the choice of an incorrect functional form can lead to erroneous inferences not only about the baseline hazard but also about the effects of substantive covariates. As a consequence, if the researcher is uninterested substantively in the form of the baseline hazard, she will often favor the Cox model over a (potentially misspecified) parametric alternative. An important caveat, here, however, is the potential importance of out-of-sample prediction. If the researcher is interested in predicting beyond the time-frame of her analysis, she may well favor a parametric model over a Cox model.

At times, however, researchers have substantive interests in the form of the baseline hazard. This is particularly the case when scholars have substantive research interests in the question of duration dependence. Duration dependence exists when the hazard of event occurrence is not constant over the time frame analyzed, but instead depends on how long the unit has been at risk. Negative duration dependence exists when the risk of event occurrence declines as time progresses. The risk of electoral defeat in congressional careers often exhibits negative duration dependence. Members of Congress are often at greatest risk of losing their seats in their first election after taking office, with the risk of seat loss declining as members accrue name recognition and influence in Washington. Positive duration dependence, in contrast, exists when the risk of event occurrence increases as time progresses. Most biostatistics applications exhibit positive duration dependence as the risk of disease increases with age.

If the researcher is substantively interested in duration dependence, she will often favor a parametric model over the Cox model.

# Parametric Models

## Exponential Model

The exponential model is perhaps the simplest parametric model, as it assumes that the baseline hazard is constant with regard to time. As Box-Steffensmeier and Jones (2004, 22) show, the exponential hazard takes the form:

$h(t) = \lambda t > 0, \lambda > 0,$ where $\lambda$ is a constant term and in the absence of covariates implies a constant hazard rate.

The assumption of time-invarying hazards, of course, will often not hold in practice. As a consequence, scholars interested in employing parametric models will often wish instead to employ an alternative parametric form. Several alternative parametric models, including the log-logistic, log-normal, and Gompertz models, have been employed in practice. Analysts interested in alternative parametric models are encouraged to consult Blossfeld, Hamerle, and Mayer (1989), Box-Steffensmeier and Jones (1997), and Box-Steffensmeier and Jones (2004). Here, I focus instead on the most widely-applied parametric model, the Weibull model.

## Weibull Model

The Weibull model is frequently employed due to its flexibility. The principal assumption that the Weibull model makes is that the baseline hazard is monotonic; however, this monotonic hazard can be either increasing with time, decreasing with time, or flat with respect to time. As a consequence, the Weibull can be applied to a variety of modeling applications, as long as the baseline hazard is monotonic.

As Box-Steffensmeier and Jones (2004, 25) show, the hazard rate for the Weibull model takes the form:

$h(t) = \lambda p (\lambda t)^{p - 1} t > 0, \lambda > 0, p > 0,$

"where lambda is a positive scale parameter and p is known as the shape parameter. . .When $p > 1$, the hazard rate is monotonically increasing with time; when $p < 1$, the hazard rate is monotonically decreasing with time; when $p = 1$, the hazard is flat, taking a constant value $\lambda$"(emphases in original). It is easy to see that the exponential model is nested within the Weibull model, as the Weibull form reduces to the exponential when the hazard is flat.

For ease of exposition, the survival models presented thus far have not included covariates. However, scholars typically believe that covariates increase or decrease the risk of event occurrence. The forms presented thus far can easily be modified to include the presence of covariates.

# Semi-Parametric Cox Proportional Hazards Model

Alternatively, scholars may prefer to employ the more flexible semi-parametric Cox model. Here, as Box-Steffensmeier and Jones (2004, 48) show, the hazard rate in the Cox model for the ith unit is:

$h_i(t) = h_0(t)exp(\beta'x),$

"where $h_0(t)$ is the baseline hazard function and $\beta'x$ are the covariates and regression parameters."

# Proportional Hazards Assumption

Many survival models carry the assumption that covariates have the same proportional effect on the hazard rate regardless of when these values are observed. Stated alternatively, this proportional hazards assumption, as Box-Steffensmeier and Zorn (2001, 973), assumes that “the effects of covariates are constant over time.” Violations of this proportional hazards assumption carry critical implications for analysts of survival processes. If hazards are assumed to be proportional, when in fact, they are not, the result is biased estimates of the effects of covariates and decreased power of significance tests. Specifically, when the non-proportional hazards diverge over time, the effects of covariates will be overestimated, and when the hazards converge over time, the effects of covariates will be underestimated (Box-Steffensmeier and Zorn 2001, 972, 974). As a consequence, scholars using many commonly employed survival modeling approaches that assume proportional hazards are advised to employ the tests for nonproportionality discussed below.

It is important to note that there is a common misconception regarding the proportional hazards assumption in survival models. Analysts often assume that the proportional hazards assumption is unique to the Cox model, no doubt because this model is often referred to as the Cox proportional hazards assumption. This nomenclature, however, should not obscure the fact that many frequently employed parametric models, including the exponential, Gompertz, and Weibull models also employ the proportional hazards assumption (Golub 2008, 536).

Importantly, however, while both the semiparametric Cox model and several of its popular parametric alternatives share the proportional hazards assumption, the tests for violations of the proportional hazards assumption differ between the Cox model and parametric models. In fact, tests for violations of proportionality are much more precise for the Cox model than for parametric models such as the exponential, Gompertz, and Weibull. The development of statistical tests for nonproportionality in the Cox model setting, and the absence of analogous tests for parametric models, has been cited as a reason for favoring the Cox model over its parametric alternatives that also assume proportional hazards (see Golub 2008, 536-539).

## Testing for Non-Proportional Hazards

Tests for nonproportional hazards in parametric models remain quite limited. The standard approach is to employ piecewise regressions to examine parameter instability across different discrete time periods in the analyst's data. Although substantive theory is one guide to the choice of discrete time periods, or regimes, in the data, such theoretical guidance is often lacking, leading to potentially arbitrary choices of regimes in the piecewise regressions approach.

In contrast to the parametric case, more precise statistical tests have been developed to test for nonproportionality in Cox models. Such tests proceed either at the global level (via Grambsch and Therneau’s global test for nonproportionality) or at the level of specific covariates (via Harrell’s rho test). Both sets of tests are based upon Schoenfeld residuals. As Box-Steffensmeier and Jones (2004, 121) state, these residuals from the Cox model “can essentially be thought of as the observed minus the expected values of the covariates at each failure time.” If these residuals are invariant with regard to time, there is then no evidence of non-proportionality for the model as a whole (in the Grambsch and Therneau test) or for a specific covariate in the model (in Harrell’s rho test). As Golub (2008, 537) notes, the global test has low power and may indicate proportional hazards even when specific covariates suffer from non-proportionality (see also Box-Steffensmeier, Reiter, and Zorn 2003, 45, cited in Golub 2008, 537).

Of course, a global test is less consequential, precise, and diagnostic than tests on individual covariates. Individual covariates are the level at which non-proportionality occurs and at which it must be addressed eventually anyway, by interacting the covariate(s) with a function of time. As a consequence, scholars interested in diagnosing and modeling non-proportional hazards will typically prefer a covariate-specific test. Harrell’s rho provides a covariate-specific test for non-proportionality based upon the correlation of the Schoenfeld residuals for the covariate examined and the rank of the survival time (Box-Steffensmeier and Jones 2004, 135, Harrell 1986). (Although a less precise identification of non-proportional hazards has been suggested via visual inspection of the Schoenfeld residuals plotted against time, there is little reason to favor such a visual approach over a statistical test, given human imprecision in identifying patterns visually).

If non-proportionality is diagnosed at the individual level via Harrell’s rho, this non-proportionality should be modeled. Typically this is done so by interacting the covariates exhibiting non-proportionality with a function of time. Typically the log of time is used. It is important to note that if several covariates exhibit non-proportionality, interacting these covariates with the same function of time will induce multicollinearity. It is also important to note that the interaction of covariates with time should not be viewed merely as a specification fix. Instead, the interaction of covariates with a function of time poses critical implications for our understanding of how the effects of these covariates varies over time. As a consequence, great care should be taken in interpreting the substantive effects of covariates exhibiting non-proportionality (see, for example, Licht 2011).

# Repeated Events Models

Many of the event processes of interest to social scientists are repeatable – that is, units may experience the event of interest multiple times. For example, sociologists interested in marriage examine subjects who may marry multiple times. Criminologists examine causes of recividism. Political scientists examine states that may experience multiple regime changes.

A variety of approaches have been proposed for the modeling of repeated events processes. Recently, Box-Steffensmeier, Linn, and Smidt (2011) have demonstrated the superiority of the conditional frailty model for repeated events processes. The conditional frailty model is an extension of the Cox model for repeated events employing a gap time framework (in which the time since previous event is employed for analysis rather than the total time since beginning of observation), a restricted risk set (in which only subjects experiencing k – 1 events are at risk of experiencing the kth event), and an event-specific baseline hazard. The conditional frailty model is particularly distinguished from alternative repeated events formulations in its use of random effects to account for within-subject correlation.

Box-Steffensmeier, Linn, and Smidt (2011) find that the conditional frailty model outperforms other estimators for repeated events such as the Anderson-Gill model, the conditional gap model and the elapsed time model in the presence of heterogeneity and event dependence. Moreover, the conditional frailty model is also flexible across differing numbers of cases and events, rates of censoring, and data generating processes. As a consequence, Box-Steffensmeier, Linn, and Smidt argue for the use of conditional frailty models for repeated events processes.

# Competing Risks Models

Just as units may experience the same type of event multiple times in a repeated events process, so also may units experience an event due to any of several alternative sources. For example, candidates may exit a presidential nomination race due to poor performance in polls, primaries, or fundraising, or due to scandal. Scholars would not wish to model the event of interest – exit from the campaign – without considering the multiple competing risks that may produce that event.

A variety of different modeling approaches have been employed for the case of competing risks. Blossfeld, Hamerle, and Mayer (1989) provide a thorough discussion of competing risks models. As they note, key to these models is the definition of a specific hazard rate for each of the different types of events that the units may experience. Either parametric or semi-parametric approaches may be employed to estimate competing risks models.

Although the competing risks are often treated as though they were independent, this assumption of independence is often not tenable in practice. For example, an autocratic regime may terminate either due to the calling of democratic elections, due to a popular uprising, or due to a coup. Regimes that are inclined to call democratic elections may be less likely to fall to popular uprisings, as an openness to fair elections reflects a thawing of the regime’s autocratic tendencies that may make a popular uprising less necessary. For such applications, scholars will wish to relax the assumption of independence in competing risks. Gordon (2002), for example, presents a competing risks framework in which the assumption of independence of risks is relaxed and the risks instead are allowed to exhibit stochastic dependence.

In some applications, units are at risk of experiencing any of a number of different types of events. For example, in biostatistics, patients may experience death due to a variety of different disease types. In political settings, elected officials may leave office due to retirement, defeat, or death. In sociology, marriages may end due either to divorce or death. When one is interested in modeling survival times, it is important to model these competing risks that the units face, for covariates may have fundamentally different effects for different event types. For example, in an application to congressional careers, the factors predicting retirements, such as age, may be quite distinct from the factors predicting defeat, such as the candidate representing a marginal district.

# Frailty models

Often scholars will be unable to model all of the factors shaping risk propensity. When important risk factors go unmodeled, the result is unmodeled heterogeneity in risk propensity. One way to account for this unmodeled heterogeneity in risk propensity is through random effects, or frailty terms. Borrowing from survival models’ foundations in biostatistics, the frailty terms capture the fact that some units are at greater risk of event occurrence – are more frail in biostatistical terminology – than are other units.

Frailty models may take the form of either individual or shared frailties. An individual frailty model allows for the possibility that each unit has its own distinct risk propensity – its own distinct frailty. Alternatively, units nested within higher level strata – e.g., counties nested within states – may exhibit a common shared frailty as a consequence of their membership in the higher-level stratum. Whether an individual or shared frailty model, the researcher’s interest is in whether the units share a common variance. A frailty variance parameter distinguishable from zero indicates that the units or strata do not share a common variance and thus exhibit heterogeneity in risk propensity. Typically the frailty terms are modeled as independent of each other. In practice, this assumption of independence will often not be valid. As a consequence, scholars may wish to model dependence across frailty terms, for example, via the spatial models discussed below.

# Spatial Dependence

Many of the event processes for which scholars employ survival models carry an inherent spatial dimension. These event processes occur at particular geographic locations and the spatial proximity between units shapes their propensity to experience the event of interest. For example, for political event processes such as policy adoptions or waves of democratization, an event occurrence in one location affects the probability that a similar event will occur at neighboring locations. Such processes, in short, carry with them an expectation of positive spatial dependence.

One way to model this spatial dependence is through a spatial frailty model (see Banerjee, Wall, and Carlin 2003, Banerjee, Carlin, and Gelfand 2004, Darmofal 2009). In contrast to standard frailty survival models such as those discussed above, the frailty terms are now no longer assumed to be independent. Instead, neighboring units are assumed to share similar unmeasured risk factors, producing spatial dependence in their frailties. This spatial dependence is often modeled in a Bayesian framework, with the spatial dependence in neighboring frailties captured via a conditionally autoregressive (CAR) prior.

A standard, non-spatial frailty model assumes an exchangeable prior, in that the locations of units are inconsequential for their risk propensities, and thus these locations are interchangeable (exchangeable) for the model at hand. Thus, for example, where neighbors would be defined in a spatial modeling context via a spatial weights matrix (with neighboring units taking a non-zero value and non-neighbors taking zero values), the exchangeable prior is consistent with a weights matrix in which all non-diagonal elements are given values of one (assuming that units cannot be their own neighbor) and thus the spatial location of units is inconsequential for the research question at hand.

The spatial CAR prior takes into account the spatial locations of units by modeling neighbors via a non-exchangeable spatial weights matrix. As a consequence, where the exchangeable prior displaces the random effects estimates toward a global mean by not taking into account the spatial locations of units, the spatial CAR prior displaces these random effects toward a local mean (Bernardinelli and Montomoli 1992, 989). Either individual or shared spatial frailties can be employed. Model choice (of spatial vs. non-spatial alternatives) can be evaluated via information criteria such as the Deviance Information Criteria (DIC).

# Dependence across Events

Another important type of dependence may occur across different types of events. Although most event processes are modeled as independent, this assumption of independence across processes is often difficult to justify. For example, as Hays and Kachi (2011) note, the duration of cabinet formations and the duration of subsequent cabinet survivals are likely to be correlated with each other. In important recent work, they develop a fill information maximum likelihood (FIML) estimator based on the Weibull distribution to model dependence across event processes. Their Monte Carlos demonstrate the importance of modeling dependence across processes and the bias induced by assuming that correlated processes are independent. In an important recent extension, Kachi (2011) extends this approach to handle the case of right-censored observations.

# Split-Population Models

In their typical forms, the semi-parametric and parametric models discussed above all carry the assumption that if the event process were observed for a sufficiently long period of time, all units would eventually experience the event. In many applications, however, this assumption is unlikely to hold. Consider, for example, congressional careers, where the event of interest is defeat in a congressional election. Because of the gerrymandering of congressional districts, most members of Congress represent safe districts and thus are unlikely to ever experience electoral defeat. A smaller set of members represent swing districts and are at much higher risk of losing their seats. Rather than treating these two sets of members as representing a single population, the analyst can employ a split-population or cure model that recognizes that the two sets represent distinct populations, with one at risk of electoral defeat and the other at essentially no risk of electoral defeat.

The split-population model, as developed by Schmidt and Witte (1989), incorporates this heterogeneity through the inclusion of a split parameter, delta, which indicates the probability that a unit will experience the event of interest. Specifically, the split-population model provides two separate sets of parameter estimates: one for the effects of covariates on the probability of occurrence of the event of interest and the other for the effects of covariates on the timing of the event of interest. Thus, for example, a split-population model would provide estimates of the effects of covariates on the occurrence of the end of a Congressional career and a second set of estimates of the effects of covariates on the timing of the end of a Congressional career.

An important point to note regarding split-population models is that they currently require estimation of parametric survival models. Cox models with split-populations are not identified. Thus, if one believes that some of her units will never experience the event of interest, it is particularly important to choose the correct parametric distribution for the baseline hazard.

# Conclusion

In summary, survival models provide a flexible and effective approach for modeling the event processes of interests to scholars in many disciplines. A variety of alternative models have been developed, as well as models for more complex competing risks or repeated events processes. These models have also been implemented in most standard statistical packages, placing them at the fingertips of applied researchers in a variety of disciplines.

# References

Banerjee, Sudipto, Bradley P. Carlin, and Alan E. Gelfand. 2004. Hierarchical Modeling and Analysis for Spatial Data. Boca Raton, FL: Chapman & Hall.

Banerjee, Sudipto, Melanie M. Wall, and Bradley P. Carlin. 2003. "Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota." Biostatistics 4(1): 123-42.

Bennett, D. Scott. 1999. "Parametric Models, Duration Dependence, and Time-Varying Data Revisited." American Journal of Political Science 43(1): 256-270.

Box-Steffensmeier, Janet M., and Bradford S. Jones. 1997. "Time is of the Essence: Event History Models in Political Science." American Journal of Political Science 41: 1414-1461.

Box-Steffensmeier, Janet M., and Bradford S. Jones. 2004. Event History Modeling: A Guide for Social Scientists. Cambridge: Cambridge University Press.

Box-Steffensmeier, Janet M., Suzanna Linn, and Corwin Smidt. 2011. "Analyzing the Robustness of Semi-Parametric Duration Models for the Study of Repeated Events." Working paper.

Box-Steffensmeier, Janet M., and Christopher J.W. Zorn. 2001. “Duration Models and Proportional Hazards in Political Science.” American Journal of Political Science 45(4): 951-67.

Darmofal, David. 2009. “Bayesian Spatial Survival Models for Political Event Processes.” American Journal of Political Science 53(1): 241-257.

Golub. Jonathan. 2008. “Survival Analysis.” In The Oxford Handbook of Political Methodology. Eds. Janet M. Box-Steffensmeier, Henry E. Brady, and David Collier. Oxford: Oxford University Press.

Gordon, Sanford C. 2002. "Stochastic Dependence in Competing Risks." American Journal of Political Science 46(1): 200-217.

Kachi, Aya. 2011. "Right Censoring in Interdependent Duration Models: The Possibility of Approximating a Joint Survivor Function Using Copulas." Working paper.

Licht, Amanda A. 2011. “Change Comes with Time: Substantive Interpretation of Nonproportional Hazards in Event History Analysis.” Political Analysis 19: 227-243.