Book - Econometrics by Example - Damodar Gujarati - Chapter 18

Survival Analysis (SA)

1. Statistical techniques that go by various names  such as duration analysis (e.g. the length of time a person is unemployed or the lenght of an industrial strike), reliability or failure time analysis (how long a light bulb lasts before it burns out), transition analysis (from one qualitative state to another, such as marriage or divorce), hazard rate analysis (e.g. the conditional probability of event occurrence) or survival analysis (e.g. time until death from breast cancer)

2. The primary goals of survival analysis are: 1) to estimate and interpret survior or hazard functions from survival data 2) to assess the impact of explanatory variables on survival time

Terminology of survival analysis

Event: An event consists of some qualitatove change that occurs at a specific point in time.. The change must consist of a relatively sharp distinction between what precedes and what follows. An obvious example is death. Less obvious, but nonethless important

Duration spell: It is the length of time before an event occurs, such as the time

The cumulative distribution function (CDF) of time: Suppose a person is hospitalized and let T denote the time until he or she is discharged and let T denote the time until he or she is discharged from the hospital. If we treat T as a continious variable, the distribution of T is given by the CDF

. if  is differentiable, its density function can be expressed as



The survior function S(t): The probability of surviving past time t, defined as

The hazard function: consider the following   this equation is known as the hazard function. It goves the instantaneous rate of leaving the initial state per unit of time.

By definition of conditional probability





Since,



we can write

 

The hazard function is the ratio of the density function to the survior function for a random variable . Simply stated, it gives the prpbability that someone fails at time t, given that they have survived up to that point.

Some special problems associated with SA

1. Censoring - A frequently encountered problem in SA is that the data are often censored.  Unemeployment data some of them would have dropped out of the labor force.

In estimating the hazard function we have to take into account the censoring problems.

2. Hazard function with or without covariates
In SA our interest is not only in estimating the hazard function but also in trying to find out if it depends on some explanatory variables or covariates. We have to determine if the covariates are time-variant or invariant.

3. Duration dependence
If the hazard function is not constant, there is a duration dependence. If dh(t)/dt > 0, there is positive duration dependence. In this case the probability of exiting the inital state increases the longer is a person in the initial state.  E.g longer a person is unemployed, his or her probability of exiting the unemployment status increases in the case of positive duration dependence.

4. Unobserved heterogeneity
No matter how many covariates we consider, there may be intrinsic heterogenity among individuals and may have to account for this.

Modeling recidivism duration



Exponential probability distribution


Weibull probability distribution

A major drawback of the exponential probability distribution to model the hazard rate is that is assumes constant hazard rate - that is, a rate that is independent of time.  But if h(t) is not constant, we have a situation of duration dependence - a positive duration dependence if hazard rate increases with duration, and a negative duration dependence if this rate decreases with duration.

A probability distribution that takes into account duration dependence is the weibull probability distribution


and   

The proportional hazard model

A model that is quite popular in survival analysis is the  proportional hazard (PH) model, originally proposed by Cox. The PH model assumes that the hazard rate of the ith individual can be expressed as:


In PH the hazard function consists of two parts in multiplicative form: (1)  called the baseline hazard, is a function of duration time and (2) a part that is a function of explanatory variables