Survival Analysis (SA)
1. Statistical techniques that go by various names such as duration analysis (e.g. the length of time a person is unemployed or the lenght of an industrial strike), reliability or failure time analysis (how long a light bulb lasts before it burns out), transition analysis (from one qualitative state to another, such as marriage or divorce), hazard rate analysis (e.g. the conditional probability of event occurrence) or survival analysis (e.g. time until death from breast cancer)
2. The primary goals of survival analysis are: 1) to estimate and interpret survior or hazard functions from survival data 2) to assess the impact of explanatory variables on survival time
Terminology of survival analysis
Event: An event consists of some qualitatove change that occurs at a specific point in time.. The change must consist of a relatively sharp distinction between what precedes and what follows. An obvious example is death. Less obvious, but nonethless important
Duration spell: It is the length of time before an event occurs, such as the time
The cumulative distribution function (CDF) of time: Suppose a person is hospitalized and let T denote the time until he or she is discharged and let T denote the time until he or she is discharged from the hospital. If we treat T as a continious variable, the distribution of T is given by the CDF
. if
is differentiable, its density function can be expressed as
![f(t) = \frac{\mathrm{d} F(t)}{\mathrm{d} t} = {F}'(t)](https://latex.codecogs.com/gif.latex?\inline&space;f(t)&space;=&space;\frac{\mathrm{d}&space;F(t)}{\mathrm{d}&space;t}&space;=&space;{F}%27(t))
The survior function S(t): The probability of surviving past time t, defined as
The hazard function: consider the following
this equation is known as the hazard function. It goves the instantaneous rate of leaving the initial state per unit of time.
By definition of conditional probability
Since,
![\lim_{h->0} \frac{F(t+h)-F(t)}{1 - F(t))} = {F}(t)' = f(t)](https://latex.codecogs.com/gif.latex?\inline&space;\lim_{h->0}&space;\frac{F(t+h)-F(t)}{1&space;-&space;F(t))}&space;=&space;{F}(t)'&space;=&space;f(t))
we can write
![h(t) = \frac{f(t)}{1-F(t)}](https://latex.codecogs.com/gif.latex?\inline&space;h(t)&space;=&space;\frac{f(t)}{1-F(t)})
The hazard function is the ratio of the density function to the survior function for a random variable . Simply stated, it gives the prpbability that someone fails at time t, given that they have survived up to that point.
Some special problems associated with SA
1. Censoring - A frequently encountered problem in SA is that the data are often censored. Unemeployment data some of them would have dropped out of the labor force.
In estimating the hazard function we have to take into account the censoring problems.
2. Hazard function with or without covariates
In SA our interest is not only in estimating the hazard function but also in trying to find out if it depends on some explanatory variables or covariates. We have to determine if the covariates are time-variant or invariant.
3. Duration dependence
If the hazard function is not constant, there is a duration dependence. If dh(t)/dt > 0, there is positive duration dependence. In this case the probability of exiting the inital state increases the longer is a person in the initial state. E.g longer a person is unemployed, his or her probability of exiting the unemployment status increases in the case of positive duration dependence.
4. Unobserved heterogeneity
No matter how many covariates we consider, there may be intrinsic heterogenity among individuals and may have to account for this.
Modeling recidivism duration
Exponential probability distribution
Weibull probability distribution
A major drawback of the exponential probability distribution to model the hazard rate is that is assumes constant hazard rate - that is, a rate that is independent of time. But if h(t) is not constant, we have a situation of duration dependence - a positive duration dependence if hazard rate increases with duration, and a negative duration dependence if this rate decreases with duration.
A probability distribution that takes into account duration dependence is the weibull probability distribution
and 1. Statistical techniques that go by various names such as duration analysis (e.g. the length of time a person is unemployed or the lenght of an industrial strike), reliability or failure time analysis (how long a light bulb lasts before it burns out), transition analysis (from one qualitative state to another, such as marriage or divorce), hazard rate analysis (e.g. the conditional probability of event occurrence) or survival analysis (e.g. time until death from breast cancer)
2. The primary goals of survival analysis are: 1) to estimate and interpret survior or hazard functions from survival data 2) to assess the impact of explanatory variables on survival time
Terminology of survival analysis
Event: An event consists of some qualitatove change that occurs at a specific point in time.. The change must consist of a relatively sharp distinction between what precedes and what follows. An obvious example is death. Less obvious, but nonethless important
Duration spell: It is the length of time before an event occurs, such as the time
The cumulative distribution function (CDF) of time: Suppose a person is hospitalized and let T denote the time until he or she is discharged and let T denote the time until he or she is discharged from the hospital. If we treat T as a continious variable, the distribution of T is given by the CDF
The survior function S(t): The probability of surviving past time t, defined as
The hazard function: consider the following
By definition of conditional probability
Since,
we can write
The hazard function is the ratio of the density function to the survior function for a random variable . Simply stated, it gives the prpbability that someone fails at time t, given that they have survived up to that point.
Some special problems associated with SA
1. Censoring - A frequently encountered problem in SA is that the data are often censored. Unemeployment data some of them would have dropped out of the labor force.
In estimating the hazard function we have to take into account the censoring problems.
2. Hazard function with or without covariates
In SA our interest is not only in estimating the hazard function but also in trying to find out if it depends on some explanatory variables or covariates. We have to determine if the covariates are time-variant or invariant.
3. Duration dependence
If the hazard function is not constant, there is a duration dependence. If dh(t)/dt > 0, there is positive duration dependence. In this case the probability of exiting the inital state increases the longer is a person in the initial state. E.g longer a person is unemployed, his or her probability of exiting the unemployment status increases in the case of positive duration dependence.
4. Unobserved heterogeneity
No matter how many covariates we consider, there may be intrinsic heterogenity among individuals and may have to account for this.
Modeling recidivism duration
Exponential probability distribution
Weibull probability distribution
A major drawback of the exponential probability distribution to model the hazard rate is that is assumes constant hazard rate - that is, a rate that is independent of time. But if h(t) is not constant, we have a situation of duration dependence - a positive duration dependence if hazard rate increases with duration, and a negative duration dependence if this rate decreases with duration.
A probability distribution that takes into account duration dependence is the weibull probability distribution
The proportional hazard model
A model that is quite popular in survival analysis is the proportional hazard (PH) model, originally proposed by Cox. The PH model assumes that the hazard rate of the ith individual can be expressed as: