Chapter Contents Previous Next
 The LOGISTIC Procedure

## Example 39.11: Complementary Log-Log Model for Interval-censored Survival Times

Often survival times are not observed more precisely than the interval (for instance, a day) within which the event occurred. Survival data of this form are known as grouped or interval-censored data. A discrete analogue of the continuous proportional hazards model (Prentice and Gloeckler 1978; Allison 1982) is used to investigate the relationship between these survival times and a set of explanatory variables.

Suppose Ti is the discrete survival time variable of the ith subject with covariates xi. The discrete-time hazard rate is defined as

Using elementary properties of conditional probabilities, it can be shown that

Suppose ti is the observed survival time of the ith subject. Suppose if Ti=ti is an event time and 0 otherwise. The likelihood for the grouped survival data is given by

where yij=1 if the ith subject experienced an event at time Ti=j and 0 otherwise.

Note that the likelihood L for the grouped survival data is the same as the likelihood of a binary response model with event probabilities .If the data are generated by a continuous-time proportional hazards model, Prentice and Gloeckler (1978) have shown that

where the coefficient vector is identical to that of the continuous-time proportional hazards model, and is a constant related to the conditional survival probability in the interval defined by Ti=j at xi = 0. The grouped data survival model is therefore equivalent to the binary response model with complementary log-log link function. To fit the grouped survival model using PROC LOGISTIC, you must treat each discrete time unit for each subject as a separate observation. For each of these observations, the response is dichotomous, corresponding to whether or not the subject died in the time unit.

Consider a study of the effect of insecticide on flour-beetles. Four different concentrations of an insecticide were sprayed on separate groups of flour-beetles. The numbers of male and female flour-beetles dying in successive intervals were saved in the data set beetles.

   data beetles(keep=time sex conc freq);
input time m20 f20 m32 f32 m50 f50 m80 f80;
conc=.20;
freq= m20; sex=1; output;
freq= f20; sex=2; output;
conc=.32;
freq= m32; sex=1; output;
freq= f32; sex=2; output;
conc=.50;
freq= m50; sex=1; output;
freq= f50; sex=2; output;
conc=.80;
freq= m80; sex=1; output;
freq= f80; sex=2; output;
datalines;
1   3   0  7  1  5  0  4  2
2  11   2 10  5  8  4 10  7
3  10   4 11 11 11  6  8 15
4   7   8 16 10 15  6 14  9
5   4   9  3  5  4  3  8  3
6   3   3  2  1  2  1  2  4
7   2   0  1  0  1  1  1  1
8   1   0  0  1  1  4  0  1
9   0   0  1  1  0  0  0  0
10   0   0  0  0  0  0  1  1
11   0   0  0  0  1  1  0  0
12   1   0  0  0  0  1  0  0
13   1   0  0  0  0  1  0  0
14 101 126 19 47  7 17  2  4
;


The data set beetles contains four variables: time, sex, conc, and freq. time represents the interval death time; for example, time=2 is the interval between day 1 and day 2. Insects surviving the duration (13 days) of the experiment are given a time value of 14. The variable sex represents the sex of the insects (1=male, 2=female), conc represents the concentration of the insecticide (mg/cm2), and freq represents the frequency of the observations.

To use PROC LOGISTIC with the grouped survival data, you must expand the data so that each beetle has a separate record for each day of survival. A beetle that died in the third day (time=3) would contribute three observations to the analysis, one for each day it was alive at the beginning of the day. A beetle that survives the 13-day duration of the experiment (time=14) would contribute 13 observations.

A new data set days that contains the beetle-day observations is created from the data set beetles. In addition to the variables sex, conc and freq, the data set contains an outcome variable y and 13 indicator variables day1, day2, ..., day13. y has a value of 1 if the observation corresponds to the day that the beetle died and has a value of 0 otherwise. An observation for the first day will have a value of 1 for day1 and a value of 0 for day2 -day13; an observation for the second day will have a value of 1 for day2 and a value of 0 for day1 and day2 -day13. For instance, Output 39.11.1 shows an observation in the beetles data set with time=3, and Output 39.11.2 shows the corresponding beetle-day observations in the data set days.

   data days;
retain day1-day13 0;
array dd[13] day1-day13;
set beetles;
if time = 14 then do day=1 to 13;
y=0; dd[day]=1;
output;
dd[day]=0;
end;
else do day=1 to time;
if day=time then y=1;
else y=0;
dd[day]=1;
output;
dd[day]=0;
end;


Output 39.11.1: An Observation with Time=3 in Data Set Beetles

 Obs time conc freq sex 17 3 0.2 10 1

Output 39.11.2: Corresponding Beetle-day Observations in Days

 Obs time conc freq sex day y day1 day2 day3 day4 day5 day6 day7 day8 day9 day10 day11 day12 day13 25 3 0.2 10 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 26 3 0.2 10 1 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 27 3 0.2 10 1 3 1 0 0 1 0 0 0 0 0 0 0 0 0 0

The following SAS statements invoke PROC LOGISTIC to fit a complementary log-log model for binary data with response variable Y and explanatory variables day1 -day13, sex, and conc. Since the values of y are coded 0 and 1, specifying the DESCENDING option ensures that the event (y=1) probability is modeled. The coefficients of day1 -day13 can be used to estimate the baseline survival function. The NOINT option is specified to prevent any redundancy in estimating the coefficients of day1 -day13. The Newton-Raphson algorithm is used for the maximum likelihood estimation of the parameters.

   proc logistic data=days descending outest=est1;
model y= day1-day13 sex conc / noint link=cloglog
technique=newton;
freq freq;
run;


Output 39.11.3: Parameter Estimates for the Grouped Proportional Hazards Model
 Analysis of Maximum Likelihood Estimates Parameter DF Estimate StandardError Chi-Square Pr > ChiSq day1 1 -3.9314 0.2934 179.5602 <.0001 day2 1 -2.8751 0.2412 142.0596 <.0001 day3 1 -2.3985 0.2299 108.8833 <.0001 day4 1 -1.9953 0.2239 79.3960 <.0001 day5 1 -2.4920 0.2515 98.1470 <.0001 day6 1 -3.1060 0.3037 104.5799 <.0001 day7 1 -3.9704 0.4230 88.1107 <.0001 day8 1 -3.7917 0.4007 89.5233 <.0001 day9 1 -5.1540 0.7316 49.6329 <.0001 day10 1 -5.1350 0.7315 49.2805 <.0001 day11 1 -5.1131 0.7313 48.8834 <.0001 day12 1 -5.1029 0.7313 48.6920 <.0001 day13 1 -5.0951 0.7313 48.5467 <.0001 sex 1 -0.5651 0.1141 24.5477 <.0001 conc 1 3.0918 0.2288 182.5665 <.0001

Results of the model fit are given in Output 39.11.3. Both sex and conc are statistically significant for the survival of beetles sprayed by the insecticide. Female beetles are more resilient to the chemical than male beetles, and increased concentration increases the effectiveness of the insecticide.

The coefficients of day1 -day13 are the maximum likelihood estimates of , respectively. The baseline survivor function S0(t) is estimated by

and the survivor function for a given covariate pattern (sex=x1 and conc=x2) is estimated by

The following DATA step computes the survivor curves for male and female flour-beetles exposed to the insecticide of concentrations .20 mg/cm2 and .80 mg/cm2. The GPLOT procedure in SAS/GRAPH software is used to plot the survival curves. Instead of plotting them as step functions, the SPLINE option is used to smooth the curves. These smoothed survival curves are displayed in Output 39.11.4.

   data one (keep=day survival element s_m20 s_f20 s_m80 s_f80);
array dd day1-day13;
array sc[4] m20 f20 m80 f80;
array s_sc[4] s_m20 s_f20 s_m80 s_f80 (1 1 1 1);
set est1;
m20= exp(sex + .20 * conc);
f20= exp(2 * sex + .20 * conc);
m80= exp(sex + .80 * conc);
f80= exp(2 * sex + .80 * conc);
survival=1;
day=0;
output;
do over dd;
element= exp(-exp(dd));
survival= survival * element;
do i=1 to 4;
s_sc[i] = survival ** sc[i];
end;
day + 1;
output;
end;
label s_m20= 'Male at .20 conc.'
s_m80= 'Male at .80 conc.'
s_f20= 'Female at .20 conc.'
s_f80= 'Female at .80 conc.';
run;

title1 'Flour-Beetles Sprayed with Insecticide';
legend1 label=none frame cframe=ligr cborder=black
position=center value=(justify=center);
axis1 label=(angle=90 'Survival Function');

proc gplot data=one;
plot (s_m20 s_f20 s_m80 s_f80) * day
/ overlay legend=legend1 vaxis=axis1 cframe=ligr;
symbol1  v=dot i=spline c=black  height=.8;
symbol2  v=dot i=spline c=red    height=.8;
symbol3  v=dot i=spline c=blue   height=.8;
symbol4  v=dot i=spline c=yellow height=.8;
run;


The probability of survival is displayed on the vertical axis. Notice that most of the insecticide effect occurs by day 6 for both the high and low concentrations.

Output 39.11.4: Predicted Survival at Concentrations of 0.20 and 0.80 mg/cm2

 Chapter Contents Previous Next Top