Chapter Contents Previous Next
 The LOGISTIC Procedure

## Example 39.5: Stratified Sampling

Consider the hypothetical example in Fleiss (1981, pp. 6 -7) in which a test is applied to a sample of 1000 people known to have a disease and to another sample of 1000 people known not to have the same disease. In the diseased sample, 950 test positive; in the nondiseased sample, only 10 test positive. If the true disease rate in the population is 1 in 100, specifying PEVENT=0.01 results in the correct false positive and negative rates for the stratified sampling scheme. Omitting the PEVENT= option is equivalent to using the overall sample disease rate (1000/2000 = 0.5) as the value of the PEVENT= option, which would ignore the stratified sampling.

The SAS code is as follows:

```   data Screen;
do Disease='Present','Absent';
do Test=1,0;
input Count @@;
output;
end;
end;
datalines;
950  50
10 990
;

proc logistic order=data data=Screen;
freq Count;
model Disease=Test / pevent=.5 .01 ctable pprob=.5;
run;
```

The ORDER=DATA option causes the Disease level of the first observation in the input data set to be the event. So, Disease='Present' is the event. The CTABLE option is specified to produce a classification table. Specifying PPROB=0.5 indicates a cutoff probability of 0.5. A list of two probabilities, 0.5 and 0.01, is specified for the PEVENT= option; 0.5 corresponds to the overall sample disease rate, and 0.01 corresponds to a true disease rate of 1 in 100.

The classification table is shown in Output 39.5.1.

Output 39.5.1: False Positive and False Negative Rates

 The LOGISTIC Procedure
 Classification Table ProbEvent ProbLevel Correct Incorrect Percentages Event Non-Event Event Non-Event Correct Sensi-tivity Speci-ficity False POS False NEG 0.500 0.500 950 990 10 50 97.0 95.0 99.0 1.0 4.8 0.010 0.500 950 990 10 50 99.0 95.0 99.0 51.0 0.1

In the classification table, the column "Prob Level" represents the cutoff values (the settings of the PPROB= option) for predicting whether an observation is an event. The "Correct" columns list the numbers of subjects that are correctly predicted as events and nonevents, respectively, and the "Incorrect" columns list the number of nonevents incorrectly predicted as events and the number of events incorrectly predicted as nonevents, respectively. For PEVENT=0.5, the false positive rate is 1% and the false negative rate is 4.8%. These results ignore the fact that the samples were stratified and incorrectly assume that the overall sample proportion of disease (which is 0.5) estimates the true disease rate. For a true disease rate of 0.01, the false positive rate and the false negative rate are 51% and 0.1%, respectively, as shown on the second line of the classification table.

 Chapter Contents Previous Next Top