Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The LOGISTIC Procedure

Example 39.8: Overdispersion

In a seed germination test, seeds of two cultivars were planted in pots of two soil conditions. The following SAS statements create the data set seeds, which contains the observed proportion of seeds that germinated for various combinations of cultivar and soil condition. Variable n represents the number of seeds planted in a pot, and variable r represents the number germinated. The indicator variables cult and soil represent the cultivar and soil condition, respectively.

   data seeds;
      input pot n r cult soil;
      datalines;
    1 16     8      0       0
    2 51    26      0       0
    3 45    23      0       0
    4 39    10      0       0
    5 36     9      0       0
    6 81    23      1       0
    7 30    10      1       0
    8 39    17      1       0
    9 28     8      1       0
   10 62    23      1       0
   11 51    32      0       1
   12 72    55      0       1
   13 41    22      0       1
   14 12     3      0       1
   15 13    10      0       1
   16 79    46      1       1
   17 30    15      1       1
   18 51    32      1       1
   19 74    53      1       1
   20 56    12      1       1
   ;

PROC LOGISTIC is used to fit a logit model to the data, with cult, soil, and cult × soil interaction as explanatory variables. The option SCALE=NONE is specified to display goodness-of-fit statistics.

   proc logistic data=seeds;
      model r/n=cult soil cult*soil/scale=none;
      title 'Full Model With SCALE=NONE';
   run;

Output 39.8.1: Results of the Model Fit for the Two-Way Layout

Full Model With SCALE=NONE
The LOGISTIC Procedure
Deviance and Pearson Goodness-of-Fit Statistics
Criterion DF Value Value/DF Pr > ChiSq
Deviance 16 68.3465 4.2717 <.0001
Pearson 16 66.7617 4.1726 <.0001
Number of events/trials observations: 20
Model Fit Statistics
Criterion Intercept
Only
Intercept
and
Covariates
AIC 1256.852 1213.003
SC 1261.661 1232.240
-2 Log L 1254.852 1205.003
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 49.8488 3 <.0001
Score 49.1682 3 <.0001
Wald 47.7623 3 <.0001
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Chi-Square Pr > ChiSq
Intercept 1 -0.3788 0.1489 6.4730 0.0110
cult 1 -0.2956 0.2020 2.1412 0.1434
soil 1 0.9781 0.2128 21.1234 <.0001
cult*soil 1 -0.1239 0.2790 0.1973 0.6569

Results of fitting the full factorial model are shown in Output 39.8.1. Both Pearson \chi^2 and deviance are highly significant (p < 0.0001), suggesting that the model does not fit well. If the link function and the model specification are correct and if there are no outliers, then the lack of fit may be due to overdispersion. Without adjusting for the overdispersion, the standard errors are likely to be underestimated, causing the Wald tests to be too sensitive. In PROC LOGISTIC, there are three SCALE= options to accommodate overdispersion. With unequal sample sizes for the observations, SCALE=WILLIAMS is preferred. The Williams model estimates a scale parameter \phi by equating the value of Pearson \chi^2 for the full model to its approximate expected value. The full model considered here is the model with cultivar, soil condition, and their interaction. Using a full model reduces the risk of contaminating \phi with lack of fit due to incorrect model specification.

   proc logistic data=seeds;
      model r/n=cult soil cult*soil / scale=williams;
      title 'Full Model With SCALE=WILLIAMS';
   run;

Output 39.8.2: Williams' Model for Overdispersion

Full Model With SCALE=WILLIAMS
The LOGISTIC Procedure
Model Information
Data Set WORK.SEEDS
Response Variable (Events) r
Response Variable (Trials) n
Number of Observations 20
Weight Variable 1 / ( 1 + 0.075941 * (n - 1) )
Sum of Weights 198.32164573
Link Function Logit
Optimization Technique Fisher's scoring
Response Profile
Ordered
Value
Binary Outcome Total
Frequency
Total
Weight
1 Event 437 92.95346
2 Nonevent 469 105.36819
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Deviance and Pearson Goodness-of-Fit Statistics
Criterion DF Value Value/DF Pr > ChiSq
Deviance 16 16.4402 1.0275 0.4227
Pearson 16 16.0000 1.0000 0.4530
Number of events/trials observations: 20
NOTE: Since the Williams method was used to accomodate overdispersion, the Pearson chi-squared statistic and the deviance can no longer be used to assess the goodness of fit of the model.
Model Fit Statistics
Criterion Intercept
Only
Intercept
and
Covariates
AIC 276.155 273.586
SC 280.964 292.822
-2 Log L 274.155 265.586
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 8.5687 3 0.0356
Score 8.4856 3 0.0370
Wald 8.3069 3 0.0401
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Chi-Square Pr > ChiSq
Intercept 1 -0.3926 0.2932 1.7932 0.1805
cult 1 -0.2618 0.4160 0.3963 0.5290
soil 1 0.8309 0.4223 3.8704 0.0491
cult*soil 1 -0.0532 0.5835 0.0083 0.9274
Association of Predicted Probabilities and
Observed Responses
Percent Concordant 50.6 Somers' D 0.258
Percent Discordant 24.8 Gamma 0.343
Percent Tied 24.6 Tau-a 0.129
Pairs 204953 c 0.629


Results using Williams' method are shown in Output 39.8.2. The estimate of \phi is 0.075941 and is given in the formula for the Weight Variable at the beginning of the displayed output. Since neither cult nor cult times soil is statistically significant (p=0.5290 and p=0.9274, respectively), a reduced model that contains only the soil condition factor is fitted, with the observations weighted by 1/(1 + 0.075941 (N-1)). This can be done conveniently in PROC LOGISTIC by including the scale estimate in the SCALE=WILLIAMS option as follows:

   proc logistic data=seeds;
      model r/n=soil / scale=williams(0.075941);
      title 'Reduced Model With SCALE=WILLIAMS(0.075941)';
   run;

Output 39.8.3: Reduced Model with Overdispersion Controlled
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Chi-Square Pr > ChiSq
Intercept 1 -0.5249 0.2076 6.3949 0.0114
soil 1 0.7910 0.2902 7.4284 0.0064


Results of the reduced model fit are shown in Output 39.8.3. Soil condition remains a significant factor (p=0.0064) for the seed germination.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.