Example 39.8: Overdispersion
In a seed germination test, seeds of two cultivars were
planted in pots of two soil conditions. The
following SAS statements create the data set seeds, which
contains the observed proportion of seeds that germinated
for various combinations of
cultivar and soil condition. Variable n represents the number
of seeds planted in a pot, and variable r represents the number
germinated. The indicator variables cult and soil
represent the cultivar and soil condition, respectively.
data seeds;
input pot n r cult soil;
datalines;
1 16 8 0 0
2 51 26 0 0
3 45 23 0 0
4 39 10 0 0
5 36 9 0 0
6 81 23 1 0
7 30 10 1 0
8 39 17 1 0
9 28 8 1 0
10 62 23 1 0
11 51 32 0 1
12 72 55 0 1
13 41 22 0 1
14 12 3 0 1
15 13 10 0 1
16 79 46 1 1
17 30 15 1 1
18 51 32 1 1
19 74 53 1 1
20 56 12 1 1
;
PROC LOGISTIC is used to fit a logit model to the data,
with cult, soil, and cult
× soil interaction as explanatory
variables. The option SCALE=NONE is specified to display
goodnessoffit statistics.
proc logistic data=seeds;
model r/n=cult soil cult*soil/scale=none;
title 'Full Model With SCALE=NONE';
run;
Output 39.8.1: Results of the Model Fit for the TwoWay Layout
Full Model With SCALE=NONE 
Deviance and Pearson GoodnessofFit Statistics 
Criterion 
DF 
Value 
Value/DF 
Pr > ChiSq 
Deviance 
16 
68.3465 
4.2717 
<.0001 
Pearson 
16 
66.7617 
4.1726 
<.0001 
Number of events/trials observations: 20 
Model Fit Statistics 
Criterion 
Intercept Only 
Intercept and Covariates 
AIC 
1256.852 
1213.003 
SC 
1261.661 
1232.240 
2 Log L 
1254.852 
1205.003 
Testing Global Null Hypothesis: BETA=0 
Test 
ChiSquare 
DF 
Pr > ChiSq 
Likelihood Ratio 
49.8488 
3 
<.0001 
Score 
49.1682 
3 
<.0001 
Wald 
47.7623 
3 
<.0001 
Analysis of Maximum Likelihood Estimates 
Parameter 
DF 
Estimate 
Standard Error 
ChiSquare 
Pr > ChiSq 
Intercept 
1 
0.3788 
0.1489 
6.4730 
0.0110 
cult 
1 
0.2956 
0.2020 
2.1412 
0.1434 
soil 
1 
0.9781 
0.2128 
21.1234 
<.0001 
cult*soil 
1 
0.1239 
0.2790 
0.1973 
0.6569 

Results of fitting the full factorial model are shown in
Output 39.8.1. Both Pearson and deviance are
highly significant (p < 0.0001), suggesting that the model
does not fit well. If the link function and the model
specification are correct and if there are no outliers, then
the lack of fit may be due to overdispersion. Without
adjusting for the overdispersion, the standard errors are
likely to be underestimated, causing the Wald tests to be
too sensitive. In PROC LOGISTIC, there are three SCALE=
options to accommodate overdispersion. With unequal sample
sizes for the observations, SCALE=WILLIAMS is preferred. The
Williams model estimates a scale parameter by
equating the value of Pearson for the full model to
its approximate expected value. The full model considered
here is the model with cultivar, soil condition, and their
interaction. Using a full model reduces the risk of
contaminating with lack of fit due to incorrect model
specification.
proc logistic data=seeds;
model r/n=cult soil cult*soil / scale=williams;
title 'Full Model With SCALE=WILLIAMS';
run;
Output 39.8.2: Williams' Model for Overdispersion
Full Model With SCALE=WILLIAMS 
Model Information 
Data Set 
WORK.SEEDS 
Response Variable (Events) 
r 
Response Variable (Trials) 
n 
Number of Observations 
20 
Weight Variable 
1 / ( 1 + 0.075941 * (n  1) ) 
Sum of Weights 
198.32164573 
Link Function 
Logit 
Optimization Technique 
Fisher's scoring 
Response Profile 
Ordered Value 
Binary Outcome 
Total Frequency 
Total Weight 
1 
Event 
437 
92.95346 
2 
Nonevent 
469 
105.36819 
Model Convergence Status 
Convergence criterion (GCONV=1E8) satisfied. 
Deviance and Pearson GoodnessofFit Statistics 
Criterion 
DF 
Value 
Value/DF 
Pr > ChiSq 
Deviance 
16 
16.4402 
1.0275 
0.4227 
Pearson 
16 
16.0000 
1.0000 
0.4530 
Number of events/trials observations: 20 
NOTE: 
Since the Williams method was used to accomodate overdispersion, the Pearson chisquared statistic and the deviance can no longer be used to assess the goodness of fit of the model. 

Model Fit Statistics 
Criterion 
Intercept Only 
Intercept and Covariates 
AIC 
276.155 
273.586 
SC 
280.964 
292.822 
2 Log L 
274.155 
265.586 
Testing Global Null Hypothesis: BETA=0 
Test 
ChiSquare 
DF 
Pr > ChiSq 
Likelihood Ratio 
8.5687 
3 
0.0356 
Score 
8.4856 
3 
0.0370 
Wald 
8.3069 
3 
0.0401 
Analysis of Maximum Likelihood Estimates 
Parameter 
DF 
Estimate 
Standard Error 
ChiSquare 
Pr > ChiSq 
Intercept 
1 
0.3926 
0.2932 
1.7932 
0.1805 
cult 
1 
0.2618 
0.4160 
0.3963 
0.5290 
soil 
1 
0.8309 
0.4223 
3.8704 
0.0491 
cult*soil 
1 
0.0532 
0.5835 
0.0083 
0.9274 
Association of Predicted Probabilities and Observed Responses 
Percent Concordant 
50.6 
Somers' D 
0.258 
Percent Discordant 
24.8 
Gamma 
0.343 
Percent Tied 
24.6 
Taua 
0.129 
Pairs 
204953 
c 
0.629 

Results using Williams' method are
shown in Output 39.8.2.
The estimate of is 0.075941 and is
given in the formula for the Weight Variable at the beginning of
the displayed output. Since neither cult nor cult times
soil is
statistically significant
(p=0.5290 and p=0.9274, respectively),
a reduced model that contains only the soil condition factor is fitted,
with the observations weighted by 1/(1 + 0.075941 (N1)).
This can be
done conveniently in PROC LOGISTIC by including the scale
estimate in the SCALE=WILLIAMS option as follows:
proc logistic data=seeds;
model r/n=soil / scale=williams(0.075941);
title 'Reduced Model With SCALE=WILLIAMS(0.075941)';
run;
Output 39.8.3: Reduced Model with Overdispersion Controlled
Analysis of Maximum Likelihood Estimates 
Parameter 
DF 
Estimate 
Standard Error 
ChiSquare 
Pr > ChiSq 
Intercept 
1 
0.5249 
0.2076 
6.3949 
0.0114 
soil 
1 
0.7910 
0.2902 
7.4284 
0.0064 

Results of the reduced model fit are shown in Output 39.8.3.
Soil condition remains a significant factor (p=0.0064) for the
seed germination.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.