Example 35.7: Overdispersed Seeds Germination Data
In a seed germination test, seeds of two cultivars were
planted in pots of two soil conditions. The
following SAS statements create the data set seeds, which
contains the observed proportion of seeds that germinated
for various combinations of
cultivar and soil condition. Variable n represents the number
of seeds planted in a pot, and variable r represents the number
germinated. The indicator variables cult and soil
represent the cultivar and soil condition, respectively.
data seeds;
input pot n r cult soil;
cxs= cult * soil;
datalines;
1 16 8 0 0
2 51 26 0 0
3 45 23 0 0
4 39 10 0 0
5 36 9 0 0
6 81 23 1 0
7 30 10 1 0
8 39 17 1 0
9 28 8 1 0
10 62 23 1 0
11 51 32 0 1
12 72 55 0 1
13 41 22 0 1
14 12 3 0 1
15 13 10 0 1
16 79 46 1 1
17 30 15 1 1
18 51 32 1 1
19 74 53 1 1
20 56 12 1 1
;
PROC LOGISTIC is used to fit a logit model to the data,
with cult, soil, and cxs (cult
× soil interaction) as explanatory
variables. The option SCALE=NONE is specified to display
goodness-of-fit statistics.
proc logistic data=seeds;
model r/n=cult soil cxs/scale=none;
title 'Full Model With SCALE=NONE';
run;
Output 35.7.1: Results of the Model Fit for the Two-Way Layout
|
| Full Model With SCALE=NONE |
| Deviance and Pearson Goodness-of-Fit Statistics |
| Criterion |
DF |
Value |
Value/DF |
Pr > ChiSq |
| Deviance |
16 |
68.3465 |
4.2717 |
<.0001 |
| Pearson |
16 |
66.7617 |
4.1726 |
<.0001 |
| Number of events/trials observations: 20 |
| Model Fit Statistics |
| Criterion |
Intercept Only |
Intercept and Covariates |
| AIC |
1256.852 |
1213.003 |
| SC |
1261.661 |
1232.240 |
| -2 Log L |
1254.852 |
1205.003 |
| Testing Global Null Hypothesis: BETA=0 |
| Test |
Chi-Square |
DF |
Pr > ChiSq |
| Likelihood Ratio |
49.8488 |
3 |
<.0001 |
| Score |
49.1682 |
3 |
<.0001 |
| Wald |
47.7623 |
3 |
<.0001 |
| Analysis of Maximum Likelihood Estimates |
| Variable |
DF |
Parameter Estimate |
Standard Error |
Wald Chi-Square |
Pr > ChiSq |
Standardized Estimate |
Odds Ratio |
| Intercept |
1 |
-0.3788 |
0.1489 |
6.4730 |
0.0110 |
|
|
| cult |
1 |
-0.2956 |
0.2020 |
2.1412 |
0.1434 |
-0.0803 |
0.744 |
| soil |
1 |
0.9781 |
0.2128 |
21.1234 |
<.0001 |
0.2693 |
2.659 |
| cxs |
1 |
-0.1239 |
0.2790 |
0.1973 |
0.6569 |
-0.0319 |
0.883 |
|
Results of fitting the full factorial model are shown in
Output 35.7.1. Both Pearson
and deviance are
highly significant (p < 0.0001), suggesting that the model
does not fit well. If the link function and the model
specification are correct and if there are no outliers, then
the lack of fit may be due to overdispersion. Without
adjusting for the overdispersion, the standard errors are
likely to be underestimated, causing the Wald tests to be
too sensitive. In PROC LOGISTIC, there are three SCALE=
options to accommodate overdispersion. With unequal sample
sizes for the observations, SCALE=WILLIAMS is preferred. The
Williams model estimates a scale parameter
by
equating the value of Pearson
for the full model to
its approximate expected value. The full model considered
here is the model with cultivar, soil condition, and their
interaction. Using a full model reduces the risk of
contaminating
with lack of fit due to incorrect model
specification.
proc logistic data=seeds;
model r/n=cult soil cxs / scale=williams;
title 'Full Model With SCALE=WILLIAMS';
run;
Output 35.7.2: Williams' Model for Overdispersion
|
| Full Model With SCALE=WILLIAMS |
| Model Information |
| Data Set |
WORK.SEEDS |
| Response Variable (Events) |
r |
| Response Variable (Trials) |
n |
| Number of Observations |
20 |
| Weight Variable |
1 / ( 1 + 0.075941 * (n - 1) ) |
| Sum of Weights |
198.32164573 |
| Link Function |
Logit |
| Optimization Technique |
Fisher's scoring |
| Response Profile |
Ordered Value |
Binary Outcome |
Total Frequency |
Total Weight |
| 1 |
Event |
437 |
92.95346 |
| 2 |
Nonevent |
469 |
105.36819 |
| Model Convergence Status |
| Convergence criterion (GCONV=1E-8) satisfied. |
| Deviance and Pearson Goodness-of-Fit Statistics |
| Criterion |
DF |
Value |
Value/DF |
Pr > ChiSq |
| Deviance |
16 |
16.4402 |
1.0275 |
0.4227 |
| Pearson |
16 |
16.0000 |
1.0000 |
0.4530 |
| Number of events/trials observations: 20 |
| NOTE: |
Since the Williams method was used to accomodate overdispersion, the Pearson chi-squared statistic and the deviance can no longer be used to assess the goodness of fit of the model. |
|
| Model Fit Statistics |
| Criterion |
Intercept Only |
Intercept and Covariates |
| AIC |
276.155 |
273.586 |
| SC |
280.964 |
292.822 |
| -2 Log L |
274.155 |
265.586 |
| Testing Global Null Hypothesis: BETA=0 |
| Test |
Chi-Square |
DF |
Pr > ChiSq |
| Likelihood Ratio |
8.5687 |
3 |
0.0356 |
| Score |
8.4856 |
3 |
0.0370 |
| Wald |
8.3069 |
3 |
0.0401 |
| Analysis of Maximum Likelihood Estimates |
| Variable |
DF |
Parameter Estimate |
Standard Error |
Wald Chi-Square |
Pr > ChiSq |
Standardized Estimate |
Odds Ratio |
| Intercept |
1 |
-0.3926 |
0.2932 |
1.7932 |
0.1805 |
|
|
| cult |
1 |
-0.2618 |
0.4160 |
0.3963 |
0.5290 |
-0.0337 |
0.770 |
| soil |
1 |
0.8309 |
0.4223 |
3.8704 |
0.0491 |
0.1072 |
2.295 |
| cxs |
1 |
-0.0532 |
0.5835 |
0.0083 |
0.9274 |
-0.00609 |
0.948 |
Association of Predicted Probabilities and Observed Responses |
| Percent Concordant |
50.6 |
Somers' D |
0.258 |
| Percent Discordant |
24.8 |
Gamma |
0.343 |
| Percent Tied |
24.6 |
Tau-a |
0.129 |
| Pairs |
204953 |
c |
0.629 |
|
Results using Williams' method are
shown in Output 35.7.2.
The estimate of
is 0.075941 and is
given in the formula for the Weight Variable at the beginning of
the displayed output. Since neither cult nor cxs is
statistically significant
(p=0.5289 and p=0.9275, respectively),
a reduced model that contains only the soil condition factor is fitted,
with the observations weighted by 1/(1 + 0.075941 (N-1)).
This can be
done conveniently in PROC LOGISTIC by including the scale
estimate in the SCALE=WILLIAMS option as follows:
proc logistic data=seeds;
model r/n=soil / scale=williams(0.075941);
title 'Reduced Model With SCALE=WILLIAMS(0.075941)';
run;
Output 35.7.3: Reduced Model with Overdispersion Controlled
| Analysis of Maximum Likelihood Estimates |
| Variable |
DF |
Parameter Estimate |
Standard Error |
Wald Chi-Square |
Pr > ChiSq |
Standardized Estimate |
Odds Ratio |
| Intercept |
1 |
-0.5249 |
0.2076 |
6.3949 |
0.0114 |
|
|
| soil |
1 |
0.7910 |
0.2902 |
7.4284 |
0.0064 |
0.1021 |
2.206 |
|
Results of the reduced model fit are shown in Output 35.7.3.
Soil condition remains a significant factor (p=0.0064) for the
seed germination.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.