Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The LOGISTIC Procedure

Example 35.7: Overdispersed Seeds Germination Data

In a seed germination test, seeds of two cultivars were planted in pots of two soil conditions. The following SAS statements create the data set seeds, which contains the observed proportion of seeds that germinated for various combinations of cultivar and soil condition. Variable n represents the number of seeds planted in a pot, and variable r represents the number germinated. The indicator variables cult and soil represent the cultivar and soil condition, respectively.

   data seeds;
      input pot n r cult soil;
      cxs= cult * soil;
      datalines;
    1 16     8      0       0
    2 51    26      0       0
    3 45    23      0       0
    4 39    10      0       0
    5 36     9      0       0
    6 81    23      1       0
    7 30    10      1       0
    8 39    17      1       0
    9 28     8      1       0
   10 62    23      1       0
   11 51    32      0       1
   12 72    55      0       1
   13 41    22      0       1
   14 12     3      0       1
   15 13    10      0       1
   16 79    46      1       1
   17 30    15      1       1
   18 51    32      1       1
   19 74    53      1       1
   20 56    12      1       1
   ;

PROC LOGISTIC is used to fit a logit model to the data, with cult, soil, and cxs (cult × soil interaction) as explanatory variables. The option SCALE=NONE is specified to display goodness-of-fit statistics.

   proc logistic data=seeds;
      model r/n=cult soil cxs/scale=none;
      title 'Full Model With SCALE=NONE';
   run;

Output 35.7.1: Results of the Model Fit for the Two-Way Layout

Full Model With SCALE=NONE

The LOGISTIC Procedure

Deviance and Pearson Goodness-of-Fit Statistics
Criterion DF Value Value/DF Pr > ChiSq
Deviance 16 68.3465 4.2717 <.0001
Pearson 16 66.7617 4.1726 <.0001

Number of events/trials observations: 20

Model Fit Statistics
Criterion Intercept
Only
Intercept
and
Covariates
AIC 1256.852 1213.003
SC 1261.661 1232.240
-2 Log L 1254.852 1205.003

Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 49.8488 3 <.0001
Score 49.1682 3 <.0001
Wald 47.7623 3 <.0001

Analysis of Maximum Likelihood Estimates
Variable DF Parameter
Estimate
Standard
Error
Wald
Chi-Square
Pr > ChiSq Standardized
Estimate
Odds
Ratio
Intercept 1 -0.3788 0.1489 6.4730 0.0110    
cult 1 -0.2956 0.2020 2.1412 0.1434 -0.0803 0.744
soil 1 0.9781 0.2128 21.1234 <.0001 0.2693 2.659
cxs 1 -0.1239 0.2790 0.1973 0.6569 -0.0319 0.883


Results of fitting the full factorial model are shown in Output 35.7.1. Both Pearson \chi^2 and deviance are highly significant (p < 0.0001), suggesting that the model does not fit well. If the link function and the model specification are correct and if there are no outliers, then the lack of fit may be due to overdispersion. Without adjusting for the overdispersion, the standard errors are likely to be underestimated, causing the Wald tests to be too sensitive. In PROC LOGISTIC, there are three SCALE= options to accommodate overdispersion. With unequal sample sizes for the observations, SCALE=WILLIAMS is preferred. The Williams model estimates a scale parameter \phi by equating the value of Pearson \chi^2 for the full model to its approximate expected value. The full model considered here is the model with cultivar, soil condition, and their interaction. Using a full model reduces the risk of contaminating \phi with lack of fit due to incorrect model specification.

   proc logistic data=seeds;
      model r/n=cult soil cxs / scale=williams;
      title 'Full Model With SCALE=WILLIAMS';
   run;

Output 35.7.2: Williams' Model for Overdispersion

Full Model With SCALE=WILLIAMS

The LOGISTIC Procedure

Model Information
Data Set WORK.SEEDS
Response Variable (Events) r
Response Variable (Trials) n
Number of Observations 20
Weight Variable 1 / ( 1 + 0.075941 * (n - 1) )
Sum of Weights 198.32164573
Link Function Logit
Optimization Technique Fisher's scoring

Response Profile
Ordered
Value
Binary Outcome Total
Frequency
Total
Weight
1 Event 437 92.95346
2 Nonevent 469 105.36819

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Deviance and Pearson Goodness-of-Fit Statistics
Criterion DF Value Value/DF Pr > ChiSq
Deviance 16 16.4402 1.0275 0.4227
Pearson 16 16.0000 1.0000 0.4530

Number of events/trials observations: 20

NOTE: Since the Williams method was used to accomodate overdispersion, the Pearson
chi-squared statistic and the deviance can no longer be used to assess
the goodness of fit of the model.

Model Fit Statistics
Criterion Intercept
Only
Intercept
and
Covariates
AIC 276.155 273.586
SC 280.964 292.822
-2 Log L 274.155 265.586

Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 8.5687 3 0.0356
Score 8.4856 3 0.0370
Wald 8.3069 3 0.0401

Analysis of Maximum Likelihood Estimates
Variable DF Parameter
Estimate
Standard
Error
Wald
Chi-Square
Pr > ChiSq Standardized
Estimate
Odds
Ratio
Intercept 1 -0.3926 0.2932 1.7932 0.1805    
cult 1 -0.2618 0.4160 0.3963 0.5290 -0.0337 0.770
soil 1 0.8309 0.4223 3.8704 0.0491 0.1072 2.295
cxs 1 -0.0532 0.5835 0.0083 0.9274 -0.00609 0.948

Association of Predicted Probabilities and
Observed Responses
Percent Concordant 50.6 Somers' D 0.258
Percent Discordant 24.8 Gamma 0.343
Percent Tied 24.6 Tau-a 0.129
Pairs 204953 c 0.629


Results using Williams' method are shown in Output 35.7.2. The estimate of \phi is 0.075941 and is given in the formula for the Weight Variable at the beginning of the displayed output. Since neither cult nor cxs is statistically significant (p=0.5289 and p=0.9275, respectively), a reduced model that contains only the soil condition factor is fitted, with the observations weighted by 1/(1 + 0.075941 (N-1)). This can be done conveniently in PROC LOGISTIC by including the scale estimate in the SCALE=WILLIAMS option as follows:

   proc logistic data=seeds;
      model r/n=soil / scale=williams(0.075941);
      title 'Reduced Model With SCALE=WILLIAMS(0.075941)';
   run;

Output 35.7.3: Reduced Model with Overdispersion Controlled
Analysis of Maximum Likelihood Estimates
Variable DF Parameter
Estimate
Standard
Error
Wald
Chi-Square
Pr > ChiSq Standardized
Estimate
Odds
Ratio
Intercept 1 -0.5249 0.2076 6.3949 0.0114    
soil 1 0.7910 0.2902 7.4284 0.0064 0.1021 2.206


Results of the reduced model fit are shown in Output 35.7.3. Soil condition remains a significant factor (p=0.0064) for the seed germination.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.