|
Chapter Contents |
Previous |
Next |
| The LOGISTIC Procedure |
This section uses the following notation:
Pregibon suggests using the index plots of several diagnostic statistics to identify influential observations and to quantify the effects on various aspects of the maximum likelihood fit. In an index plot, the diagnostic statistic is plotted against the observation number. In general, the distributions of these diagnostic statistics are not known, so cutoff values cannot be given for determining when the values are large. However, the IPLOTS and INFLUENCE options provide displays of the diagnostic values allowing visual inspection and comparison of the values across observations. In these plots, if the model is correctly specified and fits all observations well, then no extreme points should appear.
The next five sections give formulas for these diagnostic statistics.

![\widetilde{w}_j & = & \frac{w_j n_j}{\hat{p}_j\hat{q}_j[g'(\hat{p}_j)]^2} \ \hat...
...\hat{q}_j-\hat{p}_j)g'(\hat{p}_j)]}
{ (\hat{p}_j\hat{q}_j)^2 [g'(\hat{p}_j)]^3}](images/lgseq200.gif)
g'(.) and g''(.) are the first and second derivatives of the link function g(.), respectively.
For a binary response logit model, the hat matrix diagonal elements are


![\hspace*{-0.5in} d_j= & \{ -\sqrt{-2w_jn_j\log(\hat{q}_j)} & {if }r_j=0 \ +-\sqr...
... ]} &
{if }0\lt r_j\lt n_j \ \sqrt{-2w_jn_j\log(\hat{p}_j)} & {if }r_j=n_j
. \](images/lgseq203.gif)
where the plus (minus) in
is used if
rj/nj is greater (less) than
. The deviance is the sum of squares of the
deviance residuals.

where
is the standard error of the ith
component of b, and
is
the ith component of the one-step difference

is the approximate change (b- bj1) in
the vector of parameter estimates due to the omission of the jth
observation.
The DFBETAs are useful in detecting observations that are causing
instability in the selected coefficients.
C and CBAR are confidence interval displacement diagnostics that provide scalar measures of the influence of individual observations on b. These diagnostics are based on the same idea as the Cook distance in linear regression theory, and by using the one-step estimate, C and CBAR for the jth observation are computed as


Typically, to use these statistics, you plot them against an index (as the IPLOT option does) and look for outliers.
DIFDEV and DIFCHISQ are diagnostics for detecting ill-fitted observations; in other words, observations that contribute heavily to the disagreement between the data and the predicted values of the fitted model. DIFDEV is the change in the deviance due to deleting an individual observation while DIFCHISQ is the change in the Pearson chi-square statistic for the same deletion. By using the one-step estimate, DIFDEV and DIFCHISQ for the jth observation are computed as


|
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.