## Lack of Fit Tests

Two goodness-of-fit tests can be requested from the PROBIT
procedure -a Pearson chi-square test and a log-likelihood
ratio chi-square test.
If there is only a single continuous independent variable, the data
are internally sorted to group response values by the independent
variable. Otherwise, the data are aggregated into groupings that are
delimited whenever a change is observed in one of the independent
variables.

**Note:** Because of this grouping, the data set should be sorted by
the independent variables before the PROBIT procedure is run if the
LACKFIT option is specified.

If the Pearson goodness-of-fit chi-square test is requested and
the *p*-value for the test is too small, variances and covariances
are adjusted by a heterogeneity factor (the goodness-of-fit
chi-square divided by its degrees of freedom) and a critical value
from the *t* distribution is used to compute the fiducial limits.
The Pearson chi-square test statistic is computed as

where the sum on *i* is over grouping, the sum on *j* is over
levels of response, the *r*_{ij} is the frequency of response
level *j* for the *i*th grouping, *n*_{i} is the total
frequency for the *i*th grouping, and *p*_{ij} is the fitted
probability for the *j*th level at the *i*th grouping.
The log-likelihood ratio chi-square test statistic is computed as

This quantity is sometimes called the deviance.
If the modeled probabilities fit the data, these statistics
should be approximately distributed as chi-square with
degrees of freedom equal to (*k* - 1) ×*m* - *q*, where
*k* is the number of levels of the multinomial or binomial
response, *m* is the number of sets of independent variable values
(covariate patterns),
and *q* is the number of parameters fit in the model.
In order for the Pearson statistic and the deviance to be distributed as
chi-square, there must be sufficient replication within the groupings.
When this is not true, the data are sparse, and the *p*-values for
these statistics are not valid and should be ignored. Similarly, these
statistics, divided by their degrees of freedom, cannot serve as
indicators of overdispersion. A large difference between the
Pearson statistic and the deviance provides some evidence that the data
are too sparse to use either statistic.

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.