## Computational Method

The log-likelihood function is maximized by means
of a ridge-stabilized Newton-Raphson algorithm.
Initial parameter estimates are set to zero.
The INITIAL= and INTERCEPT= options in the MODEL
statement can be used to give nonzero initial estimates.
The log-likelihood function, *L*, is computed as

where the sum is over the observations in the data set,
*w*_{i} is the weight for the *i*th observation, and *p*_{i}
is the modeled probability of the observed response.
In the case of the events/trials syntax in the MODEL statement,
each observation contributes two terms corresponding to the
probability of the event and the probability of its complement:

where *r*_{i} is the number of events and *n*_{i} is the number
of trials for observation *i*.
This log-likelihood function differs from the log-likelihood
function for a binomial or multinomial
distribution by additive terms consisting of
the log of binomial or multinomial coefficients.
These terms are parameter-independent and do not affect
the model estimation or the standard errors and tests.
The estimated covariance matrix, **V**, of the parameter
estimates is computed as the negative inverse of the
information matrix of second derivatives of *L* with respect
to the parameters evaluated at the final parameter estimates.
Thus, the estimated covariance matrix is
derived from the observed information matrix rather than the
expected information matrix (these are generally not the same).
The standard error estimates for the parameter
estimates are taken as the square roots of the
corresponding diagonal elements of **V**.

For a classification effect, an overall
chi-square statistic is computed as

where **V**_{11} is the submatrix of **V**
corresponding to the indicator variables for the
classification effect and **b**_{1} is the vector of
parameter estimates corresponding to the classification effect.
This chi-square statistic has degrees of
freedom equal to the rank of **V**_{11}.
If some of the independent variables are perfectly
correlated with the response pattern, then the
theoretical parameter estimates may be infinite.
Although fitted probabilities of 0 and 1 are not
especially pathological, infinite parameter
estimates are required to yield these probabilities.
Due to the finite precision of computer arithmetic,
the actual parameter estimates are not infinite.
Indeed, since the tails of the distributions allowed in the PROBIT
procedure become small rapidly, an argument to
the cumulative distribution function of around 20
becomes effectively infinite.
In the case of such parameter estimates, the
standard error estimates and the corresponding
chi-square tests are not trustworthy.

The chi-square tests for the individual parameter
values are Wald tests based on the observed
information matrix and the parameter estimates.
The theory behind these tests assumes large samples.
If the samples are not large, it may be better
to base the tests on log-likelihood ratios.
These changes in log likelihood can be obtained by fitting
the model twice, once with all the parameters of interest
and once leaving out the parameters to be tested.
Refer to Cox and Oakes (1984) for a discussion
of the merits of some possible test methods.

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.