The Hosmer-Lemeshow Goodness-of-Fit Test
Sufficient replication within subpopulations is required to
make the Pearson and deviance goodness-of-fit tests valid.
When there are one or more continuous predictors in the model,
the data are often too sparse to use these statistics. Hosmer and
Lemeshow (1989) proposed a statistic that they show, through
simulation, is distributed as chi-square when there is no replication
in any of the subpopulations. This test is only available for binary
First, the observations are sorted in increasing order of their estimated
event probability. The event is the response level identified in the
"Response Profiles" table as
"Ordered Value 1." The observations are then divided into
approximately ten groups according to the following scheme.
Let N be the total number of subjects. Let M be the target
number of subjects for each group given by
where [x] represents the integral value of x. If the
single-trial syntax is used, blocks of subjects are formed
of observations with identical values of the explanatory variables.
Blocks of subjects are not divided when being placed into groups.
Suppose there are n1 subjects in the first block and n2
subjects in the second block. The first block of subjects is placed
in the first group. Subjects in the second block are added to
the first group if
Otherwise, they are placed in the second group. In general,
suppose subjects of the (j-1)th block have been
placed in the kth group. Let c be the total number of
subjects currently in the kth group. Subjects for
the jth block (containing nj subjects) are also placed
in the kth group if
Otherwise, the nj subjects are put into the next group. In
addition, if the number of subjects in the last group does not
exceed [0.05 ×N] (half the target group size), the last
two groups are collapsed to form only one group.
Note that the number of groups, g, may be smaller than 10 if
there are fewer than 10 patterns of explanatory variables. There
must be at least three groups in order for the Hosmer-Lemeshow
statistic to be computed.
The Hosmer-Lemeshow goodness-of-fit statistic is obtained by
calculating the Pearson chi-square statistic from the
2×g table of observed and expected frequencies, where
g is the number of groups. The statistic is written
where Ni is the total frequency of subjects in the ith
Oi is the total frequency of event outcomes in the ith group,
and is the average estimated probability of
an event outcome for the ith group. The Hosmer-Lemeshow
statistic is then compared to a chi-square distribution with
(g-n) degrees of freedom, where the value of n can be specified in
the LACKFIT option in the MODEL statement. The default is n=2. Large values of (and small p-values) indicate a lack of fit of the model.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.