Chapter Contents |
Previous |
Next |

The CATMOD Procedure |

One word of caution about log-linear model analyses: sampling zeros in the input data set should be replaced by some positive number close to zero (such as 1E-20) to ensure that these sampling zeros are not treated as structural zeros. This can be performed in a DATA step that changes cell counts for sampling zeros to a very small number. Data containing sampling zeros should be analyzed with maximum likelihood estimation. See the "Cautions" section and Example 22.5 for further information and an illustration for both cell count data and raw data.

When you perform log-linear model analysis, you can request weighted least-squares estimates, maximum likelihood estimates, or both. By default, PROC CATMOD calculates maximum likelihood estimates when the default response functions are used. The following table provides appropriate MODEL statements for the combinations of types of estimates.

Estimation Desired |
MODEL Statement |

Maximum likelihood | `model a*b=_response_;` |

Weighted least squares | `model a*b=_response_ / wls;` |

Maximum likelihood and weighted least squares | `model a*b=_response_ / wls ml;` |

proc catmod; weight wt; model r1*r2=_response_; loglin r1|r2; run;

yield a maximum likelihood analysis of a saturated log-linear model for the dependent variables r1 and r2.

If you want to fit a reduced model with respect to the dependent variables (for example, a model of independence or conditional independence), specify the reduced model in the LOGLIN statement. For example, the statements

proc catmod; weight wt; model r1*r2=_response_ / pred; loglin r1 r2; run;

yield a main-effects log-linear model analysis of the factors r1 and r2. The output includes Wald statistics for the individual effects r1 and r2, as well as predicted cell probabilities. Moreover, the goodness-of-fit statistic is the likelihood ratio test for the hypothesis of independence between r1 and r2 or, equivalently, a test of r1*r2.

For example, suppose the dependent variables r1 and r2 are dichotomous, and the independent variable group has three levels. Then

proc catmod; weight wt; model r1*r2=_response_ group*_response_; loglin r1|r2; run;

specifies a saturated model (three degrees of freedom for _RESPONSE_ and six degrees of freedom for the interaction between _RESPONSE_ and group). From another point of view, _RESPONSE_*group can be regarded as a main effect for group with respect to the three response functions, while _RESPONSE_ can be regarded as an intercept effect with respect to the functions. In other words, these statements give essentially the same results as the logistic analysis:

proc catmod; weight wt; model r1*r2=group; run;

The ability to model the interaction between the independent and the dependent variables becomes particularly useful when a reduced model is specified for the dependent variables. For example,

proc catmod; weight wt; model r1*r2=_response_ group*_response_; loglin r1 r2; run;

specifies a model with two degrees of freedom for _RESPONSE_ (one for r1 and one for r2) and four degrees of freedom for the interaction of _RESPONSE_*group. The likelihood ratio goodness-of-fit statistic (three degrees of freedom) tests the hypothesis that r1 and r2 are independent in each of the three groups.

Chapter Contents |
Previous |
Next |
Top |

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.