Chapter Contents 
Previous 
Next 
The CATMOD Procedure 
data one; input A $ B $ wt @@; datalines; yes yes 23 yes no 31 no yes 47 no no 50 ; proc catmod; weight wt; population B; model A=(1 0, 1 1); run;
Since the dependent variable A has two levels, there is one response function per population. Since the variable B has two levels, there are two populations. Thus, the MODEL statement is valid since the number of rows in the design matrix (2) is the same as the total number of response functions. If the POPULATION statement is omitted, there would be only one population and one response function, and the MODEL statement would be invalid.
To illustrate the second use, suppose that you specify
data two; input A $ B $ Y wt @@; datalines; yes yes 1 23 yes yes 2 63 yes no 1 31 yes no 2 70 no yes 1 47 no yes 2 80 no no 1 50 no no 2 84 ; proc catmod; weight wt; model Y=A B A*B / wls; run;
These statements induce four populations and produce the following design matrix and analysis of variance table.

Since the B and A*B effects are nonsignificant (p>0.10), you may want to fit the reduced model that contains only the A effect. If your new statements are
proc catmod; weight wt; model Y=A / wls; run;
then only two populations are induced, and the design matrix and the analysis of variance table are as follows.

However, if the new statements are
proc catmod; weight wt; population A B; model Y=A / wls; run;
then four populations are induced, and the design matrix and the analysis of variance table are as follows.

The advantage of the latter analysis is that it retains four populations for the reduced model, thereby creating a builtin goodnessoffit test: the residual chisquare. Such a test is important because the cumulative (or joint) effect of deleting two or more effects from the model may be significant, even if the individual effects are not.
The resulting differences between the two analyses are due to the fact that the latter analysis uses pure weighted leastsquares estimates with respect to the four populations that are actually sampled. The former analysis pools populations and therefore uses parameter estimates that can be regarded as weighted leastsquares estimates of maximum likelihood predicted cell frequencies. In any case, the estimation methods are asymptotically equivalent; therefore, the results are very similar. If you specify the ML option (instead of the WLS option) in the MODEL statements, then the parameter estimates are identical for the two analyses.
Chapter Contents 
Previous 
Next 
Top 
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.