Chapter Contents |
Previous |
Next |

The CATMOD Procedure |

If there is more than one dependent variable, and you specify RESPONSE MEANS, then the effective sample size for each response function is the same as the actual sample size. Thus, a sample size of 30 could be sufficient to support four response functions, provided that the functions are the means of four dependent variables.

- You can reduce the number of response functions according to how many can be supported by the populations with the smallest sample sizes.
- If there are three or more levels for any independent variable, you can pool the levels into a fewer number of categories, thereby reducing the number of populations. However, your interpretation of results must be done more cautiously since such pooling implies a different sampling scheme and masks any differences that existed among the pooled categories.
- If there are two or more independent variables, you can delete at least one of them from the model. However, this is just another form of pooling, and the same cautions that apply to the previous option also apply here.
- If there is one independent variable, then, in some situations, you might simply eliminate the populations that are causing the covariance matrices to be singular.
- You can use the ADDCELL option in the MODEL statement to add a small amount (for example, 0.5) to every cell frequency, but this can seriously bias the results if the cell frequencies are small.

For any log-linear model analysis, it is important to remember that PROC CATMOD creates response profiles only for those profiles that are actually observed. Thus, for any log-linear model analysis with one population (the usual case), there are no zeros in the contingency table, which means that the CATMOD procedure treats all zero frequencies as structural zeros. If there is more than one population, then a zero can appear in the body of the contingency table, in which case the zero is treated as a sampling zero (as long as some population has a nonzero count for that profile). If you want zero frequencies that PROC CATMOD would normally treat as structural zeros to be interpreted as sampling zeros, simply insert a one-line statement into the data step that changes each zero to a very small number (such as 1E-20). Refer to Bishop, Fienberg, and Holland (1975) for a discussion of the issues and Example 22.5 for an illustration of a log-linear model analysis of data that contain both structural and sampling zeros.

If you perform a weighted least-squares analysis on a contingency table that contains zero cell frequencies, then avoid using the LOG transformation as the first transformation on the observed proportions. In general, it may be better to change the response functions or to pool some of the response categories than to settle for the 0.5 correction or to use the ADDCELL option.

Warning: The _RESPONSE_ effect may be testing the wrong hypothesis since the marginal levels of the dependent variables do not coincide. Consult the response profiles and the CATMOD documentation.

The following examples illustrate situations in which the _RESPONSE_ effect tests the wrong hypothesis.

Suppose you specify the following statements:

data A1; input Time1 Time2 @@; datalines; 1 2 2 3 1 3 ; proc catmod; response marginals; model Time1*Time2=_response_; repeated Time 2 / _response_=Time; run;

One marginal probability is computed for each dependent variable, resulting in two response functions. The model is a saturated one: one degree of freedom for the intercept and one for the main effect of Time. Except for the warning message, PROC CATMOD produces an analysis with no apparent errors, but the "Response Profiles" table displayed by PROC CATMOD is as follows.

Response Profiles |
||

Response |
Time1 |
Time2 |

1 | 1 | 2 |

2 | 1 | 3 |

3 | 2 | 3 |

Since RESPONSE MARGINALS yields marginal probabilities
for every level but the last, the two response functions
being analyzed are Prob(Time1=1) and Prob(Time2=2).
Thus, the Time effect is testing the
hypothesis that Prob(Time1=1)=Prob(Time2=2).
What it *should* be testing is the hypothesis that

Prob(Time1=1) = Prob(Time2=1) Prob(Time1=2) = Prob(Time2=2) Prob(Time1=3) = Prob(Time2=3)

but there are not enough data to support the test (assuming that none of the probabilities are structural zeros by the design of the study).

Suppose you specify

data a1; input Time1 Time2 @@; datalines; 2 1 2 2 1 1 1 2 2 1 ; proc catmod order=data; response marginals; model Time1*Time2=_response_; repeated Time 2 / _response_=Time; run;

As in the preceding example, one marginal probability is computed for each dependent variable, resulting in two response functions. The model is also the same: one degree of freedom for the intercept and one for the main effect of Time. PROC CATMOD issues the warning message and displays the following "Response Profiles" table.

Response Profiles |
||

Response |
Time1 |
Time2 |

1 | 2 | 1 |

2 | 2 | 2 |

3 | 1 | 1 |

4 | 1 | 2 |

Although the marginal levels are the same for the two
dependent variables, they are not in the same order
because the ORDER=DATA option specified that they be
ordered according to their appearance in the input stream.
Since RESPONSE MARGINALS yields marginal probabilities for
every level except the last, the two response functions
being analyzed are Prob(Time1=2) and Prob(Time2=1).
Thus, the Time effect is testing the
hypothesis that Prob(Time1=2)=Prob(Time2=1).
What it *should* be testing is the hypothesis that

Prob(Time1=1) = Prob(Time2=1) Prob(Time1=2) = Prob(Time2=2)

Whenever the warning message appears, look at the "Response Profiles" table or the "One-Way Frequencies" table to determine what hypothesis is actually being tested. For the latter example, a correct analysis can be obtained by deleting the ORDER=DATA option or by reordering the data so that the (1,1) observation is first.

Chapter Contents |
Previous |
Next |
Top |

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.