Chapter Contents |
Previous |
Next |

The CATMOD Procedure |

**MODEL***response-effect=< design-effects >< / options >***;**

*response-effect*- can be either a single
variable, a crossed effect with two or more variables joined
by asterisks, or _F_. The _F_ specification indicates
that the response functions and their estimated covariance
matrix are to be read directly into the procedure. The
*response-effect*indicates the dependent variables that determine the response categories (the columns of the underlying contingency table). *design-effects*- specify potential sources of variation (such as main effects
and interactions) in the model. Thus, these effects
determine the number of model parameters, as well as the
interpretation of such parameters. In addition, if there is
no POPULATION statement, PROC CATMOD uses these variables to
determine the populations (the rows of the underlying
contingency table). When fitting the model, PROC CATMOD
adjusts the independent effects in the model for all other
independent effects in the model.
*Design-effects*can be any of those described in the section "Specification of Effects", or they can be defined by specifying the actual design matrix, enclosed in parentheses (see the "Specifying the Design Matrix Directly" section). In addition, you can use the keyword _RESPONSE_ alone or as part of an effect. Effects cannot be nested within _RESPONSE_, so effects of the form A(_RESPONSE_) are invalid.

For more information, see the "Log-Linear Model Analysis" section and the "Repeated Measures Analysis" section.

Some examples of MODEL statements are

`model r=a b;` | main effects only |

`model r=a b a*b;` | main effects with interaction |

`model r=a b(a);` | nested effect |

`model r=a|b;` | complete factorial |

`model r=a b(a=1) b(a=2);` | nested-by-value effects |

`model r*s=_response_;` | log-linear model |

`model r*s=a _response_(a);` | nested repeated measurement factor |

`model _f_=_response_;` | direct input of the response functions |

The relationship between these specifications and the structure of the design matrix

The following table summarizes the options available in the MODEL statement.

Task |
Options |

Specify details of computation | |

Generates maximum likelihood estimates | ML |

Generates weighted least-squares estimates | GLS |

WLS | |

Omits intercept term from the model | NOINT |

Adds a number to each cell frequency | ADDCELL= |

Averages main effects across response functions | AVERAGED |

Specifies the convergence criterion for maximum likelihood | EPSILON= |

Specifies the number of iterations for maximum likelihood | MAXITER= |

Request additional
computation and tables | |

Estimated correlation matrix of estimates | CORRB |

Covariance matrix of response functions | COV |

Estimated covariance matrix of estimates | COVB |

Two-way frequency tables | FREQ |

One-way frequency tables | ONEWAY |

Predicted values | PRED= |

PREDICT | |

Probability estimates | PROB |

Crossproducts matrix | XPX |

Title | TITLE= |

Suppress output | |

Design matrix | NODESIGN |

Iterations for maximum likelihood | NOITER |

Parameter estimates | NOPARM |

Population and response profiles | NOPROFILE |

_RESPONSE_ matrix | NORESPONSE |

The following list describes these options in alphabetical order.

**ADDCELL=***number*-
adds
*number*to the frequency count in each cell, where*number*is any positive number. This option has no effect on maximum likelihood analysis; it is used only for weighted least-squares analysis. **AVERAGED**-
specifies that dependent variable effects can be modeled
and that independent variable main effects are averaged
across the response functions in a population.
For further information on the effect of using
(or not using) the AVERAGED option, see
the "Generation of the Design Matrix" section.
Direct input of the design matrix or specification
of the _RESPONSE_ keyword in the MODEL statement
automatically induces an AVERAGED model type.
**CORRB**-
displays the estimated correlation matrix of the parameter
estimates.
**COV**-
displays
**S**_{i}, which is the covariance matrix of the response functions for each population. **COVB**-
displays the estimated covariance matrix of the parameter
estimates.
**EPSILON=***number*-
specifies the convergence criterion for the
maximum likelihood estimation of the parameters. The
iterative estimation process stops when the proportional
change in the log likelihood is less than
*number*, or after the number of iterations specified by the MAXITER= option, whichever comes first. By default, EPSILON=1E-8. **FREQ**-
produces the two-way frequency table for the
cross-classification of populations by responses.
**MAXITER=***number*-
specifies the maximum number of iterations used for
the maximum likelihood estimation of the parameters.
By default, MAXITER=20.
**ML**-
computes maximum likelihood estimates. This option is
available when generalized logits are used, or for the
special case of a single two-level dependent variable where
cumulative logits or adjacent category logits are used. For
generalized logits (the default response functions), ML is
the default estimation method.
**NODESIGN**-
suppresses the display of the design matrix
**X**. **NOINT**-
suppresses the intercept term in the model.
**NOITER**-
suppresses the display of parameter estimates and other
information at each iteration of a maximum likelihood analysis.
**NOPARM**-
suppresses the display of the estimated parameters and
the statistics for testing that each parameter is zero.
**NOPREDVAR**-
suppresses the display of the variable levels in
tables requested with the PRED= option.
**NOPRINT**-
suppresses the normal display of results.
The NOPRINT option is useful when you only want to create
output data sets with the OUT= or OUTEST= option in the
RESPONSE statement. A NOPRINT
option is also available in the PROC CATMOD statement.
Note that this option
temporarily disables the Output Delivery
System (ODS); see Chapter 15, "Using the Output Delivery System," for
more information.
**NOPROFILE**-
suppresses the display of the population
profiles and the response profiles.
**NORESPONSE**-
suppresses the display of the _RESPONSE_ matrix for
log-linear models. For further information, see
the "Log-Linear Model Design Matrices" section.
**ONEWAY**-
produces a one-way table of frequencies for each variable
used in the analysis. This table is useful in determining
the order of the observed levels for each variable.
**PREDICT****PRED=FREQ | PROB**-
displays the observed and predicted values of the response
functions for each population, together with their standard
errors and the residuals (observed - predicted). In
addition, if the response functions are the standard ones
(generalized logits), then the PRED=FREQ option specifies
the computation and display of predicted cell frequencies,
while PRED=PROB (or just PREDICT) specifies the computation
and display of predicted cell probabilities.

The OUT= data set always contains the predicted probabilities. If the response functions are the generalized logits, the predicted cell probabilities are output unless the option PRED=FREQ is specified, in which case the predicted cell frequencies are output. **PROB**-
produces the two-way table of probability estimates for
the cross-classification of populations by responses.
These estimates sum to one across the
response categories for each population.
**TITLE=***'title'*-
displays the
*title*at the top of certain pages of output that correspond to this MODEL statement. **WLS****GLS**-
computes weighted least-squares estimates. This type of
estimation is also called generalized-least-squares
estimation. For response functions other than the default
(of generalized logits), WLS is the default estimation
method.
**XPX**-
displays
**X**'**S**^{-1}**X**, the crossproducts matrix for the normal equations.

proc catmod; model R=(1 0, 1 1, 1 2, 1 3); run;

These statements are appropriate for the case of one population and for R with five levels (generating four response functions), so that 4 ×1 = 4. These statements are also appropriate for a situation with two populations and two response functions per population; giving 2 ×2 = 4 rows of the design matrix. (To induce more than one population, the POPULATION statement is needed.)

When you input the design matrix directly, you also have the option of specifying that any subsets of the parameters be tested for equality to zero. Indicate each subset by specifying the appropriate column numbers of the design matrix, followed by an equal sign and a label (24 characters or less, in quotes) that describes the subset. Adjacent subsets are separated by a comma, and the entire specification is enclosed in parentheses and placed after the design matrix. For example,

proc catmod; population Group Time; model R=(1 1 0 0, 1 1 0 1, 1 1 0 2, 1 0 1 0, 1 0 1 1, 1 0 1 2, 1 -1 -1 0, 1 -1 -1 1, 1 -1 -1 2) (1 ='Intercept', 2 3='Group main effect', 4 ='Linear effect of Time'); run;

The preceding statements are appropriate when Group
and Time each have three levels, and R is dichotomous.
The POPULATION statement induces nine populations, and *q*=1
(since R is dichotomous), so *q* ×*s* = 1 ×9 = 9.

If you input the design matrix directly but do not specify any subsets of the parameters to be tested, then PROC CATMOD tests the effect of MODEL | MEAN, which represents the significance of the model beyond what is explained by an overall mean. For the previous example, the MODEL | MEAN effect is the same as that obtained by specifying

(2 3 4='model|mean');

at the end of the MODEL statement.

Chapter Contents |
Previous |
Next |
Top |

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.