Chapter Contents
Chapter Contents
The CATMOD Procedure

MODEL Statement

MODEL response-effect=< design-effects >< / options > ;
PROC CATMOD requires a MODEL statement. You can specify the following in a MODEL statement:
can be either a single variable, a crossed effect with two or more variables joined by asterisks, or _F_. The _F_ specification indicates that the response functions and their estimated covariance matrix are to be read directly into the procedure. The response-effect indicates the dependent variables that determine the response categories (the columns of the underlying contingency table).

specify potential sources of variation (such as main effects and interactions) in the model. Thus, these effects determine the number of model parameters, as well as the interpretation of such parameters. In addition, if there is no POPULATION statement, PROC CATMOD uses these variables to determine the populations (the rows of the underlying contingency table). When fitting the model, PROC CATMOD adjusts the independent effects in the model for all other independent effects in the model.

Design-effects can be any of those described in the section "Specification of Effects", or they can be defined by specifying the actual design matrix, enclosed in parentheses (see the "Specifying the Design Matrix Directly" section). In addition, you can use the keyword _RESPONSE_ alone or as part of an effect. Effects cannot be nested within _RESPONSE_, so effects of the form A(_RESPONSE_) are invalid.

For more information, see the "Log-Linear Model Analysis" section and the "Repeated Measures Analysis" section.

Some examples of MODEL statements are
model r=a b;main effects only
model r=a b a*b;main effects with interaction
model r=a b(a);nested effect
model r=a|b;complete factorial
model r=a b(a=1) b(a=2);nested-by-value effects
model r*s=_response_;log-linear model
model r*s=a _response_(a);nested repeated measurement factor
model _f_=_response_;direct input of the response functions

The relationship between these specifications and the structure of the design matrix X is described in the "Generation of the Design Matrix" section.

The following table summarizes the options available in the MODEL statement.

Task Options
Specify details of computation
Generates maximum likelihood estimatesML
Generates weighted least-squares estimatesGLS
Omits intercept term from the modelNOINT
Adds a number to each cell frequencyADDCELL=
Averages main effects across response functionsAVERAGED
Specifies the convergence criterion for maximum likelihoodEPSILON=
Specifies the number of iterations for maximum likelihoodMAXITER=
Request additional computation and tables
Estimated correlation matrix of estimatesCORRB
Covariance matrix of response functionsCOV
Estimated covariance matrix of estimatesCOVB
Two-way frequency tablesFREQ
One-way frequency tablesONEWAY
Predicted valuesPRED=
Probability estimatesPROB
Crossproducts matrixXPX
Suppress output
Design matrixNODESIGN
Iterations for maximum likelihoodNOITER
Parameter estimatesNOPARM
Population and response profilesNOPROFILE

The following list describes these options in alphabetical order.

adds number to the frequency count in each cell, where number is any positive number. This option has no effect on maximum likelihood analysis; it is used only for weighted least-squares analysis.

specifies that dependent variable effects can be modeled and that independent variable main effects are averaged across the response functions in a population. For further information on the effect of using (or not using) the AVERAGED option, see the "Generation of the Design Matrix" section. Direct input of the design matrix or specification of the _RESPONSE_ keyword in the MODEL statement automatically induces an AVERAGED model type.

displays the estimated correlation matrix of the parameter estimates.

displays Si, which is the covariance matrix of the response functions for each population.

displays the estimated covariance matrix of the parameter estimates.

specifies the convergence criterion for the maximum likelihood estimation of the parameters. The iterative estimation process stops when the proportional change in the log likelihood is less than number, or after the number of iterations specified by the MAXITER= option, whichever comes first. By default, EPSILON=1E-8.

produces the two-way frequency table for the cross-classification of populations by responses.

specifies the maximum number of iterations used for the maximum likelihood estimation of the parameters. By default, MAXITER=20.

computes maximum likelihood estimates. This option is available when generalized logits are used, or for the special case of a single two-level dependent variable where cumulative logits or adjacent category logits are used. For generalized logits (the default response functions), ML is the default estimation method.

suppresses the display of the design matrix X.

suppresses the intercept term in the model.

suppresses the display of parameter estimates and other information at each iteration of a maximum likelihood analysis.

suppresses the display of the estimated parameters and the statistics for testing that each parameter is zero.

suppresses the display of the variable levels in tables requested with the PRED= option.

suppresses the normal display of results. The NOPRINT option is useful when you only want to create output data sets with the OUT= or OUTEST= option in the RESPONSE statement. A NOPRINT option is also available in the PROC CATMOD statement. Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 15, "Using the Output Delivery System," for more information.

suppresses the display of the population profiles and the response profiles.

suppresses the display of the _RESPONSE_ matrix for log-linear models. For further information, see the "Log-Linear Model Design Matrices" section.

produces a one-way table of frequencies for each variable used in the analysis. This table is useful in determining the order of the observed levels for each variable.

displays the observed and predicted values of the response functions for each population, together with their standard errors and the residuals (observed - predicted). In addition, if the response functions are the standard ones (generalized logits), then the PRED=FREQ option specifies the computation and display of predicted cell frequencies, while PRED=PROB (or just PREDICT) specifies the computation and display of predicted cell probabilities.

The OUT= data set always contains the predicted probabilities. If the response functions are the generalized logits, the predicted cell probabilities are output unless the option PRED=FREQ is specified, in which case the predicted cell frequencies are output.

produces the two-way table of probability estimates for the cross-classification of populations by responses. These estimates sum to one across the response categories for each population.

displays the title at the top of certain pages of output that correspond to this MODEL statement.

computes weighted least-squares estimates. This type of estimation is also called generalized-least-squares estimation. For response functions other than the default (of generalized logits), WLS is the default estimation method.

displays X'S-1X, the crossproducts matrix for the normal equations.

Specifying the Design Matrix Directly

If you specify the design matrix directly, adjacent rows of the matrix must be separated by a comma, and the matrix must have q ×s rows, where s is the number of populations and q is the number of response functions per population. The first q rows correspond to the response functions for the first population, the second set of q rows corresponds to the functions for the second population, and so forth. The following is an example using direct specification of the design matrix.

   proc catmod;
      model R=(1 0,
               1 1,
               1 2,
               1 3);

These statements are appropriate for the case of one population and for R with five levels (generating four response functions), so that 4 ×1 = 4. These statements are also appropriate for a situation with two populations and two response functions per population; giving 2 ×2 = 4 rows of the design matrix. (To induce more than one population, the POPULATION statement is needed.)

When you input the design matrix directly, you also have the option of specifying that any subsets of the parameters be tested for equality to zero. Indicate each subset by specifying the appropriate column numbers of the design matrix, followed by an equal sign and a label (24 characters or less, in quotes) that describes the subset. Adjacent subsets are separated by a comma, and the entire specification is enclosed in parentheses and placed after the design matrix. For example,

   proc catmod;
      population Group Time;
      model R=(1  1  0  0,
               1  1  0  1,
               1  1  0  2,
               1  0  1  0,
               1  0  1  1,
               1  0  1  2,
               1 -1 -1  0,
               1 -1 -1  1,
               1 -1 -1  2) (1  ='Intercept',
                            2 3='Group main effect',
                            4  ='Linear effect of Time');

The preceding statements are appropriate when Group and Time each have three levels, and R is dichotomous. The POPULATION statement induces nine populations, and q=1 (since R is dichotomous), so q ×s = 1 ×9 = 9.

If you input the design matrix directly but do not specify any subsets of the parameters to be tested, then PROC CATMOD tests the effect of MODEL | MEAN, which represents the significance of the model beyond what is explained by an overall mean. For the previous example, the MODEL | MEAN effect is the same as that obtained by specifying

   (2 3 4='model|mean');

at the end of the MODEL statement.

Chapter Contents
Chapter Contents

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.