## Multinomial Models

This type of model applies to cases where an observation can fall into
one of *k* categories. Binary data occurs in the special case where *k*=2.
If there are *m*_{i} observations in a subpopulation *i*, then the
probability distribution of the
number falling into the *k* categories
**y**_{i} = (*y*_{i1}, *y*_{i2}, ... *y*_{ik}) can be modeled by the
multinomial distribution, defined in the "Response Probability Distributions" section, with
. The multinomial model is an *ordinal*
model if the categories have a natural order.
The GENMOD procedure orders the response categories for ordinal
multinomial models from lowest to highest by default. This is different
from the binomial distribution, where the response probability for
the highest of the two categories is modeled.
You can change the way GENMOD orders the response levels with the
RORDER= option in the PROC GENMOD statement. The order that GENMOD
uses is shown in the "Response Profiles" output table described in
the section "Response Profile".

The GENMOD procedure supports only the ordinal multinomial
model. If (*p*_{i1}, *p*_{i2}, ... *p*_{ik}) are the category probabilities,
the cumulative category probabilities are modeled with the same
link functions used for binomial data. Let ,*r* = 1, 2, ... , *k*-1 be the cumulative category probabilities
(note that *P*_{ik} = 1).
The ordinal model is

where are intercept terms
that depend only on the categories and **x**_{i} is a
vector of covariates that does not include an intercept term.
The logit, probit, and complementary log-log link functions *g*
are available. These are obtained by specifying the MODEL
statement options DIST=MULTINOMIAL and LINK=CUMLOGIT (cumulative logit),
LINK=CUMPROBIT (cumulative probit), or LINK=CUMCLL (cumulative
complementary log-log).
Alternatively,

where F = *g*^{-1} is a cumulative distribution
function for the logistic, normal, or extreme value distribution.
PROC GENMOD estimates the intercept parameters
and regression parameters by
maximum likelihood.

The subpopulations *i* are defined by constant values of
the AGGREGATE= variable. This has no effect on the parameter
estimates, but it does affect the deviance and Pearson chi-square
statistics; it also affects parameter estimate standard errors if you specify the
SCALE=DEVIANCE or SCALE=PEARSON options.

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.