Chapter Contents |
Previous |
Next |

The MODEL Procedure |

Variable names are alphanumeric but must start with a letter. The length of a variable name is limited to thirty-two characters for non-SAS data set variables

PROC MODEL uses several classes of variables, and different
variable classes are treated differently. Variable class is controlled
by *declaration statements*.
These are the VAR, ENDOGENOUS, and EXOGENOUS statements for model variables,
the PARAMETERS statement for parameters, and the CONTROL statement for
control class variables.
These declaration statements have several valid abbreviations.
Various *internal variables* are also made
available to the model program to allow
communication between the model program and the procedure.
RANGE, ID, and BY variables are also available to the model program.
Those variables not declared as any of the preceding classes are
*program variables*.

Some classes of variables can be lagged; that is, their value at each
observation is remembered, and previous values can be referred to by the
lagging functions. Other classes have only a single value and
are not affected by lagging functions.
For example, parameters have only one value and
are not affected by lagging functions;
therefore, if P is a parameter, DIF*n*(P) is always 0,
and LAG*n*(P) is always the same as P for all values of *n*.

The different variable classes and their roles in the model are described in the following.

PROC MODEL allows you to use expressions on the left-hand side of the equal sign to define model equations. For example, a log linear model for Y can now be written as

log( y ) = a + b * x;

Previously, only a variable name was allowed on the left-hand side of the equal sign.

The text on the left hand side of the equation serves as the equation name used to identify the equation in printed output, in the OUT= data sets, and in FIT or SOLVE statements. To refer to equations specified using left-hand side expressions (on the FIT statement, for example), place the left-hand side expression in quotes. For example, the following statements fit a log linear model to the dependent variable Y:

proc model data=in; log( y ) = a + b * x; fit "log(y)"; run;

The estimation and simulation is performed by transforming the
models into general form equations. No actual or predicted value
is available for general form equations so no *R ^{2}* or
adjusted

Equation variable names can appear on parts of the PROC MODEL printed output, and they can be used in the model program. For example, RESID-prefixed variables can be used in LAG functions to define equations with moving-average error terms. See the "Autoregressive Moving-Average Error Processes" section earlier in this chapter for details.

The meaning of these prefixes is detailed in the "Equation Translations" section.

The PARAMETERS statement declares the parameters of the model. Parameters are not lagged, and they cannot be changed by the model program.

Control variables are not reinitialized before each pass through the data and can thus be used to retain values between passes. You can use control variables to vary the program logic. Control variables are not affected by lagging functions.

For example, if you have two versions of an equation for a variable Y, you could put both versions in the model and, using a CONTROL statement to select one of them, produce two different solutions to explore the effect the choice of equation has on the model:

select (case); when (1) y = ...first version of equation... ; when (2) y = ...second version of equation... ; end; control case 1; solve / out=case1; run; control case 2; solve / out=case2; run;

Prior to version 6.11, the BY processing in the SOLVE statement was performed only for the DATA= data set. The last values in the ESTDATA= and SDATA= data sets were used regardless of the existence of BY variables in those two data sets. This constraint is now removed. If the BY variables are identical in the DATA= data set and the ESTDATA= data set, then the two data sets are syncronized and the simulations are performed using the data and parameters for each BY group. This holds for BY variables in the SDATA= data set as well. If, at some point, the BY variables don't match, BY processing is abandoned in either the ESTDATA= data set or the SDATA= data set, whichever has the missing BY value. If the DATA= data set does not contain BY variables and the ESTDATA= data set or the SDATA= data set does, then BY processing is performed for the ESTDATA= data set and the SDATA= data set by reusing the data in the DATA= data set for each BY group.

if _obs_ > 20 then if _iter_ > 10 then _list_ = 1;

Internal variables are not affected by lagging functions, and they cannot be changed by the model program except as noted. The following internal variables are available. The variables are all numeric except where noted.

- _ERRORS_
- a flag that is set to 0 at the start of program execution
and is set to a nonzero value whenever an error occurs.
The program can also set the _ERRORS_ variable.
- _ITER_
- the iteration number.
For FIT tasks, the value of _ITER_ is negative for
preliminary grid-search passes. The iterative phase of the estimation
starts with iteration 0. After the estimates have converged, a final
pass is made to collect statistics with _ITER_ set to a missing value. Note
that at least one pass, and perhaps several subiteration passes as
well, is made for each iteration. For SOLVE tasks,
_ITER_ counts the
iterations used to compute the simultaneous solution of the system.
- _LAG_
- the number of dynamic lags that contribute to the solution at the
current observation. _LAG_ is always 0 for FIT tasks and for STATIC
solutions. _LAG_ is set to a missing value during the lag
starting phase.
- _LIST_
- list flag that is set to 0 at the start of program execution.
The program can set _LIST_ to a nonzero value to request a listing
of the values of all the variables in the program after the program has
finished executing.
- _METHOD_
- is the solution method in use for SOLVE tasks.
_METHOD_ is set to a blank value for FIT tasks. _METHOD_ is a
character-valued variable. Values are NEWTON, JACOBI, SIEDEL, or ONEPASS.
- _MODE_
- takes the value ESTIMATE for FIT tasks and the value
SIMULATE or FORECAST for SOLVE tasks. _MODE_ is a character-valued variable.
- _NMISS_
- the number of missing or otherwise unusable observations during the model
estimation.
For FIT tasks,
_NMISS_ is initially set to 0; at the start of each iteration,
_NMISS_ is set to the number of unusable observations for the
previous iteration. For SOLVE tasks,
_NMISS_ is set to a missing value.
- _NUSED_
- the number of nonmissing observations used in the estimation.
For FIT tasks, PROC MODEL initially
sets _NUSED_ to the number of parameters; at the start
of each iteration,
_NUSED_ is reset to the number of observations
used in the previous iteration. For SOLVE tasks,
_NUSED_ is set to a missing
value.
- _OBS_
- counts the observations being processed.
_OBS_ is negative or 0 for
observations in the lag starting phase.
- _REP_
- the replication number for Monte Carlo simulation when the
RANDOM= option is specified in the SOLVE statement.
_REP_ is 0
when the RANDOM= option is not used and for FIT tasks. When _REP_=0, the
random-number generator functions always return 0.
- _WEIGHT_
- the weight of the observation. For FIT tasks, _WEIGHT_ provides a weight for the observation in the estimation. _WEIGHT_ is initialized to 1.0 at the start of execution for FIT tasks. For SOLVE tasks, _WEIGHT_ is ignored.

- character variables in a DATA= SAS data set
- program variables assigned a character value
- declared to be character by a LENGTH or ATTRIB statement.

Chapter Contents |
Previous |
Next |
Top |

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.