For large problems, most of the memory resources
are required for holding the X'X
matrix of the sums and cross products.
The section "Parameterization of PROC GLM Models" describes how columns of the
X matrix are allocated for various types of effects.
For each level that occurs in the data for a
combination of class variables in a given effect,
a row and column for X'X is needed.
The following example illustrates the calculation.
Suppose A has 20 levels, B has 4 levels, and C has 3 levels.
Then consider the model
class A B C;
model Y1 Y2 Y3=A B A*B C A*C B*C A*B*C X1 X2;
The X'X matrix (bordered by X'Y
and Y'Y) can have as many as 425 rows and columns:
- for the intercept term
- for A
- for B
- for A*B
- for C
- for A*C
- for B*C
- for A*B*C
- for X1 and X2 (continuous variables)
- for Y1, Y2, and Y3 (dependent variables)
The matrix has 425 rows and columns only if all combinations
of levels occur for each effect in the model.
For m rows and columns, 8m2
bytes are needed for cross products.
In this case, 8·4252 = 1,445,000 bytes, or about
1,445,000 / 1024 = 1411K.
The required memory grows as the square
of the number of columns of X; most of the memory is for the A*B*C interaction.
Without A*B*C, you have 185 columns
and need 268K for X'X.
Without either A*B*C or A*B, you need 86K.
If A is recoded to have ten levels, then the full
model has only 220 columns and requires 378K.
The second time that a large amount of memory
is needed is when Type III, Type IV, or
contrast sums of squares are being calculated.
This memory requirement is a function of the number
of degrees of freedom of the model being analyzed and
the maximum degrees of freedom for any single source.
Let Rank equal the sum of the model degrees of freedom, MaxDF be
the maximum number of degrees of freedom for any single source,
and Ny be the number of dependent variables in the model.
Then the memory requirement in bytes is
Unfortunately, these quantities are not available when the
X'X matrix is being constructed, so PROC GLM may
occasionally request additional memory even after you have
increased the memory allocation available to the program.
If you have a large model that exceeds the memory
capacity of your computer, these are your options:
- Eliminate terms, especially high-level interactions.
- Reduce the number of levels for
variables with many levels.
- Use the ABSORB statement for parts
of the model that are large.
- Use the REPEATED statement for repeated measures variables.
- Use PROC ANOVA or PROC REG rather
than PROC GLM, if your design allows.
For large problems, two operations consume a lot of
CPU time: the collection of sums and cross products
and the solution of the normal equations.
The time required for collecting sums and
cross products is difficult to calculate because
it is a complicated function of the model.
For a model with m columns and n rows (observations) in
X, the worst case occurs if all columns are continuous
variables, involving nm2/2 multiplications and additions.
If the columns are levels of a classification,
then only m sums may be needed, but a significant
amount of time may be spent in look-up operations.
Solving the normal equations requires time for
approximately m3/2 multiplications and additions.
Suppose you know that Type IV sums of squares are
appropriate for the model you are analyzing
(for example, if your design has no missing cells).
You can specify the SS4 option in your MODEL
statement, which saves CPU time by requesting
the Type IV sums of squares instead of the more
computationally burdensome Type III sums of squares.
This proves especially useful if you have a factor in your model
that has many levels and is involved in several interactions.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.