The
TYPES statement controls which of the available class variables PROC MEANS
uses to subgroup the data. The unique combinations of these active class variable
values that occur together in any single observation of the input data set
determine the data subgroups. Each subgroup that PROC MEANS generates for
a given type is called a level of that type. Note, for all types
the inactive class variables can still affect the total observation count
of the rejection of observations with missing values.
When you use a WAYS statement, PROC MEANS generates types that correspond
to every possible unique combination of n class
variables chosen from the complete set of class variables. For example
proc means;
class a b c d e;
ways 2 3;
run;
is equivalent to
proc means;
class a b c d e;
types a*b a*c a*d a*e b*c b*d b*e c*d c*e d*e
a*b*c a*b*d a*b*e a*c*d a*c*e a*d*e
b*c*d b*c*e c*d*e;
run;If you omit the TYPES statement and the WAYS statement, PROC MEANS
uses all class variables to subgroup the data (the NWAY type) for displayed
output and computes all types (
) for the output data set.
PROC MEANS
determines the order of each class variable in any
type by examining the order of that class variable in the corresponding one-way
type. You see the effect of this behavior in the options ORDER=DATA or ORDER=FREQ.
When PROC MEANS subdivides the input data set into subsets, the classification
process does not apply the options ORDER=DATA or ORDER=FREQ independently
for each subgroup. Instead, one frequency and data order is established for
all output based on an nonsubdivided view of the entire data set. For example,
consider the following statements:
data pets;
input Pet $ Gender $;
datalines;
dog m
dog f
dog f
dog f
cat m
cat m
cat f
;
proc means data=pets order=freq;
class pet gender;
run;
The statements produce this output.
In the example, PROC MEANS does not list male cats before female
cats. Instead, it determines the order of gender for all types over the entire
data set. PROC MEANS found more observations for female pets (f=4, m=3).
PROC MEANS employs the same
memory allocation scheme across all
host environments. When class variables are involved, PROC MEANS must keep
a copy of each unique value of each class variable in memory. You estimate
the memory requirements to group the class variable by calculating
where
When you use the GROUPINTERNAL option in the CLASS statement,
is simply the unformatted length of
.
Each unique combination of class variables,
, for a given type forms a level in that type (see
TYPES Statement).
You can estimate the maximum potential space requirements for all levels of
a given type, when all combinations actually exist in the data (a complete
type), by calculating
where
Clearly, the memory requirements of the levels overwhelm those
of the class variables. For this reason, PROC MEANS may open one or more
utility files and write the levels of one or more types to disk. These types
are either the primary types that PROC MEANS built during the input data scan
or the derived
types.
If
PROC MEANS must write partially complete primary types to disk while
it processes input data, then one or more merge passes may be required to
combine type levels in memory with those on disk. In addition, if you use
an order other than DATA for any class variable, PROC MEANS groups the completed
type on disk. For this reason, the peak disk space requirements can be more
than twice the memory requirements for a given type.
When PROC MEANS uses a temporary work file, you will receive the following
note in the SAS log:
Processing on disk occurred during summarization.
Peak disk usage was approximately nnn Mbytes.
Adjusting SUMSIZE may improve performance.
In most cases processing
ends normally.
When you specify class variables in a CLASS statement, the amount of
data-dependent memory that PROC MEANS uses before it writes to a utility file
is controlled by the SAS system option and PROC option SUMSIZE=. Like the
system option SORTSIZE=, SUMSIZE= sets the memory threshold where disk-based
operations begin. For best results, set SUMSIZE= to less than the amount of
real memory that is likely to be available for the task. For efficiency reasons,
PROC MEANS may internally round up the value of SUMSIZE=. SUMSIZE= has no
effect unless you specify class variables.
If PROC MEANS reports that there is insufficient memory, increase SUMSIZE=.
A SUMSIZE= value greater than MEMSIZE= will have no effect. Therefore, you
may also need to increase MEMSIZE=. If PROC MEANS reports insufficient disk
space, increase the WORK space allocation. See the SAS documentation for your
operating environment for more information on how to adjust your computation
resource parameters.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.