Chapter Contents 
Previous 
Next 
The UNIVARIATE Procedure 
Rounding 
When ROUND=1 and the analysis variable values are between 2.5 and 2.5, the intervals are as follows:
i  Interval  Midpoint  Left endpt rounds to  Right endpt rounds to 

2  [2.5,1.5]  2  2  2 
1  [1.5,0.5]  1  2  0 
0  [0.5,0.5]  0  0  0 
1  [0.5,1.5]  1  0  2 
2  [1.5,2.5]  2  2  2 
i  Interval  Midpoint  Left endpt rounds to  Right endpt rounds to 

2  [1.25,0.75]  1.0  1  1 
1  [0.75,0.25]  0.5  1  0 
0  [0.25,0.25]  0.0  0  0 
1  [0.25,0.75]  0.5  0  1 
2  [0.75,1.25]  1.0  1  1 
As the rounding unit increases, the interval width also increases. This reduces the number of unique values and decreases the amount of memory that PROC UNIVARIATE needs.
Generating Line Printer Plots 
To change the number of stems that the plot displays, use PLOTSIZE= to increase or decrease the number of rows. Instructions that appear below the plot explain how to determine the values of the variable. If no instructions appear, you multiply Stem.Leaf by 1 to determine the values of the variable. For example, if the stem value is 10 and the leaf value is 1, then the variable value is approximately 10.1.
For the stemandleaf plot, the procedure rounds a variable value to
the nearest leaf. If the variable value is exactly halfway between two leaves,
the value rounds to the nearest leaf with an even integer value. For example,
a variable value of 3.15 has a stem value of 3 and a leaf value of 2.
To generate box plot using highresolution graphics, use the BOXPLOT
procedure in SAS/STAT software.
and where
is .  
^{1}  is the inverse of the standard normal distribution function. 

is the rank of the data value when ordered from smallest to largest. 

is the number of nonmissing data values. 
where is weight that is associated with for the ordered observation and is the sum of the individual weights.
When each observation has an identical weight, , the formula for reduces to the expression for in the unweighted normal probability plot
When the value of VARDEF= is WDF or WEIGHT, PROC UNIVARIATE draws a reference line with intercept and slope and when the value of VARDEF= is DF or N, the slope is where is the average weight.
When each observation has an identical weight and the value of VARDEF= is DF, N, or WEIGHT, the reference line reduces to the usual reference line with intercept and slope in the unweighted normal probability plot.
If the data are normally distributed with mean
, standard deviation
, and each observation has an identical weight
, then, as in the unweighted normal probability plot, the
points on the plot should lie approximately on a straight line. The intercept
is
and slope is
when VARDEF= is WDF or WEIGHT, and the slope is
when VARDEF= is DF or N.
For more information on how to interpret these plots see SAS System for Elementary Statistical Analysis and SAS System for Statistical Graphics.
Generating HighResolution Graphics 
The HISTOGRAM statement generates histograms and comparative histograms that allow you to examine the data distribution. You can optionally fit families of density curves and superimpose kernel density estimates on the histograms. For additional information about the fitted distributions and kernel density estimates, see Formulas for Fitted Continuous Distributions .
The PROBPLOT statement generates a probability plot, which compares
ordered values of a variable with percentiles of a specified theoretical distribution.
The QQPLOT statement generates a quantilequantile plot, which compares ordered
values of a variable with quantiles of a specified theoretical distribution.
Thus, you can use these plots to determine how well a theoretical distribution
models a set of measures.
Construction of a QQ Plot
First, the nonmissing values of the variable are ordered from smallest to largest: . Then, the ordered value is represented on the plot by a point whose coordinate is and whose coordinate is , where is the theoretical distribution with a zero location parameter and a unit scale parameter. For additional information about the theoretical distributions that you can request, see Theoretical Distributions for QuantileQuantile and Probability Plots .
You can modify the adjustment constants 0.375 and 0.25 with
the RANKADJ=
and NADJ= options. The default combination is recommended by Blom (1958).
For additional information, see Chambers et al. (1983). Since
is a quantile of the empirical cumulative distribution
function (ecdf), a QQ plot compares quantiles of the ecdf with quantiles
of a theoretical distribution. Probability plots are constructed the same
way, except that the
axis is scaled nonlinearly in percentiles.
QQ plots are more convenient than probability plots for graphical estimation of the location and scale parameters because the axis of a QQ plot is scaled linearly. On the other hand, probability plots are more convenient for estimating percentiles or probabilities. There are many reasons why the point pattern in a QQ plot may not be linear. Chambers et al. (1983) and Fowlkes (1987) discuss the interpretations of commonly encountered departures from linearity, and these are summarized in the following table.
Description of Point Pattern  Possible Interpretation 

All but a few points fall on a line  Outliers in the data 
Left end of pattern is below the line; right end of pattern is above the line  Long tails at both ends of the data distribution 
Left end of pattern is above the line; right end of pattern is below the line  Short tails at both ends of the distribution 
Curved pattern with slope increasing from left to right  Data distribution is skewed to the right 
Curved pattern with slope decreasing from left to right  Data distribution is skewed to the left 
Staircase pattern (plateaus and gaps)  Data have been rounded or are discrete 
In some applications, a nonlinear pattern may be more revealing than a linear pattern. However as noted by Chambers et al. (1983), departures from linearity can also be due to chance variation.
Determining Computer Resources 
The only factor that limits the number of variables that you can analyze is the computer resources that are available. The amount of temporary storage and CPU time that PROC UNIVARIATE requires depends on the statements and the options that you specify. To calculate the computer resources the procedure needs, let

be the number of observations in the data set 

be the number of variables in the VAR statement 

be the number of unique values for the ith variable. 
If bytes are not available, PROC UNIVARIATE must process the data multiple times to compute all the statistics. This reduces the minimum memory requirement to
ROUND= reduces the number of unique values ( ), thereby reducing memory requirements. ROBUSTSCALE requires bytes of temporary storage.
Several factors affect the CPU time requirement:
Each of these factors has a different constant of proportionality. For additional information on how to optimize CPU performance and memory usage, see the SAS documentation for your operating environment.
Chapter Contents 
Previous 
Next 
Top of Page 
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.