Chapter Contents

Previous

Next
The UNIVARIATE Procedure

PROC UNIVARIATE Statement


PROC UNIVARIATE <option(s)>;


Options

ALL
requests all statistics and tables that the FREQ, MODES, NEXTRVAL=5, PLOT, and CIBASIC options generate. If the analysis variables are not weighted, this option also requests the statistics and tables that the CIPCTLDF, CIPCTLNORMAL, LOCCOUNT, NORMAL, ROBUSTCALE, TRIMMED=.25, and WINSORIZED=.25 options generate. PROC UNIVARIATE also uses any values that you specify for ALPHA=, MU0=, NEXTRVAL=, CIBASIC, CIPCTLDF, CIPCTLNORMAL, TRIMMED=, or WINSORIZED= to produce the output.

ALPHA=value
specifies the default confidence level to compute confidence limits. The percentage for the confidence limits is (1-value) × 100. For example, ALPHA=.05 results in a 95 percent confidence limit.
Default: .05
Range: between 0 and 1
Main discussion: Confidence Limits for Parameters
Featured in: Performing a Sign Test Using Paired Data and Examining the Data Distribution and Saving Percentiles

CIBASIC<(<TYPE=keyword> <ALPHA=value>)>
requests confidence limits for the mean, standard deviation, and variance based on the assumption that the data are normally distributed. For large sample sizes, this assumption is not required for the mean because of the Central Limit Theorem.

TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED.
Default: TWOSIDED

ALPHA=value
specifies the confidence level to compute the confidence limit. The percentage for the confidence limits is (1-value) × 100. For example, ALPHA=.05 results in a 95 percent confidence limit.
Default: The value of ALPHA= in the PROC statement
Range: between 0 and 1

Requirement: You must use the default value of VARDEF=, which is DF.
Main discussion: Confidence Limits for Parameters
Featured in: Performing a Sign Test Using Paired Data and Examining the Data Distribution and Saving Percentiles

CIPCTLDF<(<TYPE=keyword> <ALPHA=value> )>
requests confidence limits for quantiles by using a method that is distribution-free. In other words, no specific parametric distribution such as the normal is assumed for the data. PROC UNIVARIATE uses order statistics (ranks) to compute the confidence limits as described by Hahn and Meeker (1991).

TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, SYMMETRIC, or ASYMMETIC.
Default: SYMMETRIC

ALPHA=value
specifies the confidence level to compute the confidence limit. The percentage for the confidence limits is (1-value) × 100. For example, ALPHA=.05 results in a 95 percent confidence limit.
Default: The value of ALPHA= in the PROC statement
Range: between 0 and 1

Alias: CIQUANTDF
Restriction: This option is not available if you specify a WEIGHT statement.
Main discussion: Confidence Limits for Quantiles
Featured in: Performing a Sign Test Using Paired Data

CIPCTLNORMAL
<(<TYPE=keyword> <ALPHA=value>)>
requests confidence limits for quantiles based on the assumption that the data are normally distributed.

TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED.
Default: TWOSIDED

ALPHA=value
specifies the confidence level to compute the confidence limit. The percentage for the confidence limits is (1-value) × 100. For example, ALPHA=.05 results in a 95 percent confidence limit.
Default: The value of ALPHA= in the PROC statement
Range: between 0 and 1

Alias: CIQUANTNORMAL
Requirement: You must use the default value of VARDEF=, which is DF.
Restriction: This option is not available if you specify a WEIGHT statement.
Main discussion: Confidence Limits for Quantiles
Featured in: Examining the Data Distribution and Saving Percentiles

DATA=SAS-data-set
specifies the input SAS data set.
Main discussion: Input Data Sets

EXCLNPWGT
excludes observations with nonpositive weight values (zero or negative) from the analysis. By default, PROC UNIVARIATE treats observations with negative weights like those with zero weights and counts them in the total number of observations.
Requirement: You must use a WEIGHT statement.
See also: WEIGHT Statement

FREQ
requests a frequency table that consists of the variable values, frequencies, cell percentages, and cumulative percentages.
Interaction: If you specify the WEIGHT statement, PROC UNIVARIATE includes the weighted count in the table and uses this value to compute the percentages.
Featured in: Rounding an Analysis Variable and Identifying Extreme Values

LOCCOUNT
requests a table that shows the number of observations greater than, equal to, and less than the value of MU0=. PROC UNIVARIATE uses these values to construct the sign test and the signed rank test.
Restriction: This option is not available if you specify a WEIGHT statement.
See also: MU0=
Featured in: Performing a Sign Test Using Paired Data

MODES
requests a table of all possible modes. By default, when the data contain multiple modes, PROC UNIVARIATE displays the lowest mode in the table of basic statistical measures. When all the values are unique, PROC UNIVARIATE does not produce a table of modes.
Alias: MODE
Main discussion: Calculating the Mode
Featured in: Performing a Sign Test Using Paired Data

MU0=value(s)
specifies the value of the mean or location parameter ( [IMAGE]) in the null hypothesis for tests of location. If you specify one value, PROC UNIVARIATE tests the same null hypothesis for all analysis variables. If you specify multiple values, a VAR statement is required, and PROC UNIVARIATE tests a different null hypothesis for each analysis variable in the corresponding order.
Alias: LOCATION=
Default: 0
Main discussion: Tests for Location
Example: The following statement tests if the mean of the first variable equals 0 and the mean of the second variable equals 0.5.

proc univariate mu0=0 0.5;
Featured in: Examining the Data Distribution and Saving Percentiles

NEXTROBS=n
specifies the number of extreme observations that PROC UNIVARIATE lists in the table of extreme observations. The table lists the n lowest observations and the n highest observations.
Default: 5
Range: an integer between 0 and the half the maximum number of observations
Tip: Use NEXTROBS=0 to suppress the table of extreme observations.
Featured in: Rounding an Analysis Variable and Identifying Extreme Values and Creating Schematic Plots and an Output Data Set with BY Groups

NEXTRVAL=n
specifies the number of extreme values that PROC UNIVARIATE lists in the table of extreme values. The table lists the n lowest unique values and the n highest unique values.
Default: 0
Range: an integer between 0 and half the maximum number of observations
Featured in: Rounding an Analysis Variable and Identifying Extreme Values

NOBYPLOT
suppresses side-by-side box plots when you use the BY statement and the ALL option or the PLOT option in the PROC statement.

NOPRINT
suppresses all the output.
Tip: Use NOPRINT when you want to create an OUT= output data set only.
Featured in: Creating an Output Data Set with Multiple Analysis Variables

NORMAL
requests tests for normality that include the Shapiro-Wilk test and a series of goodness-of-fit tests based on the empirical distribution function.
Alias: NORMALTEST
Restriction: This option is not available if you specify a WEIGHT statement.
Main discussion: Tests for Normality
Featured in: Examining the Data Distribution and Saving Percentiles

PCTLDEF=value
specifies the definition that PROC UNIVARIATE uses to calculate quantiles.
Alias: DEF=
Default: 5
Range: 1, 2, 3, 4, 5
Restriction: You cannot use PCTLDEF= when you compute weighted quantiles.
Main discussion: Percentile and Related Statistics

PLOTS
produces a stem-and-leaf plot (or a horizontal bar chart), a box plot, and a normal probability plot. If you use a BY statement, side-by-side box plots that are labeled Schematic Plots appear after the univariate analysis for the last BY group.
Alias: PLOT
Main discussion: Plots
Featured in: Examining the Data Distribution and Saving Percentiles and Creating Schematic Plots and an Output Data Set with BY Groups

PLOTSIZE=n
specifies the approximate number of rows that the plots use. If n is larger than the value of the SAS system option PAGESIZE=, PROC UNIVARIATE uses the value of PAGESIZE=. If n is less than eight, PROC UNIVARIATE uses eight rows to draw the plots.
Default: the value of PAGESIZE=
Range: 8 to the value of PAGESIZE=
Featured in: Examining the Data Distribution and Saving Percentiles and Creating Schematic Plots and an Output Data Set with BY Groups

ROBUSTSCALE
produces a table with robust estimates of scale. The statistics include the interquartile range, Gini's mean difference, the median absolute deviation about the median (MAD), and two statistics proposed by Rousseeuw and Croux (1993), [IMAGE], and [IMAGE].
Restriction: This option is not available if you specify a WEIGHT statement.
Main discussion: Robust Measures of Scale
Featured in: Computing Robust Estimators

ROUND=unit(s)
specifies the units to use to round the analysis variables prior to computing statistics. If you specify one unit, PROC UNIVARIATE uses this unit to round all analysis variables. If you specify multiple units, a VAR statement is required, and each unit rounds the values of the corresponding analysis variable. If ROUND=0, no rounding occurs.
Default: 0
Tip: ROUND= reduces the number of unique variable values, thereby reducing the memory requirements.
Range: [ge] 0
Main discussion: Rounding
Example: To make 1 the rounding unit for the first analysis variable and 0.5 the rounding unit for second analysis variable, submit the statement

proc univariate round=1 0.5;
Featured in: Rounding an Analysis Variable and Identifying Extreme Values

TRIMMED=value(s)
<(<TYPE=keyword> <ALPHA=value>)>
requests a table of trimmed means, where value specifies the number or the proportion of observations that PROC UNIVARIATE trims. If value is a proportion p between 0 and .5, the number of observations that PROC UNIVARIATE trims is the smallest integer that is greater than or equal to np, where n is the number of observations.

TYPE=keyword
specifies the type of confidence limit for the mean, where keyword is LOWER, UPPER, or TWOSIDED.
Default: TWOSIDED

ALPHA=value
specifies the confidence level to compute the confidence limit. The percentage for the confidence limits is (1-value) × 100. For example, ALPHA=.05 results in a 95 percent confidence limit.
Default: The value of ALPHA= in the PROC statement
Range: between 0 and 1

Alias: TRIM=
Range: between 0 and half the number of nonmissing observations. When a proportion is specified, value must be less than .5.
Requirement: To compute confidence limits for the mean and the Student's t test, you must use the default value of VARDEF=, which is DF.
Restriction: This option is not available if you specify a WEIGHT statement.
Main discussion Trimmed Means
Featured in: Computing Robust Estimators

VARDEF=divisor
specifies the divisor to use in the calculation of variances and standard deviation. Possible Values for VARDEF= shows the possible values for divisor and associated divisors.

Possible Values for VARDEF=
Value Divisor Formula for Divisor
DF degrees of freedom n - 1
N number of observations n
WDF sum of weights minus one ([Sigma]iwi) - 1
WEIGHT
|WGT
sum of weights [Sigma]iwi

The procedure computes the variance as [IMAGE], where [IMAGE] is the corrected sums of squares and equals [IMAGE]. When you weight the analysis variables, [IMAGE] equals [IMAGE], where [IMAGE] is the weighted mean.
Default: DF
Requirement: To compute the standard error of the mean, confidence limits, and Student's t test, use the default value of VARDEF=.
Tip: When you use the WEIGHT statement and VARDEF=DF, the variance is an estimate of [IMAGE], where the variance of the ith observation is [IMAGE] and [IMAGE] is the weight for the ith observation. This yields an estimate of the variance of an observation with unit weight.
Tip: When you use the WEIGHT statement and VARDEF=WGT, the computed variance is asymptotically (for large n) an estimate of [IMAGE], where [IMAGE] is the average weight. This yields an asymptotic estimate of the variance of an observation with average weight.
See also: Keywords and Formulas and WEIGHT Statement

WINSORIZED=value(s)
<(<TYPE=keyword> <ALPHA=value>)>
requests of a table of Winsorized means, where value is the number or the proportion of observations that PROC UNIVARIATE uses to compute the Winsorized mean. If value is a proportion p between 0 and .5, the number of observations that PROC UNIVARIATE uses is equal to the smallest integer that is greater than or equal to np, where n is the number of observations.

TYPE=keyword
specifies the type of confidence limit for the mean, where keyword is LOWER, UPPER, or TWOSIDED.
Default: TWOSIDED

ALPHA=value
specifies the confidence level to compute the confidence limit. The percentage for the confidence limits is (1-value) × 100. For example, ALPHA=.05 results in a 95 percent confidence limit.
Default: The value of ALPHA= in the PROC statement
Range: between 0 and 1

Alias: WINSOR=
Range: between 0 and half the number of nonmissing observations. When a proportion is specified, value must be less than .5.
Requirement: To compute confidence limits and the Student's t test, you must use the default value of VARDEF=, which is DF.
Restriction: This option is not available if you specify a WEIGHT statement.
Main discussion Winsorized Means
Featured in: Computing Robust Estimators


Chapter Contents

Previous

Next

Top of Page

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.