The UNIVARIATE Procedure

# Example 7: Creating Schematic Plots and an Output Data Set with BY Groups

Procedure features:
PROC UNIVARIATE statement options:
 NEXTROBS= PLOT PLOTSIZE=
BY statement
OUTPUT statement
Other features:
 FORMAT statement FORMAT procedure PRINT procedure SORT procedure
Data set: STATEPOP

This example

• creates a data set with observations that are separated by census year

• sorts the data set by geographic region and census year

• calculates univariate statistics and produces a stem-and-leaf plot, box plot, and normal probability plot for each BY group

• creates schematic plots to compare the BY groups

• creates an output data set with descriptive statistics and percentiles

• prints the output data set.

`options nodate pageno=1 linesize=120 pagesize=80;`
 ```proc format; value Regnfmt 1='Northeast' D 2='South' 3='Midwest' 4='West'; run;```
 ```data metropop; set statepop; keep Region Decade Populationcount; label PopulationCount='US Census Population (millions)' Decade='Census year'; decade=1980; populationcount=sum(citypop_80,noncitypop_80); output; decade=1990; populationcount=sum(citypop_90,noncitypop_90); output; ```
 ```proc sort data=metropop; by region decade; run;```
 ```proc univariate data=metropop nextrobs=0 plots plotsize=20 ;```
 ` var populationcount;`
 ` by region decade;`
 ``` output out=censtat sum=PopulationTotal mean=PopulationMean std=PopulationStdDeviation pctlpts=50 to 100 by 25 pctlpre=Pop_ ;```
 ``` format region regnfmt.; title 'United States Census of Population and Housing'; run;```
 ```proc print data=censtat; title1 'Statistics for Census Data by Decade and Region'; title2 'Output Dataset From PROC UNIVARIATE'; run;```

 The BY statement requests separate reports for each BY group. The first report contains univariate statistics for the 1980 Census, Northeast region. Using both the BY statement and a PLOTS option in the PROC statement produces the schematic plots on the last page of the output. Select the Side-by-side Box Plots from the Table of Contents to examine the graph. You can see and compare the data distribution for each region-year combination.

 The CENSTAT data set includes the BY variables Region and Decade and contains eight observation, one for each BY group.