The TABULATE Procedure

# Example 13: Using Denominator Definitions to Display Basic Frequency Counts and Percentages

Procedure features:
TABLE statement:
 ALL class variable denominator definitions (angle bracket operators) N statistic PCTN statistic
Other features:
 FORMAT procedure

Crosstabulation tables (also called contingency tables and stub-and-banner reports) show combined frequency distributions for two or more variables. This table shows frequency counts for females and males within each of four job classes. The table also shows the percentage that each frequency count represents of

• the total women and men in that job class (row percentage)

• the total for that gender in all job classes (column percentage)

• the total for all employees.

`options nodate pageno=1 linesize=80 pagesize=60;`
 ```data jobclass; input Gender Occupation @@; datalines; 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 3 1 3 1 3 1 3 1 3 1 3 1 3 1 1 1 1 1 1 1 2 1 2 1 2 1 2 1 2 1 2 1 3 1 3 1 4 1 4 1 4 1 4 1 4 1 4 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 3 1 3 1 3 1 3 1 4 1 4 1 4 1 4 1 4 1 1 1 3 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 3 2 3 2 3 2 4 2 4 2 4 2 4 2 4 2 4 2 1 2 3 2 3 2 3 2 3 2 3 2 4 2 4 2 4 2 4 2 4 2 1 2 1 2 1 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 3 2 4 2 4 2 4 2 1 2 1 2 1 2 1 2 1 2 2 2 2 2 2 2 3 2 3 2 3 2 3 2 4 ;```
 ```proc format; value gendfmt 1='Female' 2='Male' other='*** Data Entry Error ***'; value occupfmt 1='Technical' 2='Manager/Supervisor' 3='Clerical' 4='Administrative' other='*** Data Entry Error ***'; run;```
 `proc tabulate data=jobclass format=8.2;`
 ` class gender occupation;`
 ` table (occupation='Job Class' all='All Jobs')`
 ``` *(n='Number of employees'*f=9. pctn='Percent of row total' pctn='Percent of column total' pctn='Percent of total'),```
 ` gender='Gender' all='All Employees'/ rts=50;`
 ``` format gender gendfmt. occupation occupfmt.; title 'Gender Distribution'; title2 'within Job Classes'; run;```

The part of the TABLE statement that defines the rows of the table uses the PCTN statistic to calculate three different percentages.

In all calculations of PCTN, the numerator is N, the frequency count for one cell of the table. The denominator for each occurrence of PCTN is determined by the denominator definition. The denominator definition appears in angle brackets after the keyword PCTN. It is a list of one or more expressions. The list tells PROC TABULATE which frequency counts to sum for the denominator.

### Analyzing the Structure of the Table

Taking a close look at the structure of the table helps you understand how PROC TABULATE uses the denominator definitions. The following simplified version of the TABLE statement clarifies the basic structure of the table:
```table occupation='Job Class' all='All Jobs',
gender='Gender' all='All Employees';```

The table is a concatenation of four subtables. In this report, each subtable is a crossing of one class variable in the row dimension and one class variable in the column dimension. Each crossing establishes one or more categories. A category is a combination of unique values of class variables, such as `female, technical` or `all, clerical`. Contents of Subtables describes each subtable.

Contents of Subtables
Class variables contributing to the subtable Description of frequency counts Number of categories
Occupation and Gender number of females in each job or number of males in each job 8
All and Gender number of females or number of males 2
Occupation and All number of people in each job 4
All and All number of people in all jobs 1

Illustration of the Four Subtables highlights these subtables and the frequency counts for each category.

### Interpreting Denominator Definitions

The following fragment of the TABLE statement defines the denominator definitions for this report. The PCTN keyword and the denominator definitions are underlined.
``` table (occupation='Job Class' all='All Jobs')
*(n='Number of employees'*f=5.
pctn<gender all>='Row percent'
pctn<occupation all>='Column percent'
pctn='Percent of total'),```
Each use of PCTN nests a row of statistics within each value of Occupation and All. Each denominator definition tells PROC TABULATE which frequency counts to sum for the denominators in that row. This section explains how PROC TABULATE interprets these denominator definitions.

### Row Percentages

The part of the TABLE statement that calculates the row percentages and that labels the row is
`   pctn<gender all>='Row percent'`

Consider how PROC TABULATE interprets this denominator definition for each subtable.

Subtable 1: Occupation and Gender

PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Gender within the same value of Occupation.

For example, the denominator for the category `female, technical` is the sum of all frequency counts for all categories in this subtable for which the value of Occupation is `technical`. There are two such categories: `female, technical` and `male, technical`. The corresponding frequency counts are 16 and 18. Therefore, the denominator for this category is 16+18, or 34.

Subtable 2: All and Gender

PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Gender in the subtable.

For example, the denominator for the category ```all, female``` is the sum of the frequency counts for `all, female` and `all, male`. The corresponding frequency counts are 61 and 62. Therefore, the denominator for cells in this subtable is 61+62, or 123.

Subtable 3: Occupation and All

PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. The variable All does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.

For example, the denominator for the category ```clerical, all``` is the frequency count for that category, 28.

Note:   In these table cells, because the numerator and the denominator are the same, the row percentages in this subtable are all 100.

Subtable 4: All and All

PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. The variable All does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.

There is only one category in this subtable: `all, all`. The denominator for this category is 123.

Note:   In this table cell, because the numerator and denominator are the same, the row percentage in this subtable is 100.

### Column Percentages

The part of the TABLE statement that calculates the column percentages and labels the row is
`   pctn<occupation all>='Column percent'`

Consider how PROC TABULATE interprets this denominator definition for each subtable.

Subtable 1: Occupation and Gender

PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Occupation within the same value of Gender.

For example, the denominator for the category `manager/supervisor, male` is the sum of all frequency counts for all categories in this subtable for which the value of Gender is `male`. There are four such categories: `technical, male`; `manager/supervisor, male`; `clerical, male`; and `administrative, male`. The corresponding frequency counts are 18, 15, 14, and 15. Therefore, the denominator for this category is 18+15+14+15, or 62.

Subtable 2: All and Gender

PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. Because the variable All does contribute to this subtable, PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count for All as the denominator.

For example, the denominator for the category ```all, female``` is the frequency count for that category, 61.

Note:   In these table cells, because the numerator and denominator are the same, the column percentages in this subtable are all 100.

Subtable 3: Occupation and All

PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Occupation in the subtable.

For example, the denominator for the category ```technical, all``` is the sum of the frequency counts for `technical, all`; ```manager/supervisor, all```; `clerical, all`; and `administrative, all`. The corresponding frequency counts are 34, 35, 28, and 26. Therefore, the denominator for this category is 34+35+28+26, or 123.

Subtable 4: All and All

PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. Because the variable All does contribute to this subtable, PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.

There is only one category in this subtable: `all, all`. The frequency count for this category is 123.

Note:   In this calculation, because the numerator and denominator are the same, the column percentage in this subtable is 100.

### Total Percentages

The part of the TABLE statement that calculates the total percentages and labels the row is
`   pctn='Total percent'`

If you do not specify a denominator definition, PROC TABULATE obtains the denominator for a cell by totaling all the frequency counts in the subtable. Denominators for Total Percentages summarizes the process for all subtables in this example.

Denominators for Total Percentages
Class variables contributing to the subtable Frequency counts Total
Occupat and Gender 16, 18, 20, 15 14, 14, 11, 15 123
Occupat and All 34, 35, 28, 26 123
Gender and All 61, 62 123
All and All 123 123

Consequently, the denominator for total percentages is always 123.