Chapter Contents Previous Next
 The FREQ Procedure

## Frequency Tables and Statistics

The FREQ procedure provides easy access to statistics for testing for association in a crosstabulation table.

In this example, high school students applied for courses in a summer enrichment program: these courses included journalism, art history, statistics, graphic arts, and computer programming. The students accepted were randomly assigned to classes with and without internships in local companies. The following table contains counts of the students who enrolled in the summer program by gender and whether they were assigned an internship slot.

Table 28.1: Summer Enrichment Data
 Enrollment Gender Internship Yes No Total boys yes 35 29 64 boys no 14 27 41 girls yes 32 10 32 girls no 53 23 76

The SAS data set SummerSchool is created by inputting the summer enrichment data as cell count data, or providing the frequency count for each combination of variable values. The following DATA step statements create the SAS data set SummerSchool.

   data SummerSchool;
input Gender $Internship$ Enrollment \$ Count @@;
datalines;
boys  yes yes 35   boys  yes no 29
boys   no yes 14   boys   no no 27
girls yes yes 32   girls yes no 10
girls  no yes 53   girls  no no 23
;

The variable Gender takes the values boys' or girls', the variable Internship takes the values yes' and no', and the variable Enrollment takes the values yes' and no'. The variable Count contains the number of students corresponding to each combination of data values. The double at sign (@@) indicates that more than one observation is included on a single data line. In this DATA step, two observations are included on each line.

Researchers are interested in whether there is an association between internship status and summer program enrollment. The Pearson chi-square statistic is an appropriate statistic to assess the association in the corresponding 2×2 table. The following PROC FREQ statements specify this analysis.

You specify the table for which you want to compute statistics with the TABLES statement. You specify the statistics you want to compute with options after a slash (/) in the TABLES statement.


proc freq data=SummerSchool order=data;
weight count;
tables Internship*Enrollment / chisq;
run;


The ORDER= option controls the order in which variable values are displayed in the rows and columns of the table. By default, the values are arranged according to the alphanumeric order of their unformatted values. If you specify ORDER=DATA, the data are displayed in the same order as they occur in the input data set. Here, since yes' appears before no' in the data, yes' appears first in any table. Other options for controlling order include ORDER=FORMATTED, which orders according to the formatted values, and ORDER=FREQUENCY, which orders by descending frequency count.

In the TABLES statement, Internship*Enrollment specifies a table where the rows are internship status and the columns are program enrollment. Since the input data are in cell count form, the WEIGHT statement is required. The WEIGHT statement names the variable Count, which provides the frequency of each combination of data values. Finally, the CHISQ option requests chi-square statistics for assessing association.

Figure 28.1 presents the crosstabulation of Internship and Enrollment. In each cell, the values printed under the cell count are the table percentage, row percentage, and column percentage, respectively. For example, in the first cell, 63.21 percent of those offered courses with internships accepted them and 36.79 percent did not.

 The SAS System
 The FREQ Procedure
 Frequency Percent Row Pct Col Pct
 Table of Internship by Enrollment Internship Enrollment Total yes no yes 67 30.04 63.21 50.00 39 17.49 36.79 43.82 106 47.53 no 67 30.04 57.26 50.00 50 22.42 42.74 56.18 117 52.47 Total 134 60.09 89 39.91 223 100.00

Figure 28.1: Crosstabulation Table

The next tables display the statistics produced by the CHISQ option. The Pearson chi-square statistic is labeled Chi-Square' and has a value of 0.8189 with 1 degree of freedom. The associated p-value is 0.3655, which means that there is no significant evidence of an association between internship status and program enrollment. The other chi-square statistics have similar values and are asymptotically equivalent. The other statistics (Phi Coefficient, Contingency Coefficient, and Cramer's V) are measures of association derived from the Pearson chi-square. For Fisher's exact test, the two-sided p-value is 0.4122, which also shows no association between internship status and program enrollment.

 The FREQ Procedure
 Statistics for Table of Internship by Enrollment

 Statistic DF Value Prob Chi-Square 1 0.8189 0.3655 Likelihood Ratio Chi-Square 1 0.8202 0.3651 Continuity Adj. Chi-Square 1 0.5899 0.4425 Mantel-Haenszel Chi-Square 1 0.8153 0.3666 Phi Coefficient 0.0606 Contingency Coefficient 0.0605 Cramer's V 0.0606

 Fisher's Exact Test Cell (1,1) Frequency (F) 67 Left-sided Pr <= F 0.8513 Right-sided Pr >= F 0.2213 Table Probability (P) 0.0726 Two-sided Pr <= P 0.4122
 Sample Size = 223

Figure 28.2: Statistics Produced with the CHISQ Option

The analysis, so far, has ignored gender. However, it may be of interest to ask whether program enrollment is associated with internship status after adjusting for gender. You can address this question by doing an analysis of a set of tables, in this case, by analyzing the set consisting of one for boys and one for girls. The Cochran-Mantel-Haenszel statistic is appropriate for this situation: it addresses whether rows and columns are associated after controlling for the stratification variable. In this case, you would be stratifying by gender.

The FREQ statements for this analysis are very similar to those for the first analysis, except that there is a third variable, Gender, in the TABLES statement. When you cross more than two variables, the two rightmost variables construct the rows and columns of the table, respectively, and the leftmost variables determine the stratification.


proc freq data=SummerSchool;
weight count;
tables Gender*Internship*Enrollment / chisq cmh;
run;


This execution of PROC FREQ first produces two individual crosstabulation tables of Internship*Enrollment, one for boys and one for girls. Chi-square statistics are produced for each individual table. Note that the chi-square statistic for boys is significant at the level of significance. Boys offered a course with an internship are more likely to enroll than boys who are not.

 The FREQ Procedure
 Frequency Percent Row Pct Col Pct
 Table 1 of Internship by Enrollment Controlling for Gender=boys Internship Enrollment Total no yes no 27 25.71 65.85 48.21 14 13.33 34.15 28.57 41 39.05 yes 29 27.62 45.31 51.79 35 33.33 54.69 71.43 64 60.95 Total 56 53.33 49 46.67 105 100.00
 Statistics for Table 1 of Internship by EnrollmentControlling for Gender=boys

 Statistic DF Value Prob Chi-Square 1 4.2366 0.0396 Likelihood Ratio Chi-Square 1 4.2903 0.0383 Continuity Adj. Chi-Square 1 3.4515 0.0632 Mantel-Haenszel Chi-Square 1 4.1963 0.0405 Phi Coefficient 0.2009 Contingency Coefficient 0.1969 Cramer's V 0.2009

 Fisher's Exact Test Cell (1,1) Frequency (F) 27 Left-sided Pr <= F 0.9885 Right-sided Pr >= F 0.0311 Table Probability (P) 0.0196 Two-sided Pr <= P 0.0467
 Sample Size = 105

Figure 28.3: Frequency Table and Statistics for Boys

If you look at the individual table for girls, you see that there is no evidence of association for girls between internship offers and program enrollment.

 Frequency Percent Row Pct Col Pct
 Table 2 of Internship by Enrollment Controlling for Gender=girls Internship Enrollment Total no yes no 23 19.49 30.26 69.70 53 44.92 69.74 62.35 76 64.41 yes 10 8.47 23.81 30.30 32 27.12 76.19 37.65 42 35.59 Total 33 27.97 85 72.03 118 100.00
 Statistics for Table 2 of Internship by EnrollmentControlling for Gender=girls

 Statistic DF Value Prob Chi-Square 1 0.5593 0.4546 Likelihood Ratio Chi-Square 1 0.5681 0.4510 Continuity Adj. Chi-Square 1 0.2848 0.5936 Mantel-Haenszel Chi-Square 1 0.5545 0.4565 Phi Coefficient 0.0688 Contingency Coefficient 0.0687 Cramer's V 0.0688

 Fisher's Exact Test Cell (1,1) Frequency (F) 23 Left-sided Pr <= F 0.8317 Right-sided Pr >= F 0.2994 Table Probability (P) 0.1311 Two-sided Pr <= P 0.5245
 Sample Size = 118
Figure 28.4: Frequency Table and Statistics for Girls

These individual table results demonstrate the occasional problems with combining information into one table and not accounting for information in other variables such as Gender. Figure 28.4 contains the CMH results. There are three summary (CMH) statistics: which one you use depends on whether your rows and/or columns have an order in r×c tables. However, in the case of 2×2 tables, ordering doesn't matter and all three statistics take the same value. The CMH statistic follows the chi-square distribution under the hypothesis of no association, and here, it takes the value 4.0186 with 1 degree of freedom. The associated p-value is 0.0450, which indicates a significant association at the level.

Thus, when you adjust for the effect of gender in these data, there is an association between internship and program enrollment. But, if you ignore gender, no association is found. Note that the CMH option also produces other statistics, including estimates and confidence limits for relative risk and odds ratios for 2×2 tables and the Breslow-Day Test. These results are not displayed here.

 The FREQ Procedure
 Summary Statistics for Internship by EnrollmentControlling for Gender

 Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob 1 Nonzero Correlation 1 4.0186 0.0450 2 Row Mean Scores Differ 1 4.0186 0.0450 3 General Association 1 4.0186 0.0450
 Total Sample Size = 223

Figure 28.5: Test for the Hypothesis of No Association

 Chapter Contents Previous Next Top