Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The FREQ Procedure

Frequency Tables and Statistics

The FREQ procedure provides easy access to statistics for testing for association in a crossclassification table.

In this example, high school students applied for courses in a summer enrichment program: these included journalism, art history, statistics, graphics arts, and computer programming. The students accepted were randomly assigned to classes with and without internships in local companies. The following table contains counts of the students who enrolled in the summer program by gender and whether they were assigned an internship slot.

Table 26.1: Summer Enrichment Data
    Enrollment
Gender Internship Yes No Total
boysyes352964
boysno142741
girlsyes321032
girlsno532376

The SAS data set SummerSchool is created by inputting count data that corresponds to each cell of the table. The following DATA step statements create the SAS data set SummerSchool.

   data SummerSchool; 
      input Gender $ Internship $ School $ Count @@; 
      datalines;
   boys  yes yes 35   boys  yes no 29 
   boys   no yes 14   boys   no no 27
   girls yes yes 32   girls yes no 10  
   girls  no yes 53   girls  no no 23
   ;
The variable Gender takes the values `boys' or `girls', the variable Internship takes the values `yes' and `no', and the variable School takes the values `yes' and `no'. The variable COUNT contains the number of students with the characteristics corresponding to the other variable values. The double at sign (@@) indicates that more than one observation is included on a single data line. In this DATA step, two observations are included on each line.

Researchers are interested in whether there is an association between internship status and summer program enrollment. The Pearson chi-square statistic is an appropriate statistic to assess the association in the corresponding 2×2 table. The following PROC FREQ statements specify this analysis.

You specify the table for which you want to compute statistics with the TABLE statement. You specify the statistics you want to compute with options after a slash (/) in the TABLES statement.

 
   proc freq data=SummerSchool order=data;
      weight count;  
      tables Internship*School / chisq;
   run;

The ORDER= option controls the order in which variable values are displayed in the rows and columns of the table. By default, the values are arranged according to alphanumeric order. If ORDER=DATA is specified, the data are displayed in the same order as they were encountered in the DATA step. Here, since `yes' appears before `no' in the data, `yes' appears first in any table. Another option for controlling order is to use ORDER=FORMATTED to base the ordering on formatted values.

In the TABLES statement, Internship*School specifies a table comprised of rows of internship status and columns of program attendance. The WEIGHT statement is required when the input data are in count form. The variable specified in the WEIGHT statement identifies the count variable. Finally, the CHISQ option requests that chi-square statistics for assessing association be computed.

Figure 26.1 presents the cross-classification of Internship and School. In each cell, the values printed under the cell count are the table percentage, column percentage, and row percentage, respectively. For example, in the first cell, 63.21 percent of those offered courses with internships accepted them and 36.79 percent did not.

The SAS System

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct

Table of Internship by School
Internship School Total
yes no
yes 67
30.04
63.21
50.00
39
17.49
36.79
43.82
106
47.53
 
 
no 67
30.04
57.26
50.00
50
22.42
42.74
56.18
117
52.47
 
 
Total 134
60.09
89
39.91
223
100.00

Figure 26.1: Cross-Classification Table

The next table displays the statistics produced by the CHISQ option. The Pearson chi-square statistic is labeled `Chi-Square' and has a value of 0.8189 with 1 degree of freedom. The associated p-value is 0.3655, which means that there is no significant evidence of an association between internship status and program acceptance. The other chi-square statistics have similar values and they are asymptotically equivalent. The Fisher's Exact test takes the value p=0.4122 (two-tailed). The other statistics (Phi Coefficient, Contingency Coefficient, and Cramer's V) are measures of correlation.

The FREQ Procedure

Statistics for Table of Internship by School

Statistic
DF Value Prob
Chi-Square                  
1 0.8189 0.3655
Likelihood Ratio Chi-Square 
1 0.8202 0.3651
Continuity Adj. Chi-Square  
1 0.5899 0.4425
Mantel-Haenszel Chi-Square  
1 0.8153 0.3666
Fisher's Exact Test (Left)  
    0.8513
                    (Right) 
    0.2213
                    (2-Tail)
    0.4122
Phi Coefficient             
  0.0606  
Contingency Coefficient     
  0.0605  
Cramer's V                  
  0.0606  

Sample Size = 223

Figure 26.2: Statistics Produced with the CHISQ Option

The analysis, so far, has ignored gender. However, it may be of interest to ask whether program acceptance is associated with internship status after adjusting for gender. You can address this question by doing an analysis of a set of tables, in this case, by analyzing the set consisting of one for boys and one for girls. The Cochran-Mantel-Haenszel statistic is appropriate for this situation: it addresses whether rows and columns are associated after controlling for the stratification variable. In this case, you would be stratifying by gender.

The FREQ statements for this analysis are very similar except that there is a third variable, Gender, in the TABLES statement. When you cross more than two variables, the two rightmost variables construct the rows and columns of the table, respectively, and the leftmost variables determine the stratification.

 
   proc freq data=SummerSchool;
      weight count;  
      tables Gender*Internship*School / chisq cmh;
   run;

This execution of PROC FREQ first produces two individual tables, one for boys and one for girls. Chi-square statistics are produced for each individual table. Note that the chi-square statistic for boys is significant at the \alpha=0.05 level of significance. Boys offered a course with an internship are more likely to accept than boys who are not.

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct

Table 1 of Internship by School
Controlling for Gender=boys
Internship School Total
no yes
no 27
25.71
65.85
48.21
14
13.33
34.15
28.57
41
39.05
 
 
yes 29
27.62
45.31
51.79
35
33.33
54.69
71.43
64
60.95
 
 
Total 56
53.33
49
46.67
105
100.00

Statistics for Table 1 of Internship by School
Controlling for Gender=boys

Statistic
DF Value Prob
Chi-Square                  
1 4.2366 0.0396
Likelihood Ratio Chi-Square 
1 4.2903 0.0383
Continuity Adj. Chi-Square  
1 3.4515 0.0632
Mantel-Haenszel Chi-Square  
1 4.1963 0.0405
Fisher's Exact Test (Left)  
    0.9885
                    (Right) 
    0.0311
                    (2-Tail)
    0.0467
Phi Coefficient             
  0.2009  
Contingency Coefficient     
  0.1969  
Cramer's V                  
  0.2009  

Sample Size = 105

Figure 26.3: Frequency Table and Statistics for Boys

If you look at the individual table for girls, you see that there is no evidence of association for girls getting internship offers versus those who did not.

Frequency
Percent
Row Pct
Col Pct

Table 2 of Internship by School
Controlling for Gender=girls
Internship School Total
no yes
no 23
19.49
30.26
69.70
53
44.92
69.74
62.35
76
64.41
 
 
yes 10
8.47
23.81
30.30
32
27.12
76.19
37.65
42
35.59
 
 
Total 33
27.97
85
72.03
118
100.00

Statistics for Table 2 of Internship by School
Controlling for Gender=girls

Statistic
DF Value Prob
Chi-Square                  
1 0.5593 0.4546
Likelihood Ratio Chi-Square 
1 0.5681 0.4510
Continuity Adj. Chi-Square  
1 0.2848 0.5936
Mantel-Haenszel Chi-Square  
1 0.5545 0.4565
Fisher's Exact Test (Left)  
    0.8317
                    (Right) 
    0.2994
                    (2-Tail)
    0.5245
Phi Coefficient             
  0.0688  
Contingency Coefficient     
  0.0687  
Cramer's V                  
  0.0688  

Sample Size = 118

Figure 26.4: Frequency Table and Statistics for Girls

These individual table results demonstrate the occasional problems with combining information into one table and not accounting for information in other variables such as Gender. Figure 26.4 contains the CMH results. There are three summary (CMH) statistics: which one you use depends on whether your rows and/or columns have an ordering to them in r×c tables. However, in the case of 2×2 tables, ordering doesn't matter and all three statistics take the same value. The CMH statistic follows the chi-square distribution under the hypothesis of no association, and here, it takes the value 4.0186 with 1 degree of freedom. The associated p-value is 0.0450, which indicates a significant association at the \alpha=0.05 level.

Thus, when you adjust for the effect of gender in these data, there is an association between internship and program acceptance. But, if you ignore gender, no association is found. Note that the CMH option also produces other statistics, including estimates and confidence limits for relative risk and odds ratios for 2×2 tables and the Breslow-Day Test. These results are not displayed here.

The FREQ Procedure

Summary Statistics for Internship by School
Controlling for Gender

Cochran-Mantel-Haenszel Statistics (Based on Table
Scores)
Statistic Alternative Hypothesis DF Value Prob
1 Nonzero Correlation 1 4.0186 0.0450
2 Row Mean Scores Differ 1 4.0186 0.0450
3 General Association 1 4.0186 0.0450

Total Sample Size = 223

Figure 26.5: Test for the Hypothesis of No Association

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.