The FREQ Procedure

# Statistical Computations

This section gives the formulas PROC FREQ uses to compute the following:

• chi-square tests and statistics (CHISQ option)

• measures of association (MEASURES option)

• binomial proportion (BINOMIAL option)

• risks (or binomial proportions) and risk differences for 2×2 tables (RISKDIFF option)

• odds ratios and relative risks for 2×2 tables (MEASURES or RELRISK option)

• Jonckheere-Terpstra test (JT option)

• Cochran-Armitage test for trend (TREND option)

• tests and measures of agreement (AGREE option)

• Cochran-Mantel-Haenszel statistics (CMH option)

Furthermore, this section describes the computation of exact p-values.

When selecting statistics to analyze your data, consider the study design (which indicates whether the row and column variables are dependent or independent), the measurement scale of the variables (nominal, ordinal, or interval), the type of association that the statistics detect, and the assumptions for valid interpretation of the statistics. For example, the Mantel-Haenszel chi-square statistic requires an ordinal scale for both variables and detects a linear association. On the other hand, the Pearson chi-square is appropriate for all variables and can detect any kind of association, but is less powerful for detecting a linear association. Select tests and measures carefully, choosing those that are appropriate for your data. For more information on when to use a statistic and how to interpret the results, refer to Agresti (1996) and Stokes et al. (1995).

In this chapter, a two-way table represents the crosstabulation of two variables X and Y. Let the rows of the table be labeled by the values , and the columns by . Let denote the cell frequency in the th row and the th column and define the following:

 (row totals) (column totals) (overall total) (cell percentages) (row percentages) (column percentages) score for row score for column (average row score) (average column score) (twice the number of concordances) (twice the number of discordances)

### Scores

PROC FREQ uses row and column scores when computing the Mantel-Haenszel chi-square, Pearson correlation, Cochran-Armitage test for trend, weighted kappa coefficient, and Cochran-Mantel-Haenszel statistics. The SCORES= option in the TABLES statement specifies the score type that PROC FREQ uses. The available score types are TABLE, RANK, RIDIT, and MODRIDIT scores. The default score type is TABLE.

For numeric variables, TABLE scores are the values of the row and column levels. If the row or column variables are formatted, then the TABLE score is the internal numeric value corresponding to that level. If two or more numeric values are classified into the same formatted level, then the internal numeric value for that level is the smallest of these values. For character variables, TABLE scores are defined as the row numbers and column numbers (that is, 1 for the first row, 2 for the second row, and so on).

RANK scores, which you can use to obtain nonparametric analyses, are defined by

Note that RANK scores yield midranks for tied values.

RIDIT scores (Bross 1958; Mack and Skillings 1980) also yield nonparametric analyses, but they are standardized by the sample size. RIDIT scores are derived from RANK scores as

Modified ridit (MODRIDIT) scores (van Elteren 1960 and Lehmann 1975), which also yield nonparametric analyses, represent the expected values of the order statistics for the uniform distribution on (0,1). Modified ridit scores are derived from RANK scores as

When you specify the CHISQ option in the TABLES statement, PROC FREQ performs the following chi-square tests for each two-way table: Pearson chi-square, continuity-adjusted chi-square for 2×2 tables, likelihood-ratio chi-square, Mantel-Haenszel chi-square, and Fisher's exact test for 2×2 tables. Also, PROC FREQ computes the following statistics derived from the Pearson chi-square: the phi coefficient, the contingency coefficient, and Cramer's V. PROC FREQ computes Fisher's exact test for general tables when you specify the FISHER option in the TABLES statement, or, equivalently, when you specify the FISHER option in the EXACT statement.

For one-way frequency tables, PROC FREQ performs a chi-square goodness-of-fit test when you specify the CHISQ option. See Chi-Square Test for One-Way Tables for information. The other chi-square tests and statistics described in this section are defined only for two-way tables, and so are not computed for one-way frequency tables.

All the two-way test statistics described in this section test the null hypothesis of no association between the row variable and the column variable. When the sample size is large, these test statistics are distributed approximately as chi-square when the null hypothesis is true. When the sample size is not large, exact tests may be useful. PROC FREQ computes exact tests for the following chi-square statistics when you specify the corresponding option in the EXACT statement: Pearson chi-square, likelihood-ratio chi-square, and Mantel-Haenszel chi-square. See Exact Statistics for more information.

Note that the Mantel-Haenszel chi-square statistic is appropriate only when both variables lie on an ordinal scale. The other chi-square tests and statistics in this section are appropriate for either nominal or ordinal variables. The following sections give the formulas that PROC FREQ uses to compute the chi-square tests and statistics. For further information on the formulas and on the applicability of each statistic, refer to Agresti (1996), Stokes et al. (1995), and the other references cited for each statistic.

### Chi-Square Test for One-Way Tables

For one-way frequency tables, the CHISQ option in the TABLES statement computes a chi-square goodness-of-fit test. Let denote the number of classes, or levels, in the one-way table. Let denote the frequency of class (or the number of observations in class ), for . Then PROC FREQ computes the chi-square statistic as

where is the expected frequency for class under the null hypothesis.

In the test for equal proportions, which is the default for the CHISQ option, the null hypothesis specifies equal proportions of the total sample size for each class. Under this null hypothesis, the expected frequency for each class equals the total sample size divided by the number of classes,

In the test for specified frequencies, which PROC FREQ computes when you input null hypothesis frequencies using the TESTF= option, the expected frequencies are those TESTF= values. In the test for specified proportions, which PROC FREQ computes when you input null hypothesis proportions using the TESTP= option, the expected frequencies are determined from the TESTP= proportions , as

Under the null hypothesis (of equal proportions, specified frequencies, or specified proportions), this test statistic has an asymptotic chi-square distribution, with degrees of freedom. In addition to the asymptotic test, PROC FREQ computes the exact one-way chi-square test when you specify the CHISQ option in the EXACT statement.

### Chi-Square Test for Two-Way Tables

The Pearson chi-square statistic for two-way tables involves the differences between the observed and expected frequencies, where the expected frequencies are computed under the null hypothesis of independence. The chi-square statistic is computed as

where

When the row and column variables are independent, has an asymptotic chi-square distribution with degrees of freedom. For large values of , this test rejects the null hypothesis in favor of the alternative hypothesis of general association. In addition to the asymptotic test, PROC FREQ computes the exact chi-square test when you specify the PCHI option or CHISQ option in the EXACT statement.

For a 2×2 table, the Pearson chi-square is also appropriate for testing the equality of two binomial proportions or, for and tables, the homogeneity of proportions. efer to Fienberg (1980).

### Likelihood-Ratio Chi-Square Test

The likelihood-ratio chi-square statistic involves the ratios between the observed and expected frequencies. The statistic is computed as

When the row and column variables are independent, has an asymptotic chi-square distribution with degrees of freedom. In addition to the asymptotic test, PROC FREQ computes the exact test when you specify the LRCHI option or the CHISQ option in the EXACT statement.

The continuity-adjusted chi-square statistic for 2×2 tables is similar to the Pearson chi-square, except that it is adjusted for the continuity of the chi-square distribution. The continuity-adjusted chi-square is most useful for small sample sizes. The use of the continuity adjustment is controversial; this chi-square test is more conservative, and more like Fisher's exact test, when your sample size is small. As the sample size increases, the statistic becomes more and more like the Pearson chi-square. The statistic is computed as

Under the null hypothesis of independence, has an asymptotic chi-square distribution with degrees of freedom.

### Mantel-Haenszel Chi-Square Test

The Mantel-Haenszel chi-square statistic tests the alternative hypothesis that there is a linear association between the row variable and the column variable. Both variables must lie on an ordinal scale. The statistic is computed as

where is the Pearson correlation between the row variable and the column variable. For a description of the Pearson correlation, see Pearson Correlation Coefficient. The Pearson correlation, and thus the Mantel-Haenszel chi-square statistic, use the scores you specify in the SCORES= option in the TABLES statement.

Under the null hypothesis of no association, has an asymptotic chi-square distribution with 1 degree of freedom. In addition to the asymptotic test, PROC FREQ computes the exact test when you specify the MHCHI option or the CHISQ option in the EXACT statement.

Refer to Mantel and Haenszel (1959) and Landis et al. (1978).

### Fisher's Exact Test

For 2×2 tables, Fisher's exact test is the probability of observing a table that gives at least as much evidence of association as the one actually observed, given that the null hypothesis is true. The row and column margins are assumed to be fixed. The hypergeometric probability, , of every possible table is computed, and the p-value is defined as

For a two-sided alternative hypothesis, A is the set of tables with less than or equal to the probability of the observed table. A small two-sided p-value supports the alternative hypothesis of association between the row and column variables.

One-sided tests are defined in terms of the frequency of the cell in the first row and first column (the (1,1) cell). For a left-sided alternative hypothesis, A is the set of tables where the frequency in the (1,1) cell is less than or equal to that of the observed table. A small left-sided p-value supports the alternative hypothesis that the probability of an observation being in the first cell is less than that expected under the null hypothesis of independent row and column variables.

Similarly, for a right-sided alternative hypothesis, A is the set of tables where the frequency in the (1,1) cell is greater than or equal to that of the observed table. A small right-sided p-value supports the alternative that the probability of the first cell is greater than that expected under the null hypothesis.

Because the (1,1) cell frequency completely determines the 2×2 table when the marginal row and column sums are fixed, these one-sided alternatives can be equivalently stated in terms of other cell probabilities or ratios of cell probabilities. The left-sided alternative is equivalent to an odds ratio less than 1, and the right-sided alternative is equivalent to an odds ratio greater than 1, where the odds ratio equals . Additionally, the left-sided alternative is equivalent to the column 1 risk for row 1 being less than the column 1 risk for row 2, . Similarly, the right-sided alternative is equivalent to the column 1 risk for row 1 being greater than the column 1 risk for row 2, . Refer to Agresti (1996).

Fisher's exact test was extended to general tables by Freeman and Halton (1951), and this test is also known as the Freeman-Halton test. For tables, the two-sided p-value is defined the same as it is for 2×2 tables. A is the set of all tables with p less than or equal to the probability of the observed table. A small p-value supports the alternative hypothesis of association between the row and column variables. For tables, Fisher's exact test is inherently two-sided. The alternative hypothesis is defined only in terms of general, and not linear, association. Therefore, PROC FREQ does not compute right-sided or left-sided p-values for general tables.

For tables, PROC FREQ computes Fisher's exact test using the network algorithm of Mehta and Patel (1983), which provides a faster and more efficient solution than direct enumeration. See Exact Statistics for more information.

### Phi Coefficient

The phi coefficient is a measure of association derived from the Pearson chi-square statistic. It has the range for 2×2 tables. Otherwise, the range is (Liebetrau, 1983). The phi coefficient is computed as

Refer to Fleiss (1981, pp 59-60).

### Contingency Coefficient

The contingency coefficient is a measure of association derived from the Pearson chi-square. It has the range , where (Liebetrau, 1983). The contingency coefficient is computed as

Refer to Kendall and Stuart (1979, pp 587-588).

### Cramer's V

Cramer's V is a measure of association derived from the Pearson chi-square. It is designed so that the attainable upper bound is always 1. It has the range for 2×2 tables; otherwise, the range is . Cramer's V is computed as

Refer to Kendall and Stuart (1979, p. 588).

When you specify the MEASURES option in the TABLES statement, PROC FREQ computes several statistics that describe the association between the two variables of the contingency table. The following are measures of ordinal association that consider whether the variable Y tends to increase as X increases: gamma, Kendall's tau-b, Stuart's tau-c, and Somers' D. These measures are appropriate for ordinal variables, and classify pairs of observations as concordant or discordant. A pair is concordant if the observation with the larger value of X also has the larger value of Y. A pair is discordant if the observation with the larger value of X has the smaller value of Y. Refer to Agresti (1996) and the other references cited in the discussion of each measure of association.

The Pearson correlation coefficient and the Spearman rank correlation coefficient are also appropriate for ordinal variables. The Pearson correlation describes the strength of the linear association between the row and column variables, and is computed using the row and column scores specified by the SCORES= option in the TABLES statement. The Spearman correlation is computed with rank scores. The polychoric correlation (requested by the PLCORR option) also requires ordinal variables, and assumes that the variables have an underlying bivariate normal distribution. The following measures of association do not require ordinal variables, but are appropriate for nominal variables: lambda asymmetric and symmetric, and the uncertainty coefficients.

PROC FREQ computes estimates of the measures according to the formulas given in the discussion of each measure of association. For each measure, PROC FREQ computes an asymptotic standard error, which is the square root of the asymptotic variance denoted by var in the following sections.

### Confidence Bounds

If you specify the CL option in the TABLES statement, PROC FREQ computes asymptotic confidence bounds for all MEASURES statistics. The confidence coefficient is determined according to the value of the ALPHA= option, which by default equals 0.05 and produces 95 percent confidence bounds. The confidence bounds are computed as

where is the estimate of the measure, is the percentile of the standard normal distribution, and ASE is the asymptotic standard error of the estimate.

### Asymptotic Tests

For each measure that you specify in the TEST statement, PROC FREQ computes an asymptotic test of the null hypothesis that the measure equals zero. Asymptotic tests are available for the following measures of association: gamma, Kendall's tau-b, Stuart's tau-c, Somers' D( ), Somers' D( ), the Pearson correlation coefficient, and the Spearman rank correlation coefficient. To compute an asymptotic test, PROC FREQ uses a standardized test statistic z, which has an asymptotic standard normal distribution under the null hypothesis. The standardized test statistic is computed as

where is the estimate of the measure, and is the variance of the estimate under the null hypothesis. Formulas for are given in the discussion of each measure of association.

Note that the ratio of to is the same for the following measures: gamma, Kendall's tau-b, Stuart's tau-c, Somers' D( ), and Somers' D( ). Therefore, the tests for these measures are identical. For example, the p-values for the test of : gamma=0 equal the p-values for the test of : tau-b= 0.

PROC FREQ computes one-sided and two-sided p-values for each of these tests. When the test statistic z is greater than its null hypothesis expected value of zero, PROC FREQ computes the right-sided p-value, which is the probability of a larger value of the statistic occurring under the null hypothesis. A small right-sided p-value supports the alternative hypothesis that the true value of the measure is greater than zero. When the test statistic is less than or equal to zero, PROC FREQ computes the left-sided p-value, which is the probability of a smaller value of the statistic occurring under the null hypothesis. A small left-sided p-value supports the alternative hypothesis that the true value of the measure is less than zero. The one-sided p-value can be expressed as

where has a standard normal distribution. The two-sided p-value is computed as

### Exact Tests

Exact tests are available for two measures of association, the Pearson correlation coefficient and the Spearman rank correlation coefficient. If you specify the PCORR option in the EXACT statement, PROC FREQ computes the exact test of the hypothesis that the Pearson correlation equals zero. If you specify the SCORR option in the EXACT statement, PROC FREQ computes the exact test of the hypothesis that the Spearman correlation equals zero. See Exact Statistics for information on exact tests.

### Gamma

The estimator of gamma is based only on the number of concordant and discordant pairs of observations. It ignores tied pairs (that is, pairs of observations that have equal values of X or equal values of Y). Gamma is appropriate only when both variables lie on an ordinal scale. It has the range . If the two variables are independent, then the estimator of gamma tends to be close to zero. Gamma is estimated by

with

The variance of the estimator under the null hypothesis that gamma equals zero is computed as

For 2×2 tables, gamma is equivalent to Yule's Q. Refer to Goodman and Kruskal (1963; 1972), Brown and Benedetti (1977), and Agresti (1990).

### Kendall's Tau-b

Kendall's tau-b is similar to gamma except that tau-b uses a correction for ties. Tau-b is appropriate only when both variables lie on an ordinal scale. Tau-b has the range . It is estimated by

with

where

The variance of the estimator under the null hypothesis that tau-b equals zero is computed as

Refer to Kendall (1955) and Brown and Benedetti (1977).

### Stuart's Tau-c

Stuart's tau-c makes an adjustment for table size in addition to a correction for ties. Tau-c is appropriate only when both variables lie on an ordinal scale. Tau-c has the range . It is estimated by

with

where

The variance of the estimator under the null hypothesis that tau-c equals zero is the same as in the above equation.

Refer to Brown and Benedetti (1977).

### Somers' D

Somers' D( ) and Somers' D( ) are asymmetric modifications of tau-b. denotes that the row variable X is regarded as an independent variable, while the column variable Y is regarded as dependent. Similarly, denotes that the column variable Y is regarded as an independent variable, while the row variable X is regarded as dependent. Somers' D differs from tau-b in that it uses a correction only for pairs that are tied on the independent variable. Somers' D is appropriate only when both variables lie on an ordinal scale. It has the range . Formulas for Somers' D( ) are obtained by interchanging the indices:

with

where

The variance of the estimator under the null hypothesis that tau-c equals zero is computed as

Refer to Somers (1962) and Goodman and Kruskal (1972).

### Pearson Correlation Coefficient

PROC FREQ computes the Pearson correlation coefficient using the scores specified in the SCORES= option. The Pearson correlation is appropriate only when both variables lie on an ordinal scale. It has the range . The Pearson correlation coefficient is computed as

with

The row scores and the column scores are determined by the SCORES= option in the TABLES statement and by

Refer to Snedecor and Cochran (1989) and Brown and Benedetti (1977).

To compute an asymptotic test for the Pearson correlation, PROC FREQ uses a standardized test statistic , which has an asymptotic standard normal distribution under the null hypothesis. The standardized test statistic is computed as

where is the variance of the correlation under the null hypothesis.

This asymptotic variance is derived for multinomial sampling in a contingency table framework, and it differs from the form obtained under the assumption that both variables are continuous and normally distributed. Refer to Brown and Benedetti (1977).

PROC FREQ also computes the exact test for the hypothesis that the Pearson correlation equals zero when you specify the PCORR option in the EXACT statement. See Exact Statistics for more information on exact tests.

### Spearman Rank Correlation Coefficient

The Spearman correlation coefficient is computed using rank scores and , defined in Scores. It is appropriate only when both variables lie on an ordinal scale. It has the range . The Spearman correlation coefficient is computed as

with

where

Refer to Snedecor and Cochran (1989) and Brown and Benedetti (1977).

To compute an asymptotic test for the Spearman correlation, PROC FREQ uses a standardized test statistic , which has an asymptotic standard normal distribution under the null hypothesis. The standardized test statistic is computed as

where is the variance of the correlation under the null hypothesis.

where

This asymptotic variance is derived for multinomial sampling in a contingency table framework, and it differs from the form obtained under the assumption that both variables are continuous. Refer to Brown and Benedetti (1977).

PROC FREQ also computes the exact test for the hypothesis that the Spearman rank correlation equals zero when you specify the SCORR option in the EXACT statement. See Exact Statistics for more information.

### Polychoric Correlation

When you specify the PLCORR option in the TABLES statement, PROC FREQ computes the polychoric correlation. This measure of association is based on the assumption that the ordered, categorical variables of the frequency table have an underlying bivariate normal distribution. For 2×2 tables, the polychoric correlation is also known as the tetrachoric correlation. Refer to Drasgow (1986) for an overview of polychoric correlation. The polychoric correlation coefficient is the maximum likelihood estimate of the product-moment correlation between the normal variables, estimating thresholds from the observed table frequencies. Olsson (1979) gives the likelihood equations and an asymptotic covariance matrix for the estimates.

To estimate the polychoric correlation, PROC FREQ iteratively solves the likelihood equations by a Newton-Raphson algorithm. Iteration stops when the convergence measure falls below the convergence criterion, or when the maximum number of iterations is reached, whichever occurs first. The CONVERGE= option sets the convergence criterion, and the default is 0.0001. The MAXITER= option sets the maximum number of iterations, and the default is 20.

### Lambda Asymmetric

Asymmetric lambda, , is interpreted as the probable improvement in predicting the column variable Y given knowledge of the row variable X. Asymmetric lambda has the range . It is computed as

with

where

Also, let be the unique value of such that , and let be the unique value of such that .

Because of the uniqueness assumptions, ties in the frequencies or in the marginal totals must be broken in an arbitrary but consistent manner. In case of ties, is defined here as the smallest value of such that . For a given , if there is at least one value such that then is defined here to be the smallest such value of . Otherwise, if , then is defined to be equal to . If neither condition is true, then is taken to be the smallest value of such that . The formulas for lambda asymmetric can be obtained by interchanging the indices.

Refer to Goodman and Kruskal (1963).

### Lambda Symmetric

The nondirectional lambda is the average of the two asymmetric lambdas. Lambda symmetric has the range . Lambda symmetric is defined as

with

where

Refer to Goodman and Kruskal (1963).

### Uncertainty Coefficient Asymmetric

The uncertainty coefficient, , is the proportion of uncertainty (entropy) in the column variable Y that is explained by the row variable X. It has the range . The formulas for are obtained by interchanging the indices.

with

where

Refer to Theil (1972, pp 115-120) and Goodman and Kruskal (1972).

### Uncertainty Coefficient Symmetric

The uncertainty coefficient, U, is the symmetric version of the two asymmetric coefficients. It has the range . It is defined as

with

Refer to Goodman and Kruskal (1972).

When you specify the BINOMIAL option in the TABLES statement, PROC FREQ computes a binomial proportion for one-way tables. This is the proportion of observations for the first variable level, or class, that appears in the output.

where is the frequency for the first level, and is the total frequency for the one-way table. The standard error for the binomial proportion is computed as

Using the normal approximation to the binomial distribution, PROC FREQ constructs asymptotic confidence bounds for according to

where is the percentile of the standard normal distribution. The confidence level is determined by the ALPHA= option, which by default equals .05 and produces 95 percent confidence bounds. Additionally, PROC FREQ computes exact confidence bounds for the binomial proportion using the F distribution method given in Collett (1991) and also described by Leemis and Trivedi (1996).

PROC FREQ computes an asymptotic test of the hypothesis that the binomial proportion equals , where the value of is specified by the P= option in the TABLES statement. If you do not specify a value for P=, PROC FREQ uses by default. The asymptotic test statistic is

PROC FREQ computes one-sided and two-sided p-values for this test. When the test statistic z is greater than its null hypothesis expected value of zero, PROC FREQ computes the right-sided p-value, which is the probability of a larger value of the statistic occurring under the null hypothesis. A small right-sided p-value supports the alternative hypothesis that the true value of the proportion is greater than . When the test statistic is less than or equal to zero, PROC FREQ computes the left-sided p-value, which is the probability of a smaller value of the statistic occurring under the null hypothesis. A small left-sided p-value supports the alternative hypothesis that the true value of the proportion is less than . The one-sided p-value can be expressed as

where has a standard normal distribution. The two-sided p-value is computed as

When you specify the BINOMIAL option in the EXACT statement, PROC FREQ also computes an exact test of the null hypothesis . To compute this exact test, PROC FREQ uses the binomial probability function

where the variable X has a binomial distribution with parameters and . To compute , PROC FREQ sums these binomial probabilities over from zero to . To compute , PROC FREQ sums these binomial probabilities over from to . Then the exact one-sided p-value is

and the exact two-sided p-value is

The RISKDIFF option in the TABLES statement provides estimates of risks (or binomial proportions) and risk differences for 2×2 tables. This analysis may be appropriate when you are comparing the proportion of some characteristic for two groups, where row 1 and row 2 correspond to the two groups, and the columns correspond to two possible characteristics or outcomes. For example, the row variable might be a treatment or dose, and the column variable might be the response. Refer to Collett (1991), Fleiss (1981), and Stokes et al. (1995).

Let the frequencies of the 2×2 table be represented as follows:

The column 1 risk for row 1 is the proportion of row 1 observations classified in column 1

This estimates the conditional probability of the column 1 response, given the first level of the row variable.

The column 1 risk for row 2 is the proportion of row 2 observations classified in column 1,

and the overall column 1 risk is the proportion of all observations classified in column 1,

The column 1 risk difference compares the risks for the two rows, and it is computed as the column 1 risk for row 1 minus the column 1 risk for row 2,

The risks and risk difference are defined similarly for column 2.

The standard error of the column 1 risk estimate for row i is computed as

The standard error of the overall column 1 risk estimate is computed as

If the two rows represent independent binomial samples, the standard error for the column 1 risk difference is computed as

The standard errors are computed similarly for the column 2 risks and risk difference.

Using the normal approximation to the binomial distribution, PROC FREQ constructs asymptotic confidence bounds for the risk and risk differences according to

where is the estimate, is the percentile of the standard normal distribution, and is the standard error of the estimate. The confidence level is determined from the value of the ALPHA= option, which, by default, equals 0.05 and produces 95 percent confidence bounds.

PROC FREQ computes exact confidence bounds for the column 1, column 2, and overall risks using the F distribution method given in Collett (1991), and also described by Leemis and Trivedi (1996). PROC FREQ does not provide exact confidence bounds for the risk differences. Refer to Agresti (1992) for a discussion of issues involved in constructing exact confidence bounds for differences of proportions.

### Odds Ratio (Case-Control Studies)

The odds ratio is a useful measure of association for a variety of study designs. For a retrospective design called a case-control study, the odds ratio can be used to estimate the relative risk when the probability of positive response is small (Agresti, 1990). In a case-control study, two independent samples are identified based on a binary (yes-no) response variable, and the conditional distribution of a binary explanatory variable is examined within fixed levels of the response variable. Refer to Stokes et al. (1995) and Agresti (1996).

The odds of a positive response (column 1) in row 1 is . Similarly, the odds of positive response in row 2 is . The odds ratio is formed as the ratio of the row 1 odds to the row 2 odds. The odds ratio for 2×2 tables is defined as

The odds ratio can be any nonnegative number. When the row and column variables are independent, the true value of the odds ratio equals 1. An odds ratio greater than 1 indicates that the odds of a positive response are higher in row 1 than in row 2. Values less than 1 indicate the odds of positive response are higher in row 2. The strength of association increases with the deviation from 1.

The transformation transforms the odds ratio to the range such that when , when , and when is missing. is the gamma statistic, which PROC FREQ computes when you specify the MEASURES option.

The asymptotic percent confidence bounds for the odd ratio are

where

and is the percentile of the standard normal distribution. If any of the four cell frequencies are zero, the estimates are not computed.

When you specify the OR option in the EXACT statement PROC FREQ computes exact confidence bounds for the odds ratio using an iterative algorithm based on that presented by Thomas (1971). Because this is a discrete problem, the confidence coefficient for these exact confidence bounds is not exactly , but is at least . Thus, these confidence bounds are conservative. Refer to Agresti (1992).

### Relative Risks (Cohort Studies)

These measures of relative risk are useful in cohort (prospective) study designs, where two samples are identified based on the presence or absence of an explanatory factor. The two samples are observed in future time for the binary (yes-no) response variable under study. Relative risk measures are also useful in cross-sectional studies, where two variables are observed simultaneously. Refer to Stokes et al. (1995) and Agresti (1996).

The column 1 relative risk is the ratio of the column 1 risks for row 1 to row 2. The column 1 risk for row 1 is the proportion of the row 1 observations classified in column 1,

Similarly, the column 1 risk for row 2 is

The column 1 relative risk is then computed as

A relative risk greater than 1 indicates that the probability of positive response is greater in row 1 than in row 2. Similarly, a relative risk that is less than 1 indicates that the probability of positive response is less in row 1 than in row 2. The strength of association increases with the deviation from 1.

The asymptotic percent confidence bounds for the column 1 relative risk are

where

and is the percentile of the standard normal distribution. If either or is zero, PROC FREQ does not compute the relative risks.

The column 2 relative risk is computed similarly.

The TREND option in the TABLES statement requests the Cochran-Armitage test for trend, which tests for trend in binomial proportions across levels of a single factor or covariate. This test is appropriate for a contingency table where one variable has two levels and the other variable is ordinal. The two-level variable represents the response, and the other variable represents an explanatory variable with ordered levels. When the contingency table has two columns and R rows, PROC FREQ tests for trend across the R levels of the row variable. When the table has two rows and C columns, PROC FREQ tests for trend across the C levels of the column variable.

The trend test is based upon the regression coefficient for the weighted linear regression of the binomial proportions on the scores of the levels of the explanatory variable. Refer to Margolin (1988) and Agresti (1990). If the contingency table has two columns and R rows, the trend test statistic is computed as

where

The row scores are determined by the value of the SCORES= option in the TABLES statement. By default, PROC FREQ uses TABLE scores. For character variables, the TABLE scores for the row variable are the row numbers (for example, 1 for the first row, 2 for the second row, and so on). For numeric variables, the TABLE score for each row is the numeric value of the row level. When you perform the trend test, the explanatory variable may be numeric (for example, dose of a test substance), and these variable values may be appropriate scores. If the explanatory variable has ordinal levels that are not numeric, you can assign meaningful scores to the variable levels. Sometimes equidistant scores, such as the TABLE scores for a character variable, may be appropriate. For more information on choosing scores for the trend test, refer to Margolin (1988).

The null hypothesis for the Cochran-Armitage test is no trend, which means the binomial proportion is the same for all levels of the explanatory variable. Under this null hypothesis, the trend test statistic is asymptotically distributed as a standard normal random variable. In addition to this asymptotic test, PROC FREQ can compute the exact test for trend, which you request by specifying the TREND option in the EXACT statement. See the EXACT Statement for information on exact tests.

PROC FREQ computes one-sided and two-sided p-values for the trend test. When the test statistic is greater than its expected value of zero, PROC FREQ computes the right-sided p-value, which is the probability of a larger value of the statistic occurring under the null hypothesis. A small right-sided p-value supports the alternative hypothesis of increasing trend in column 1 probability from row 1 to row R. When the test statistic is less than or equal to zero, PROC FREQ computes the left-sided p-value. A small left-sided p-value supports the alternative of decreasing trend. The one-sided p-value can be expressed as

The two-sided p-value is computed as

The JT option in the TABLES statement requests the Jonckheere-Terpstra test, which is a nonparametric test for ordered differences among classes. It tests the null hypothesis that the distribution of the response variable does not differ among classes. It is designed to detect alternatives of ordered class differences, which can be expressed as (or ) with at least one of the inequalities being strict, where denotes the effect of class . For such ordered alternatives, the Jonckheere-Terpstra test can be preferable to tests of more general class difference alternatives, such as the Kruskal-Wallis test (requested by the WILCOXON option in the NPAR1WAY procedure). Refer to Pirie (1983) and Hollander and Wolfe (1973) for more information about the Jonckheere-Terpstra test.

The Jonckheere-Terpstra test is appropriate for a contingency table where an ordinal column variable represents the response. The row variable, which can be nominal or ordinal, represents the classification variable. The levels of the row variable should be ordered according to the ordering you want the test to detect. The order of variable levels is determined by the ORDER= option in the PROC FREQ statement. The default is ORDER=INTERNAL, which orders by unformatted value. If you specify ORDER=DATA, PROC FREQ orders values according to their order in the input data set. For more information on how to order variable levels, see the ORDER= option.

The Jonckheere-Terpstra test statistic is computed by first forming Mann-Whitney counts , where , for pairs of rows in the contingency table,

where is response in row . Then the Jonckheere-Terpstra test statistic is computed as

This test rejects the null hypothesis of no difference among classes for large values of . Asymptotic p-values for the Jonkheere-Terpstra test are obtained by using the normal approximation for the distribution of the standardized test statistic. The standardized test statistic is computed as

where and are the expected value and variance of the test statistic under the null hypothesis.

where

In addition to this asymptotic test, PROC FREQ can compute the exact Jonckheere-Terpstra test, which you request by specifying the JT option in the EXACT statement. See the EXACT Statement for information on exact tests.

PROC FREQ computes one-sided and two-sided p-values for the Jonckheere-Terpstra test. When the standardized test statistic is greater than its expected value of 0, PROC FREQ computes the right-sided p-value, which is the probability of a larger value of the statistic occurring under the null hypothesis. A small right-sided p-value supports the alternative hypothesis of increasing order from row 1 to row R. When the standardized test statistic is less than or equal to 0, PROC FREQ computes the left-sided p-value. A small left-sided p-value supports the alternative of decreasing order from row 1 to row R. The one-sided p-value, , can be expressed as

The two-sided p-value, , is computed as

When you specify the AGREE option in the TABLES statement, PROC FREQ computes tests and measures of agreement for square tables (that is, for tables where the number of rows equals the number of columns). For two-way tables, these tests and measures include McNemar's test for 2×2 tables, Bowker's test of symmetry, the simple kappa coefficient, and the weighted kappa coefficient. For multiple strata (n-way tables, where ), PROC FREQ computes the overall simple kappa coefficient and the overall weighted kappa coefficient, as well as tests for equal kappas (simple and weighted) among strata. For multiple strata of 2×2 tables, PROC FREQ computes Cochran's Q.

PROC FREQ computes the kappa coefficients (simple and weighted), their asymptotic standard errors, and their confidence bounds when you specify the AGREE option in the TABLES statement. If you also specify the KAPPA option in the TEST statement, then PROC FREQ computes the asymptotic test of the hypothesis that simple kappa equals zero. Similarly, if you specify WTKAP in the TEST statement, PROC FREQ computes the asymptotic test for weighted kappa.

In addition to the asymptotic tests described in this section, PROC FREQ can also compute the exact p-value for McNemar's test when you specify the keyword MCNEM in the EXACT statement. For the kappa statistic, PROC FREQ can compute an exact test of the hypothesis that kappa (or weighted kappa) equals zero when you specify KAPPA (or WTKAP) in the EXACT statement. See Exact Statistics for more information on these tests.

The discussion of each test and measure of agreement provides the formulas that PROC FREQ uses to compute the AGREE statistics. For information on the use and interpretation of these statistics, refer to Agresti (1990), Agresti (1996), Fleiss (1981), and the references that follow.

### McNemar's Test

PROC FREQ computes McNemar's test for 2×2 tables when you specify the AGREE option. McNemar's test is appropriate when you are analyzing data from matched pairs of subjects with a dichotomous (yes-no) response. It tests for marginal homogeneity, or a null hypothesis of . McNemar's test is computed as

Under the null hypothesis, has an asymptotic chi-square distribution with one degree of freedom. Refer to McNemar (1947), as well as the references cited in the preceding section. PROC FREQ can also compute an exact p-value for McNemar's test when you specify MCNEM in the EXACT statement.

### Bowker's Test of Symmetry

For Bowker's test of symmetry, the null hypothesis is that the probabilities in the square table satisfy symmetry, or that for all pairs of table cells. When there are more than two categories for each variable, Bowker's test of symmetry is calculated as

For large samples, has an asymptotic chi-square distribution with degrees of freedom under the null hypothesis of symmetry of the expected counts. Refer to Bowker (1948). For two categories, this test of symmetry is identical to McNemar's test.

### Simple Kappa Coefficient

The simple kappa coefficient, introduced by Cohen (1960), is a measure of interrater agreement:

where and . Viewing the two response variables as two independent ratings of the subjects, the kappa coefficient equals +1 when there is complete agreement of the raters. When the observed agreement exceeds chance agreement, the kappa coefficient is positive, with its magnitude reflecting the strength of agreement. Although unusual in practice, kappa is negative when the observed agreement is less than chance agreement. The minimum value of kappa is between -1 and 0, depending on the marginal proportions.

The asymptotic variance of the simple kappa coefficient is estimated by the following, according to Fleiss et al. (1969):

where

and

PROC FREQ computes confidence bounds for the simple kappa coefficient according to

where is the percentile of the standard normal distribution. The value of is determined by the value of the ALPHA= option, which by default equals 0.05 and produces 95 percent confidence bounds.

To compute an asymptotic test for the kappa coefficient, PROC FREQ uses a standardized test statistic , which has an asymptotic standard normal distribution under the null hypothesis that kappa equals zero. The standardized test statistic is computed as

where is the variance of the kappa coefficient under the null hypothesis.

Refer to Fleiss (1981).

In addition to the asymptotic test for kappa, PROC FREQ computes an exact test when you specify the KAPPA option or the AGREE option in the EXACT statement. See Exact Statistics for more information on exact tests.

### Weighted Kappa Coefficient

The weighted kappa coefficient is a generalization of the simple kappa coefficient, using weights to quantify the relative difference between categories. PROC FREQ computes the weights from the column scores, using either the Cicchetti-Allison weight type or the Fleiss-Cohen weight type, which are described below. The weights are constructed so that for all for all , and . The weighted kappa coefficient is defined as

where

and

The asymptotic variance of the weighted kappa coefficient is estimated by the following, according to Fleiss et al. (1969):

where

and

PROC FREQ computes confidence bounds for the weighted kappa coefficient according to

where is the percentile of the standard normal distribution. The value of is determined by the value of the ALPHA= option, which by default equals 0.05 and produces 95 percent confidence bounds.

To compute an asymptotic test for the weighted kappa coefficient, PROC FREQ uses a standardized test statistic , which has an asymptotic standard normal distribution under the null hypothesis. The standardized test statistic is computed as

where is the variance of the kappa coefficient under the null hypothesis.

Refer to Fleiss (1981).

In addition to the asymptotic test for weighted kappa, PROC FREQ computes the exact test when you specify the WTKAP option or the AGREE option in the EXACT statement. See Exact Statistics for more information on exact tests.

PROC FREQ computes kappa coefficient weights using the column scores and one of two available weight types. The column scores are determined by the SCORES= option in the TABLES statement. The two available weight types are Cicchetti-Allison and Fleiss-Cohen. By default, PROC FREQ uses the Cicchetti-Allison type. If you specify WT=FC in the AGREE option, then PROC FREQ uses the Fleiss-Cohen weight type to construct kappa weights.

PROC FREQ computes Cicchetti-Allison kappa coefficient weights using a form similar to that given by Cicchetti and Allison (1971).

where is the score for column , and C is the number of categories. You can specify the type of score using the SCORES= option in the TABLES statement. If you do not specify the SCORES= option, PROC FREQ uses TABLE scores. For numeric variables, TABLE scores are the values of the numeric row and column headings. You can assign numeric values to the categories in a way that reflects their level of similarity. For example, suppose you have four categories and order them according to similarity. If you assign them values of 0, 2, 4, and 10, the following weights are used for computing the weighted kappa coefficient: and . Note that when there are only two categories (that is, ), the weighted kappa coefficient is identical to the simple kappa coefficient.

If you specify (WT=FC) with the AGREE option in the TABLES statement, PROC FREQ computes Fleiss-Cohen kappa coefficient weights using a form similar to that given by Fleiss and Cohen (1973).

### Overall Kappa Coefficient

When there are multiple strata, PROC FREQ combines the stratum-level estimates of kappa into an overall estimate of the supposed common value of kappa. Assume there are strata, indexed by , and let denote the variance of . Then the estimate of the overall kappa, according to Fleiss (1981), is computed as follows:

An estimate of the overall weighted kappa is computed similarly.

### Tests for Equal Kappa Coefficients

The following chi-square statistic, with degrees of freedom, is used to test whether the values of the kappa are equal among the strata:

A similar test is done for weighted kappa coefficients.

### Cochran's Q Test

When there are multiple strata and two response categories, Cochran's Q statistic is used to test the homogeneity of the one-dimensional margins. Let denote the number of variables and denote the total number of subjects. Then Cochran's Q statistic is computed as follows:

where is the number of positive responses for variable , is the total number of positive responses over all variables, and is the number of positive responses for subject . Under the null hypothesis, Cochran's Q is an approximate chi-square statistic with degrees of freedom. Refer to Cochran (1950). When there are two variables, Cochran's Q simplifies to McNemar's statistic. When there are more than two response categories, you can test for marginal homogeneity using the repeated measures capabilities of the CATMOD procedure.

### Tables with Zero Rows or Columns

For multiway tables, PROC FREQ does not compute CHISQ or MEASURES statistics for a stratum with a zero row or a zero column because most of these statistics are undefined. However, PROC FREQ does compute AGREE statistics for tables with a zero row or a zero column. Therefore, the analysis includes all row and column variable levels that occur in any stratum. It does not include levels that do not occur in any stratum, even if such observations are in the data set with zero weight, because PROC FREQ does not process observations with zero weights (as described in WEIGHT Statement). And, for a two-way table where there is no stratification, the analysis includes only those levels that occur with nonzero weight.

To include a variable level with no observations in the analysis, you can assign an extremely small weight (such as 1E-8) to an observation with that variable level. Then the analysis includes this variable level, but the statistic value remains unchanged because the weight is so small. For example, suppose you need to compute a kappa coefficient for data for two raters. One rater uses all possible ratings (say, 1, 2, 3, 4, and 5), but another rater uses only four of the available ratings (1, 2, 3, and 4). You can create an observation where the second rater uses the rating level 5, and assign it a weight of 1E-8. This forms a 5×5 square table for the analysis.

For n-way crosstabulation tables, consider the following example:

```proc freq;
tables a*b*c*d / cmh;
run; ```
The CMH option in the TABLES statement gives a stratified statistical analysis of the relationship between C and D, controlling for A and B. The stratified analysis provides a way to adjust for the possible confounding effects of A and B without being forced to estimate parameters for them. The analysis produces Cochran-Mantel-Haenszel statistics, and for 2×2 tables, it includes estimation of the common odds ratio, common relative risks, and the Breslow-Day test for homogeneity of the odds ratios.

Let the number of strata be denoted by , indexing the strata by . Each stratum contains a contingency table with X representing the row variable and Y representing the column variable. For table , denote the cell frequency in row and column by , with corresponding row and column marginal totals denoted by and and the overall stratum total by .

Because the formulas for the Cochran-Mantel-Haenszel statistics are more easily defined in terms of matrices, the following notation is used. Vectors are presumed to be column vectors unless they are transposed (′).

Assume that the strata are independent and that the marginal totals of each stratum are fixed. The null hypothesis, , is that there is no association between X and Y in any of the strata. The corresponding model is the multiple hypergeometric, which implies that under , the expected value and covariance matrix of the frequencies are, respectively,

and

where

and where denotes Kronecker product multiplication and is a diagonal matrix with elements of on the main diagonal.

The generalized CMH statistic (Landis, Heyman, and Koch 1978) is defined as

where

and where

is a matrix of fixed constants based on column scores and row scores . When the null hypothesis is true, the CMH statistic has an asymptotic chi-square distribution with degrees of freedom equal to the rank of . If is found to be singular, PROC FREQ displays a message and sets the value of the CMH statistic to missing.

PROC FREQ computes three CMH statistics using this formula for the generalized CMH statistic, with different row and column score definitions for each statistic. The CMH statistics that PROC FREQ computes are the correlation statistic, the ANOVA (row mean scores) statistic, and the general association statistic. These statistics test the null hypothesis of no association against different alternative hypotheses. The following sections describe the computation of these CMH statistics.

CAUTION:
CMH statistics have low power for detecting an association when the patterns of association for some of the strata are in the opposite direction of the patterns displayed by other strata. Thus, a nonsignificant CMH statistic suggests either that there is no association or that no pattern of association has enough strength or consistency to dominate any other pattern.

### Correlation Statistic

The correlation statistic, with one degree of freedom, was popularized by Mantel and Haenszel (1959) and Mantel (1963) and is therefore known as the Mantel-Haenszel statistic.

The alternative hypothesis is that there is a linear association between X and Y in at least one stratum. If either X or Y does not lie on an ordinal (or interval) scale, then this statistic is meaningless.

To compute the correlation statistic, PROC FREQ uses the formula for the generalized CMH statistic with the row and column scores determined by the SCORES= option in the TABLES statement. See Scores for more information on the available score types. The matrix of row scores has dimension , and the matrix of column scores has dimension .

When there is only one stratum, this CMH statistic reduces to , where is the Pearson correlation coefficient between X and Y. When you specify nonparametric (RANK, RIDIT, or MODRIDIT) scores, the statistic reduces to , where is the Spearman rank correlation coefficient between X and Y. When there is more than one stratum, then the CMH statistic becomes a stratum-adjusted correlation statistic.

### ANOVA (Row Mean Scores) Statistic

The ANOVA statistic can be used only when the column variable Y lies on an ordinal (or interval) scale so that the mean score of Y is meaningful. For the ANOVA statistic, the mean score is computed for each row of the table, and the alternative hypothesis is that, for at least one stratum, the mean scores of the rows are unequal. In other words, the statistic is sensitive to location differences among the distributions of Y.

The matrix of column scores has dimension , and the scores, one for each column, are specified in the SCORES= option. The matrix has dimension which PROC FREQ creates internally as

where is an identity matrix of rank , and is an vector of ones. This matrix has the effect of forming independent contrasts of the mean scores.

When there is only one stratum, this CMH statistic is essentially an analysis-of-variance (ANOVA) statistic in the sense that it is a function of the variance ratio F statistic that would be obtained from a one-way ANOVA on the dependent variable Y. If nonparametric scores are specified in this case, then the ANOVA statistic is a Kruskal-Wallis test.

If there is more than one stratum, then this CMH statistic corresponds to a stratum-adjusted ANOVA or Kruskal-Wallis test. In the special case where there is one subject per row and one subject per column in the contingency table of each stratum, then this CMH statistic is identical to Friedman's chi-square. See Computing Friedman's Chi-Square Statistic for an illustration.

### General Association Statistic

The alternative hypothesis for the general association statistic is that, for at least one stratum, there is some kind of association between X and Y. This statistic is always interpretable because it does not require an ordinal scale for either X or Y.

For the general association statistic, the matrix is the same as the one used for the ANOVA statistic. The matrix is defined similarly as

PROC FREQ generates both score matrices internally. When there is only one stratum, then the general association CMH statistic reduces to , where is the Pearson chi-square statistic. When there is more than one stratum, then the CMH statistic becomes a stratum-adjusted Pearson chi-square statistic. Note that a similar adjustment is made by summing the Pearson chi-squares across the strata. However, the latter statistic requires a large sample size in each stratum to support the resulting chi-square distribution with degrees of freedom. The CMH statistic requires only a large overall sample size because it has only degrees of freedom.

Refer to Cochran (1954); Mantel and Haenszel (1959); Mantel (1963); Birch (1965); and Landis et al. (1978).

### Adjusted Odds Ratio and Relative Risk Estimates

The CMH option provides adjusted odds ratio and relative risk estimates for stratified 2×2 tables. For each of these measures, PROC FREQ computes the Mantel-Haenszel estimate and the logit estimate. These estimates apply to n-way table requests in the TABLES statement, when the row and column variables both have only two levels. For example,

```proc freq;
tables a*b*c*d / cmh;
run;```
In this example, if the row and column variables C and D both have two levels, PROC FREQ provides odds ratio and relative risk estimates, adjusting for the confounding variables A and B.

The choice of an appropriate measure depends on the study design. For case-control (retrospective) studies, the odds ratio is appropriate. For cohort (prospective) or cross-sectional studies, the relative risk is appropriate. See Odds Ratio and Relative Risks for 2×2 Tables for more information on these measures.

Throughout this section, is the percentile of the standard normal distribution.

### Odds Ratio (Case-control Studies): Mantel-Haenszel Adjusted

The Mantel-Haenszel adjusted odds ratio estimator is given by

It is always computed unless the denominator is zero. Refer to Mantel and Haenszel (1959) and Agresti (1990).

Using the estimated variance for given by Robins et al. (1986), PROC FREQ computes the corresponding percent confidence bounds for the odds ratio as

where

Note that the Mantel-Haenszel odds ratio estimator is less sensitive to small than the logit estimator.

### Odds Ratio (Case-control Studies): Adjusted Logit

The adjusted logit odds ratio estimator (Woolf 1955) is given by

and the corresponding percent confidence bounds are

where is the odds ratio for stratum h, and

Refer to Woolf (1955)

If any cell frequency in a stratum is zero, then PROC FREQ adds 0.5 to each cell of the stratum before computing and (Haldane 1955), and displays a warning.

### Relative Risks (Cohort Studies)

The Mantel-Haenszel estimate of the common relative risk for column 1 is computed as

It is always computed unless the denominator is zero. Refer to Mantel and Haenszel (1959) and Agresti(1990).

Using the estimated variance for given by Greenland and Robins (1985), PROC FREQ computes the corresponding confidence percent bounds for the relative risk as

where

The adjusted logit estimate of the common relative risk for column 1 is computed as

and the corresponding percent confidence bounds are

where is the column 1 relative risk estimator for stratum h, and

If or is zero, then PROC FREQ adds 0.5 to each cell of the stratum before computing and , and displays a warning.

Refer to Kleinbaum, Kupper, and Morgenstern (1982, Sections 17.4, 17.5) and Breslow and Day (1994).

### Breslow-Day Test for Homogeneity of the Odds Ratios

When you specify the CMH option, PROC FREQ computes the Breslow-Day test for the stratified analysis of 2×2 tables. It tests the null hypothesis that the odds ratios from the strata are all equal. When the null hypothesis is true, the statistic has an asymptotic chi-square distribution with degrees of freedom.

The Breslow-Day statistic is computed as

where E and var denote expected value and variance, respectively. The summation does not include any tables with a zero row or column. If equals zero or if it is undefined, then PROC FREQ does not compute the statistic, and displays a warning message.

CAUTION:
Unlike the Cochran-Mantel-Haenszel statistics, the Breslow-Day test requires a large sample size within each stratum, and this limits its usefulness. In addition, the validity of the CMH tests does not depend on any assumption of homogeneity of the odds ratios, and therefore, the Breslow-Day test should never be used as an indicator of validity.

Refer to Breslow and Day (1993).

Exact statistics can be useful in situations where the asymptotic assumptions are not met, and so the asymptotic p-values are not close approximations for the true p-values. Standard asymptotic methods involve the assumption that the test statistic follows a particular distribution when the sample size is sufficiently large. When the sample size is not large, asymptotic results may not be valid, with the asymptotic p-values differing perhaps substantially from the exact p-values. Asymptotic results may also be unreliable when the distribution of the data is sparse, skewed, or heavily tied. Refer to Agresti (1996) and Bishop et al. (1975). Exact computations are based on the statistical theory of exact conditional inference for contingency tables, reviewed by Agresti (1992).

PROC FREQ provides exact p-values for the following tests for two-way tables: Pearson chi-square, likelihood-ratio chi-square, Mantel-Haenszel chi-square, Fisher's exact test, Jonckheere-Terpstra test, Cochran-Armitage test for trend, and McNemar 's test. PROC FREQ can also compute exact p-values for tests of hypotheses that the following statistics are equal to zero: Pearson correlation coefficient, Spearman correlation coefficient, simple kappa coefficient, and weighted kappa coefficient. Additionally, PROC FREQ can compute exact confidence bounds for the odds ratio for 2×2 tables. For one-way frequency tables, PROC FREQ provides the exact chi-square goodness-of-fit test (for equal proportions, or for proportions or frequencies that you specify). Also for one-way tables, PROC FREQ provides exact confidence bounds for the binomial proportion, and an exact test for the binomial proportion value.

The following sections summarize the computational algorithms, define the p-values that PROC FREQ computes, and discuss the computational resource requirements.

### Computational Algorithms

PROC FREQ computes exact p-values for general R×C tables using the network algorithm developed by Mehta and Patel (1983). This algorithm provides a substantial advantage over direct enumeration, which can be very time-consuming and feasible only for small problems. Refer to Agresti (1992) for a review of algorithms for computation of exact p-values, and refer to Mehta et al. (1984, 1991) for information on the performance of the network algorithm.

The reference set for a given contingency table is the set of all contingency tables with the observed marginal row and column sums. Corresponding to this reference set, the network algorithm forms a directed acyclic network consisting of nodes in a number of stages. A path through the network corresponds to a distinct table in the reference set. The distances between nodes are defined so that the total distance of a path through the network is the corresponding value of the test statistic. At each node, the algorithm computes the shortest and longest path distances for all the paths that pass through that node. For statistics that can be expressed as a linear combination of cell frequencies multiplied by increasing row and column scores, PROC FREQ computes shortest and longest path distances using the algorithm given in Agresti et al. (1990). For statistics of other forms, PROC FREQ computes an upper bound for the longest path and a lower bound for the shortest path following the approach of Valz and Thompson (1994).

The longest and shortest path distances or bounds for a node are compared to the value of the test statistic to determine whether all paths through the node contribute to the p-value, none of the paths through the node contribute to the p-value, or neither of these situations occur. If all paths through the node contribute, the p-value is incremented accordingly, and these paths are eliminated from further analysis. If no paths contribute, these paths are eliminated from the analysis. Otherwise, the algorithm continues, still processing this node and the associated paths. The algorithm finishes when all nodes have been accounted for, incrementing the p-value accordingly, or eliminated.

In applying the network algorithm, PROC FREQ uses full precision to represent all statistics, row and column scores, and other quantities involved in the computations. Although it is possible to use rounding to improve the speed and memory requirements of the algorithm, PROC FREQ does not do this because it can result in reduced accuracy of the p-values.

PROC FREQ computes exact confidence bounds for the odds ratio according to an iterative algorithm based on that presented by Thomas (1971). Refer also to Gart (1971). Because this is a discrete problem, the confidence coefficient is not exactly , but is at least . Thus, these confidence bounds are conservative.

For one-way tables, PROC FREQ computes the exact chi-square goodness-of-fit test by the method of Radlow and Alf (1975). PROC FREQ generates all possible one-way tables with the observed total sample size and number of categories. For each possible table, PROC FREQ compares its chi-square value with the value for the observed table. If the table's chi-square value is greater than or equal to the observed chi-square, PROC FREQ increments the exact p-value by the probability of that table, which is calculated under the null hypothesis using the multinomial frequency distribution. By default, the null hypothesis states that all categories have equal proportions. If you specify null hypothesis proportions or frequencies using the TESTP= or TESTF= option in the TABLES statement, then PROC FREQ calculates the exact chi-square test based on that null hypothesis.

For binomial proportions in one-way tables, PROC FREQ computes exact confidence bounds using the F distribution method given in Collett (1991) and also described by Leemis and Trivedi (1996). PROC FREQ computes the exact test for a binomial proportion by summing binomial probabilities over all alternatives. See Binomial Proportion for details. By default PROC FREQ uses as the null hypothesis proportion. Alternatively, you can specify the null hypothesis proportion with the P= option in the TABLES statement.

### Definition of p-Values

For several tests in PROC FREQ, the test statistic is nonnegative, and large values of the test statistic indicate a departure from the null hypothesis. Such tests include the Pearson chi-square, the likelihood-ratio chi-square, the Mantel-Haenszel chi-square, Fisher's exact test for tables larger than 2×2 tables, McNemar's test, and the one-way goodness-of-fit test. The exact p-value for these nondirectional tests is the sum of probabilities for those tables having a test statistic greater than or equal to the value of the observed test statistic.

There are other tests where it may be appropriate to test against either a one-sided or a two-sided alternative hypothesis. For example, when you test the null hypothesis that the true parameter value equals zero , the alternative of interest may be one-sided or two-sided . Such tests include the Pearson correlation coefficient, Spearman correlation coefficient, Jonckheere-Terpstra test, Cochran-Armitage test for trend, simple kappa coefficient, and weighted kappa coefficient. For these tests, PROC FREQ computes the right-sided p-value when the observed value of the test statistic is greater than its expected value. The right-sided p-value is the sum of probabilities for those tables having a test statistic greater than or equal to the observed test statistic. Otherwise, when the test statistic is less than or equal to its expected value, PROC FREQ computes the left-sided p-value. The left-sided p-value is the sum of probabilities for those tables having a test statistic less than or equal to the one observed. The one-sided p-value can be expressed as

where t is the observed value of the test statistic, and is the expected value of the test statistic under the null hypothesis. PROC FREQ computes the two-sided p-value as the sum of the one-sided p-value and the corresponding area in the opposite tail of the distribution of the statistic, equidistant from the expected value. The two-sided p-value can be expressed as

### Computational Resources

PROC FREQ uses relatively fast and efficient algorithms for exact computations. These recently developed algorithms, together with improvements in computer power, make it feasible now to perform exact computations for data sets where previously only asymptotic methods could be applied. Nevertheless, there are still large problems that may require a prohibitive amount of time and memory for exact computations, depending on the speed and memory available on your computer. For large problems, consider whether exact methods are really needed or whether asymptotic methods might give results quite close to the exact results, while requiring much less computer time and memory.

A formula does not exist that can determine in advance how much time or memory that PROC FREQ needs to compute an exact p-value for a certain problem. The time and memory requirements depend on several factors which include the test that is performed, the total sample size, the number of rows and columns, and the specific arrangement of the observations into table cells. Generally, larger problems (in terms of total sample size, number of rows, and number of columns) tend to require more time and memory. Additionally, for a fixed total sample size, time and memory requirements tend to increase as the number of rows and columns increases, because this corresponds to an increase in the number of tables in the reference set. Also for a fixed sample size, time and memory requirements increase as the marginal row and column totals become more homogeneous. Refer to Agresti et al. (1992) and Gail and Mantel (1977).

At any time while PROC FREQ computes exact p-values, you can terminate the computations by pressing the system interrupt key sequence (refer to the SAS Companion for your operating environment) and choosing to stop computations. After you terminate exact computations, PROC FREQ completes all other remaining tasks that the procedure specifies. The procedure produces the requested output, reporting missing values for any exact p-values that were not computed by the time of termination.

You can also use the MAXTIME= option in the EXACT statement to limit the amount of time PROC FREQ uses for exact computations. You specify a MAXTIME= value that is the maximum amount of time (in seconds) that PROC FREQ can use to compute an exact p-value. If PROC FREQ does not finish computing an exact p-value within that time, it terminates the computation and completes all other remaining tasks.