Chapter Contents Previous Next
 SAS/INSIGHT Software

## Multiple Comparison of Means

### Overview

Box plots allow you to examine means in different groups. Statistical questions you might have about the group means include

• Which underlying group means are likely to be different?
• Which group means are better than the mean of a standard group?
• Which group means are statistically indistinguishable from the best?
Multiple comparisons allows you to test for differences between means, and also to construct simultaneous confidence intervals for these differences.

All of the tests implemented in SAS/INSIGHT software are constructed assuming that the displayed variables are independent and normally distributed with identical variance. For details, see (Hsu, 1996).

### Example: Comparing High School GPAs

As an example, open the GPA data set. Suppose you want to compare the means of the high school mathematics, science, and English scores, and to test whether the true mean math score is the highest of the three. Choose Analyze:Box Plot/Mosaic Plot (Y). Assign HSM, HSS, and HSE the role of Y. To select a multiple comparison test, click on the Output button of the variables dialog. This displays the output options dialog. Select the Means and Multiple Comparison of Means check boxes. Click on the Multiple Comparison Options button to select a comparison test and a confidence level. For this example, choose Hsu's Test for Best, and accept the default 95% confidence level, then click OK in each dialog.

You will have to scroll down in the Box Plot Window to see the table that summarizes the results of Hsu's test. The significance (Alpha) level, degrees of freedom (DF), root mean square error (Root MSE), and a quantile value are displayed in the first row of the table. The subsequent columns specify the names or labels of the groups (also called categories or treatments) being compared, the difference between their means, the lower and upper bound of a confidence interval about the difference, and a p-value. The p-value is the simultaneously corrected significance level of a test for the difference between the associated means. A p-value of p indicates that 100(1-p)% is the maximum confidence level for which the confidence interval for the difference excludes zero.

For the current example, all of the confidence intervals whose endpoints appear in the columns marked Lower Limit and Upper Limit contain zero, so we cannot conclude that any of them are significantly different from the (unknown) true best at the 95% confidence level. The p-values are each slightly larger than 0.11 which indicates that we would have to drop our confidence level to about 88% before we could conclude any differences exist.

### Example: Comparing Carbon-Monoxide Concentrations

As a second example, open the AIR data set. Suppose you are interested in determining which days of the week have essentially the same average carbon-monoxide concentration. Assuming that the variances for each day are approximately the same (possibly not a valid assumption for this data set!) you can run the Tukey-Kramer test.

Figure 1.1: Multiple comparison of means

Choose Analyze:Box Plot/Mosaic Plot (Y). Assign CO to the role of Y and DAY the role of X. Click on the Method button, turn off the automatic sorting of X values, and click OK. Click on the Output button of the variables dialog. Click on the Multiple Comparison Options button, choose Tukey-Kramer All Pairs, and accept the default 95% confidence level, then click OK in each dialog. The resulting graphic and table indicate, for example, that the mean carbon-monoxide concentration on Monday is different than on Wednesday, Friday, Saturday, and Sunday (at the 95% confidence level). We cannot conclude that there is any difference between the average concentration on Monday, Tuesday, and Thursday. Now click on any comparison circle. The graphical output is recomputed for the new selected category.

### Multiple Comparison Tests

Each of the tests available in SAS/INSIGHT software is described below. In the descriptions that follow, k is the number of categories (i.e., the number of boxes in the box plot), ni is the number of observations for the ith category, is the true mean for the ith category, is the sample mean for the ith category, is the total degrees of freedom, and is the root mean square error, also known as the pooled standard deviation. Each test returns a table showing confidence intervals for the difference , , i = 1 ... k.

Figure 1.2: Multiple comparison options

The Pairwise t-test is not a true simultaneous comparison test, but rather uses a pairwise t-test to provide confidence interval about the difference between two means. These intervals have a half-width equal to . Although each confidence interval was computed at the level, the probability that all of your confidence intervals are correct simultaneously, is less than .The actual simultaneous confidence for the t-based intervals is approximately . For example, for 5 groups the actual simultaneous confidence for the t-based intervals is approximately only 75%.

The Tukey-Kramer method is a true "multiple comparison" test, appropriate when all pairwise comparisons are of interest; it is the default test used. The test is an exact -level test if the samples sizes are the same, and slightly conservative for unequal sample sizes. The confidence interval around the point-estimate has half-width . It is a common convention to report the quantity as the Tukey-Kramer quantile, rather than just q*.

The Pairwise Bonferroni method is also appropriate when all pairwise comparisons are of interest. It is conservative; that is, Bonferroni tests performed at nominal significance level of actually have a somewhat greater level of significance. The Bonferroni method uses the t-distribution, like the pairwise t-test, but returns smaller intervals with half-width .Note that the t probability (, since this is a two-sided test) is divided by the total number of pairwise comparisons (k(k-1)/2). The Bonferroni test produces wider confidence intervals than the Tukey-Kramer test. For example, on the AIR data set, running the Bonferroni test with a 95% confidence level does not allow you to infer that Monday's mean CO concentration is different than Wednesday's, whereas that inference was valid for the Tukey-Kramer test.

Dunnett's Test with Control is a two-sided multiple comparison method used to compare a set of categories to a control group. The quantile that scales the confidence interval is usually denoted |d|. If the ith confidence interval does not include zero, you may infer that the ith group is significantly different from the control. A control group may be a placebo or null treatment, or it might be a standard treatment. While the interactive nature of SAS/INSIGHT allows you to select any category to use as the basis of comparison in Dunnett's test, you should only select a category if it truly is the control group.

Hsu's Test for Best can be used to screen out group means which are statistically less than the (unknown) largest true mean. It forms non-symmetric confidence intervals around the difference between the largest sample mean and each of the others. If an interval does not properly contain zero in its interior, then you may infer that the associated group is not among the best.

Similary, Hsu's Test for Worst can be used to screen out group means which are statistically greater than the (unknown) smallest true mean. If an interval does not properly contain zero in its interior, then you may infer that the true mean of that group is not equal to the (unknown) the smallest true mean.

 Chapter Contents Previous Next Top