Chapter Contents |
Previous |
Next |

SAS/INSIGHT Software |

Box plots allow you to examine means in different groups. Statistical questions you might have about the group means include

- Which underlying group means are likely to be different?
- Which group means are better than the mean of a standard group?
- Which group means are statistically indistinguishable from the best?

All of the tests implemented in SAS/INSIGHT software are constructed assuming that the displayed variables are independent and normally distributed with identical variance. For details, see (Hsu, 1996).

As an example, open the GPA data set. Suppose you want to compare the means of the high school mathematics, science, and English scores, and to test whether the true mean math score is the highest of the three. Choose Analyze:Box Plot/Mosaic Plot (Y). Assign HSM, HSS, and HSE the role of Y. To select a multiple comparison test, click on the Output button of the variables dialog. This displays the output options dialog. Select the Means and Multiple Comparison of Means check boxes. Click on the Multiple Comparison Options button to select a comparison test and a confidence level. For this example, choose Hsu's Test for Best, and accept the default 95% confidence level, then click OK in each dialog.

You will have to scroll down in the Box Plot Window to see the table
that summarizes the results of Hsu's test. The significance
(Alpha) level,
degrees of freedom (DF), root mean square error
(Root MSE),
and a quantile value are displayed in the first row of the table. The
subsequent columns specify the names or labels of the groups
(also called categories or *treatments*)
being compared, the difference between their
means, the lower and upper bound of a confidence interval about the
difference, and a *p*-value.
The *p*-value is the simultaneously corrected significance level of
a test for the difference between the associated means. A *p*-value
of *p* indicates that 100(1-*p*)% is the maximum confidence level
for which the confidence interval for the difference excludes zero.

For the current example, all of the confidence intervals whose
endpoints appear in the columns marked Lower Limit and
Upper Limit contain zero, so we cannot conclude that any of
them are significantly different from the (unknown) true best at the
95%
confidence level. The *p*-values are each slightly larger than
0.11 which indicates that we would have to
drop our confidence level to about 88% before we could conclude any
differences exist.

As a second example, open the AIR data set. Suppose you are interested in determining which days of the week have essentially the same average carbon-monoxide concentration. Assuming that the variances for each day are approximately the same (possibly not a valid assumption for this data set!) you can run the Tukey-Kramer test.

Choose Analyze:Box Plot/Mosaic Plot (Y). Assign CO to the role of Y and DAY the role of X. Click on the Method button, turn off the automatic sorting of X values, and click OK. Click on the Output button of the variables dialog. Click on the Multiple Comparison Options button, choose Tukey-Kramer All Pairs, and accept the default 95% confidence level, then click OK in each dialog. The resulting graphic and table indicate, for example, that the mean carbon-monoxide concentration on Monday is different than on Wednesday, Friday, Saturday, and Sunday (at the 95% confidence level). We cannot conclude that there is any difference between the average concentration on Monday, Tuesday, and Thursday. Now click on any comparison circle. The graphical output is recomputed for the new selected category.

Each of the tests available in SAS/INSIGHT software is described below.
In the descriptions that follow,
*k* is the number of categories
(i.e., the number of boxes in the box plot),
*n*_{i} is the number of observations for the *i*th category,
is the true mean for the *i*th category, is the
sample mean for the *i*th category, is the
total degrees of freedom, and is the root mean square
error, also known as the pooled standard deviation. Each test returns
a table showing
confidence intervals for the difference
, , *i* = 1 ... *k*.

The Pairwise *t*-test is not a true simultaneous comparison
test, but rather uses a pairwise *t*-test to provide confidence interval
about the difference between two means. These intervals
have a half-width equal to . Although each confidence interval
was computed at the level, the probability that
all of your confidence intervals are correct *simultaneously*,
is less than .The actual simultaneous confidence for the *t*-based intervals is
approximately . For example, for 5 groups the
actual simultaneous confidence for the *t*-based intervals is
approximately only 75%.

The Tukey-Kramer method is a true "multiple comparison" test,
appropriate when all pairwise comparisons are of interest; it is
the default test used.
The test is an exact -level test if the samples sizes are
the same, and slightly conservative for unequal sample sizes.
The confidence interval around the point-estimate
has half-width . It is a common
convention to report the quantity as the Tukey-Kramer
quantile, rather than just *q ^{*}*.

The Pairwise Bonferroni method is also appropriate when all
pairwise comparisons are of interest. It is conservative; that is,
Bonferroni tests performed at nominal significance level of actually have a somewhat greater level of significance. The
Bonferroni method uses the *t*-distribution, like the pairwise
*t*-test, but returns smaller intervals with half-width
.Note that the *t* probability (, since this is a two-sided
test) is divided by the total number of pairwise comparisons
(*k*(*k*-1)/2). The Bonferroni test
produces wider confidence intervals than the Tukey-Kramer
test. For example, on the AIR data set, running the Bonferroni test
with a 95% confidence level does not allow you to infer that Monday's
mean CO concentration is different than Wednesday's, whereas
that inference was valid for the Tukey-Kramer test.

Dunnett's Test with Control is a two-sided multiple comparison
method used to
compare a set of categories to a control group. The quantile that
scales the confidence interval is usually denoted |*d*|. If the *i*th
confidence interval does not include zero, you may infer that the
*i*th group is significantly different from the control. A
control group may be a placebo or null treatment, or it might be a
standard treatment. While the interactive nature of SAS/INSIGHT
allows you to select any category to use as the basis of comparison in
Dunnett's test, you should only select a category if it truly is the
control group.

Hsu's Test for Best can be used to screen out group means which
are statistically less than the (unknown) largest true mean. It
forms *non-symmetric* confidence intervals around the difference
between the largest sample mean and each of the others. If an
interval does not properly contain zero in its interior, then you may
infer that the associated group is not among the best.

Similary, Hsu's Test for Worst can be used to screen out group means which are statistically greater than the (unknown) smallest true mean. If an interval does not properly contain zero in its interior, then you may infer that the true mean of that group is not equal to the (unknown) the smallest true mean.

Chapter Contents |
Previous |
Next |
Top |

Copyright © 1999 by SAS Insitute Inc., Cary, NC, USA. All rights reserved.