Chapter Contents |
Previous |
Next |

The STEPDISC Procedure |

The STEPDISC procedure displays the following output:

- Class Level Information, including the values of the classification variable, the Frequency of each value, the Weight of each value, and the Proportion of each value in the total sample

- Within-Class SSCP Matrices for each group
- Pooled Within-Class SSCP Matrix
- Between-Class SSCP Matrix
- Total-Sample SSCP Matrix
- Within-Class Covariance Matrices for each group
- Pooled Within-Class Covariance Matrix
- Between-Class Covariance Matrix,
equal to the between-class SSCP matrix divided by
*n*(*c*-1)/*c*, where*n*is the number of observations and*c*is the number of classes - Total-Sample Covariance Matrix
- Within-Class Correlation Coefficients and to test the hypothesis that the within-class population correlation coefficients are zero
- Pooled Within-Class Correlation Coefficients and to test the hypothesis that the partial population correlation coefficients are zero
- Between-Class Correlation Coefficients and to test the hypothesis that the between-class population correlation coefficients are zero
- Total-Sample Correlation Coefficients and to test the hypothesis that the total population correlation coefficients are zero
- descriptive Simple Statistics including
*N*(the number of observations), Sum, Mean, Variance, and Standard Deviation for the total sample and within each class - Total-Sample Standardized Class Means, obtained by subtracting the grand mean from each class mean and dividing by the total-sample standard deviation
- Pooled Within-Class Standardized Class Means, obtained by subtracting the grand mean from each class mean and dividing by the pooled within-class standard deviation

- for each variable considered for entry or removal:
Partial R-Square, the squared (partial) correlation,
the
*F*statistic, and Pr >*F*, the probability level, from a one-way analysis of covariance - the minimum Tolerance for entering each variable. A variable is
entered only if its tolerance and the tolerances for all variables
already in the model are greater than the value specified in the
SINGULAR= option. The tolerance for the entering variable is 1 -
*R*from regressing the entering variable on the other variables already in the model. The tolerance for a variable already in the model is 1 -^{2}*R*from regressing that variable on the entering variable and the other variables already in the model. With^{2}*m*variables already in the model, for each entering variable,*m*+ 1 multiple regressions are performed using the entering variable and each of the*m*variables already in the model as a dependent variable. These*m*+ 1 tolerances are computed for each entering variable, and the minimum tolerance is displayed for each.The tolerance is computed using the total-sample correlation matrix. It is customary to compute tolerance using the pooled within-class correlation matrix (Jennrich 1977), but it is possible for a variable with excellent discriminatory power to have a high total-sample tolerance and a low pooled within-class tolerance. For example, PROC STEPDISC enters a variable that yields perfect discrimination (that is, produces a canonical correlation of one), but a program using pooled within-class tolerance does not.

- the variable Label, if any
- the name of the variable chosen
- the variables already selected or removed
- Wilks' Lambda and the associated
*F*approximation with degrees of freedom and Pr <*F*, the associated probability level after the selected variable has been entered or removed. Wilks' lambda is the likelihood ratio statistic for testing the hypothesis that the means of the classes on the selected variables are equal in the population (see the "Multivariate Tests" section in Chapter 3, "Introduction to Regression Procedures"). Lambda is close to zero if any two groups are well separated. - Pillai's Trace and the associated
*F*approximation with degrees of freedom and Pr >*F*, the associated probability level after the selected variable has been entered or removed. Pillai's trace is a multivariate statistic for testing the hypothesis that the means of the classes on the selected variables are equal in the population (see the "Multivariate Tests" section in Chapter 3). - Average Squared Canonical Correlation (ASCC). The ASCC is Pillai's trace divided by the number of groups minus 1. The ASCC is close to 1 if all groups are well separated and if all or most directions in the discriminant space show good separation for at least two groups.
- Summary to give statistics
associated with the variable chosen at each step.
The summary includes the following:
- -
- Step number
- -
- Variable Entered or Removed
- -
- Number In, the number of variables in the model
- -
- Partial R-Square
- -
- the
*F*Value for entering or removing the variable - -
- Pr >
*F*, the probability level for the*F*statistic - -
- Wilks' Lambda
- -
- Pr < Lambda based on the
*F*approximation to Wilks' Lambda - -
- Average Squared Canonical Correlation
- -
- Pr > ASCC based on the
*F*approximation to Pillai's trace - -
- the variable Label, if any

Chapter Contents |
Previous |
Next |
Top |

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.