## Canonical Discriminant Analysis

Canonical discriminant analysis is a dimension-reduction technique
related to principal component analysis and canonical correlation.
Given a classification variable and several interval variables,
canonical discriminant analysis
derives *canonical variables* (linear combinations of the
interval variables) that summarize between-class variation in much
the same way that principal components summarize total variation.
Given two or more groups of observations with measurements on several
interval variables, canonical discriminant analysis derives a linear
combination of the variables that has the highest possible multiple
correlation with the groups. This maximal multiple correlation is called
the first canonical correlation. The coefficients
of the linear combination are the canonical coefficients or
canonical weights. The variable defined by the linear combination is
the first canonical variable or canonical component.
The second canonical correlation is obtained by finding the linear
combination uncorrelated with the first canonical variable that has the
highest possible multiple correlation with the groups. The process of
extracting canonical variables can be repeated until the number of
canonical variables equals the number of original variables or the
number of classes minus one, whichever is smaller.

The first canonical correlation is at least as large as the multiple
correlation between the groups and any of the original variables.
If the original variables have high within-group
correlations, the first canonical correlation can be large even if all
the multiple correlations are small. In other words,
the first canonical variable can show substantial differences
among the classes, even if none of the original variables does.

For each canonical correlation, canonical discriminant analysis
tests the hypothesis that
it and all smaller canonical correlations are zero in the population.
An *F* approximation is used that gives
better small-sample results than the usual approximation.
The variables should have an approximate multivariate normal
distribution within each class, with a common covariance matrix
in order for the probability levels to be valid.

The new variables with canonical variable scores
in canonical discriminant analysis have either
pooled within-class variances equal to one (**Std Pooled Variance**)
or total-sample variances equal to one (**Std Total Variance**).
You specify the selection in the method options dialog
as shown in Figure 40.3.
By default, canonical variable scores have pooled within-class
variances equal to one.

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.