PROC DISCRIM Statement
 PROC DISCRIM < options > ;
This statement invokes the DISCRIM procedure.
You can specify the following options in the PROC DISCRIM statement.
Tasks


Options

Specify Input Data Set   DATA= 
   TESTDATA= 
Specify Output Data Set   OUTSTAT= 
   OUT= 
   OUTCROSS= 
   OUTD= 
   TESTOUT= 
   TESTOUTD= 
Discriminant Analysis   METHOD= 
   POOL= 
   SLPOOL= 
Nonparametric Methods   K= 
   R= 
   KERNEL= 
   METRIC= 
Tasks


Options

Classification Rule   THRESHOLD= 
Determine Singularity   SINGULAR= 
Canonical Discriminant Analysis   CANONICAL 
   CANPREFIX= 
   NCAN= 
Resubstitution Classification   LIST 
   LISTERR 
   NOCLASSIFY 
Cross Validation Classification   CROSSLIST 
   CROSSLISTERR 
   CROSSVALIDATE 
Test Data Classification   TESTLIST 
   TESTLISTERR 
Estimate Error Rate   POSTERR 
Control Displayed Output   
 Correlations   BCORR 
   PCORR 
   TCORR 
   WCORR 
 Covariances   BCOV 
   PCOV 
   TCOV 
   WCOV 
 SSCP Matrix   BSSCP 
   PSSCP 
   TSSCP 
   WSSCP 
 Miscellaneous   ALL 
   ANOVA 
   DISTANCE 
   MANOVA 
   SIMPLE 
   STDMEAN 
 Suppress output   NOPRINT 
   SHORT 
 ALL

activates all options that control displayed output.
When the derived classification criterion is used to classify
observations, the ALL option also activates the
POSTERR option.
 ANOVA

displays univariate statistics for testing the hypothesis that
the class means are equal in the population for each variable.
 BCORR

displays betweenclass correlations.
 BCOV

displays betweenclass covariances.
The betweenclass covariance matrix equals the betweenclass
SSCP matrix divided by n(c1)/c, where n is the number
of observations and c is the number of classes.
You should interpret the betweenclass covariances
in comparison with the totalsample and withinclass
covariances, not as formal estimates of population parameters.
 BSSCP

displays the betweenclass SSCP matrix.
 CANONICAL
 CAN

performs canonical discriminant analysis.
 CANPREFIX=name

specifies a prefix for naming the canonical variables.
By default, the names are Can1, Can2, ... , Cann.
If you specify CANPREFIX=ABC, the components
are named ABC1, ABC2, ABC3, and so on.
The number of characters in the prefix, plus
the number of digits required to designate the
canonical variables, should not exceed 32.
The prefix is truncated if the combined length exceeds 32.
The CANONICAL option is activated when you specify either
the NCAN= or the CANPREFIX= option.
A discriminant criterion is always derived in PROC DISCRIM.
If you want canonical discriminant analysis without the
use of discriminant criteria, you should use PROC CANDISC.
 CROSSLIST

displays the cross validation classification
results for each observation.
 CROSSLISTERR

displays the cross validation classification
results for misclassified observations only.
 CROSSVALIDATE

specifies the cross validation
classification of the input DATA= data set.
When a parametric method is used, PROC DISCRIM classifies each
observation in the DATA= data set using a discriminant
function computed from the other observations in the
DATA= data set, excluding the observation being classified.
When a nonparametric method is used, the covariance matrices used
to compute the distances are based on all observations in the
data set and do not exclude the observation being classified.
However, the observation being classified is excluded
from the nonparametric density estimation (if you specify the R=
option) or the k nearest neighbors
(if you specify the K= option) of that observation.
The CROSSVALIDATE option is set when you specify the CROSSLIST,
CROSSLISTERR, or OUTCROSS= option.
 DATA=SASdataset

specifies the data set to be analyzed.
The data set can be an ordinary SAS data set or one of several
specially structured data sets created by SAS/STAT procedures.
These specially structured data sets include
TYPE=CORR, TYPE=COV, TYPE=CSSCP, TYPE=SSCP,
TYPE=LINEAR, TYPE=QUAD, and TYPE=MIXED.
The input data set must be an ordinary
SAS data set if you specify METHOD=NPAR.
If you omit the DATA= option, the procedure
uses the most recently created SAS data set.
 DISTANCE
 MAHALANOBIS
displays the squared Mahalanobis distances between the group
means, F statistics, and the corresponding probabilities of
greater Mahalanobis squared distances between the group means.
The squared distances are based on the
specification of the POOL=
and METRIC= options.
 K=k

specifies a k value for the knearestneighbor rule.
An observation x is classified into a group based on
the information from the k nearest neighbors of x.
Do not specify both the K= and R= options.
 KERNEL=BIWEIGHT  BIW
 KERNEL=EPANECHNIKOV  EPA
 KERNEL=NORMAL  NOR
 KERNEL=TRIWEIGHT  TRI
 KERNEL=UNIFORM  UNI

specifies a kernel density to estimate the groupspecific densities.
You can specify the KERNEL= option only when the
R= option is specified.
The default is KERNEL=UNIFORM.
 LIST

displays the resubstitution classification
results for each observation.
You can specify this option only when the
input data set is an ordinary SAS data set.
 LISTERR

displays the resubstitution classification
results for misclassified observations only.
You can specify this option only when the
input data set is an ordinary SAS data set.
 MANOVA

displays multivariate statistics for testing the hypothesis
that the class means are equal in the population.
 METHOD=NORMAL  NPAR

determines the method to use in
deriving the classification criterion.
When you specify METHOD=NORMAL, a parametric method based
on a multivariate normal distribution within each class is
used to derive a linear or quadratic discriminant function.
The default is METHOD=NORMAL.
When you specify METHOD=NPAR, a nonparametric method
is used and you must also specify either the
K= or R= option.
 METRIC=DIAGONAL  FULL  IDENTITY

specifies the metric in which the computations
of squared distances are performed.
If you specify METRIC=FULL, PROC DISCRIM uses either the pooled covariance matrix
(POOL=YES) or individual withingroup covariance
matrices (POOL=NO) to compute the squared distances.
If you specify METRIC=DIAGONAL, PROC DISCRIM uses either the diagonal matrix of
the pooled covariance matrix (POOL=YES) or diagonal
matrices of individual withingroup covariance
matrices (POOL=NO) to compute the squared distances.
If you specify METRIC=IDENTITY, PROC DISCRIM uses Euclidean distance.
The default is METRIC=FULL.
When you specify METHOD=NORMAL, the option METRIC=FULL is used.
 NCAN=number

specifies the number of canonical variables to compute.
The value of number must be less than
or equal to the number of variables.
If you specify the option NCAN=0, the procedure
displays the canonical correlations but not the
canonical coefficients, structures, or means.
Let v be the number of variables in the VAR
statement and c be the number of classes.
If you omit the NCAN= option, only
min(v, c1) canonical variables are generated.
If you request an output data set (OUT=, OUTCROSS=, TESTOUT=),
v canonical variables are generated.
In this case, the last v(c1)
canonical variables have missing values.
The CANONICAL option is activated when you specify either
the NCAN= or the CANPREFIX= option.
A discriminant criterion is always derived in PROC DISCRIM.
If you want canonical discriminant analysis without the
use of discriminant criterion, you should use PROC CANDISC.
 NOCLASSIFY

suppresses the resubstitution
classification of the input DATA= data set.
You can specify this option only when the
input data set is an ordinary SAS data set.
 NOPRINT

suppresses the normal display of results. Note that this option
temporarily disables the Output Delivery System
(ODS); see Chapter 15, "Using the Output Delivery System," for more information.
 OUT=SASdataset

creates an output SAS data set containing all the data from the
DATA= data set, plus the posterior probabilities and the class
into which each observation is classified by resubstitution.
When you specify the CANONICAL option, the data set also
contains new variables with canonical variable scores.
See the "OUT= Data Set" section.
 OUTCROSS=SASdataset

creates an output SAS data set containing all the data from the
DATA= data set, plus the posterior probabilities and the class
into which each observation is classified by cross validation.
When you specify the CANONICAL option, the data set also
contains new variables with canonical variable scores.
See the "OUT= Data Set" section.
 OUTD=SASdataset

creates an output SAS data set containing all the
data from the DATA= data set, plus the groupspecific
density estimates for each observation.
See the "OUT= Data Set" section.
 OUTSTAT=SASdataset

creates an output SAS data set containing various statistics
such as means, standard deviations, and correlations.
When the input data set is an ordinary SAS data set or when
TYPE=CORR, TYPE=COV, TYPE=CSSCP, or TYPE=SSCP, this
option can be used to generate discriminant statistics.
When you specify the CANONICAL option, canonical correlations,
canonical structures, canonical coefficients, and means of
canonical variables for each class are included in the data set.
If you specify METHOD=NORMAL, the output data set also
includes coefficients of the discriminant functions,
and the output data set is TYPE=LINEAR (POOL=YES),
TYPE=QUAD (POOL=NO), or TYPE=MIXED (POOL=TEST).
If you specify METHOD=NPAR, this output data set is TYPE=CORR.
This data set also holds calibration information
that can be used to classify new observations.
See the "Saving and Using Calibration Information" section and the "OUT= Data Set" section.
 PCORR

displays pooled withinclass correlations.
 PCOV

displays pooled withinclass covariances.
 POOL=NO  TEST  YES

determines whether the pooled or withingroup covariance
matrix is the basis of the measure of the squared distance.
If you specify POOL=YES, PROC DISCRIM uses the pooled covariance matrix in
calculating the (generalized) squared distances.
Linear discriminant functions are computed.
If you specify POOL=NO, the procedure uses the individual withingroup
covariance matrices in calculating the distances.
Quadratic discriminant functions are computed.
The default is POOL=YES.
When you specify METHOD=NORMAL, the option POOL=TEST requests
Bartlett's modification of the likelihood ratio
test (Morrison 1976; Anderson 1984) of the
homogeneity of the withingroup covariance matrices.
The test is unbiased (Perlman 1980).
However, it is not robust to nonnormality.
If the test statistic is significant at the level specified by
the SLPOOL= option, the withingroup covariance matrices are used.
Otherwise, the pooled covariance matrix is used.
The discriminant function coefficients are displayed
only when the pooled covariance matrix is used.
 POSTERR

displays the posterior probability errorrate estimates of the
classification criterion based on the classification results.
 PSSCP

displays the pooled withinclass corrected SSCP matrix.
 R=r

specifies a radius r value for kernel density estimation.
With uniform, Epanechnikov, biweight, or triweight
kernels, an observation x is classified into a
group based on the information from observations y
in the training set within the radius r of x,
that is, the group t observations y with squared
distance .When a normal kernel is used, the classification
of an observation x is based on the
information of the estimated groupspecific
densities from all observations in the training set.
The matrix r^{2} V_{t} is used as the group t covariance
matrix in the normalkernel density, where V_{t}
is the matrix used in calculating the squared distances.
Do not specify both the K= and R= options.
For more information on selecting r, see
the "Nonparametric Methods" section.
 SHORT

suppresses the display of certain items in the default output.
If you specify METHOD= NORMAL, PROC DISCRIM suppresses the display of
determinants, generalized squared distances betweenclass
means, and discriminant function coefficients.
When you specify the CANONICAL option, PROC DISCRIM suppresses
the display of canonical structures, canonical coefficients,
and class means on canonical variables; only tables of
canonical correlations are displayed.
 SIMPLE

displays simple descriptive statistics for
the total sample and within each class.
 SINGULAR=p

specifies the criterion for determining
the singularity of a matrix, where 0<p<1.
The default is SINGULAR=1E8.
Let S be the totalsample correlation matrix.
If the R^{2} for predicting a quantitative variable
in the VAR statement from the variables preceding
it exceeds 1p, then S is considered singular.
If S is singular, the probability levels for the
multivariate test statistics and canonical correlations are
adjusted for the number of variables with R^{2} exceeding 1p.
Let S_{t} be the group t covariance matrix
and S_{p} be the pooled covariance matrix.
In group t, if the R^{2} for predicting a quantitative
variable in the VAR statement from the variables preceding
it exceeds 1p, then S_{t} is considered singular.
Similarly, if the partial R^{2} for predicting a quantitative
variable in the VAR statement from the variables preceding it,
after controlling for the effect of the CLASS variable,
exceeds 1p, then S_{p} is considered singular.
If PROC DISCRIM needs to compute either the inverse or the determinant
of a matrix that is considered singular, then it uses a quasiinverse
or a quasideterminant.
For details, see the "QuasiInverse" section.
 SLPOOL=p

specifies the significance level for the test of homogeneity.
You can specify the SLPOOL= option only when POOL=TEST is also specified.
If you specify POOL= TEST but omit the SLPOOL= option,
PROC DISCRIM uses 0.10 as the significance level for the test.
 STDMEAN

displays totalsample and pooled
withinclass standardized class means.
 TCORR

displays totalsample correlations.
 TCOV

displays totalsample covariances.
 TESTDATA=SASdataset

names an ordinary SAS data set with
observations that are to be classified.
The quantitative variable names in this data
set must match those in the DATA= data set.
When you specify the TESTDATA= option, you can also specify
the TESTCLASS,
TESTFREQ, and
TESTID statements.
When you specify the TESTDATA= option, you can use the
TESTOUT=
and TESTOUTD= options to generate
classification results and groupspecific density
estimates for observations in the test data set.
 TESTLIST

lists classification results for all
observations in the TESTDATA= data set.
 TESTLISTERR

lists only misclassified observations in the
TESTDATA=
data set but only if a TESTCLASS statement is also used.
 TESTOUT=SASdataset

creates an output SAS data set containing all the data from
the TESTDATA= data set, plus the posterior probabilities
and the class into which each observation is classified.
When you specify the CANONICAL option, the data set also
contains new variables with canonical variable scores.
See the "OUT= Data Set" section.
 TESTOUTD=SASdataset

creates an output SAS data set containing all
the data from the TESTDATA= data set, plus the
groupspecific density estimates for each observation.
See the "OUT= Data Set" section.
 THRESHOLD=p

specifies the minimum acceptable posterior
probability for classification, where .If the largest posterior probability of group
membership is less than the THRESHOLD value,
the observation is classified into group OTHER.
The default is THRESHOLD=0.
 TSSCP

displays the totalsample corrected SSCP matrix.
 WCORR

displays withinclass correlations for each class level.
 WCOV

displays withinclass covariances for each class level.
 WSSCP

displays the withinclass corrected SSCP matrix for each class level.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.