PROC PLS Statement
 PROC PLS < options > ;
You use the PROC PLS statement to invoke the PLS procedure and,
optionally, to indicate the analysis data and method. The following
options are available.
 DATA=SASdataset

names the SAS data set to be used by PROC PLS. The default
is the most recently created data set.
 METHOD=PLS < ( PLSoptions ) >
 METHOD=SIMPLS
 METHOD=PCR
 METHOD=RRR

specifies the general factor extraction method to be used. The value PLS
requests partial least squares, SIMPLS requests the SIMPLS method of
de Jong (1993), PCR requests principal components regression, and RRR
requests reduced rank regression. The default is METHOD=PLS. You can also
specify the following optional PLSoptions in parentheses
after METHOD=PLS:

ALGORITHM=NIPALS  SVD  EIG  RLGW

names the specific algorithm used to compute extracted PLS factors.
NIPALS requests the usual iterative NIPALS algorithm, SVD bases the
extraction on the singular value decomposition of X'Y, EIG bases the
extraction on the eigenvalue decomposition of Y'XX'Y, and RLGW is an
iterative approach that is efficient when there are many predictors
(Rnner et al. 1994). ALGORITHM=SVD is the most accurate but least efficient
approach; the default is ALGORITHM=NIPALS.

MAXITER=n

specifies the maximum number of iterations for the NIPALS and RLGW
algorithms. The default value is 200.

EPSILON=n

specifies the convergence criterion for the NIPALS and RLGW algorithms.
The default value is 10^{12}.
 CV=ONE
 CV=SPLIT < (n) >
 CV=BLOCK < (n) >
 CV=RANDOM < (cvrandomopts) >
 CV=TESTSET(SASdataset)

specifies the cross validation method to be used.
By default, no cross validation is
performed. The method CV=ONE requests oneatatime cross validation, CV=SPLIT
requests that every nth observation be excluded, CV=BLOCK
requests that blocks of n observations be excluded, CV=RANDOM requests
that observations be excluded at random, and CV=TESTSET(
SASdataset) specifies a test set of observations to be used for
validation (formally, this is called "test set validation" rather
than "cross validation"). You can, optionally, specify n for
CV=SPLIT and CV=BLOCK; the default is n=7. You can also specify the
following optional cvrandomoptions in parentheses after
the CV=RANDOM option:

NITER=n

specifies the number of random subsets to exclude. The default value is 10.

NTEST=n

specifies the number of observations in each random subset chosen for
exclusion. The default value is onetenth of the total number of
observations.

SEED=n

specifies the seed value for random number generation (the clock time
is used by default).
 CVTEST < (cvtestoptions) >

specifies that van der Voet's (1994) randomizationbased model
comparison test be performed to test models with different numbers of
extracted factors against the model that minimizes the predicted
residual sum of squares; see the "Cross Validation" section for more information.
You can also specify the following cvtestoptions in
parentheses after the CVTEST option:

PVAL=n

specifies the cutoff probability for declaring an insignificant
difference. The default value is 0.10.

STAT=teststatistic

specifies the test statistic for the model comparison. You can
specify either T2, for Hotelling's T^{2} statistic, or PRESS,
for the predicted residual sum of squares. The default value is T2.

NSAMP=n

specifies the number of randomizations to perform. The default value is
1000.

SEED=n

specifies the seed value for randomization generation (the clock time is
used by default).
 NFAC=n

specifies the number of factors to extract. The default is
min{15,p,N}, where p is the number of predictors (the number
of dependent variables for METHOD=RRR) and
N is the number of runs. This is probably more than you need for
most applications. Extracting too many factors can lead to an
overfit model, one that matches the training data too well,
sacrificing predictive ability. Thus, if you use the default NFAC=
specification, you should also either use the CV= option to select the
appropriate number of factors for the final model or consider the
analysis to be preliminary and examine the results to determine the
appropriate number of factors for a subsequent analysis.
 NOPRINT

suppresses the normal display of results.
This is useful when you want only the output
statistics saved in a data set.
Note that this option temporarily disables the Output
Delivery System (ODS); see Chapter 15, "Using the Output Delivery System," for more information.
 NOSCALE

suppresses scaling of the responses and predictors before fitting.
This is useful if the analysis variables are already centered and
scaled. See the "Centering and Scaling" section for more information.
 NOCENTER

suppresses centering of the responses and predictors before fitting.
This is useful if the analysis variables are already centered and
scaled. See the "Centering and Scaling" section for more information.
 NOCVSTDIZE

suppresses recentering and rescaling of the responses and predictors
before each model is fit in the cross validation. See
the "Centering and Scaling" section for more information.
 CENSCALE

lists the centering and scaling information for each response and
predictor.
 VARSCALE

specifies that continuous model variables should be centered and scaled
prior to centering and scaling the model effects in which they are
involved.
The rescaling specified by the VARSCALE option may be more
appropriate if the model involves cross products between
model variables; however, the VARSCALE option still may not
produce the model you expect.
See the "Centering and Scaling" section for more information.
 VARSS

lists, in addition to the average response and predictor sum of
squares accounted for by each successive factor, the amount of
variation accounted for in each response and predictor.
 DETAILS

lists the details of the fitted model for each successive factor. The
details listed are different for different extraction
methods: see the "Displayed Output" section for more information.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.