Chapter Contents |
Previous |
Next |

The FACTOR Procedure |

proc factor; run;

result in a principal component analysis. The output includes all the eigenvalues and the pattern matrix for eigenvalues greater than one.

Most applications require additional output.
For example, you may want to compute principal component
scores for use in subsequent analyses or obtain a
graphical aid to help decide how many components to keep.
You can save the results of the analysis in a
permanent SAS data library by using the OUTSTAT= option.
(Refer to the *SAS Language Reference: Dictionary*
for more information on permanent SAS data libraries and librefs.)
Assuming that your SAS data library has the libref
save and that the data are in a SAS data set called raw,
you could do a principal component analysis as follows:

proc factor data=raw method=principal scree mineigen=0 score outstat=save.fact_all; run;

The SCREE option produces a plot of the eigenvalues that is helpful in deciding how many components to use. The MINEIGEN=0 option causes all components with variance greater than zero to be retained. The SCORE option requests that scoring coefficients be computed. The OUTSTAT= option saves the results in a specially structured SAS data set. The name of the data set, in this case fact_all, is arbitrary. To compute principal component scores, use the SCORE procedure.

proc score data=raw score=save.fact_all out=save.scores; run;

The SCORE procedure uses the data and the scoring
coefficients that are saved in save.fact_all
to compute principal component scores.
The component scores are placed in variables
named Factor1, Factor2, ... , Factor*n*
and are saved in the data set save.scores.
If you know ahead of time how many principal components
you want to use, you can obtain the scores directly from
PROC FACTOR by specifying the NFACTORS= and OUT= options.
To get scores from three principal components, specify

proc factor data=raw method=principal nfactors=3 out=save.scores; run;

To plot the scores for the first three components, use the PLOT procedure.

proc plot; plot factor2*factor1 factor3*factor1 factor3*factor2; run;

proc factor data=raw method=principal scree mineigen=0 priors=smc outstat=save.fact_all; run;

The squared multiple correlations (SMC) of each variable with all the other variables are used as the prior communality estimates. If your correlation matrix is singular, you should specify PRIORS=MAX instead of PRIORS=SMC. The SCREE and MINEIGEN= options serve the same purpose as in the preceding principal component analysis. Saving the results with the OUTSTAT= option enables you to examine the eigenvalues and scree plot before deciding how many factors to rotate and to try several different rotations without re-extracting the factors. The OUTSTAT= data set is automatically marked TYPE=FACTOR, so the FACTOR procedure realizes that it contains statistics from a previous analysis instead of raw data.

After looking at the eigenvalues to estimate the number of factors, you can try some rotations. Two and three factors can be rotated with the following statements:

proc factor data=save.fact_all method=principal n=2 rotate=promax reorder score outstat=save.fact_2; proc factor data=save.fact_all method=principal n=3 rotate=promax reorder score outstat=save.fact_3; run;

The output data set from the previous run is used as input for these analyses. The options N=2 and N=3 specify the number of factors to be rotated. The specification ROTATE=PROMAX requests a promax rotation, which has the advantage of providing both orthogonal and oblique rotations with only one invocation of PROC FACTOR. The REORDER option causes the variables to be reordered in the output so that variables associated with the same factor appear next to each other.

You can now compute and plot factor scores for the two-factor promax-rotated solution as follows:

proc score data=raw score=save.fact_2 out=save.scores; proc plot; plot factor2*factor1; run;

The ML solution is equivalent to Rao's (1955) canonical factor solution and Howe's solution maximizing the determinant of the partial correlation matrix (Morrison 1976). Thus, as a descriptive method, ML factor analysis does not require a multivariate normal distribution. The validity of Bartlett's test for the number of factors does require approximate normality plus additional regularity conditions that are usually satisfied in practice (Geweke and Singleton 1980).

The ML method is more computationally demanding than principal factor analysis for two reasons. First, the communalities are estimated iteratively, and each iteration takes about as much computer time as principal factor analysis. The number of iterations typically ranges from about five to twenty. Second, if you want to extract different numbers of factors, as is often the case, you must run the FACTOR procedure once for each number of factors. Therefore, an ML analysis can take 100 times as long as a principal factor analysis.

You can use principal factor analysis to get a rough idea of the number of factors before doing an ML analysis. If you think that there are between one and three factors, you can use the following statements for the ML analysis:

proc factor data=raw method=ml n=1 outstat=save.fact1; run; proc factor data=raw method=ml n=2 rotate=promax outstat=save.fact2; run; proc factor data=raw method=ml n=3 rotate=promax outstat=save.fact3; run;

The output data sets can be used for trying different rotations, computing scoring coefficients, or restarting the procedure in case it does not converge within the allotted number of iterations.

The ML method cannot be used with a singular correlation matrix, and it is especially prone to Heywood cases. (See the section "Heywood Cases and Other Anomalies" for a discussion of Heywood cases.) If you have problems with ML, the best alternative is to use the METHOD=ULS option for unweighted least-squares factor analysis.

Chapter Contents |
Previous |
Next |
Top |

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.