Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The PRINQUAL Procedure

Getting Started

In the following example, PROC PRINQUAL uses the MTV method. Suppose that the problem is to linearize a curve through three-dimensional space. Let

{\rm X}_1 & = & {\rm X}^3 \{\rm X}_2 & = & {\rm X}_1 - {\rm X}^5 \{\rm X}_3 & = & {\rm X}_2 - {\rm X}^6

where X = -1.00, -0.98, -0.96, ... , 1.00.

These three variables define a curve in three-dimensional space. The GPLOT procedure is used to display two-dimensional views of this curve. These data are completely described by three linear components, but they define a single curve, which could be described as a single nonlinear component.

PROC PRINQUAL is used to attempt to straighten the curve into a one-dimensional line with a continuous transformation of each variable. The N=1 option in the PROC PRINQUAL statement requests one principal component. The TRANSFORM statement requests a cubic spline transformation with nine knots. Splines are curves, which are usually required to be continuous and smooth. Splines are usually defined as piecewise polynomials of degree n with function values and first n-1 derivatives that agree at the points where they join. The abscissa values of the join points are called knots. The term "spline" is also used for polynomials (splines with no knots) and piecewise polynomials with more than one discontinuous derivative. Splines with no knots are generally smoother than splines with knots, which are generally smoother than splines with multiple discontinuous derivatives. Splines with few knots are generally smoother than splines with many knots; however, increasing the number of knots usually increases the fit of the spline function to the data. Knots give the curve freedom to bend to more closely follow the data. Refer to Smith (1979) for an excellent introduction to splines. For another example of using splines, see Example 65.1 in Chapter 65, "The TRANSREG Procedure."

One component accounts for 71 percent of the variance of the untransformed data, and after 50 iterations, over 98 percent of the variance of the transformed data is accounted for by one component (see Figure 53.2). The algorithm did not converge with 50 iterations, so more iterations may be needed for this problem.

PROC PRINQUAL creates an output data set (which is not displayed) that contains both the original and transformed variables. The original variables have the names X1, X2, and X3. Transformed variables are named TX1, TX2, and TX3. All observations in the output data set have _TYPE_='SCORE', since the CORRELATIONS option is not specified in the PROC PRINQUAL statement. The GPLOT procedure uses this output data set and displays the nonlinear transformations of all three variables and the nearly one-dimensional scatter plot (see Figure 53.3 and Figure 53.4).

PROC PRINQUAL tries to project each variable on the first principal component. Notice that the curve in this example is closer to a circle than to a function from some views (see the plot of X3 vs. X2 in Figure 53.1) and that the first component does not run approximately from one end point of the curve to the other (see Figure 53.4). Since the curve has these characteristics, PROC PRINQUAL linearizes the scatter plot by collapsing the scatter around the principal axis, not by straightening the curve into a single line. PROC PRINQUAL would straighten simpler curves.

The following statements produce Figure 53.1 through Figure 53.4:

   * Generate a Three-Dimensional Curve;
   data X;
      do X = -1 to 1 by 0.02;
         X1 =      X ** 3;
         X2 = X1 - X ** 5;
         X3 = X2 - X ** 6;
         output;
      end;
      drop X;
   run;

   goptions goutmode=replace nodisplay;
   %let opts = haxis=axis2 vaxis=axis1 frame cframe=ligr;
   * Depending on your goptions, these plot options may work better:
   * %let opts = haxis=axis2 vaxis=axis1 frame;

   proc gplot data=X;
      title;
      axis1 minor=none label=(angle=90 rotate=0)
            order=(-1 to 1);
      axis2 minor=none order=(-1 to 1);
      plot X1*X2 / &opts name='prqin1';
      plot X3*X2 / &opts name='prqin2' vreverse;
      plot X1*X3 / &opts name='prqin3';
      symbol1 color=blue;
   run; quit;

   goptions display;
   proc greplay nofs tc=sashelp.templt template=l2r2;
      igout gseg;
      treplay 1:prqin1 2:prqin2 3:prqin3;
   run; quit;

   * Try to Straighten the Curve;
   proc prinqual data=X n=1 maxiter=50 covariance;
      title 'Iteratively Derive Variable Transformations';
      transform spline(X1-X3 / nknots=9);
   run;

   * Plot the Transformations;
   goptions nodisplay;
   proc gplot;
      title;
      axis1 minor=none label=(angle=90 rotate=0);
      axis2 minor=none;
      plot TX1*X1 / &opts name='prqin4';
      plot TX2*X2 / &opts name='prqin5';
      plot TX3*X3 / &opts name='prqin6';
      symbol1 color=blue;
   run; quit;

   goptions display;
   proc greplay nofs tc=sashelp.templt template=l2r2;
      igout gseg;
      treplay 1:prqin4 2:prqin6 3:prqin5;
   run; quit;

   * Plot the Straightened Scatter Plot;
   goptions nodisplay;
   proc gplot;
      axis1 minor=none label=(angle=90 rotate=0)
            order=(-1 to 1);
      axis2 minor=none order=(-1 to 1);
      plot TX1*TX2 / &opts name='prqin7';
      plot TX3*TX2 / &opts name='prqin8' vreverse;
      plot TX1*TX3 / &opts name='prqin9';
      symbol1 color=blue;
   run; quit;

   goptions display;
   proc greplay nofs tc=sashelp.templt template=l2r2;
      igout gseg;
      treplay 1:prqin7 2:prqin8 3:prqin9;
   run; quit;

pqualg1.gif (4817 bytes)

Figure 53.1: Three-Dimensional Curve Example Output

Iteratively Derive Variable Transformations

The PRINQUAL Procedure

PRINQUAL MTV Algorithm Iteration History
Iteration
Number
Average
Change
Maximum
Change
Proportion
of Variance
Criterion
Change
Note
1 0.16253 1.33045 0.71369    
2 0.07871 0.94549 0.79035 0.07667  
3 0.06518 0.80219 0.86334 0.07299  
4 0.05322 0.57928 0.91379 0.05045  
5 0.04154 0.38404 0.94204 0.02825  
6 0.03181 0.24391 0.95640 0.01436  
7 0.02461 0.15397 0.96349 0.00709  
8 0.01982 0.10205 0.96704 0.00355  
9 0.01662 0.07393 0.96894 0.00189  
10 0.01439 0.06232 0.97005 0.00112  
11 0.01288 0.05436 0.97081 0.00075  
12 0.01189 0.04911 0.97139 0.00058  
13 0.01119 0.04531 0.97188 0.00049  
14 0.01068 0.04276 0.97232 0.00044  
15 0.01027 0.04115 0.97273 0.00041  
16 0.00993 0.04039 0.97313 0.00040  
17 0.00965 0.04249 0.97351 0.00038  
18 0.00940 0.04400 0.97388 0.00037  
19 0.00919 0.04509 0.97423 0.00036  
20 0.00900 0.04587 0.97458 0.00034  
21 0.00883 0.04643 0.97491 0.00033  
22 0.00867 0.04681 0.97523 0.00032  
23 0.00852 0.04705 0.97555 0.00031  
24 0.00839 0.04719 0.97585 0.00031  
25 0.00827 0.04724 0.97615 0.00030  
26 0.00816 0.04722 0.97644 0.00029  
27 0.00805 0.04713 0.97672 0.00028  
28 0.00795 0.04699 0.97700 0.00027  
29 0.00785 0.04680 0.97726 0.00027  
30 0.00776 0.04656 0.97752 0.00026  
31 0.00768 0.04629 0.97777 0.00025  
32 0.00760 0.04598 0.97802 0.00025  
33 0.00752 0.04564 0.97826 0.00024  
34 0.00745 0.04528 0.97849 0.00023  
35 0.00739 0.04489 0.97872 0.00023  
36 0.00733 0.04448 0.97894 0.00022  
37 0.00729 0.04405 0.97915 0.00022  
38 0.00724 0.04361 0.97936 0.00021  
39 0.00720 0.04315 0.97957 0.00021  
40 0.00716 0.04268 0.97977 0.00020  
41 0.00713 0.04219 0.97997 0.00020  
42 0.00709 0.04170 0.98016 0.00019  
43 0.00706 0.04120 0.98035 0.00019  
44 0.00703 0.04070 0.98054 0.00019  
45 0.00699 0.04019 0.98072 0.00018  
46 0.00696 0.03967 0.98090 0.00018  
47 0.00693 0.03916 0.98107 0.00017  
48 0.00690 0.03864 0.98124 0.00017  
49 0.00687 0.03812 0.98141 0.00017  
50 0.00684 0.03760 0.98158 0.00017 Not Converged

ERROR: Failed to converge.

Figure 53.2: PROC PRINQUAL MTV Iteration History

pqualg3.gif (6402 bytes)

Figure 53.3: Variable Transformation Plots

pqualg4.gif (5309 bytes)

Figure 53.4: Plots of the Nearly One-Dimensional Curve

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.