Example 64.2: Spline Model With HigherOrder Penalty
The following example continues the analysis of
the data set Measure to illustrate how you can use
PROC TPSPLINE to fit a
spline model with a higherorder penalty term. Spline models with
highorder penalty terms move loworder polynomial terms into the null
space. Hence, there is no penalty for these terms, and they can vary
without constraint.
As shown in the previous analyses, the final model for the data set
Measure must include quadratic terms for both x1 and
x2. This example fits the following model:
The model includes quadratic terms for both variables, although it differs
from the usual linear model. The nonparametric term
g(x_{1},x_{2}) explains the variation of the data unaccounted
for by a simple quadratic surface.
To modify the order of the derivative in the penalty term, specify the
M= option. The following statements specify the option
M=3 in order to include
the quadratic terms in the null space:
data measure;
set measure;
x1sq = x1*x1;
x2sq = x2*x2;
x1x2 = x1*x2;
;
proc tpspline data= measure;
model y = (x1 x2) /m=3;
score data = pred
out = predy;
run;
The output resulting from these statements is displayed in Output 64.2.1.
Output 64.2.1: Output from PROC TPSPLINE with M=3
The TPSPLINE Procedure 
Dependent Variable: y 
Summary of Input Data Set 
Number of NonMissing Observations 
50 
Number of Missing Observations 
0 
Unique Smoothing Design Points 
25 
Summary of Final Model 
Number of Regression Variables 
0 
Number of Smoothing Variables 
2 
Order of Derivative in the Penalty 
3 
Dimension of Polynomial Space 
6 
Summary Statistics of Final Estimation 
log10(n*Lambda) 
3.783100 
Smoothing Penalty 
2092.449492 
Residual SS 
0.273146 
Tr(IA) 
29.171609 
Model DF 
20.828391 
Standard Deviation 
0.096765 

The model contains six terms in the null space. Compare
Output 64.2.1 with Figure 64.2: the LOGNLAMBDA value and
the smoothing penalty differ significantly.
Note that, in general, these terms are not directly comparable
for different models.
The final estimate
based on this model is close to the estimate based on the model using
the default, M=2.
In the following statements, the REG procedure fits
a quadratic surface model to the data set Measure.
proc reg data= measure;
model y = x1 x1sq x2 x2sq x1x2;
run;
The results are displayed in Output 64.2.2.
Output 64.2.2: Quadratic Surface Model: The REG Procedure
The REG Procedure 
Model: MODEL1 
Dependent Variable: y 
Analysis of Variance 
Source 
DF 
Sum of Squares 
Mean Square 
F Value 
Pr > F 
Model 
5 
443.20502 
88.64100 
436.33 
<.0001 
Error 
44 
8.93874 
0.20315 


Corrected Total 
49 
452.14376 



Root MSE 
0.45073 
RSquare 
0.9802 
Dependent Mean 
15.08548 
Adj RSq 
0.9780 
Coeff Var 
2.98781 


Parameter Estimates 
Variable 
DF 
Parameter Estimate 
Standard Error 
t Value 
Pr > t 
Intercept 
1 
14.90834 
0.12519 
119.09 
<.0001 
x1 
1 
0.01292 
0.09015 
0.14 
0.8867 
x1sq 
1 
4.85194 
0.15237 
31.84 
<.0001 
x2 
1 
0.02618 
0.09015 
0.29 
0.7729 
x2sq 
1 
5.20624 
0.15237 
34.17 
<.0001 
x1x2 
1 
0.04814 
0.12748 
0.38 
0.7076 

The REG procedure produces slightly different results. To fit a
similar model with PROC TPSPLINE, you can use a MODEL
statement specifying the degrees of freedom with the DF= option. You
can also use a large value for the LOGNLAMBDA0= option to force
a parametric model fit.
Because there is one degree of freedom for each
of the following terms, Intercept, x1, x2, x1sq, x2sq, and
x1x2, the DF=6 option is used.
proc TPSPLINE data=measure;
model y=(x1 x2) /m=3 df=6 lognlambda=(4 to 1 by 0.2);
score data = pred
out = predy;
run;
The results are displayed in Output 64.2.3. PROC TPSPLINE displays
the list of GCV values for comparison.
Output 64.2.3: Output from PROC TPSPLINE Using M=3 and DF=6
The TPSPLINE Procedure 
Dependent Variable: y 
Summary of Input Data Set 
Number of NonMissing Observations 
50 
Number of Missing Observations 
0 
Unique Smoothing Design Points 
25 
Summary of Final Model 
Number of Regression Variables 
0 
Number of Smoothing Variables 
2 
Order of Derivative in the Penalty 
3 
Dimension of Polynomial Space 
6 
GCV Function 
log10(n*Lambda) 
GCV 

4.000000 
0.016330 

3.800000 
0.016051 
* 
3.600000 
0.016363 

3.400000 
0.017770 

3.200000 
0.021071 

3.000000 
0.027496 

2.800000 
0.038707 

2.600000 
0.056292 

2.400000 
0.080613 

2.200000 
0.109714 

2.000000 
0.139642 

1.800000 
0.166338 

1.600000 
0.187437 

1.400000 
0.202625 

1.200000 
0.212871 

1.000000 
0.219512 

0.800000 
0.223727 

0.600000 
0.226377 

0.400000 
0.228041 

0.200000 
0.229085 

0 
0.229740 

0.200000 
0.230153 

0.400000 
0.230413 

0.600000 
0.230576 

0.800000 
0.230680 

1.000000 
0.230745 

Note: * indicates minimum GCV value. 

The TPSPLINE Procedure 
Dependent Variable: y 
Summary Statistics of Final Estimation 
log10(n*Lambda) 
2.382955 
Smoothing Penalty 
0.000000646 
Residual SS 
8.938429 
Tr(IA) 
43.999670 
Model DF 
6.000330 
Standard Deviation 
0.450719 

The final estimate is based on 6.000330 degrees of freedom
because there are already 6 degrees of freedom in the null space
and the search range for lambda is not large enough
(in this case, setting DF=6 is equivalent to setting lambda = ).
The standard deviation and RSS (Output 64.2.3) are close to the
sum of squares for the error term and the root MSE from the
the linear regression model (Output 64.2.2), respectively.
For this model, the optimal LOGNLAMBDA is around 3.8, which produces
a standard deviation estimate of 0.096765 (see Output 64.2.1) and a
GCV
value of 0.016051, while the model specifying DF=6
results in a LOGNLAMBDA larger than 1 and a GCV
value larger than 0.23074.
The nonparametric model, based on the GCV, should provide better
prediction, but the linear regression model can be more easily interpreted.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.