## The Penalized Least Squares Estimate

Penalized least squares estimates provide a way to balance fitting the
data closely and avoiding excessive roughness or rapid variation. A
penalized least squares estimate is a surface that minimizes the
penalized least squares over the class of all surfaces satisfying
sufficient regularity conditions.

Define **x**_{i} as a *d*-dimensional covariate
vector, **z**_{i} as a *p*-dimensional covariate
vector, and *y*_{i} as the observation associated with
(**x**_{i}, **z**_{i}). Assuming that the relation
between **z**_{i} and *y*_{i} is linear but the relation
between **x**_{i} and *y*_{i} is unknown,
you can fit the data using a semiparametric model as follows:

where *f* is an unknown function that is
assumed to be reasonably smooth, are independent, zero-mean random errors,
and is a *p*-dimensional unknown parametric
vector.
This model consists of two parts.
The is the parametric part of the
model, and the **z**_{i} are the regression variables. The
*f*(**x**_{i}) is the nonparametric part of the model,
and the **x**_{i} are the smoothing variables.

The ordinary least squares method estimates *f*(*x*_{i}) and
by minimizing the quantity:

However, the functional space of *f*(*x*) is so large
that
you can always find a function *f* that interpolates
the data points. In order to obtain an estimate that
fits the data well and has some degree of smoothness,
you can use the penalized least squares method.

The penalized least squares function is defined as

where *J*_{2}(*f*) is the penalty on the roughness of *f* and
is defined, in most cases, as the integral of the square of the
second derivative of *f*.
The first term measures the goodness of fit and
the second term measures the smoothness associated with
*f*. The term is the smoothing parameter, which governs
the tradeoff between smoothness and goodness of fit. When
is large, it heavily penalizes estimates
with large second derivatives. Conversely, a small value
of puts more emphasis on the goodness of fit.

The estimate is selected from a reproducing
kernel Hilbert space, and it can be represented as a linear
combination of a sequence of basis functions. Hence, the
final estimates of *f* can be written as

where *B*_{j} is the basis function, which depends on where
the data **x**_{j} is located, and and are the coefficients that need to be estimated.

For a fixed , the coefficients can be estimated by solving an *n*×*n* system.

The smoothing parameter can be chosen by minimizing the
generalized cross validation (GCV) function.

If you write

then is referred to as the *hat* or *smoothing* matrix,
and the GCV function is defined as

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.