Chapter Contents |
Previous |
Next |

Robust Regression Examples |

SAS/IML has three subroutines
that can be used for outlier detection and robust regression.
The Least Median of Squares (LMS) and Least Trimmed
Squares (LTS) subroutines perform *robust regression*
(sometimes called *resistant regression*).
These subroutines are able to detect outliers and perform
a least-squares regression on the remaining observations.
The Minimum Volume Ellipsoid Estimation (MVE) subroutine
can be used to find the minimum volume ellipsoid estimator,
which is the location and robust covariance matrix that
can be used for constructing confidence regions and for
detecting multivariate outliers and leverage points.
Moreover, the MVE subroutine provides a table of
robust distances and classical Mahalanobis distances.
The LMS, LTS, and MVE subroutines and some other
robust estimation theories and methods were developed
by Rousseeuw (1984) and Rousseeuw and Leroy (1987).
Some statistical applications for MVE are
described in Rousseeuw and Van Zomeren (1990).

Whereas robust regression methods like L1
or Huber *M*-estimators reduce the influence of outliers
only (compared to least-squares or L2 regression),
resistant regression methods like LMS and LTS can
completely disregard influential outliers (sometimes
called *leverage points*) from the fit of the model.
The algorithms used in the LMS and LTS subroutines are
based on the PROGRESS program by Rousseeuw and Leroy (1987).
Rousseeuw and Hubert (1996) prepared a new version of
PROGRESS to facilitate its inclusion in SAS software,
and they have incorporated several recent developments.
Among other things, the new version of PROGRESS now yields the
exact LMS for simple regression, and the program uses a new
definition of the robust coefficient of determination (*R ^{2}*).
Therefore, the outputs may differ slightly from those
given in Rousseeuw and Leroy (1987) or those obtained
from software based on the older version of PROGRESS.
The MVE algorithm is based on the algorithm
used in the MINVOL program by Rousseeuw (1984).

The three SAS/IML subroutines are designed for

- LMS: minimizing the
*h*th ordered squared residual - LTS: minimizing the sum of the
*h*smallest squared residuals - MVE: minimizing the volume of an ellipsoid containing
*h*points

For each parameter vector **b** = (*b _{1}*, ... ,

- LMS
*h*=*N*/2+1, the*h*th quantile is the median of the squared residuals. The default*h*in PROGRESS is*h*= [ [(*N*+*n*+1)/2] ], which yields the breakdown value (where [*k*] denotes the integer part of*k*). - LTS
- MVE
The objective function for the MVE optimization problem is based on the

*h*th quantile*d*_{h:N}of the Mahalanobis-type distances**d**= (*d*, ... ,_{1}*d*_{N}),**C**is the scatter matrix estimate, and the Mahalanobis-type distances are computed as*T*is the location estimate.

Because of the nonsmooth form of these objective
functions, the estimates cannot be obtained
with traditional optimization algorithms.
For LMS and LTS, the algorithm, as in the PROGRESS program,
selects a number of subsets of *n* observations out of the
*N* given observations, evaluates the objective function,
and saves the subset with the lowest objective function.
As long as the problem size enables you to evaluate
all such subsets, the result is a global optimum.
If computer time does not permit you to evaluate all the
different subsets, a random collection of subsets is evaluated.
In such a case, you may not obtain the global optimum.

Note that the LMS, LTS, and MVE subroutines are
executed only when the number *N* of observations is
over twice the number *n* of explanatory variables
*x*_{j} (including the intercept), that is, if *N* > 2*n*.

Chapter Contents |
Previous |
Next |
Top |

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.