Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Robust Regression Examples

Example 9.1: LMS and LTS with Substantial Leverage Points: Hertzsprung-Russell Star Data

The following data are reported in Rousseeuw and Leroy (1987, p. 27) and are based on Humphrey (1978) and Vansina and De Greve (1982). The 47 observations correspond to the 47 stars of the CYG OB1 cluster in the direction of Cygnus. The regressor variable (column 2) x is the logarithm of the effective temperature at the surface of the star (Te), and the response variable (column 3) y is the logarithm of its light intensity (L / L0). The results for LS and LMS on page 28 of Rousseeuw and Leroy (1987) are based on a more precise (five decimal places) version of the data set. This data set is remarkable in that it contains four substantial leverage points (giant stars) corresponding to observations 11, 20, 30, and 34 that greatly affect the results of L2 and even L1 regression.

   ab =  { 1  4.37  5.23,   2  4.56  5.74,   3  4.26  4.93,
           4  4.56  5.74,   5  4.30  5.19,   6  4.46  5.46,
           7  3.84  4.65,   8  4.57  5.27,   9  4.26  5.57,
          10  4.37  5.12,  11  3.49  5.73,  12  4.43  5.45,
          13  4.48  5.42,  14  4.01  4.05,  15  4.29  4.26,
          16  4.42  4.58,  17  4.23  3.94,  18  4.42  4.18,
          19  4.23  4.18,  20  3.49  5.89,  21  4.29  4.38,
          22  4.29  4.22,  23  4.42  4.42,  24  4.49  4.85,
          25  4.38  5.02,  26  4.42  4.66,  27  4.29  4.66,
          28  4.38  4.90,  29  4.22  4.39,  30  3.48  6.05,
          31  4.38  4.42,  32  4.56  5.10,  33  4.45  5.22,
          34  3.49  6.29,  35  4.23  4.34,  36  4.62  5.62,
          37  4.53  5.10,  38  4.45  5.22,  39  4.53  5.18,
          40  4.43  5.57,  41  4.38  4.62,  42  4.45  5.06,
          43  4.50  5.34,  44  4.45  5.34,  45  4.55  5.54,
          46  4.45  4.98,  47  4.42  4.50 } ;

   a = ab[,2]; b = ab[,3];
The following code specifies that most of the output be printed.
   print "*** Hertzsprung-Russell Star Data: Do LMS ***";
      optn = j(8,1,.);
      optn[2]= 3;    /* ipri */
      optn[3]= 3;    /* ilsq */
      optn[8]= 3;    /* icov */
   call lms(sc,coef,wgt,optn,b,a);

Output 9.1.1: Some Simple Statistics
 
 
Median and Mean
  Median Mean
VAR1 4.42 4.31
Intercep 1 1
Response 5.1 5.0121276596
 
Dispersion and Standard Deviation
  Dispersion StdDev
VAR1 0.163086244 0.2908234187
Intercep 0 0
Response 0.6671709983 0.5712493409

Partial output for LS regression is shown in Output 9.1.2.

Output 9.1.2: Table of Unweighted LS Regression
 
Unweighted Least-Squares Estimation
LS Parameter Estimates
Variable Estimate Approx
Std Err
t Value Pr > |t| Lower WCI Upper WCI
VAR1 -0.4133039 0.28625748 -1.44 0.1557 -0.9743582 0.14775048
Intercep 6.7934673 1.23651563 5.49 <.0001 4.3699412 9.21699339
 
Sum of Squares = 14.346394626
 
Degrees of Freedom = 45
 
LS Scale Estimate = 0.5646315343
Cov Matrix of Parameter Estimates
  VAR1 Intercep
VAR1 0.0819433428 -0.353175807
Intercep -0.353175807 1.5289708954
 
R-squared = 0.0442737441
 
F(1,45) Statistic = 2.0846120667
 
Probability = 0.1557164396

Looking at the column Best Crit in the iteration history table, you see that, with complete enumeration, the optimal solution is found very early.

Output 9.1.3: History of Iteration Process
 
 
Complete Enumeration for LMS 

 
Subset Singular Best 
Criterion
Percent
271 5 0.392791 25
541 8 0.392791 50
811 27 0.392791 75
1081 45 0.392791 100
 
 
Minimum Criterion= 0.3927910898
 
Least Median of Squares (LMS) Method
 
Minimizing 25th Ordered Squared Residual.
 
Highest Possible Breakdown Value = 48.94 %
 
Selection of All 1081 Subsets of 2 Cases Out of 47
 
Among 1081 subsets 45 are singular.

Output 9.1.4: Results of Optimization
 
Observations of Best Subset
2 29

 
 
Estimated Coefficients
VAR1 Intercep
3.9705882353 -12.62794118
 
LMS Objective Function = 0.2620588235
 
Preliminary LMS Scale = 0.3987301586
 
Robust R Squared = 0.5813148789
 
Final LMS Scale = 0.3645644492

 
 

The output for WLS regression follows. Due to the size of the scaled residuals, six observations (with numbers 7, 9, 11, 20, 30, 34) were assigned zero weights in the following WLS analysis.

Output 9.1.5: Table of Weighted LS Regression
 
 
Weighted Least-Squares Estimation
RLS Parameter Estimates Based on LTS
Variable Estimate Approx 
Std Err
t Value Pr > |t| Lower WCI Upper WCI
VAR1 3.04615694 0.43733923 6.97 <.0001 2.18898779 3.90332608
Intercep -8.5000549 1.92630783 -4.41 <.0001 -12.275549 -4.7245609
 
 
Weighted Sum of Squares = 4.52819451
 
Degrees of Freedom = 39
 
RLS Scale Estimate = 0.3407455818
Cov Matrix of Parameter Estimates
  VAR1 Intercep
VAR1 0.1912656038 -0.842128459
Intercep -0.842128459 3.7106618752
 
 
Weighted R-squared = 0.5543573521
 
F(1,39) Statistic = 48.514065776
 
Probability = 2.3923178E-8
 
There are 41 points with nonzero weight.
 
Average Weight = 0.8723404255

 

The LTS regression leads to similar results:

   print "*** Hertzsprung-Russell Star Data: Do LTS ***";
      optn = j(8,1,.);
      optn[2]= 3;    /* ipri */
      optn[3]= 3;    /* ilsq */
      optn[8]= 3;    /* icov */
   call lts(sc,coef,wgt,optn,b,a);

Output 9.1.6: History of Iteration Process
 
Complete Enumeration for LTS
Subset Singular Best
Criterion
Percent
271 5 0.274268 25
541 8 0.274268 50
811 27 0.274268 75
1081 45 0.274268 100
 
Minimum Criterion= 0.274268243
 
 
 
Least Trimmed Squares (LTS) Method
 
Minimizing Sum of 25 Smallest Squared Residuals.
 
Highest Possible Breakdown Value = 48.94 %
 
Selection of All 1081 Subsets of 2 Cases Out of 47
 
Among 1081 subsets 45 are singular.

 

Output 9.1.7: Results of Optimization
 
Observations of Best Subset
22 36

 
 
Estimated Coefficients
VAR1 Intercep
4.2424242424 -13.72633939
 
LTS Objective Function = 0.1829838175
 
Preliminary LTS Scale = 0.4525412929
 
Robust R Squared = 0.4038660847
 
Final LTS Scale = 0.3743418666

 
 

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.