Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Robust Regression Examples

Example 9.5: MVE: Stackloss Data

This example analyzes the three regressors of Brownlee's (1965) stackloss data. By default, the MVE subroutine, like the MINVOL subroutine, tries only 2000 randomly selected subsets in its search. There are, in total, 5985 subsets of 4 cases out of 21 cases.

   title2 "***MVE for Stackloss Data***";
   title3 "*** Use All Subsets***";
      a = aa[,2:4];
      optn = j(8,1,.);
      optn[1]= 2;              /* ipri */
      optn[2]= 1;              /* pcov: print COV */
      optn[3]= 1;              /* pcor: print CORR */
      optn[6]= -1;             /* nrep: use all subsets */
   call mve(sc,xmve,dist,optn,a);
The first part of the output shows the classical scatter and correlation matrix.

Output 9.5.1: Some Simple Statistics
Minimum Volume Ellipsoid (MVE) Estimation

Consider Ellipsoids Containing 12 Cases.

Classical Covariance Matrix
  VAR1 VAR2 VAR3
VAR1 84.057142857 22.657142857 24.571428571
VAR2 22.657142857 9.9904761905 6.6214285714
VAR3 24.571428571 6.6214285714 28.714285714

Classical Correlation Matrix
  VAR1 VAR2 VAR3
VAR1 1 0.781852333 0.5001428749
VAR2 0.781852333 1 0.3909395378
VAR3 0.5001428749 0.3909395378 1

Classical Mean
VAR1 60.428571429
VAR2 21.095238095
VAR3 86.285714286


The second part of the output shows the results of the optimization (complete subset sampling).

Output 9.5.2: Iteration History
Random Subsampling for MVE

Subset Singular Best
Criterion
Percent
500 23 165.830053 25
1000 55 165.634363 50
1500 79 165.634363 75
2000 103 165.634363 100

Minimum Criterion= 165.63436284

Among 2103 subsets 103 are singular.

Observations of Best Subset
14 20 7 10

Initial MVE Location
Estimates
VAR1 58.5
VAR2 20.25
VAR3 87

Initial MVE Scatter Matrix
  VAR1 VAR2 VAR3
VAR1 34.829014749 28.413143611 62.32560534
VAR2 28.413143611 38.036950318 58.659393261
VAR3 62.32560534 58.659393261 267.63348175


The third part of the output shows the optimization results after local improvement.

Output 9.5.3: Table of MVE Results
Final MVE Estimates (Using Local Improvement)

Number of Points with Nonzero Weight=17

Robust MVE Location Estimates
VAR1 56.705882353
VAR2 20.235294118
VAR3 85.529411765

Robust MVE Scatter Matrix
  VAR1 VAR2 VAR3
VAR1 23.470588235 7.5735294118 16.102941176
VAR2 7.5735294118 6.3161764706 5.3676470588
VAR3 16.102941176 5.3676470588 32.389705882

Eigenvalues of Robust
Scatter Matrix
VAR1 46.597431018
VAR2 12.155938483
VAR3 3.423101087

Robust Correlation Matrix
  VAR1 VAR2 VAR3
VAR1 1 0.6220269501 0.5840361335
VAR2 0.6220269501 1 0.375278187
VAR3 0.5840361335 0.375278187 1


The final output presents a table containing the classical Mahalanobis distances, the robust distances, and the weights identifying the outlying observations (that is, the leverage points when explaining y with these three regressor variables).

Output 9.5.4: Mahalanobis and Robust Distances

Classical Distances and Robust (Rousseeuw) Distances
Unsquared Mahalanobis Distance and
Unsquared Rousseeuw Distance of Each Observation
N Mahalanobis Distances Robust Distances Weight
1 2.253603 5.528395 0
2 2.324745 5.637357 0
3 1.593712 4.197235 0
4 1.271898 1.588734 1.000000
5 0.303357 1.189335 1.000000
6 0.772895 1.308038 1.000000
7 1.852661 1.715924 1.000000
8 1.852661 1.715924 1.000000
9 1.360622 1.226680 1.000000
10 1.745997 1.936256 1.000000
11 1.465702 1.493509 1.000000
12 1.841504 1.913079 1.000000
13 1.482649 1.659943 1.000000
14 1.778785 1.689210 1.000000
15 1.690241 2.230109 1.000000
16 1.291934 1.767582 1.000000
17 2.700016 2.431021 1.000000
18 1.503155 1.523316 1.000000
19 1.593221 1.710165 1.000000
20 0.807054 0.675124 1.000000
21 2.176761 3.657281 0

Distribution of Robust Distances

MinRes 1st Qu. Median Mean 3rd Qu. MaxRes
0.6751244996 1.5084120761 1.7159242054 2.2282960174 2.0831826658 5.6373573538

Cutoff Value = 3.0575159206

The cutoff value is the square root of the 0.975 quantile of the chi square distribution
with 3 degrees of freedom.

There are 4 points with large robust distances receiving zero weights. These may include
boundary cases. Only points whose robust distances are subs tantially larger than the
cutoff value should be considered outliers.


The following specification generates three bivariate plots of the classical and robust tolerance ellipsoids, one plot for each pair of variables:

      optn = j(8,1,.); optn[6]= -1;
      vnam = { "Rate", "Temperature", "AcidConcent" };
      filn = "stl";
      titl = "Stackloss Data: Use All Subsets";
   call scatmve(2,optn,.9,a,vnam,titl,1,filn);

The output follows.

Output 9.5.5: Stackloss Data: Rate vs. Temperature
stl21.gif (4270 bytes)

Output 9.5.6: Stackloss Data: Rate vs. Acid Concent
stl31.gif (4526 bytes)

Output 9.5.7: Stackloss Data: Temperature vs. Acid Concent
stl32.gif (4482 bytes)

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.