Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Robust Regression Examples

Example 9.4: Brainlog Data

The following data, consisting of the body weights (in kilograms) and brain weights (in grams) of N=28 animals, are reported by Jerison (1973) and can be found also in Rousseeuw and Leroy (1987, p. 57). Instead of the original data, this example uses the logarithms of the measurements of the two variables.

   title "*** Brainlog Data: Do MVE ***";
      aa={ 1.303338E-001  9.084851E-001 ,
           2.6674530         2.6263400  ,
           1.5602650         2.0773680  ,
           1.4418520         2.0606980  ,
        1.703332E-002  7.403627E-001    ,
           4.0681860         1.6989700  ,
           3.4060290         3.6630410  ,
           2.2720740         2.6222140  ,
           2.7168380         2.8162410  ,
           1.0000000         2.0606980  ,
        5.185139E-001         1.4082400 ,
           2.7234560         2.8325090  ,
           2.3159700         2.6085260  ,
           1.7923920         3.1205740  ,
           3.8230830         3.7567880  ,
           3.9731280         1.8450980  ,
        8.325089E-001         2.2528530 ,
           1.5440680         1.7481880  ,
       -9.208187E-001          .0000000 ,
          -1.6382720 -3.979400E-001     ,
        3.979400E-001         1.0827850 ,
           1.7442930         2.2430380  ,
           2.0000000         2.1959000  ,
           1.7173380         2.6434530  ,
           4.9395190         2.1889280  ,
       -5.528420E-001  2.787536E-001    ,
       -9.136401E-001  4.771213E-001    ,
           2.2833010         2.2552720  };
By default, the MVE subroutine (like the MINVOL subroutine) uses only 1500 randomly selected subsets rather than all subsets. The following specification of the options vector requires that all 3276 subsets of 3 cases out of 28 cases are generated and evaluated:

   title2 "***MVE for BrainLog Data***";
   title3 "*** Use All Subsets***";
      optn = j(8,1,.);
      optn[1]= 3;              /* ipri */
      optn[2]= 1;              /* pcov: print COV */
      optn[3]= 1;              /* pcor: print CORR */
      optn[6]= -1;             /* nrep: all subsets */
   call mve(sc,xmve,dist,optn,aa);

Specifying optn[1]=3, optn[2]=1, and optn[3]=1 requests that all output be printed. Therefore, the first part of the output shows the classical scatter and correlation matrix.

Output 9.4.1: Some Simple Statistics
Minimum Volume Ellipsoid (MVE) Estimation

Consider Ellipsoids Containing 15 Cases.

Classical Covariance Matrix
  VAR1 VAR2
VAR1 2.6816512357 1.3300846932
VAR2 1.3300846932 1.0857537552

Classical Correlation Matrix
  VAR1 VAR2
VAR1 1 0.7794934643
VAR2 0.7794934643 1

Classical Mean
VAR1 1.6378572186
VAR2 1.9219465964


The second part of the output shows the results of the combinatoric optimization (complete subset sampling).

Output 9.4.2: Iteration History for MVE
There are 3276 subsets of 3 cases out of 28 cases.

The algorithm will draw 1500 random subsets of 3 cases.

Random Subsampling for MVE

Subset Singular Best
Criterion
Percent
375 0 0.453147 25
750 0 0.453147 50
1125 0 0.453147 75
1500 0 0.453147 100

Minimum Criterion= 0.4531471437

Among 1500 subsets 0 are singular.

Observations of Best Subset
1 22 2

Initial MVE Location
Estimates
VAR1 1.5140266
VAR2 1.9259543667

Initial MVE Scatter Matrix
  VAR1 VAR2
VAR1 7.5699818309 5.2533273517
VAR2 5.2533273517 3.7329226108


The third part of the output shows the optimization results after local improvement.

Output 9.4.3: Table of MVE Results
Final MVE Estimates (Using Local Improvement)

Number of Points with Nonzero Weight=23

Robust MVE Location Estimates
VAR1 1.3154029661
VAR2 1.8568731174

Robust MVE Scatter Matrix
  VAR1 VAR2
VAR1 2.139986054 1.6068556533
VAR2 1.6068556533 1.2520384784

Eigenvalues of Robust
Scatter Matrix
VAR1 3.363074897
VAR2 0.0289496354

Robust Correlation Matrix
  VAR1 VAR2
VAR1 1 0.9816633012
VAR2 0.9816633012 1


The final output presents a table containing the classical Mahalanobis distances, the robust distances, and the weights identifying the outlier observations.

Output 9.4.4: Mahalanobis and Robust Distances

Classical Distances and Robust (Rousseeuw) Distances
Unsquared Mahalanobis Distance and
Unsquared Rousseeuw Distance of Each Observation
N Mahalanobis Distances Robust Distances Weight
1 1.006591 0.855347 1.000000
2 0.695261 1.477050 1.000000
3 0.300831 0.239828 1.000000
4 0.380817 0.517719 1.000000
5 1.146485 1.108362 1.000000
6 2.644176 10.599341 0
7 1.708334 1.808455 1.000000
8 0.706522 0.690099 1.000000
9 0.858404 1.052423 1.000000
10 0.798698 2.077131 1.000000
11 0.686485 0.888545 1.000000
12 0.874349 1.035824 1.000000
13 0.677791 0.683978 1.000000
14 1.721526 4.257963 0
15 1.761947 1.716065 1.000000
16 2.369473 9.584992 0
17 1.222253 3.571700 0
18 0.203178 1.323783 1.000000
19 1.855201 1.741064 1.000000
20 2.266268 2.026528 1.000000
21 0.831416 0.743545 1.000000
22 0.416158 0.419923 1.000000
23 0.264182 0.944610 1.000000
24 1.046120 2.289334 1.000000
25 2.911101 11.471953 0
26 1.586458 1.518721 1.000000
27 1.582124 2.054593 1.000000
28 0.394664 1.675651 1.000000

Distribution of Robust Distances

MinRes 1st Qu. Median Mean 3rd Qu. MaxRes
0.2398280857 0.8719458303 1.4978856181 2.4419474031 2.0658619775 11.471952618

Cutoff Value = 2.7162030315

The cutoff value is the square root of the 0.975 quantile of the chi square distribution
with 2 degrees of freedom.

There are 5 points with large robust distances receiving zero weights. These may include
boundary cases. Only points whose robust distances are subs tantially larger than the
cutoff value should be considered outliers.


Again, you can call the subroutine scatmve(), which is included in the sample library in file robustmc.sas, to plot the classical and robust confidence ellipsoids:

      optn = j(8,1,.); optn[6]= -1;
      vnam = { "Log Body Wgt","Log Brain Wgt" };
      filn = "brl";
      titl = "BrainLog Data: Use All Subsets";
   call scatmve(2,optn,.9,aa,vnam,titl,1,filn);

The output follows.

Output 9.4.5: BrainLog Data: Classical and Robust Ellipsoid
brl21.gif (4372 bytes)

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.