Chapter Contents Previous Next
 Robust Regression Examples

## Example 9.4: Brainlog Data

The following data, consisting of the body weights (in kilograms) and brain weights (in grams) of N=28 animals, are reported by Jerison (1973) and can be found also in Rousseeuw and Leroy (1987, p. 57). Instead of the original data, this example uses the logarithms of the measurements of the two variables.

```   title "*** Brainlog Data: Do MVE ***";
aa={ 1.303338E-001  9.084851E-001 ,
2.6674530         2.6263400  ,
1.5602650         2.0773680  ,
1.4418520         2.0606980  ,
1.703332E-002  7.403627E-001    ,
4.0681860         1.6989700  ,
3.4060290         3.6630410  ,
2.2720740         2.6222140  ,
2.7168380         2.8162410  ,
1.0000000         2.0606980  ,
5.185139E-001         1.4082400 ,
2.7234560         2.8325090  ,
2.3159700         2.6085260  ,
1.7923920         3.1205740  ,
3.8230830         3.7567880  ,
3.9731280         1.8450980  ,
8.325089E-001         2.2528530 ,
1.5440680         1.7481880  ,
-9.208187E-001          .0000000 ,
-1.6382720 -3.979400E-001     ,
3.979400E-001         1.0827850 ,
1.7442930         2.2430380  ,
2.0000000         2.1959000  ,
1.7173380         2.6434530  ,
4.9395190         2.1889280  ,
-5.528420E-001  2.787536E-001    ,
-9.136401E-001  4.771213E-001    ,
2.2833010         2.2552720  };
```
By default, the MVE subroutine (like the MINVOL subroutine) uses only 1500 randomly selected subsets rather than all subsets. The following specification of the options vector requires that all 3276 subsets of 3 cases out of 28 cases are generated and evaluated:

```   title2 "***MVE for BrainLog Data***";
title3 "*** Use All Subsets***";
optn = j(8,1,.);
optn[1]= 3;              /* ipri */
optn[2]= 1;              /* pcov: print COV */
optn[3]= 1;              /* pcor: print CORR */
optn[6]= -1;             /* nrep: all subsets */
call mve(sc,xmve,dist,optn,aa);
```

Specifying optn[1]=3, optn[2]=1, and optn[3]=1 requests that all output be printed. Therefore, the first part of the output shows the classical scatter and correlation matrix.

Output 9.4.1: Some Simple Statistics
 Minimum Volume Ellipsoid (MVE) Estimation

 Consider Ellipsoids Containing 15 Cases.

 Classical Covariance Matrix VAR1 VAR2 VAR1 2.6816512357 1.3300846932 VAR2 1.3300846932 1.0857537552

 Classical Correlation Matrix VAR1 VAR2 VAR1 1 0.7794934643 VAR2 0.7794934643 1

 Classical Mean VAR1 1.6378572186 VAR2 1.9219465964

The second part of the output shows the results of the combinatoric optimization (complete subset sampling).

Output 9.4.2: Iteration History for MVE
 There are 3276 subsets of 3 cases out of 28 cases.

 The algorithm will draw 1500 random subsets of 3 cases.

 Random Subsampling for MVE

 Subset Singular BestCriterion Percent 375 0 0.453147 25 750 0 0.453147 50 1125 0 0.453147 75 1500 0 0.453147 100

 Minimum Criterion= 0.4531471437

 Among 1500 subsets 0 are singular.

 Observations of Best Subset 1 22 2

 Initial MVE LocationEstimates VAR1 1.5140266 VAR2 1.9259543667

 Initial MVE Scatter Matrix VAR1 VAR2 VAR1 7.5699818309 5.2533273517 VAR2 5.2533273517 3.7329226108

The third part of the output shows the optimization results after local improvement.

Output 9.4.3: Table of MVE Results
 Final MVE Estimates (Using Local Improvement)

 Number of Points with Nonzero Weight=23

 Robust MVE Location Estimates VAR1 1.3154029661 VAR2 1.8568731174

 Robust MVE Scatter Matrix VAR1 VAR2 VAR1 2.139986054 1.6068556533 VAR2 1.6068556533 1.2520384784

 Eigenvalues of RobustScatter Matrix VAR1 3.363074897 VAR2 0.0289496354

 Robust Correlation Matrix VAR1 VAR2 VAR1 1 0.9816633012 VAR2 0.9816633012 1

The final output presents a table containing the classical Mahalanobis distances, the robust distances, and the weights identifying the outlier observations.

Output 9.4.4: Mahalanobis and Robust Distances

 Classical Distances and Robust (Rousseeuw) Distances Unsquared Mahalanobis Distance and Unsquared Rousseeuw Distance of Each Observation N Mahalanobis Distances Robust Distances Weight 1 1.006591 0.855347 1.000000 2 0.695261 1.477050 1.000000 3 0.300831 0.239828 1.000000 4 0.380817 0.517719 1.000000 5 1.146485 1.108362 1.000000 6 2.644176 10.599341 0 7 1.708334 1.808455 1.000000 8 0.706522 0.690099 1.000000 9 0.858404 1.052423 1.000000 10 0.798698 2.077131 1.000000 11 0.686485 0.888545 1.000000 12 0.874349 1.035824 1.000000 13 0.677791 0.683978 1.000000 14 1.721526 4.257963 0 15 1.761947 1.716065 1.000000 16 2.369473 9.584992 0 17 1.222253 3.571700 0 18 0.203178 1.323783 1.000000 19 1.855201 1.741064 1.000000 20 2.266268 2.026528 1.000000 21 0.831416 0.743545 1.000000 22 0.416158 0.419923 1.000000 23 0.264182 0.944610 1.000000 24 1.046120 2.289334 1.000000 25 2.911101 11.471953 0 26 1.586458 1.518721 1.000000 27 1.582124 2.054593 1.000000 28 0.394664 1.675651 1.000000

 Distribution of Robust Distances

 MinRes 1st Qu. Median Mean 3rd Qu. MaxRes 0.2398280857 0.8719458303 1.4978856181 2.4419474031 2.0658619775 11.471952618

 Cutoff Value = 2.7162030315

 The cutoff value is the square root of the 0.975 quantile of the chi square distributionwith 2 degrees of freedom.

 There are 5 points with large robust distances receiving zero weights. These may includeboundary cases. Only points whose robust distances are subs tantially larger than thecutoff value should be considered outliers.

Again, you can call the subroutine scatmve(), which is included in the sample library in file robustmc.sas, to plot the classical and robust confidence ellipsoids:

```      optn = j(8,1,.); optn[6]= -1;
vnam = { "Log Body Wgt","Log Brain Wgt" };
filn = "brl";
titl = "BrainLog Data: Use All Subsets";
call scatmve(2,optn,.9,aa,vnam,titl,1,filn);
```

The output follows.

Output 9.4.5: BrainLog Data: Classical and Robust Ellipsoid

 Chapter Contents Previous Next Top