Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The CLUSTER Procedure

Example 23.1: Cluster Analysis of Flying Mileages between Ten American Cities

This first example clusters ten American cities based on the flying mileages between them. Six clustering methods are shown with corresponding tree diagrams produced by the TREE procedure. The EML method cannot be used because it requires coordinate data. The other omitted methods produce the same clusters, although not the same distances between clusters, as one of the illustrated methods: complete linkage and the flexible-beta method yield the same clusters as Ward's method, McQuitty's similarity analysis produces the same clusters as average linkage, and the median method corresponds to the centroid method.

All of the methods suggest a division of the cities into two clusters along the east-west dimension. There is disagreement, however, about which cluster Denver should belong to. Some of the methods indicate a possible third cluster containing Denver and Houston. The following statements produce Output 23.1.1:

   title 'Cluster Analysis of Flying Mileages Between 10 American Cities';
   data mileages(type=distance);
      input (atlanta chicago denver houston losangeles
            miami newyork sanfran seattle washdc) (5.)
            @55 city $15.;
      datalines;
       0                                                 ATLANTA
     587    0                                            CHICAGO
    1212  920    0                                       DENVER
     701  940  879    0                                  HOUSTON
    1936 1745  831 1374    0                             LOS ANGELES
     604 1188 1726  968 2339    0                        MIAMI
     748  713 1631 1420 2451 1092    0                   NEW YORK
    2139 1858  949 1645  347 2594 2571    0              SAN FRANCISCO
    2182 1737 1021 1891  959 2734 2408  678    0         SEATTLE
     543  597 1494 1220 2300  923  205 2442 2329    0    WASHINGTON D.C.
   ;


   /*---------------------- Average linkage --------------------*/
    proc cluster data=mileages method=average pseudo;
      id city;
   run;

   proc tree horizontal spaces=2;
      id city;
   run;

   /*---------------------- Centroid method --------------------*/
   proc cluster data=mileages method=centroid pseudo;
      id city;
   run;

   proc tree horizontal spaces=2;
      id city;
   run;

   /*-------- Density linkage with 3rd-nearest-neighbor --------*/
   proc cluster data=mileages method=density k=3;
      id city;
   run;

   proc tree horizontal spaces=2;
      id city;
   run;

   /*--------------------- Single linkage ----------------------*/
   proc cluster data=mileages method=single;
      id city;
   run;

   proc tree horizontal spaces=2;
      id city;
   run;

   /*--- Two-stage density linkage with 3rd-nearest-neighbor ---*/
   proc cluster data=mileages method=twostage k=3;
      id city;
   run;

   proc tree horizontal spaces=2;
      id city;
   run;

   /* Ward's minimum variance with pseudo $F$ and $t^2$ statistics */
   proc cluster data=mileages method=ward pseudo;
      id city;
   run;

   proc tree horizontal spaces=2;
      id city;
   run;

Output 23.1.1: Statistics and Tree Diagrams for Six Different Clustering Methods

Cluster Analysis of Flying Mileages Between 10 American Cities

The CLUSTER Procedure
Average Linkage Cluster Analysis

Root-Mean-Square Distance Between Observations = 1580.242

Cluster History
NCL Clusters Joined FREQ PSF PST2 Norm
RMS
Dist
T
i
e
9 NEW YORK WASHINGTON D.C. 2 66.7 . 0.1297  
8 LOS ANGELES SAN FRANCISCO 2 39.2 . 0.2196  
7 ATLANTA CHICAGO 2 21.7 . 0.3715  
6 CL7 CL9 4 14.5 3.4 0.4149  
5 CL8 SEATTLE 3 12.4 7.3 0.5255  
4 DENVER HOUSTON 2 13.9 . 0.5562  
3 CL6 MIAMI 5 15.5 3.8 0.6185  
2 CL3 CL4 7 16.0 5.3 0.8005  
1 CL2 CL5 10 . 16.0 1.2967  


clue1b.gif (3698 bytes)

Cluster Analysis of Flying Mileages Between 10 American Cities

The CLUSTER Procedure
Centroid Hierarchical Cluster Analysis

Root-Mean-Square Distance Between Observations = 1580.242

Cluster History
NCL Clusters Joined FREQ PSF PST2 Norm
Cent
Dist
T
i
e
9 NEW YORK WASHINGTON D.C. 2 66.7 . 0.1297  
8 LOS ANGELES SAN FRANCISCO 2 39.2 . 0.2196  
7 ATLANTA CHICAGO 2 21.7 . 0.3715  
6 CL7 CL9 4 14.5 3.4 0.3652  
5 CL8 SEATTLE 3 12.4 7.3 0.5139  
4 DENVER CL5 4 12.4 2.1 0.5337  
3 CL6 MIAMI 5 14.2 3.8 0.5743  
2 CL3 HOUSTON 6 22.1 2.6 0.6091  
1 CL2 CL4 10 . 22.1 1.173  


clue1d.gif (3674 bytes)

Cluster Analysis of Flying Mileages Between 10 American Cities

The CLUSTER Procedure
Density Linkage Cluster Analysis

K = 3

Cluster History
NCL   FREQ Normalized
Fusion Density
  Maximum Density
in Each Cluster
T
i
e
Clusters Joined Lesser Greater
9 ATLANTA WASHINGTON D.C. 2 96.106   92.5043 100.0  
8 CL9 CHICAGO 3 95.263   90.9548 100.0  
7 CL8 NEW YORK 4 86.465   76.1571 100.0  
6 CL7 MIAMI 5 74.079   58.8299 100.0 T
5 CL6 HOUSTON 6 74.079   61.7747 100.0  
4 LOS ANGELES SAN FRANCISCO 2 71.968   65.3430 80.0885  
3 CL4 SEATTLE 3 66.341   56.6215 80.0885  
2 CL3 DENVER 4 63.509   61.7747 80.0885  
1 CL5 CL2 10 61.775 * 80.0885 100.0  

* indicates fusion of two modal or multimodal clusters

2 modal clusters have been formed.


clue1f.gif (3848 bytes)

Cluster Analysis of Flying Mileages Between 10 American Cities

The CLUSTER Procedure
Single Linkage Cluster Analysis

Mean Distance Between Observations = 1417.133

Cluster History
NCL Clusters Joined FREQ Norm
Min
Dist
T
i
e
9 NEW YORK WASHINGTON D.C. 2 0.1447  
8 LOS ANGELES SAN FRANCISCO 2 0.2449  
7 ATLANTA CL9 3 0.3832  
6 CL7 CHICAGO 4 0.4142  
5 CL6 MIAMI 5 0.4262  
4 CL8 SEATTLE 3 0.4784  
3 CL5 HOUSTON 6 0.4947  
2 DENVER CL4 4 0.5864  
1 CL3 CL2 10 0.6203  


clue1h.gif (3809 bytes)

Cluster Analysis of Flying Mileages Between 10 American Cities

The CLUSTER Procedure
Two-Stage Density Linkage Clustering

K = 3

Cluster History
NCL   FREQ Normalized
Fusion Density
Maximum Density
in Each Cluster
T
i
e
Clusters Joined Lesser Greater
9 ATLANTA WASHINGTON D.C. 2 96.106 92.5043 100.0  
8 CL9 CHICAGO 3 95.263 90.9548 100.0  
7 CL8 NEW YORK 4 86.465 76.1571 100.0  
6 CL7 MIAMI 5 74.079 58.8299 100.0 T
5 CL6 HOUSTON 6 74.079 61.7747 100.0  
4 LOS ANGELES SAN FRANCISCO 2 71.968 65.3430 80.0885  
3 CL4 SEATTLE 3 66.341 56.6215 80.0885  
2 CL3 DENVER 4 63.509 61.7747 80.0885  
1 CL5 CL2 10 61.775 80.0885 100.0  

2 modal clusters have been formed.


clue1j.gif (3462 bytes)

Cluster Analysis of Flying Mileages Between 10 American Cities

The CLUSTER Procedure
Ward's Minimum Variance Cluster Analysis

Root-Mean-Square Distance Between Observations = 1580.242

Cluster History
NCL Clusters Joined FREQ SPRSQ RSQ PSF PST2 T
i
e
9 NEW YORK WASHINGTON D.C. 2 0.0019 .998 66.7 .  
8 LOS ANGELES SAN FRANCISCO 2 0.0054 .993 39.2 .  
7 ATLANTA CHICAGO 2 0.0153 .977 21.7 .  
6 CL7 CL9 4 0.0296 .948 14.5 3.4  
5 DENVER HOUSTON 2 0.0344 .913 13.2 .  
4 CL8 SEATTLE 3 0.0391 .874 13.9 7.3  
3 CL6 MIAMI 5 0.0586 .816 15.5 3.8  
2 CL3 CL5 7 0.1488 .667 16.0 5.3  
1 CL2 CL4 10 0.6669 .000 . 16.0  


clue1l.gif (3725 bytes)

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.