Chapter Contents Previous Next
 The CLUSTER Procedure

## Example 23.1: Cluster Analysis of Flying Mileages between Ten American Cities

This first example clusters ten American cities based on the flying mileages between them. Six clustering methods are shown with corresponding tree diagrams produced by the TREE procedure. The EML method cannot be used because it requires coordinate data. The other omitted methods produce the same clusters, although not the same distances between clusters, as one of the illustrated methods: complete linkage and the flexible-beta method yield the same clusters as Ward's method, McQuitty's similarity analysis produces the same clusters as average linkage, and the median method corresponds to the centroid method.

All of the methods suggest a division of the cities into two clusters along the east-west dimension. There is disagreement, however, about which cluster Denver should belong to. Some of the methods indicate a possible third cluster containing Denver and Houston. The following statements produce Output 23.1.1:

```   title 'Cluster Analysis of Flying Mileages Between 10 American Cities';
data mileages(type=distance);
input (atlanta chicago denver houston losangeles
miami newyork sanfran seattle washdc) (5.)
@55 city \$15.;
datalines;
0                                                 ATLANTA
587    0                                            CHICAGO
1212  920    0                                       DENVER
701  940  879    0                                  HOUSTON
1936 1745  831 1374    0                             LOS ANGELES
604 1188 1726  968 2339    0                        MIAMI
748  713 1631 1420 2451 1092    0                   NEW YORK
2139 1858  949 1645  347 2594 2571    0              SAN FRANCISCO
2182 1737 1021 1891  959 2734 2408  678    0         SEATTLE
543  597 1494 1220 2300  923  205 2442 2329    0    WASHINGTON D.C.
;

proc cluster data=mileages method=average pseudo;
id city;
run;

proc tree horizontal spaces=2;
id city;
run;

/*---------------------- Centroid method --------------------*/
proc cluster data=mileages method=centroid pseudo;
id city;
run;

proc tree horizontal spaces=2;
id city;
run;

/*-------- Density linkage with 3rd-nearest-neighbor --------*/
proc cluster data=mileages method=density k=3;
id city;
run;

proc tree horizontal spaces=2;
id city;
run;

proc cluster data=mileages method=single;
id city;
run;

proc tree horizontal spaces=2;
id city;
run;

/*--- Two-stage density linkage with 3rd-nearest-neighbor ---*/
proc cluster data=mileages method=twostage k=3;
id city;
run;

proc tree horizontal spaces=2;
id city;
run;

/* Ward's minimum variance with pseudo \$F\$ and \$t^2\$ statistics */
proc cluster data=mileages method=ward pseudo;
id city;
run;

proc tree horizontal spaces=2;
id city;
run;
```

Output 23.1.1: Statistics and Tree Diagrams for Six Different Clustering Methods

 Cluster Analysis of Flying Mileages Between 10 American Cities

 The CLUSTER Procedure Average Linkage Cluster Analysis

 Root-Mean-Square Distance Between Observations = 1580.242

 Cluster History NCL Clusters Joined FREQ PSF PST2 NormRMSDist Tie 9 NEW YORK WASHINGTON D.C. 2 66.7 . 0.1297 8 LOS ANGELES SAN FRANCISCO 2 39.2 . 0.2196 7 ATLANTA CHICAGO 2 21.7 . 0.3715 6 CL7 CL9 4 14.5 3.4 0.4149 5 CL8 SEATTLE 3 12.4 7.3 0.5255 4 DENVER HOUSTON 2 13.9 . 0.5562 3 CL6 MIAMI 5 15.5 3.8 0.6185 2 CL3 CL4 7 16.0 5.3 0.8005 1 CL2 CL5 10 . 16.0 1.2967

 Cluster Analysis of Flying Mileages Between 10 American Cities

 The CLUSTER Procedure Centroid Hierarchical Cluster Analysis

 Root-Mean-Square Distance Between Observations = 1580.242

 Cluster History NCL Clusters Joined FREQ PSF PST2 NormCentDist Tie 9 NEW YORK WASHINGTON D.C. 2 66.7 . 0.1297 8 LOS ANGELES SAN FRANCISCO 2 39.2 . 0.2196 7 ATLANTA CHICAGO 2 21.7 . 0.3715 6 CL7 CL9 4 14.5 3.4 0.3652 5 CL8 SEATTLE 3 12.4 7.3 0.5139 4 DENVER CL5 4 12.4 2.1 0.5337 3 CL6 MIAMI 5 14.2 3.8 0.5743 2 CL3 HOUSTON 6 22.1 2.6 0.6091 1 CL2 CL4 10 . 22.1 1.173

 Cluster Analysis of Flying Mileages Between 10 American Cities

 The CLUSTER Procedure Density Linkage Cluster Analysis

 K = 3

 Cluster History NCL FREQ NormalizedFusion Density Maximum Densityin Each Cluster Tie Clusters Joined Lesser Greater 9 ATLANTA WASHINGTON D.C. 2 96.106 92.5043 100.0 8 CL9 CHICAGO 3 95.263 90.9548 100.0 7 CL8 NEW YORK 4 86.465 76.1571 100.0 6 CL7 MIAMI 5 74.079 58.8299 100.0 T 5 CL6 HOUSTON 6 74.079 61.7747 100.0 4 LOS ANGELES SAN FRANCISCO 2 71.968 65.3430 80.0885 3 CL4 SEATTLE 3 66.341 56.6215 80.0885 2 CL3 DENVER 4 63.509 61.7747 80.0885 1 CL5 CL2 10 61.775 * 80.0885 100.0

 * indicates fusion of two modal or multimodal clusters

 2 modal clusters have been formed.

 Cluster Analysis of Flying Mileages Between 10 American Cities

 The CLUSTER Procedure Single Linkage Cluster Analysis

 Mean Distance Between Observations = 1417.133

 Cluster History NCL Clusters Joined FREQ NormMinDist Tie 9 NEW YORK WASHINGTON D.C. 2 0.1447 8 LOS ANGELES SAN FRANCISCO 2 0.2449 7 ATLANTA CL9 3 0.3832 6 CL7 CHICAGO 4 0.4142 5 CL6 MIAMI 5 0.4262 4 CL8 SEATTLE 3 0.4784 3 CL5 HOUSTON 6 0.4947 2 DENVER CL4 4 0.5864 1 CL3 CL2 10 0.6203

 Cluster Analysis of Flying Mileages Between 10 American Cities

 The CLUSTER Procedure Two-Stage Density Linkage Clustering

 K = 3

 Cluster History NCL FREQ NormalizedFusion Density Maximum Densityin Each Cluster Tie Clusters Joined Lesser Greater 9 ATLANTA WASHINGTON D.C. 2 96.106 92.5043 100.0 8 CL9 CHICAGO 3 95.263 90.9548 100.0 7 CL8 NEW YORK 4 86.465 76.1571 100.0 6 CL7 MIAMI 5 74.079 58.8299 100.0 T 5 CL6 HOUSTON 6 74.079 61.7747 100.0 4 LOS ANGELES SAN FRANCISCO 2 71.968 65.3430 80.0885 3 CL4 SEATTLE 3 66.341 56.6215 80.0885 2 CL3 DENVER 4 63.509 61.7747 80.0885 1 CL5 CL2 10 61.775 80.0885 100.0

 2 modal clusters have been formed.

 Cluster Analysis of Flying Mileages Between 10 American Cities

 The CLUSTER Procedure Ward's Minimum Variance Cluster Analysis

 Root-Mean-Square Distance Between Observations = 1580.242

 Cluster History NCL Clusters Joined FREQ SPRSQ RSQ PSF PST2 Tie 9 NEW YORK WASHINGTON D.C. 2 0.0019 .998 66.7 . 8 LOS ANGELES SAN FRANCISCO 2 0.0054 .993 39.2 . 7 ATLANTA CHICAGO 2 0.0153 .977 21.7 . 6 CL7 CL9 4 0.0296 .948 14.5 3.4 5 DENVER HOUSTON 2 0.0344 .913 13.2 . 4 CL8 SEATTLE 3 0.0391 .874 13.9 7.3 3 CL6 MIAMI 5 0.0586 .816 15.5 3.8 2 CL3 CL5 7 0.1488 .667 16.0 5.3 1 CL2 CL4 10 0.6669 .000 . 16.0

 Chapter Contents Previous Next Top