Chapter Contents Previous Next
 The STEPDISC Procedure

## Example 60.1: Performing a Stepwise Discriminant Analysis

The iris data published by Fisher (1936) have been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length, and petal width are measured in millimeters on fifty iris specimens from each of three species: Iris setosa, I. versicolor, and I. virginica.

```   proc format;
value specname
1='Setosa    '
2='Versicolor'
3='Virginica ';
data iris;
title 'Fisher (1936) Iris Data';
input SepalLength SepalWidth PetalLength PetalWidth
Species @@;
format Species specname.;
label SepalLength='Sepal Length in mm.'
SepalWidth ='Sepal Width in mm.'
PetalLength='Petal Length in mm.'
PetalWidth ='Petal Width in mm.';
datalines;
50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3
63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2
59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2
65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3
68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3
77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3
49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2
64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3
55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1
49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1
67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1
77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2
50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1
61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1
61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1
51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1
51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1
46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1
50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3
57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1
71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3
49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1
49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1
66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1
44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2
47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2
74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1
56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3
49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1
56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2
51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3
54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3
61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3
68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1
45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1
55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1
51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2
63 33 60 25 3 53 37 15 02 1
;
```

A stepwise discriminant analysis is performed using stepwise selection.

In the PROC STEPDISC statement, the BSSCP and TSSCP options display the between-class SSCP matrix and the total-sample corrected SSCP matrix. By default, the significance level of an F test from an analysis of covariance is used as the selection criterion. The variable under consideration is the dependent variable, and the variables already chosen act as covariates. The following SAS statements produce Output 60.1.1 through Output 60.1.8:

```   proc stepdisc data=iris bsscp tsscp;
class Species;
var SepalLength SepalWidth PetalLength PetalWidth;
run;
```

Output 60.1.1: Iris Data: Summary Information

 Fisher (1936) Iris Data

 The STEPDISC Procedure

 The Method for Selecting Variables is STEPWISE Observations 150 Variable(s) in the Analysis 4 Class Levels 3 Variable(s) will be Included 0 Significance Level to Enter 0.15 Significance Level to Stay 0.15

 Class Level Information Species VariableName Frequency Weight Proportion Setosa Setosa 50 50.0000 0.333333 Versicolor Versicolor 50 50.0000 0.333333 Virginica Virginica 50 50.0000 0.333333

Output 60.1.2: Iris Data: Between-Class and Total-Sample SSCP Matrices

 Fisher (1936) Iris Data

 The STEPDISC Procedure

 Between-Class SSCP Matrix Variable Label SepalLength SepalWidth PetalLength PetalWidth SepalLength Sepal Length in mm. 6321.21333 -1995.26667 16524.84000 7127.93333 SepalWidth Sepal Width in mm. -1995.26667 1134.49333 -5723.96000 -2293.26667 PetalLength Petal Length in mm. 16524.84000 -5723.96000 43710.28000 18677.40000 PetalWidth Petal Width in mm. 7127.93333 -2293.26667 18677.40000 8041.33333

 Total-Sample SSCP Matrix Variable Label SepalLength SepalWidth PetalLength PetalWidth SepalLength Sepal Length in mm. 10216.83333 -632.26667 18987.30000 7692.43333 SepalWidth Sepal Width in mm. -632.26667 2830.69333 -4911.88000 -1812.42667 PetalLength Petal Length in mm. 18987.30000 -4911.88000 46432.54000 19304.58000 PetalWidth Petal Width in mm. 7692.43333 -1812.42667 19304.58000 8656.99333

In Step 1, the tolerance is 1.0 for each variable under consideration because no variables have yet entered the model. Variable PetalLength is selected because its F statistic, 1180.161, is the largest among all variables.

Output 60.1.3: Iris Data: Stepwise Selection Step 1

 Fisher (1936) Iris Data

 The STEPDISC Procedure Stepwise Selection: Step 1

 Statistics for Entry, DF = 2, 147 Variable Label R-Square F Value Pr > F Tolerance SepalLength Sepal Length in mm. 0.6187 119.26 <.0001 1.0000 SepalWidth Sepal Width in mm. 0.4008 49.16 <.0001 1.0000 PetalLength Petal Length in mm. 0.9414 1180.16 <.0001 1.0000 PetalWidth Petal Width in mm. 0.9289 960.01 <.0001 1.0000

 Variable PetalLength will be entered.

 Variable(s) thathave been Entered PetalLength

 Multivariate Statistics Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.058628 1180.16 2 147 <.0001 Pillai's Trace 0.941372 1180.16 2 147 <.0001 Average Squared Canonical Correlation 0.470686

In Step 2, with variable PetalLength already in the model, PetalLength is tested for removal before selecting a new variable for entry. Since PetalLength meets the criterion to stay, it is used as a covariate in the analysis of covariance for variable selection. Variable SepalWidth is selected because its F statistic, 43.035, is the largest among all variables not in the model and its associated tolerance, 0.8164, meets the criterion to enter. The process is repeated in Steps 3 and 4. Variable PetalWidth is entered in Step 3, and variable SepalLength is entered in Step 4.

Output 60.1.4: Iris Data: Stepwise Selection Step 2

 Fisher (1936) Iris Data

 The STEPDISC Procedure Stepwise Selection: Step 2

 Statistics for Removal, DF = 2, 147 Variable Label R-Square F Value Pr > F PetalLength Petal Length in mm. 0.9414 1180.16 <.0001

 No variables can be removed.

 Statistics for Entry, DF = 2, 146 Variable Label PartialR-Square F Value Pr > F Tolerance SepalLength Sepal Length in mm. 0.3198 34.32 <.0001 0.2400 SepalWidth Sepal Width in mm. 0.3709 43.04 <.0001 0.8164 PetalWidth Petal Width in mm. 0.2533 24.77 <.0001 0.0729

 Variable SepalWidth will be entered.

 Variable(s) that have beenEntered SepalWidth PetalLength

 Multivariate Statistics Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.036884 307.10 4 292 <.0001 Pillai's Trace 1.119908 93.53 4 294 <.0001 Average Squared Canonical Correlation 0.559954

Output 60.1.5: Iris Data: Stepwise Selection Step 3

 Fisher (1936) Iris Data

 The STEPDISC Procedure Stepwise Selection: Step 3

 Statistics for Removal, DF = 2, 146 Variable Label PartialR-Square F Value Pr > F SepalWidth Sepal Width in mm. 0.3709 43.04 <.0001 PetalLength Petal Length in mm. 0.9384 1112.95 <.0001

 No variables can be removed.

 Statistics for Entry, DF = 2, 145 Variable Label PartialR-Square F Value Pr > F Tolerance SepalLength Sepal Length in mm. 0.1447 12.27 <.0001 0.1323 PetalWidth Petal Width in mm. 0.3229 34.57 <.0001 0.0662

 Variable PetalWidth will be entered.

 Variable(s) that have been Entered SepalWidth PetalLength PetalWidth

 Multivariate Statistics Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.024976 257.50 6 290 <.0001 Pillai's Trace 1.189914 71.49 6 292 <.0001 Average Squared Canonical Correlation 0.594957

Output 60.1.6: Iris Data: Stepwise Selection Step 4

 Fisher (1936) Iris Data

 The STEPDISC Procedure Stepwise Selection: Step 4

 Statistics for Removal, DF = 2, 145 Variable Label PartialR-Square F Value Pr > F SepalWidth Sepal Width in mm. 0.4295 54.58 <.0001 PetalLength Petal Length in mm. 0.3482 38.72 <.0001 PetalWidth Petal Width in mm. 0.3229 34.57 <.0001

 No variables can be removed.

 Statistics for Entry, DF = 2, 144 Variable Label PartialR-Square F Value Pr > F Tolerance SepalLength Sepal Length in mm. 0.0615 4.72 0.0103 0.0320

 Variable SepalLength will be entered.

 All variables have been entered.

 Multivariate Statistics Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.023439 199.15 8 288 <.0001 Pillai's Trace 1.191899 53.47 8 290 <.0001 Average Squared Canonical Correlation 0.595949

Since no more variables can be added to or removed from the model, the procedure stops at Step 5 and displays a summary of the selection process.

Output 60.1.7: Iris Data: Stepwise Selection Step 5

 Fisher (1936) Iris Data

 The STEPDISC Procedure Stepwise Selection: Step 5

 Statistics for Removal, DF = 2, 144 Variable Label PartialR-Square F Value Pr > F SepalLength Sepal Length in mm. 0.0615 4.72 0.0103 SepalWidth Sepal Width in mm. 0.2335 21.94 <.0001 PetalLength Petal Length in mm. 0.3308 35.59 <.0001 PetalWidth Petal Width in mm. 0.2570 24.90 <.0001

 No variables can be removed.

 No further steps are possible.

Output 60.1.8: Iris Data: Stepwise Selection Summary

 Fisher (1936) Iris Data

 The STEPDISC Procedure

 Stepwise Selection Summary Step NumberIn Entered Removed Label PartialR-Square F Value Pr > F Wilks'Lambda Pr ASCC 1 1 PetalLength Petal Length in mm. 0.9414 1180.16 <.0001 0.05862828 <.0001 0.47068586 <.0001 2 2 SepalWidth Sepal Width in mm. 0.3709 43.04 <.0001 0.03688411 <.0001 0.55995394 <.0001 3 3 PetalWidth Petal Width in mm. 0.3229 34.57 <.0001 0.02497554 <.0001 0.59495691 <.0001 4 4 SepalLength Sepal Length in mm. 0.0615 4.72 0.0103 0.02343863 <.0001 0.59594941 <.0001

 Chapter Contents Previous Next Top