Chapter Contents Previous Next
 The SURVEYSELECT Procedure

## Stratified Sampling

In this section, stratification is added to the sample design for the customer satisfaction survey. The sampling frame, or list of all customers, is stratified by State and Type. This divides the sampling frame into nonoverlapping subgroups formed from the values of the State and Type variables. Samples are then selected independently within the strata.

PROC SURVEYSELECT requires that the input data set be sorted by the STRATA variables. The following PROC SORT statements sort the Customers data set by the stratification variables State and Type.

```   proc sort data=Customers;
by State Type;
run;
```

The following PROC FREQ statements display the crosstabulation of the Customers data set by State and Type.

```   proc freq data=Customers;
tables State*Type;
run;
```

Figure 63.4 presents the table of State by Type for the 13,471 customers. There are four states and two levels of Type, forming a total of eight strata.

 The FREQ Procedure

 Frequency Percent Row Pct Col Pct

 Table of State by Type State Type Total New Old AL 1238 9.19 63.68 14.43 706 5.24 36.32 14.43 1944 14.43 FL 2170 16.11 61.30 25.29 1370 10.17 38.70 28.01 3540 26.28 GA 3488 25.89 64.26 40.65 1940 14.40 35.74 39.66 5428 40.29 SC 1684 12.50 65.81 19.63 875 6.50 34.19 17.89 2559 19.00 Total 8580 63.69 4891 36.31 13471 100.00

Figure 63.4: Stratification of Customers by State and Type

The following PROC SURVEYSELECT statements select a probability sample of customers from the Customers data set according to the stratified sample design.

```   title1 'Customer Satisfaction Survey';
title2 'Stratified Sampling';
proc surveyselect data=Customers method=srs n=15
seed=1953 out=SampleStrata;
strata State Type;
run;
```

The STRATA statement names the stratification variables State and Type. In the PROC SURVEYSELECT statement, the METHOD=SRS option specifies simple random sampling. The N=15 option specifies a sample size of 15 customers for each stratum. If you want to specify different sample sizes for different strata, you can use the N=SAS-data-set option to name a secondary data set that contains the stratum sample sizes. The SEED=1953 option specifies '1953' as the initial seed for random number generation.

Figure 63.5 displays the output from PROC SURVEYSELECT, which summarizes the sample selection. A total of 120 customers are selected.

 Customer Satisfaction Survey Stratified Sampling

 The SURVEYSELECT Procedure

 Selection Method Simple Random Sampling Strata Variables State Type

 Input Data Set CUSTOMERS Random Number Seed 1953 Stratum Sample Size 15 Number of Strata 8 Total Sample Size 120 Output Data Set SAMPLESTRATA

Figure 63.5: Sample Selection Summary

The following PROC PRINT statements display the first 30 observations of the output data set SampleStrata.

```   title1 'Customer Satisfaction Survey';
title2 'Sample Selected by Stratified Design';
title3 '(First 30 Observations)';
proc print data=SampleStrata(obs=30);
run;
```

Figure 63.6 displays the first 30 observations of the output data set SampleStrata, which contains the sample of 120 customers, 15 customers from each of the 8 strata. The variable SelectionProb contains the selection probability for each customer in the sample. Since customers are selected with equal probability within strata in this design, the selection probability equals the stratum sample size (15) divided by the stratum population size. The selection probabilities differ from stratum to stratum since the population sizes differ. The selection probability for each customer in the first stratum (State=`AL' and Type=`New') is 0.012116, and the selection probability is 0.021246 for customers in the second stratum. The variable SamplingWeight contains the sampling weights, which are computed as inverse selection probabilities.

 Customer Satisfaction Survey Sample Selected by Stratified Design (First 30 Observations)

 Obs State Type CustomerID Usage SelectionProb SamplingWeight 1 AL New 002-26-1498 1189 0.012116 82.5333 2 AL New 070-86-8494 106 0.012116 82.5333 3 AL New 121-28-6895 76 0.012116 82.5333 4 AL New 131-79-7630 265 0.012116 82.5333 5 AL New 211-88-4991 108 0.012116 82.5333 6 AL New 222-81-3742 83 0.012116 82.5333 7 AL New 238-46-3776 278 0.012116 82.5333 8 AL New 370-01-0671 123 0.012116 82.5333 9 AL New 407-07-5479 1580 0.012116 82.5333 10 AL New 550-90-3188 177 0.012116 82.5333 11 AL New 582-40-9610 46 0.012116 82.5333 12 AL New 672-59-9114 66 0.012116 82.5333 13 AL New 848-60-3119 28 0.012116 82.5333 14 AL New 886-83-4909 170 0.012116 82.5333 15 AL New 993-31-7677 64 0.012116 82.5333 16 AL Old 124-60-0495 80 0.021246 47.0667 17 AL Old 128-54-9590 56 0.021246 47.0667 18 AL Old 204-05-4017 17 0.021246 47.0667 19 AL Old 210-68-8704 4363 0.021246 47.0667 20 AL Old 239-75-4343 430 0.021246 47.0667 21 AL Old 317-70-6496 452 0.021246 47.0667 22 AL Old 365-37-1340 21 0.021246 47.0667 23 AL Old 399-78-7900 108 0.021246 47.0667 24 AL Old 404-90-6273 824 0.021246 47.0667 25 AL Old 421-04-8548 1332 0.021246 47.0667 26 AL Old 604-48-0587 16 0.021246 47.0667 27 AL Old 774-04-0162 318 0.021246 47.0667 28 AL Old 849-66-4156 79 0.021246 47.0667 29 AL Old 937-69-9106 182 0.021246 47.0667 30 AL Old 985-09-8691 24 0.021246 47.0667

Figure 63.6: Customer Sample (First 30 Observations)

 Chapter Contents Previous Next Top