Chapter Contents Previous Next
 The SURVEYSELECT Procedure

## Example 63.3: PPS (Dollar-Unit) Sampling

A small company wants to audit employee travel expenses in an effort to improve the expense reporting procedure and possibly reduce expenses. The company does not have resources to examine all expense reports and wants to use statistical sampling to objectively select expense reports for audit.

The data set TravelExpense contains the dollar amount of all employee travel expense transactions during the past month.

```   data TravelExpense;
input ID\$ Amount @@;
if (Amount < 500) then Level='1_Low ';
else if (Amount > 1500) then Level='3_High';
else Level='2_Avg ';
datalines;
110  237.18   002  567.89   234  118.50
743   74.38   411 1287.23   782  258.10
216  325.36   174  218.38   568 1670.80
302  134.71   285 2020.70   314   47.80
139 1183.45   775  330.54   425  780.10
506  895.80   239  620.10   011  420.18
672  979.66   142  810.25   738  670.85
192  314.58   243   87.50   263 1893.40
496  753.30   332  540.65   486 2580.35
614  230.56   654  185.60   308  688.43
784  505.14   017  205.48   162  650.42
289 1348.34   691   30.50   545 2214.80
517  940.35   382  217.85   024  142.90
478  806.90   107  560.72
;
```

In the SAS data set TravelExpense, the variable ID identifies the travel expense report. The variable Amount contains the dollar amount of the reported expense. The variable Level equals `1_Low', `2_Avg', or `3_High', depending on the value of Amount.

In the sample design for this audit, expense reports are stratified by Level. This ensures that each of these expense levels is included in the sample and also permits a disproportionate allocation of the sample, selecting proportionately more of the expense reports from the higher levels. Within strata, the sample of expense reports is selected with probability proportional to the amount of the expense, thus giving a greater chance of selection to larger expenses. In auditing terms, this is known as monetary-unit sampling. Refer to Wilburn (1984).

PROC SURVEYSELECT requires that the input data set be sorted by the STRATA variables. The following PROC SORT statements sort the TravelExpense data set by the stratification variable Level.

```   proc sort data=TravelExpense;
by Level;
run;
```

The following PROC PRINT statements display the sampling frame data set TravelExpense, which contains 41 observations.

```   title1 'Travel Expense Audit';
proc print data=TravelExpense;
run;
```

Output 63.3.1: Sampling Frame

 Travel Expense Audit

 Obs ID Amount Level 1 110 237.18 1_Low 2 234 118.50 1_Low 3 743 74.38 1_Low 4 782 258.10 1_Low 5 216 325.36 1_Low 6 174 218.38 1_Low 7 302 134.71 1_Low 8 314 47.80 1_Low 9 775 330.54 1_Low 10 011 420.18 1_Low 11 192 314.58 1_Low 12 243 87.50 1_Low 13 614 230.56 1_Low 14 654 185.60 1_Low 15 017 205.48 1_Low 16 691 30.50 1_Low 17 382 217.85 1_Low 18 024 142.90 1_Low 19 002 567.89 2_Avg 20 411 1287.23 2_Avg 21 139 1183.45 2_Avg 22 425 780.10 2_Avg 23 506 895.80 2_Avg 24 239 620.10 2_Avg 25 672 979.66 2_Avg 26 142 810.25 2_Avg 27 738 670.85 2_Avg 28 496 753.30 2_Avg 29 332 540.65 2_Avg 30 308 688.43 2_Avg 31 784 505.14 2_Avg 32 162 650.42 2_Avg 33 289 1348.34 2_Avg 34 517 940.35 2_Avg 35 478 806.90 2_Avg 36 107 560.72 2_Avg 37 568 1670.80 3_High 38 285 2020.70 3_High 39 263 1893.40 3_High 40 486 2580.35 3_High 41 545 2214.80 3_High

The following PROC SURVEYSELECT statements select a probability sample of expense reports from the TravelExpense data set using the stratified design with PPS selection within strata.

```   title1 'Travel Expense Audit';
proc surveyselect data=TravelExpense method=pps
n=(6 10 4)
seed=47279 out=AuditSample;
size Amount;
strata Level;
run;
```

The STRATA statement names the stratification variable Level. The SIZE statement specifies the size measure variable Amount. In the PROC SURVEYSELECT statement, the METHOD=PPS option requests sample selection with probability proportional to size and without replacement. The N=(6 10 4) option specifies the stratum sample sizes, listing the sample sizes in the same order that the strata apear in the TravelExpense data set. The sample size of 6 corresponds to the first stratum, Level = `1_Low', the sample size of 10 corresponds to the second stratum, Level = `2_Avg', and 4 corresponds to the last stratum, Level = `3_High'. The SEED=47279 option specifies '47279' as the initial seed for random number generation.

Figure 63.3.2 displays the output from PROC SURVEYSELECT. A total of 20 expense reports is selected for audit. The data set AuditSample contains the sample of travel expense reports.

Output 63.3.2: Sample Selection Summary

 Travel Expense Audit

 The SURVEYSELECT Procedure

 Selection Method PPS, Without Replacement Size Measure Amount Strata Variable Level

 Input Data Set TRAVELEXPENSE Random Number Seed 47279 Number of Strata 3 Total Sample Size 20 Output Data Set AUDITSAMPLE

The following PROC PRINT statements display the audit sample.

```   title1 'Travel Expense Audit Sample';
title2 'Sample Selected by Stratified PPS Design';
proc print data=AuditSample;
run;
```

Output 63.3.3: Audit Sample

 Travel Expense Audit Sample Selected by Stratified PPS Design

 Obs Level ID Amount SelectionProb SamplingWeight 1 1_Low 654 185.60 0.31105 3.21489 2 1_Low 017 205.48 0.34437 2.90385 3 1_Low 382 217.85 0.36510 2.73896 4 1_Low 614 230.56 0.38640 2.58797 5 1_Low 782 258.10 0.43256 2.31183 6 1_Low 775 330.54 0.55396 1.80518 7 2_Avg 784 505.14 0.34623 2.88823 8 2_Avg 332 540.65 0.37057 2.69853 9 2_Avg 002 567.89 0.38924 2.56909 10 2_Avg 239 620.10 0.42503 2.35278 11 2_Avg 738 670.85 0.45981 2.17479 12 2_Avg 496 753.30 0.51633 1.93676 13 2_Avg 425 780.10 0.53470 1.87022 14 2_Avg 478 806.90 0.55307 1.80810 15 2_Avg 672 979.66 0.67148 1.48925 16 2_Avg 139 1183.45 0.81116 1.23280 17 3_High 568 1670.80 0.64385 1.55316 18 3_High 263 1893.40 0.72963 1.37056 19 3_High 285 2020.70 0.77869 1.28421 20 3_High 486 2580.35 0.99435 1.00568

 Chapter Contents Previous Next Top