Chapter Contents Previous Next
 The SURVEYREG Procedure

Example 62.6: Stratum Collapse

In a stratified sample, it is possible that some strata will have only one sampling unit. When this happens, PROC SURVEYREG collapses these strata that contain single sampling unit into a pooled stratum. For more detailed information on stratum collapse, see the section "Stratum Collapse".

Suppose that you have the following data.

```   data Sample;
input Stratum X Y;
datalines;
10 0 0
10 1 1
11 1 1
11 1 2
12 3 3
33 4 4
14 6 7
12 3 4
;
```

The variable Stratum is the stratification variable, the variable X is the independent variable, and the variable Y is the dependent variable. You want to regress Y on X. In the data set Sample, both Stratum=33 and Stratum=14 contain one observation. By default, PROC SURVEYREG collapses these strata into one pooled stratum in the regression analysis.

To input the finite population correction information, you create the SAS data set StratumTotal.

```   data StratumTotal;
input Stratum _TOTAL_;
datalines;
10 10
11 20
12 32
33 40
33 45
14 50
15  .
66 70
;
```

The variable Stratum is the stratification variable, and the variable _TOTAL_ contains the stratum totals. The data set StratumTotal contains more strata than the data set Sample. Also in the data set StratumTotal, more than one observation contains the stratum totals for Stratum=33.

```   33 40
33 45
```
PROC SURVEYREG allows this type of input. The procedure simply ignores the strata that are not present in the data set Sample; for the multiple entries of a stratum, the procedure uses the first observation. In this example, Stratum=33 has the stratum total _TOTAL_=40.

The following SAS statements perform the regression analysis.

```   title1 'Stratified Sample with Single Sampling Unit in Strata';
title2 'With Stratum Collapse';
proc SURVEYREG data=Sample total=StratumTotal;
strata Stratum/list;
model Y=X;
run;
```

Output 62.6.1: Summary of Data and Regression

 Stratified Sample with Single Sampling Unit in Strata With Stratum Collapse

 The SURVEYREG Procedure Regression Analysis for Dependent Variable Y

 Data Summary Number of Observations 8 Mean of Y 2.75000 Sum of Y 22.00000

 Design Summary Number of Strata 5 Number of Strata Collapsed 2

 Fit Statistics R-square 0.9555 Root MSE 0.5129 Denominator DF 4

Output 62.6.1 displays that there are a total of 5 strata in the input data set, and 2 strata are collapsed into a pooled stratum. The denominator degrees of freedom is 4, due to the collapse (see the section "Denominator Degrees of Freedom").

Output 62.6.2: Stratification Information

 Stratified Sample with Single Sampling Unit in Strata With Stratum Collapse

 The SURVEYREG Procedure Regression Analysis for Dependent Variable Y

 Stratum Information StratumIndex Collapsed Stratum N Obs Population Total SamplingRate 1 10 2 10 0.20 2 11 2 20 0.10 3 12 2 32 0.06 4 Yes 14 1 50 0.02 5 Yes 33 1 40 0.03 0 Pooled 2 90 0.02

 NOTE: Strata with only one observation are collapsed into the stratum with Stratum Index "0".

Output 62.6.2 displays the stratification information, including stratum collapse. Under the column Collapsed, the fourth (Stratum Index=4) stratum and the fifth (Stratum Index=5) stratum are marked as "Yes," which indicates that these two strata are collapsed into the pooled stratum (Stratum Index=0). The sampling rate for the pooled stratum is 2%, which combined from the 4th stratum and the 5th stratum (see the section "Sampling Rate of the Pooled Stratum from Collapse").

Output 62.6.3: Parameter Estimates and Effect Tests

 Stratified Sample with Single Sampling Unit in Strata With Stratum Collapse

 The SURVEYREG Procedure Regression Analysis for Dependent Variable Y

 Tests of Model Effects Effect Num DF F Value Pr > F Model 1 155.62 0.0002 Intercept 1 0.24 0.6503 X 1 155.62 0.0002

 NOTE: The denominator degrees of freedom for the F tests is 4.

 Estimated Regression Coefficients Parameter Estimate Standard Error t Value Pr > |t| Intercept 0.13004484 0.26578532 0.49 0.6503 X 1.10313901 0.08842825 12.47 0.0002

 NOTE: The denominator degrees of freedom for the t tests is 4.

Output 62.6.3 displays the parameter estimates and the tests of the significance of the model effects.

Alternatively, if you prefer not to collapse the strata that have single sampling unit, you can specify the NOCOLLAPSE option in the STRATA statement.

```   title1 'Stratified Sample with Single Sampling Unit in Strata';
title2 'Without Stratum Collapse';
proc SURVEYREG data=Sample total=StratumTotal;
strata Stratum/list nocollapse;
model Y = X;
run;
```

Output 62.6.4: Summary of Data and Regression

 Stratified Sample with Single Sampling Unit in Strata Without Stratum Collapse

 The SURVEYREG Procedure Regression Analysis for Dependent Variable Y

 Data Summary Number of Observations 8 Mean of Y 2.75000 Sum of Y 22.00000

 Design Summary Number of Strata 5

 Fit Statistics R-square 0.9555 Root MSE 0.5129 Denominator DF 3

Output 62.6.4 does not contain stratum collapse information as compared to Output 62.6.1. The denominator degrees of freedom is 3 instead of 4 as in Output 62.6.1.

Output 62.6.5: Stratification Information

 Stratified Sample with Single Sampling Unit in Strata Without Stratum Collapse

 The SURVEYREG Procedure Regression Analysis for Dependent Variable Y

 Stratum Information StratumIndex Stratum N Obs Population Total SamplingRate 1 10 2 10 0.20 2 11 2 20 0.10 3 12 2 32 0.06 4 14 1 50 0.02 5 33 1 40 0.03

In Output 62.6.5, although the fourth stratum and the fifth stratum contain only one observation, no stratum collapse occurs as in Output 62.6.2.

Output 62.6.6: Parameter Estimates and Effect Tests

 Stratified Sample with Single Sampling Unit in Strata Without Stratum Collapse

 The SURVEYREG Procedure Regression Analysis for Dependent Variable Y

 Tests of Model Effects Effect Num DF F Value Pr > F Model 1 391.94 0.0003 Intercept 1 0.25 0.6508 X 1 391.94 0.0003

 NOTE: The denominator degrees of freedom for the F tests is 3.

 Estimated Regression Coefficients Parameter Estimate Standard Error t Value Pr > |t| Intercept 0.13004484 0.25957741 0.50 0.6508 X 1.10313901 0.05572135 19.80 0.0003

 NOTE: The denominator degrees of freedom for the t tests is 3.

As a result of not collapsing strata, the standard error estimates of the parameters are different from those in Output 62.6.3, the tests of the significance of model effects are different as well.

 Chapter Contents Previous Next Top