|Introduction to Survey Sampling and Analysis
Design Information for Survey Procedures
Survey sampling is the process of selecting a
probability-based sample from a finite population according
to a sample design. You then collect data from these
selected units and use them to estimate characteristics of the entire
A sample design encompasses the rules and operations by
which you select sampling units from the population and
the computation of sample statistics, which are estimates of
the population values of interest. The objective of your
survey often determines appropriate sample designs and valid
data collection methodology. A complex sample design often
includes stratification, clustering, multiple stages of
selection, and unequal weighting.
For more detailed information, refer to Cochran (1977),
Kalton (1983), Kish (1965), and Hansen, Hurwitz, and Madow
To select a sample with the SURVEYSELECT procedure and
analyze your survey data with the SURVEYMEANS and SURVEYREG
procedures, you need to specify sample design information to
those procedures. This information includes design strata,
clusters, and sampling weights.
Population refers to the target population or group of
individuals of interest for study. Often, the primary
objective is to estimate certain characteristics of this
population, called population values. A sampling
unit is an element or an individual in the target
population. A sample is a subset of the population that is
selected for the study.
Before you use the survey procedures, you should have a
well-defined target population, sampling units, and an
appropriate sample design.
In order to select a sample according to your sample design,
you need to have a list of sampling units in the population.
This is called a sampling frame. PROC SURVEYSELECT
selects a sample using this sampling frame.
Stratified sampling involves selecting samples
independently within strata, which are nonoverlapping subgroups of
the survey population. Stratification controls the
distribution of the sample size in the strata. It is widely
used in practice to meet a variety of survey
objectives. For example, with stratification you can ensure
adequate sample sizes for subgroups of interest, including
small subgroups, or you can use stratification to
improve the precision of overall estimates. To improve
precision, units within strata should be as homogeneous as
possible for the characteristics of interest.
Cluster sampling involves selecting clusters, which are
groups of sampling units. For example, clusters may be
schools, hospitals, or geographical areas, and sampling
units may be students, patients, or citizens. Cluster
sampling can provide efficiency in frame construction and
other survey operations. However, it can also result in a
loss in precision of your estimates, compared to a
nonclustered sample of the same size. To minimize this
effect, units within clusters should be as heterogeneous as
possible for the characteristics of interest.
In multistage sampling, you select an initial or
first-stage sample based on groups of elements in the
population, called primary sampling units or
Then you create a second-stage sample by drawing a
subsample from each selected PSU in the first-stage
sample. By repeating this operation, you can select a
If you include all the elements from a selected primary
sampling unit, then the two-stage sampling is a cluster
Sampling weights, or survey weights, are
positive values associated with each unit in your
sample. Ideally, the weight of a sampling unit should be the
"frequency" that the sampling unit represents
in the target population. Therefore, the sum of the weights
over the sample should estimate the population size N. If
you normalize the weights such that the sum of the weights
over the sample equals the population size N, then the
weighted sum of a characteristic y estimates the
population total value Y.
Often, sampling weights are the reciprocals of the selection
probabilities for the sampling units. When you use PROC
SURVEYSELECT, the procedure generates the sampling weight
component for each stage of the design, and you can multiply
these sampling weight components to obtain the final
sampling weights. Sometimes, sampling weights also include
nonresponse adjustments, post-sampling stratification, or
regression adjustments using supplemental information.
When the sampling units have unequal weights, you must
provide the weights to the survey analysis procedures.
If you do not specify sampling weights, the procedures
use equal weights in the analysis.
Population Totals and Sampling Rates
The ratio of the sample size (the number of sampling units
in the sample) n and the population size (the total number
of sampling units in the target population) N is written
This ratio is called the sampling rate or the sampling
fraction. If you select a sample without replacement, the
extra efficiency compared to selecting a sample with
replacement can be measured by the finite population
correction (fpc) factor, (1-f).
If your analysis should include a finite population
correction factor, you can input either the sampling rate or
the population total. Otherwise, the procedures do not use
the fpc when computing variance estimates. For fairly small
sampling fractions, it is appropriate to ignore this
correction. Refer to Cochran (1977) and Kish (1965).
f = [n/N]
As stated in the section "Variance Estimation", for a multistage
sample design, the variance estimation method depends only
on the first stage of the sample design. Therefore, if you
are specifying the sampling rate, you should input the
first-stage sampling rate, which is the ratio of the number
of PSUs in the sample to the total number of PSUs in the
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.