The PROC SURVEYSELECT statement also specifies the sample selection
method, the sample size, and other sample design parameters.
If you do not specify a selection method, PROC SURVEYSELECT uses
simple random sampling (METHOD=SRS) if there is no SIZE statement.
If you specify a SIZE statement but do not specify a selection
method, PROC SURVEYSELECT uses probability proportional to size
selection without replacement (METHOD=PPS). You must specify the
sample size or sampling rate
unless you request a method that selects two units
from each stratum (METHOD=PPS_BREWER or METHOD=PPS_MURTHY).
The following table lists the options available with the PROC
SURVEYSELECT statement. Descriptions follow in alphabetical order.
You can specify the following options in the PROC
SURVEYSELECT statement.
-
DATA=SAS-data-set
-
names the SAS data set from which PROC SURVEYSELECT
selects the sample. If you omit the DATA= option,
the procedure uses the most recently created SAS
data set. In sampling terminology, the input data
set is the sampling frame, or list of units
from which the sample is selected.
-
JTPROBS
-
includes joint probabilities of selection in the OUT= output data set.
This option is available for the following probability
proportional to size selection methods: METHOD=PPS,
METHOD=PPS_SAMPFORD, and METHOD=PPS_WR.
By default, PROC SURVEYSELECT outputs joint selection probabilities
for METHOD=PPS_BREWER and METHOD=PPS_MURTHY, which select two units
per stratum.
For more information on the contents of the output
data set, see the section "Output Data Set".
-
MAXSIZE
-
requests size measure adjustment by stratum maximum size measures, which
you provide in the secondary input data set variable _MAXSIZE_.
Use the MAXSIZE option when you have
already named the secondary input data set in another option, such as
SAMPSIZE=SAS-data-set, SAMPRATE=SAS-data-set, or
MINSIZE=SAS-data-set. You can name only one secondary
input data set in each invocation of the procedure.
If any size measure exceeds the maximum size measure for its stratum,
then PROC SURVEYSELECT adjusts this size measure downward to equal
the maximum size measure. Each maximum size measure must be a
positive number. The MAXSIZE option is available whenever you specify a
SIZE statement for probability proportional to size selection
and a STRATA statement for stratification.
If you want to specify a single maximum size value in the PROC
SURVEYSELECT statement, use the MAXSIZE=max
option.
-
MAXSIZE=max
-
specifies the maximum allowable size measure. If any size measure
exceeds the value max, then PROC SURVEYSELECT adjusts
this size measure to equal max. The maximum size measure
must be a positive number. This option is available whenever you
specify a SIZE statement for selection with probability proportional
to size.
If you request a stratified sample design with
a STRATA statement and specify the MAXSIZE= option,
PROC SURVEYSELECT uses the maximum size max for all
strata. If you do not want to use the same maximum size
for all strata, use the MAXSIZE=SAS-data-set
option to specify a maximum size for each stratum.
-
MAXSIZE=SAS-data-set
-
names a SAS data set that contains the maximum allowable size measures
for the strata. If any size measure exceeds the maximum size measure
for its stratum, then PROC SURVEYSELECT adjusts this size measure
downward to equal the maximum size measure. Each maximum size
measure must be a positive number. This option is available
whenever you specify a SIZE statement for probability proportional
to size selection and a STRATA statement for stratified selection.
The MAXSIZE= input data set should contain all the STRATA variables,
with the same type and length as in the DATA= data set. The STRATA
groups should appear in the same order in the MAXSIZE= data set
as in the DATA= data set. The MAXSIZE= data set should have
a variable named _MAXSIZE_ that contains the maximum size measure
for each stratum.
-
METHOD=name
- M=name
-
specifies the method for sample selection.
If you do not specify the METHOD= option, by default PROC SURVEYSELECT
uses simple random sampling (METHOD=SRS) if there is no SIZE statement.
If you specify a SIZE statement, the default selection method is
probability proportional to size without replacement (METHOD=PPS).
Valid values for name are as follows:
- PPS
-
requests selection with probability proportional to size
and without replacement.
See the section "PPS Sampling without Replacement" for details.
If you specify METHOD=PPS, you must name the size measure
variable in the SIZE statement.
- PPS_BREWER | BREWER
-
requests selection according to Brewer's method. Brewer's
method selects two units from each stratum with probability
proportional to size and without replacement.
See the section "Brewer's PPS Method" for details.
If you specify METHOD=PPS_BREWER, you must name the size measure
variable in the SIZE statement. You do not need to specify the
sample size with the SAMPSIZE= option, since Brewer's method selects
two units from each stratum.
- PPS_MURTHY | MURTHY
-
requests selection according to Murthy's method. Murthy's
method selects two units from each stratum with probability
proportional to size and without replacement.
See the section "Murthy's PPS Method" for details.
If you specify METHOD=PPS_MURTHY, you must name the size measure
variable in the SIZE statement. You do not need to specify the
sample size with the SAMPSIZE= option, since Murthy's method selects
two units from each stratum.
- PPS_SAMPFORD | SAMPFORD
-
requests selection according to Sampford's method. Sampford's method
selects units with probability proportional to size and without
replacement.
See the section "Sampford's PPS Method" for details.
If you specify METHOD=PPS_SAMPFORD, you must name the size measure
variable in the SIZE statement.
- PPS_SEQ | CHROMY
-
requests sequential selection with probability proportional to size
and with minimum replacement. This method is also known as Chromy's
method.
See the section "PPS Sequential Sampling" for details.
If you specify METHOD=PPS_SEQ, you must name the size measure
variable in the SIZE statement.
- PPS_SYS
-
requests systematic selection with probability proportional to
size.
See the section "PPS Systematic Sampling" for details on this method.
If you specify METHOD=PPS_SYS, you must name the size measure
variable in the SIZE statement.
- PPS_WR
-
requests selection with probability proportional to size and
with replacement.
See the section "PPS Sampling with Replacement" for details on this method.
If you specify METHOD=PPS_WR, you must name the size measure
variable in the SIZE statement.
- SEQ
-
requests sequential selection according to Chromy's method.
If you specify METHOD=SEQ and do not specify a size measure
with the SIZE statement, PROC SURVEYSELECT uses sequential
zoned selection with equal probability and without replacement.
See the section "Sequential Random Sampling" for details on this method.
If you specify METHOD=SEQ and also name a size measure in the
SIZE statement, PROC SURVEYSELECT uses METHOD=PPS_SEQ, which
is sequential selection with probability proportional to size
and with minimum replacement.
See the section "PPS Sequential Sampling" for details on this method.
- SRS
-
requests simple random sampling, which is selection with
equal probability and without replacement.
See the section "Simple Random Sampling" for details.
This method is the default if you do not specify
the METHOD= option and also do not specify a SIZE statement.
- SYS
-
requests systematic random sampling.
If you specify METHOD=SYS and do not specify a size measure
with the SIZE statement, PROC SURVEYSELECT uses systematic
selection with equal probability.
See the section "Systematic Random Sampling" for details on this method.
If you specify METHOD=SYS and also name a size measure in the
SIZE statement, PROC SURVEYSELECT uses METHOD=PPS_SYS, which
is systematic selection with probability proportional to size.
See the section "PPS Systematic Sampling" for details.
- URS
-
requests unrestricted random sampling, which is selection
with equal probability and with replacement.
See the section "Unrestricted Random Sampling" for details.
-
MINSIZE
-
requests size measure adjustment by the stratum minimum size measures, which
you provide in the secondary input data set variable _MINSIZE_.
Use the MINSIZE option when you have
already named the secondary input data set in another option, such as
SAMPSIZE=SAS-data-set, SAMPRATE=SAS-data-set, or
MAXSIZE=SAS-data-set. You can name only one secondary
input data set in each invocation of the procedure.
If any size measure is less than the minimum size measure for its stratum,
then PROC SURVEYSELECT adjusts this size measure upward to equal
the minimum size measure. Each minimum size measure must be a
positive number. The MINSIZE option is available whenever you specify a
SIZE statement for probability proportional to size selection
and a STRATA statement for stratification.
If you want to specify a single minimum size value in the PROC
SURVEYSELECT statement, use the MINSIZE=min
option.
-
MINSIZE=min
-
specifies the minimum allowable size measure. If any size measure
is less than the value min, then PROC SURVEYSELECT adjusts
this size measure upward to equal min. The minimum size measure
must be a positive number. This option is available whenever you
specify a SIZE statement for selection with probability proportional
to size.
If you request a stratified sample design with
a STRATA statement and specify the MINSIZE= option,
PROC SURVEYSELECT uses the minimum size min for all
strata. If you do not want to use the same minimum size
for all strata, use the MINSIZE=SAS-data-set
option to specify a minimum size for each stratum.
-
MINSIZE=SAS-data-set
-
names a SAS data set that contains the minimum allowable size measures
for the strata. If any size measure is less than the minimum size measure
for its stratum, then PROC SURVEYSELECT adjusts this size measure
upward to equal the minimum size measure. Each minimum size
measure must be a positive number. This option is available
whenever you specify a SIZE statement for probability proportional
to size selection and a STRATA statement for stratified selection.
The MINSIZE= input data set should contain all the STRATA variables,
with the same type and length as in the DATA= data set. The STRATA
groups should appear in the same order in the MINSIZE= data set
as in the DATA= data set. The MINSIZE= data set should have
a variable named _MINSIZE_ that contains the minimum size measure
for each stratum.
-
NOPRINT
-
suppresses the display of all output.
You can use the NOPRINT option when you want only to create an
output data set. Note that this option
temporarily disables the Output Delivery System (ODS).
For more information, see the chapter titled "Using the
Output Delivery System" in SAS/STAT User's Guide.
-
OUT=SAS-data-set
-
names the output data set that contains the sample.
If you omit the OUT= option,
the data set is named DATAn, where n is the
smallest integer that makes the name unique.
The output data set contains the units selected for the
sample, as well as design information and selection statistics,
depending on the selection method and output options you specify.
See the descriptions for the options
JTPROBS, OUTSIZE,
and STATS.
For information on the contents of the output
data set, see the section "Output Data Set".
-
OUTSIZE
-
includes additional design and sampling frame parameters in the output data set.
If you specify the OUTSIZE option, PROC SURVEYSELECT includes the
sample size or sampling rate in the output data set. When you request
the OUTSIZE option and also specify the SIZE statement, the procedure
outputs the size measure total for the sampling frame.
If you do not
specify the SIZE statement, the procedure outputs the total
number of sampling units in the frame. Also, PROC SURVEYSELECT
includes the minimum size measure if you specify the MINSIZE=
option and the maximum size measure if you specify the MAXSIZE=
option.
If you have a stratified design, the output data set includes
the stratum-level values of these parameters. Otherwise, the
output data set includes the overall population-level values.
For information on the contents of the output
data set, see the section "Output Data Set".
-
OUTSORT=SAS-data-set
-
names an output data set that contains the sorted input data
set. This option is available when you specify a CONTROL
statement for systematic or sequential selection methods
(METHOD=SYS, METHOD=PPS_SYS, METHOD=SEQ, and METHOD=PPS_SEQ).
PROC SURVEYSELECT sorts
the input data set by the CONTROL variables within strata
before selecting the sample.
If you specify CONTROL variables but do not name an output
data set with the OUTSORT= option, then the sorted data
set replaces the input data set.
-
REP=nrep
-
specifies the number of sample replicates. If you
specify the REP= option, PROC SURVEYSELECT selects
nrep independent samples, each with the same
specified sample size or sampling rate
and the same sample design.
You can use replicated sampling to provide a
simple method of variance estimation for any form of
statistic, as well as to evaluate variable nonsampling
errors such as interviewer differences. Refer to
Kish (1965), Kish (1987), and Kalton (1983) for
information on replicated sampling.
-
SAMPRATE=r
- RATE=r
-
specifies the sampling rate, which is the proportion of units selected
for the sample. The sampling rate r must be a positive number.
You can specify r as a number between 0 and
1. Or you can specify r in percentage form as a number between
1 and 100, and PROC SURVEYSELECT converts that number to a proportion.
The procedure treats the value 1 as 100%, and not the percentage
form 1%.
The SAMPRATE= option is available only for equal probability
selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ).
For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses
the inverse of the sampling rate r as the interval.
See the section "Systematic Random Sampling" for details.
For other selection methods, PROC SURVEYSELECT converts the
sampling rate r to the sample size before selection,
multiplying the rate by the number of units in the stratum
or frame and rounding up to the nearest integer.
If you request a stratified sample design with
a STRATA statement and specify the SAMPRATE=r option,
PROC SURVEYSELECT uses the sampling rate r for each
stratum. If you do not want to use the same sampling rate
for each stratum, use the SAMPRATE=(values) option or the
SAMPRATE=SAS-data-set option to specify a sampling rate
for each stratum.
-
SAMPRATE=(values)
- RATE=(values)
-
specifies sampling rates for the strata. You can separate values
with blanks or commas. The number of SAMPRATE= values must equal the
number of strata in the input data set.
List the stratum sampling rate values in the order in which the strata
appear in the input data set.
If you use the SAMPRATE=(values) option,
the input data set must be sorted by the STRATA variables in ascending
order. You cannot use the DESCENDING or NOTSORTED options in the STRATA
statement.
Each stratum sampling rate value must be a positive number.
You can specify each value as a number between 0 and
1. Or you can specify a value in percentage form as a number between
1 and 100, and PROC SURVEYSELECT converts that number to a proportion.
The procedure treats the value 1 as 100%, and not the percentage
form 1%.
The SAMPRATE= option is available only for equal probability
selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ).
For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses
the inverse of the stratum sampling rate as the interval for the stratum.
See the section "Systematic Random Sampling" for details on systematic sampling.
For other selection methods, PROC SURVEYSELECT converts the
stratum sampling rate to a stratum sample size before selection,
multiplying the rate by the number of units in the stratum
and rounding up to the nearest integer.
-
SAMPRATE=SAS-data-set
- RATE=SAS-data-set
-
names a SAS data set that contains sampling rates for the strata.
This input data set should contain all the STRATA variables, with
the same type and length as in the DATA= data set. The STRATA
groups should appear in the same order in the SAMPSIZE= data set
as in the DATA= data set. The SAMPRATE= data set should have
a variable _RATE_ that contains the sampling rate for each stratum.
Each sampling rate value must be a positive number.
You can specify each value as a number between 0 and
1. Or you can specify a value in percentage form as a number between
1 and 100, and PROC SURVEYSELECT converts that number to a proportion.
The procedure treats the value 1 as 100%, and not the percentage
form 1%.
The SAMPRATE= option is available only for equal probability
selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ).
For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses
the inverse of the stratum sampling rate as the interval for the stratum.
See the section "Systematic Random Sampling" for details.
For other selection methods, PROC SURVEYSELECT converts the
stratum sampling rate to the stratum sample size before selection,
multiplying the rate by the number of units in the stratum
and rounding up to the nearest integer.
-
SAMPSIZE=n
- N=n
-
specifies the sample size, which is the number of units selected
for the sample. The sample size n must be a positive integer.
For methods that select without replacement, the sample size n
must not exceed the number of units in the input data set.
If you request a stratified sample design with
a STRATA statement and specify the SAMPSIZE=n option,
PROC SURVEYSELECT selects n units from each
stratum. For methods that select without replacement, the
sample size n must not exceed the number of units in any
stratum. If you do not want to select the same number of units
from each stratum, use the SAMPSIZE=(values) option or the
SAMPSIZE=SAS-data-set option to specify different
sample sizes for the strata.
-
SAMPSIZE=(values)
- N=(values)
-
specifies sample sizes for the strata. You can separate values
with blanks or commas. The number of SAMPSIZE= values must equal the
number of strata in the input data set.
List the stratum sample size values in the order in which the strata
appear in the input data set.
If you use the SAMPSIZE=(values) option,
the input data set must be sorted by the STRATA variables in ascending
order. You cannot use the DESCENDING or NOTSORTED options in the STRATA
statement.
Each stratum sample size value must be
a positive integer. For methods that select without replacement,
the sample size for a stratum must not exceed the number of units in
that stratum.
-
SAMPSIZE=SAS-data-set
- N=SAS-data-set
-
names a SAS data set that contains the sample sizes for the strata.
This input data set should contain all the STRATA variables, with
the same type and length as in the DATA= data set. The STRATA
groups should appear in the same order in the SAMPSIZE= data set
as in the DATA= data set. The SAMPSIZE= data set should have
a variable _NSIZE_ that contains the sample size for each stratum.
Each sample size value must be a positive integer. For methods
that select without replacement, the stratum sample size must
not exceed the number of units in the stratum.
-
SEED=number
-
specifies the initial seed for random number generation.
The value of the SEED= option must be a positive integer.
If you do not specify the SEED= option, PROC SURVEYSELECT uses
the time of day from the computer's clock to obtain the initial
seed.
-
SORT=NEST | SERP
-
specifies the type of sorting by CONTROL variables.
The option SORT=NEST requests nested sorting, and SORT=SERP requests
hierarchic serpentine sorting.
The default is SORT=SERP.
See the section "Sorting by CONTROL Variables" for descriptions of
serpentine and nested sorting.
Where there is only one CONTROL variable, the two types of
sorting are equivalent.
This option is available when you specify a CONTROL
statement for systematic or sequential selection methods
(METHOD=SYS, METHOD=PPS_SYS, METHOD=SEQ, and METHOD=PPS_SEQ).
PROC SURVEYSELECT sorts
the input data set by the CONTROL variables within strata
before selecting the sample.
-
STATS
-
includes selection probabilities and sampling weights in the
OUT= output data set for equal probability selection methods
when you do not specify a STRATA statement.
This option is available for the folowing equal probability selection
methods: METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ.
For PPS selection methods and stratified designs, the
output data set contains selection probabilities and sampling
weights by default.
For more information on the contents of the output
data set, see the section "Output Data Set".