Reads an observation from one or more SAS data sets
| Valid: |
in a DATA step
|
| Category: |
File-handling
|
| Type: |
Executable
|
SET<SAS-data-set(s)
<(data-set-options(s) )>>
<options>;
|
When you do
not specify an argument, the SET statement
reads an observation from the most recently created data set.
-
SAS-data-set
-
specifies a one-level name, a two-level
name, or one of the special SAS data set names.
-
(data-set-options)
-
specifies actions SAS is to take when it
reads variables or observations into the program data vector for processing.
-
END=variable
-
creates and names a temporary variable
that contains an end-of-file indicator. The variable, which is initialized
to zero, is set to 1 when SET reads the last observation of the last data
set listed. This variable is not added to any new data set.
-
KEY=index
-
provides nonsequential access to observations
in a SAS data set, which are based on the value of an index variable or a
key.
-
NOBS=variable
-
creates and names a temporary variable whose
value is usually the total number of observations in the input data set or
data sets. If more than one data set is listed in the SET statement, NOBS=
the total number of observations in the data sets that are listed. The number
of observations includes those that are marked for deletion but are not yet
deleted.
-
OPEN=(IMMEDIATE
| DEFER)
-
allows you to delay the opening of any concatenated
SAS data sets until they are ready to be processed.
-
IMMEDIATE
-
during the compilation phase, opens all
data sets that are listed in the SET statement.
-
DEFER
-
opens the first data set during the compilation
phase, and opens subsequent data sets during the execution phase. When the
DATA step reads and processes all observations in a data set, it closes the
data set and opens the next data set in the list.
| Restriction: |
When you specify
the DEFER option, you cannot use the KEY= statement option, the POINT= statement
option, or the BY statement. These constructs imply either random processing
or interleaving of observations from the data sets, which is not possible
unless all data sets are open. |
| Requirement: |
You can use the
DROP=, KEEP=, or RENAME= data set options to process a set of variables, but
the set of variables that are processed for each data set must be identical.
In most cases, if the set of variables defined by any subsequent data set
differs from that defined by the first data set, SAS prints a warning message
to the log but does not stop execution. Exceptions to this behavior are
-
If a variable on a subsequent data set is of a
different type (character versus numeric, for example) than that of the same-named
variable on the first data set, the DATA step will stop processing and produce
an error message.
-
If a variable on a subsequent data set was not
defined by the first data set in the SET statement, but was defined proviously
in the DATA step program, the DATA step will stop processing and produce an
error message. In this case, the value of the variable in previous iterations
may be incorrect because the semantic behavior of SET requires this variable
to be set to missing when processing the first observation of the first data
set.
|
-
POINT=variable
-
specifies a temporary variable whose numeric
value determines which observation is read. POINT= causes the SET statement
to use random (direct) access to read a SAS data set.
- CAUTION:
- Continuous loops can occur when
you use the POINT= option.
When you use the
POINT= option, you must include a STOP statement to stop DATA step processing,
programming logic that checks for an invalid value of the POINT= variable,
or both. Because POINT= reads only those observations that are specified in
the DO statement, SAS cannot read an end-of-file indicator as it would if
the file were being read sequentially. Because reading an end-of-file indicator
ends a DATA step automatically, failure to substitute another means of ending
the DATA step when you use POINT= can cause the DATA step to go into a continuous
loop. If SAS reads an invalid value of the POINT= variable, it sets the automatic
variable _ERROR_ to 1. Use this information to check for conditions that cause
continuous DO-loop processing, or include a STOP statement at the end of the
DATA step, or both.
![[cautend]](../common/images/cautend.gif)
-
UNIQUE
-
causes a KEY= search always to begin at
the top of the index for the data set that is being read.
Each time the SET statement is executed, SAS reads one
observation into the program data vector. SET reads all variables and all
observations from the input data sets unless you tell SAS to do otherwise.
A SET statement can contain multiple data sets; a DATA step can contain multiple
SET statements. See Combining and Modifying SAS Data Sets: Examples
.
The SET statement is flexible and has a variety of uses
in SAS programming. These uses are determined by the options and statements
that you use with the SET statement. They include
Only one BY statement can accompany each SET statement
in a DATA step. The BY statement should immediately follow the SET statement
to which it applies. The data sets that are listed in the SET statement must
be sorted by the values of the variables that are listed in the BY statement,
or they must have an appropriate index. SET when it is used with a BY statement
interleaves data sets. The observations in the new data set are arranged by
the values of the BY variable or variables, and within each BY group, by the
order of the data sets in which they occur. See Interleaving SAS Data Sets
for an example of BY group processing with the SET statement.
Use a single SET statement with multiple data
sets that are specified
to concatenate the specified data sets. That is, the number of observations
in the new data set is the sum of the number of observations in the original
data sets, and the order is all the observations from the first data set followed
by all observations from the second data set, and so on. See Concatenating SAS Data Sets
for an example of concatenating data sets.
Use a single SET statement with a BY statement to interleave
the specified data sets. The observations in the new data set are arranged
by the values of the BY variable or variables, and within each BY group, by
the order of the data sets in which they occur. See Interleaving SAS Data Sets
for an example of interleaving data sets.
Use multiple SET statements to perform one-to-one reading
(also called one-to-one matching) of the specified data sets. The new data
set contains all the variables from all the input data sets. The number of
observations in the new data set is the number of observations in the smallest
original data set. If the data sets contain common variables, the values that
are read in from the last data set replace those read in from earlier ones.
See Combining One Observation with Many, Performing a Table-Lookup, and
Performing a Table-Lookup When the Master File Contains Duplicate Observations
for examples of one-to-one reading of data sets.
For extensive examples, see
Combining and Modifying SAS Data Sets: Examples
.
If more than one data set name appears in the SET statement,
the resulting output data set is a concatenation of all the data sets that
are listed. SAS reads all observations from the first data set, then all from
the second data set, and so on until all observations from all the data sets
have been read. This example concatenates the three SAS data sets into one
output data set named FITNESS:
data fitness;
set health exercise well;
run;
To interleave two or more SAS data sets, use a BY statement
after the SET statement:
data april;
set payable recvable;
by account;
run;
In this DATA step, each observation in the data set
NC.MEMBERS is read into the program data vector. Only those observations
whose value of CITY is
Raleigh are output to the new data
set RALEIGH.MEMBERS:
data raleigh.members;
set nc.members;
if city='Raleigh';
run;
An observation to be merged into an exisitng data set
can be one that is created by a SAS procedure or another DATA step. In this
example, the data set AVGSALES has only one observation:
data national;
if _n_=1 then set avgsales;
set totsales;
run;
In this example, SAS treats each SET statement independently;
that is, it reads from one data set as if it were reading from two separate
data sets:
data drugxyz;
set trial5(keep=sample);
if sample>2;
set trial5;
run;
For each iteration of the DATA step, the first SET statement
reads one observation. The next time the first SET statement is executed,
it reads the next observation. Each SET statement can read different observations
with the same iteration of the DATA step.
You can subset observations from one data set and combine
them with observations from another data set by using direct access methods,
as follows:
data south;
set revenue;
if region=4;
set expense point=_n_;
run;
This example illustrates using the KEY= option to perform
a table-lookup. The DATA step reads a primary data set that is named INVTORY
and a lookup data set that is named PARTCODE. It uses the index PARTNO to
read PARTCODE nonsequentially, by looking for a match between the PARTNO value
in each data set. The purpose is to obtain the appropriate description, which
is available only in the variable DESC in the lookup data set, for each part
that is listed in the primary data set:
data combine;
set invtory(keep=partno instock price);
set partcode(keep=partno desc) key=partno;
run;
This example uses the KEY= option to perform a table
lookup. The DATA step reads a primary data set that is named INVTORY, which
is indexed on PARTNO, and a lookup data set named PARTCODE. PARTCODE contains
quantities of new stock (variable NEW_STK). The UNIQUE option ensures that,
if there are any duplicate observations in INVTORY, values of NEW_STK are
added only to the first observation of the group:
data combine;
set partcode(keep=partno new_stk);
set invtory(keep=partno instock price)
key=partno/unique;
instock=instock+new_stk;
run;
These statements select a subset of 50 observations
from the data set DRUGTEST by using the POINT= option to access observations
directly by number:
data sample;
do obsnum=1 to 100 by 2;
set drugtest point=obsnum;
if _error_ then abort;
output;
end;
stop;
run;
These statements use NOBS= to set the termination value
for DO-loop processing. The value of the temporary variable LAST is the sum
of the observations in SURVEY1 and SURVEY2:
do obsnum=1 to last by 100;
set survey1 survey2 point=obsnum nobs=last;
output;
end;
stop;
This example uses the END= variable LAST to tell SAS
to assign a value to the variable REVENUE and write an observation only after
the last observation of RENTAL has been read:
set rental end=last;
totdays + days;
if last then
do;
revenue=totdays*65.78;
output;
end;
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.