Chapter Contents



Creating Variables

Ways to Create Variables

You can create variables in a DATA step in the following ways:

Note:   You can also create variables with the FGET function. See SAS Language Reference: Dictionary for more information.  [cautionend]

Using an Assignment Statement

In a DATA step, you can create a new variable and assign it a value by using it for the first time on the left side of an assignment statement. SAS determines the length of a variable from its first occurrence in the DATA step. The new variable gets the same type and length as the expression on the right side of the assignment statement.

When the type and length of a variable are not explicitly set, SAS gives the variable a default type and length as shown in the examples in the following table.

Resulting Variable Types and Lengths Produced When Not Explicitly Set
Expression Example Resulting Type of X Resulting Length of X Explanation
Numeric variable length a 4;


Numeric variable 8 Default numeric length (8 bytes unless otherwise specified)
Character variable length a $ 4;


Character variable 4 Length of source variable
Character literal x='ABC';


Character variable 3 Length of first literal encountered
Concatenation of variables length a $ 4

b $ 6

c $ 2;


Character variable 12 Sum of the lengths of all variables
Concatenation of variables and literal length a $ 4;



Character variable 7 Sum of the lengths of variables and literals encountered in first assignment statement

If a variable appears for the first time on the right side of an assignment statement, SAS assumes that it is a numeric variable and that its value is missing. If no later statement gives it a value, SAS prints a note in the log that the variable is uninitialized.

Note:   A RETAIN statement initializes a variable and can assign it an initial value, even if the RETAIN statement appears after the assignment statement.  [cautionend]

Reading Data with the INPUT Statement in a DATA Step

When you read raw data in SAS by using an INPUT statement, you define variables based on positions in the raw data. You can use one of the following methods with the INPUT statement to provide information to SAS about how the raw data is organized:

See SAS Language Reference: Dictionary for more information about using each method.

The following example uses simple list input to create a SAS data set named GEMS and defines four variables based on the data provided:

data gems;
   input Name $ Color $ Carats Owner $;
emerald green 1 smith
sapphire blue 2 johnson
ruby red 1 clark

Specifying a New Variable in a FORMAT or an INFORMAT Statement

You can create a variable and specify its format or informat with a FORMAT or an INFORMAT statement. For example, the following FORMAT statement creates a variable named Sale_Price with a format of 6.2 in a new data set named SALES:

data sales;
   format Sale_Price 6.2;
SAS creates a numeric variable with the name Sale_Price and a length of 8.

See SAS Language Reference: Dictionary for more information about using the FORMAT and INFORMAT statements.

Specifying a New Variable in a LENGTH Statement

You can use the LENGTH statement to create a variable and set the length of the variable, as in the following example:

data sales; 
   length Salesperson $20;

For character variables, you must allow for the longest possible value in the first statement that uses the variable, because you cannot change the length with a subsequent LENGTH statement within the same DATA step. The maximum length of any character variable in the SAS System is 32,767 bytes. For numeric variables, you can change the length of the variable by using a subsequent LENGTH statement.

When SAS assigns a value to a character variable, it pads the value with blanks or truncates the value on the right side, if necessary, to make it match the length of the target variable. Consider the following statements:

length address1 address2 address3 $ 200;

Because the length of ADDRESS3 is 200 bytes, only the first 200 bytes of the concatenation (the value of ADDRESS1) are assigned to ADDRESS3. You might be able to avoid this problem by using the TRIM function to remove trailing blanks from ADDRESS1 before performing the concatenation, as follows:


See SAS Language Reference: Dictionary for more information about using the LENGTH statement.

Specifying a New Variable in an ATTRIB Statement

The ATTRIB statement enables you to specify one or more of the following variable attributes for an existing variable:

If the variable does not already exist, one or more of the FORMAT=, INFORMAT=, and LENGTH= attributes can be used to create a new variable. For example, the following DATA step creates a variable named Flavor in a data set named LOLLIPOPS:
data lollipops;
   attrib Flavor format=$10.;

Note:   You cannot create a new variable by using a LABEL statement or the ATTRIB statement's LABEL= attribute by itself; labels can only be applied to existing variables.  [cautionend]

See SAS Language Reference: Dictionary for more information about using the ATTRIB statement.

Using the IN= Data Set Option

The IN= data set option creates a special boolean variable that indicates whether the data set contributed data to the current observation. The variable has a value of 1 when true, and a value of 0 when false. You can use IN= on the SET, MERGE, and UPDATE statements in a DATA step.

The following example shows a merge of the OLD and NEW data sets where the IN= option is used to create a variable named X that indicates whether the NEW data set contributed data to the observation:

data master missing;
   merge old new(in=x);
   by id;
   if x=0 then output missing;
   else output master;

Chapter Contents



Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.