![]() Chapter Contents |
![]() Previous |
![]() Next |
| INFILE |
| Valid: | in a DATA Step |
| Category: | File-handling |
| Type: | Executable |
Syntax |
| INFILE file-specification <options> <host-options>; |
| INFILE DBMS-specifications; |
| Arguments |
| Requirement: | You must have previously associated the fileref with an external file in a FILENAME statement, FILENAME function, or an appropriate operating environment command. |
| See: | FILENAME |
Operating Environment Information:
Different operating environments call an aggregate grouping of files by different names, such as a directory, a MACLIB, or a partitioned data set. For details on how to specify external files, see the SAS documentation for your operating environment.
| Requirement: | You must have previously associated the fileref with an external file in a FILENAME statement, a FILENAME function, or an appropriate operating environment command. |
| See: | FILENAME |
| Alias: | DATALINES | DATALINES4 |
| Alias: | CARDS | CARDS4 |
| Featured in: | Changing How Delimiters are Treated |
| Default: |
dependent on the operating
environment
Operating Environment Information: For details, see the SAS documentation for your operating environment.![]() |
| Alias: | COL= |
| See Also: | LINE= |
| Featured in: | Listing the Pointer Location |
| Requirement: | Enclose the list of characters in quotation marks. |
| Featured in: | Changing How Delimiters are Treated |
| Alias: | DLM= |
| Default: | blank space |
| Tip: | DELIMITER= allows you to use list input even when the data are separated by characters other than spaces. |
| See: | Reading Delimited Data |
| See Also: | DSD option |
| Featured in: | Changing How Delimiters are Treated |
| Restriction: | You cannot use the END= option with |
| Tip: | Use the option EOF= when END= is invalid. |
| Featured in: | Reading from Multiple Input Files |
| Interaction: | Use EOF= instead of the END= option with |
| Tip: | The EOF= option is useful when you read from multiple input files sequentially. |
| See Also: | END=, EOV=, and UNBUFFERED |
| Tip: | Reset the EOV= variable back to 0 after SAS encounters each boundary. |
| See Also: | END= and EOF= |
| Default: | NOEXPANDTABS |
| Tip: | EXPANDTABS is useful when you read data that contain the tab character that is native to your operating environment. |
| Tip: | Use a LENGTH statement to make the variable length long enough to contain the value of the filename. |
| See Also: | FILEVAR= |
| Featured in: | Reading from Multiple Input Files |
| Default: | 1 |
| Tip: | Use FIRSTOBS= with OBS= to read a range of records from the middle of a file. |
| Example: |
This statement processes
record 50 through record 100:
infile file-specification firstobs=50 obs=100; |
| See: | Reading Past the End of a Line |
| See Also: | MISSOVER, STOPOVER, and TRUNCOVER |
| Tip: | This option in conjunction with the $VARYING informat is useful when the field width varies. |
| Featured in: | Reading Files That Contain Variable-Length Records and Truncating Copied Records |
Operating Environment Information:
Values for line-size are dependent on the operating environment record size. For details, see the SAS documentation for your operating environment.
Operating Environment Information:
Values for logical-record-length are dependent on the operating environment. For details, see the SAS documentation for your operating environment.
| Default: | dependent on the operating environment's file characteristics. |
| Tip: | LRECL= specifies the physical line length of the file. LINESIZE= tells the INPUT statement how much of the line to read. |
| Tip: | Use OBS= with FIRSTOBS= to read a range of records from the middle of a file. |
| Example: |
This statement processes
only the first 100 records in the file:
infile file-specification obs=100; |
| Default: | NOPAD |
| See Also: | LRECL= |
| Tip: | To read a print file in a DATA step without having to remove the carriage control characters, specify PRINT. To read the carriage control characters as data values, specify NOPRINT. |
Operating Environment Information:
Values for record-format are dependent on the operating environment. For details, see the SAS documentation for your operating environment.![[cautend]](../common/images/cautend.gif)
| See Also: | _INFILE_ option in the PUT statement |
| Tip: | Use FLOWOVER to reset the default behavior. |
| See: | Reading Past the End of a Line |
| See Also: | FLOWOVER, MISSOVER, SCANOVER, and TRUNCOVER |
| Featured in: | Handling Missing Values and Short Records with List Input |
| Alias: | UNBUF |
| Interaction: | When you use UNBUFFERED, SAS never sets the END= variable to 1. |
| Tip: | When you read in-stream data with a DATALINES statement, UNBUFFERED is in effect. |
Operating Environment Information:
| Details |
Operating Environment Information:
You can read from multiple input files in a single iteration of the DATA step in one of two ways:
To
update individual fields within a record instead
of the entire record, use the SHAREBUFFERS option.
put _infile_ $hex100.;outputs the contents of the input buffer using a hexadecimal format.
data scores; infile datalines delimiter=','; input test1 test2 test3; datalines; 91,87,95 97,,92 ,1,1 ;With the FLOWOVER option in effect, the data set SCORES contains two, not three, observations. The second observation is built incorrectly:
| OBS | TEST1 | TEST2 | TEST3 | |
|---|---|---|---|---|
| 1 | 91 | 87 | 95 | |
| 2 | 97 | 92 | 1 | |
infile datalines dsd;Now the INPUT statement detects the two consecutive delimiters and therefore assigns a missing value to variable TEST 2 in the second observation.
| OBS | TEST1 | TEST2 | TEST3 | |
|---|---|---|---|---|
| 1 | 91 | 87 | 95 | |
| 2 | 97 | . | 92 | |
| 3 | 1 | 1 | 1 | |
By default, if the INPUT statement tries to read past the end of the current input data record, it moves the input pointer to column 1 of the next record to read the remaining values. This default behavior is specified by the FLOWOVER option. A message is written to the SAS log:
NOTE: SAS went to a new line when INPUT @'CHARACTER_STRING' scanned past the end of a line.The STOPOVER option treats this condition as an error and stops building the data set. The MISSOVER option sets the remaining INPUT statement variables to missing values. The SCANOVER option scans the input record until it finds the specified character-string. The FLOWOVER option restores the default behavior.
For example, an external file with variable-length records contains these records:
----+----1----+----2 1 22 333 4444 55555The following DATA step reads these data to create a SAS data set. Only one of the input records is as long as the informatted length of the variable TESTNUM.
data numbers; infile 'external-file'; input testnum 5.; run;This DATA step creates the three observations from the five input records because by default the FLOWOVER option is used to read the input records.
infile 'external-file' truncover;The DATA step now reads the same input records and creates five observations. See The Value of TESTNUM Using Different INFILE Statement Options to compare the SAS data sets.
| OBS | FLOWOVER | MISSOVER | TRUNCOVER |
|---|---|---|---|
| 1 | 22 | . | 1 |
| 2 | 4444 | . | 22 |
| 3 | 55555 | . | 333 |
| 4 | . | 4444 | |
| 5 | 55555 | 55555 |
| Comparisons |
| Examples |
data num; infile datalines dsd; input x y z; datalines; ,2,3 4,5,6 7,8,9 ;The argument DATALINES in the INFILE statement allows you to use an INFILE statement option to read in-stream data lines. The DSD option sets the comma as the default delimiter. Because a comma precedes the first value in the first dataline, a missing value is assigned to variable X in the first observation, and the value
2 is assigned to
variable Y.
If the data uses multiple delimiters or a single delimiter other than a comma, simply specify the delimiter values with the DELIMITER= option. In this example, the characters a and b function as delimiters:
data nums;
infile datalines dsd delimiter='ab';
input X Y Z;
datalines;
1aa2ab3
4b5bab6
7a8b9
;
The
output that PROC PRINT generates shows the resulting NUMS data set. Values
are missing for variables in the first and second observation because DSD
causes list input to detect two consecutive delimiters. If you omit DSD, the
characters a, b, aa, ab, ba, or bb function as the delimiter and no variables
are assigned missing values.
The SAS System 1
OBS X Y Z
1 1 . 2
2 4 5 .
3 7 8 9 |
This DATA step uses modified list input and the DSD option to read data that are separated by commas and that may contain commas as part of a character value:
data scores;
infile datalines dsd;
input Name : $9. Score
Team : $25. Div $;
datalines;
Joseph,76,"Red Racers, Washington",AAA
Mitchel,82,"Blue Bunnies, Richmond",AAA
Sue Ellen,74,"Green Gazelles, Atlanta",AA
;
The
output that PROC PRINT generates shows the resulting SCORES data set. The
delimiter (comma) is stored as part of the value of TEAM while the quotation
marks are not.
The folowing output shows how to use the tilde (~) format modifier in an
INPUT statement to retain the quotation marks in character data.
The SAS System 1 OBS NAME SCORE TEAM DIV 1 Joseph 76 Red Racers, Washington AAA 2 Mitchel 82 Blue Bunnies, Richmond AAA 3 Sue Ellen 74 Green Gazelles, Atlanta AA |
data weather; infile datalines missover; input temp1-temp5; datalines; 97.9 98.1 98.3 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ;SAS reads the three values on the first data line as the values of TEMP1, TEMP2, and TEMP3. The MISSOVER option causes SAS to set the values of TEMP4 and TEMP5 to missing for the first observation because no values for those variables are in the current input data record.
NOTE: SAS went to a new line when INPUT statement
reached past the end of a line.
infile datalines stopover;Because SAS does not find a TEMP4 value in the first data record, it sets _ERROR_ to 1, stops building the data set, and prints data line 1.
data a; infile file-specification length=linelen; input firstvar 1-10 @; /* assign LINELEN */ varlen=linelen-10; /* Calculate VARLEN */ input @11 secondvar $varying500. varlen; run;The following occurs in this DATA step:
See the informat
$VARYINGw.
for more information.
data qtrtot(drop=jansale febsale marsale
aprsale maysale junsale);
/* identify location of 1st file */
infile file-specification-1;
/* read values from 1st file */
input name $ jansale febsale marsale;
qtr1tot=sum(jansale,febsale,marsale);
/* identify location of 2nd file */
infile file-specification-2;
/* read values from 2nd file */
input @7 aprsale maysale junsale;
qtr2tot=sum(aprsale,maysale,junsale);
run;
The DATA step terminates when SAS reaches an end-of-file on the
shortest input file.
This DATA step uses FILEVAR= to read from a different file during each iteration of the DATA step:
data allsales;
length fileloc myinfile $ 300;
input fileloc $ ; /* read instream data */
/* The INFILE statement closes the current file
and opens a new one if FILELOC changes value
when INFILE executes */
infile file-specification filevar=fileloc
filename=myinfile end=done;
/* DONE set to 1 when last input record read */
do while(not done);
/* Read all input records from the currently */
/* opened input file, write to ALLSALES */
input name $ jansale febsale marsale;
output;
end;
put 'Finished reading ' myinfile=;
datalines;
external-file-1
external-file-2
external-file-3
;
The FILENAME= option assigns the name of the current input file
to the variable MYINFILE. The LENGTH statement ensures that the FILENAME=
variable and FILEVAR= variable have a length long enough to contain the value
of the filename. The PUT statement prints the physical name of the currently
open input file to the SAS log.
data _null_;
/* The INFILE and FILE statements */
/* must specify the same file. */
infile file-specification-1 sharebuffers;
file file-specification-1;
input state $ 1-2 phone $ 5-16;
/* Replace area code for NC exchanges */
if state= 'NC' and substr(phone,5,3)='333' then
phone='910-'||substr(phone,5,8);
put phone 5-16;
run;
data _null_; infile file-specification-1 length=a; input; a=a-20; file file-specification-2; put _infile_; run;
data _null_; infile file-specification start=s; input; s=11; file file-specification-2; put _infile_; run;
data _null_; infile datalines n=2 line=Linept col=Columnpt; input name $ 1-15 #2 @3 id; put linept= columnpt=; datalines; J. Brooks 40974 T. R. Ansen 4032 ;These statements produce the following line for each execution of the DATA step because the input pointer is on the second line in the input buffer when the PUT statement executes:
Linept=2 Columnpt=9 Linept=2 Columnpt=8
| See Also |
Statements:
|
![]() Chapter Contents |
![]() Previous |
![]() Next |
![]() Top of Page |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.