Chapter Contents

Previous

Next
SAS Companion for the Microsoft Windows Environment

Advanced Performance Tuning Methods

This section presents some advanced performance topics, such as improving the performance of the SORT procedure and calculating data set size. Use these methods only if you are an experienced SAS user and you are familiar with the way SAS is configured on your machine.


Improving Performance of the SORT Procedure

Two options for the PROC SORT statement are available under Windows, the SORTSIZE= and TAGSORT options. These two options control the amount of memory the SORT procedure uses during a sort and are discussed in the next two sections. Also included is a discussion of determining where the sorting process occurs for a given data set and determining how much disk space you need for the sort. For more information about the SORT procedure, see SORT.

SORTSIZE= Option

The PROC SORT statement supports the SORTSIZE= option, which limits the amount of memory available for PROC SORT to use.

If you do not use the SORTSIZE= option in the PROC SORT statement, PROC SORT uses the value of the SORTSIZE= system option. If the system option is not set, PROC SORT uses all available memory and causes unnecessary amounts of swapping. If you use the SORTSIZE= option to limit the amount of available memory to about 1 or 2 megabytes, most of the unneeded SAS files and operating system files are swapped out, and the 1 to 2 megabytes of sort buffers stay in memory for an optimum sort. If PROC SORT needs more memory than you specify, it creates a temporary utility file in your SASWORK directory to complete the sort.

The default value of this option is 2 megabytes (MB), which is optimal. If your machine has more than 12 MB of physical memory and you are sorting large data sets, setting this option to a value between 4 MB and 8 MB may improve performance.

Note:   You can also use the SORTSIZE system option, which has the same effect as the SORTSIZE= option in the PROC SORT statement.  [cautionend]

TAGSORT Option

The TAGSORT option is useful in situations where there may not be enough disk space to sort a large SAS data set. When you specify the TAGSORT option, only sort keys (that is, the variables specified in the BY statement) and the observation number for each observation are stored in the temporary files. The sort keys, together with the observation number, are referred to as tags. At the completion of the sorting process, the tags are used to retrieve the records from the input data set in sorted order. Thus, in cases where the total number of bytes of the sort keys is small compared with the length of the record, temporary disk use is reduced considerably. However, you should have enough disk space to hold another copy of the data (the output data set) or two copies of the tags, whichever is greater. Note that although using the TAGSORT option can reduce temporary disk use, the processing time may be much higher.

Determining Where the Sort Occurs

Where the physical sort occurs for a given data set depends on how you reference the data set name and whether you use the OUT= option in the PROC SORT statement. You may want to know where the sort occurs if you think there may not be enough disk space available for the sort. You always need free disk space that equals 3 to 4 times the SAS data set size. For example, if your SAS data set takes up 1MB of disk space, you need 3 to 4MB of disk space to complete the sort.

When you sort a SAS data set, a temporary utility file is opened in the WORK data library (that is, in a subdirectory of the SASWORK directory) if there is not enough memory to hold the data set during the sort. This file has a .sasv7butl file extension. This file can be several times as large as your data set. Before you sort, be sure your WORK data library has room for this temporary utility file.

Note:   If you work with especially large data sets, and you use a Windows NT NTFS disk volume, you should redirect your WORK data library to that volume. Windows NT with NTFS is not restricted by the 2 gigabyte file size limit you might encounter under other Windows systems. For more information, see Using Large Data Sets with Windows NT and NTFS.  [cautionend]

A second file with a .SU7 file extension is also created, which, if the sort completes successfully, is renamed to the data set name of the file being sorted (with a .sasv7bdat file extension). The original data set is then deleted. This technique ensures data integrity. Be sure that you have space for this .SU7 file. Use the following rules to determine where the .SU7 file and the resulting sorted data set are created:


Calculating Data Set Size

To estimate the amount of disk space needed for a SAS data set:

  1. create a dummy SAS data set containing one observation and the variables you need

  2. run the CONTENTS procedure using the dummy data set

  3. determine the data set size by performing simple math using information from the CONTENTS procedure output

For example, for a data set with one character variable and four numeric variables, you would submit the following statements:

data oranges;
  input variety $ flavor texture looks;
  total=flavor+texture+looks;
  datalines;
navel 9 8 6
;
proc contents data=oranges;
  title 'Example for Calculating Data Set Size';
run;

These statements generate the output shown in Example for Calculating Data Set Size with PROC CONTENTS.

Example for Calculating Data Set Size with PROC CONTENTS
                    Example for Calculating Data Set Size                             1
                                                07:44 Tuesday, February 2, 1999
                                                           
                          The CONTENTS Procedure
                  
Data Set Name: WORK.ORANGES                          Observations:         1
Member Type:   DATA                                  Variables:            5
Engine:        V8                                    Indexes:              0
Created:       7:45 Tuesday February 2, 1999         Observation Length:   40
Last Modified: 7:45 Tuesday February 2, 1999         Deleted Observations: 0
Protection:                                          Compressed:           NO
Data Set Type:                                       Sorted:               NO
Label:

                -----Engine/Host Dependent Information-----

      Data Set Page Size:         4096
      Number of Data Set Pages:   1
      First Data Page:            1
      Max Obs per Page:           101
      Obs in First Data Page:     1
      Number of Data Set Repairs: 0
      File Name:                  C:\TEMP\SAS Temporary Files\_Td200\oranges.sas7bdat
      Release Created:            8.00.008
      Host Created:               WIN_NT


            -----Alphabetic List of Variables and Attributes-----

                     #    Variable    Type    Len    Pos
                     ャャャャャャャャャャャャャャャャャ
                     2    flavor      Num       8      0
                     4    looks       Num       8     16 
                     3    texture     Num       8      8
                     5    total       Num       8     24
                     1    variety     Char      8     32

The size of the resulting data set depends on the data set page size and the number of observations. The following formula can be used to estimate the data set size:
number of data pages = 1 + (floor(number of obs / Max Obs per Page))
size = 256 + ( Data Set Page Size * number of data pages)
(floor represents a function that rounds the value down to the nearest integer.)

Taking the information shown in Example for Calculating Data Set Size with PROC CONTENTS, you can calculate the size of the example data set:
number of data pages = 1 + (floor(1/101))
size = 256 + (4096 * 1) = 4352

Thus, the example data set uses 4,352 bytes of storage space.


Increasing the Efficiency of Interactive Processing

If you are running a SAS job using the SAS System interactively and the job generates numerous log messages or extensive output, consider using the AUTOSCROLL command to suppress the scrolling of windows. This makes your job run faster because the SAS System does not have to use resources to update the display of the LOG and OUTPUT windows during the job. For example, issuing autoscroll 0 in the LOG window causes the LOG window not to scroll until your job is finished. (For the OUTPUT window, AUTOSCROLL is set to 0 by default.)


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.