| SAS Companion for the OpenVMS Environment |
The information that is presented in this section applies to reading and writing SAS data sets. In general, the larger your data sets, the greater the
potential performance gain for your entire SAS job. The performance gains that are described here were observed on data sets of approximately 100,000 blocks.
In
Version 7, the SAS System initially allocates enough space for 10 pages of data for a data set. Each time the data set is extended, another 5 pages of space is allocated on the
disk.
OpenVMS maintains a bit map on each disk that identifies the blocks that are available for use. When a data set is written and then extended,
OpenVMS alternates between scanning the bit map to locate free blocks and actually writing
the data set. However, if the data sets were written with larger initial and extent allocations, then write operations to the data set would proceed uninterrupted for longer periods of time. At the
hardware level, this means that disk activity is concentrated on the data set, and disk head seek operations that alternate between the bit map and the data set are minimized. The user sees fewer I/Os
and faster elapsed time.
Large initial and extent values can also reduce disk fragmentation. SAS data sets are written using the RMS algorithm "contiguous best
try." With large preallocation, the space is reserved to the data set and does not become fragmented as it does when inappropriate ALQ= and DEQ= values are used.
SAS
Institute recommends setting ALQ= to the size of the data set to be written. If you are uncertain of the size, underestimate and use DEQ= for extents. Values of DEQ= larger than 5000 blocks are not
recommended. For information about predicting data set size, see Estimating the Size of a SAS Data Set.
The following is
an example of using the ALQ= and DEQ= options:
libname x '[]';
/* Know this is a big data set. */
data x.big (alq=100000 deq=5000);
length a b c d e f g h i j k l m
_n o p q r s t u v w x y z $200;
do ii=1 to 13000;
output;
end;
run;
Note:
If you do not want to specify an exact number of blocks for the data set, use the ALQMULT= and DEQMULT= options.
Highwater marking is an
OpenVMS security feature that is enabled by default. It forces prezeroing of disk blocks for files that are opened for random access. All SAS data sets are
random-access files and, therefore, pay the performance penalty of prezeroing, increased I/Os, and increased elapsed time.
Two DCL
commands can be used independently to disable highwater marking on a disk. When initializing a new volume, use the NOHIGHWATER_MARKING qualifier to disable the
highwater function as in the following example:
$ initialize/nohighwater $DKA470 mydisk
To disable volume highwater marking on an active disk, use a command similar to the following:
$ set volume/nohighwater $DKA200
Any software that reads and writes from disk benefits from a well-managed disk. This applies to SAS data sets. On an
unfragmented disk, files are kept contiguous; thus, after one I/O operation, the disk head is well positioned for the next
I/O operation.
A disk drive that is frequently defragmented can provide performance benefits. Use a frequently defragmented disk to
store commonly accessed SAS data sets. In some situations, adding an inexpensive SCSI drive to the configuration allows the system manager to maintain a clean, unfragmented environment more easily
than using a large disk farm. Data sets maintained on an unfragmented SCSI disk may perform better than heavily fragmented data sets on larger
disks.
By defragmenting, we mean a process that runs the
OpenVMS Backup Facility after regular business hours. SAS Institute does not recommend using dynamic defragmenting tools that run in the background of an active
system because such programs can corrupt files.
The BUFSIZE= data set option sets the SAS internal page size for the data set. Once set,
this becomes a permanent attribute of the file that cannot be changed. This option is meaningful only when you are creating a data set. If you do not specify a
BUFSIZE= option,
the SAS System selects a value that contains as many observations as possible with the least amount of wasted space.
An observation cannot span page boundaries. Therefore,
unused space at the end of a page may occur unless the observations pack evenly into the page. By default, the SAS System tries to choose a page size between 8192 and 32768 if an
explicit
BUFSIZE= option has not been specified. If you increase the BUFSIZE= value, more observations can be stored on a page, and
the same amount of data can be accessed with fewer I/Os.
When explicitly choosing a BUFSIZE, be sure to choose a value that does not waste space in a data set page, resulting in wasted disk space. The highest
recommended value for BUFSIZE= is 65024.
The following is an example of an efficiently written large data set, using
the
BUFSIZE= data set option. Note that in the following example, BUFSIZE=63488 becomes a permanent attribute of the data set:
libname buf '[]';
data buf.big (bufsize=63488);
length a b c d e f g h i j k l m
n o p q r s t u v w x y z $200;
do ii=1 to 13000;
output;
end;
run;
For each SAS file that you open, the SAS System maintains a set of caches to
buffer the data set pages. The size of each of these caches is controlled
by the CACHESIZ= option. The number of caches used for each open file is controlled by the CACHENUM= option. The ability
to maintain more data pages in memory potentially reduces the number of I/Os that are required to access the data. The number of caches that are used to access a
file is a temporary attribute. It may be changed each time you access the file.
By default, up to 5 caches are used for each SAS file that is opened; each of the caches is
the value (in bytes) of CACHESIZ= in size. On a memory-constrained system you may wish to reduce the number of caches used in order to conserve
memory.
The following example shows using the CACHENUM= option to specify that 8 caches of 65024 bytes each are used to buffer data
pages in memory.
proc sort data=cache.big (cachesiz=65024 cachenum=8);
by serial;
run;
The SAS System maintains a cache that is used to buffer multiple data set pages in memory. This
reduces I/O operation
by enabling SAS to read or write multiple pages in a single operation. SAS maintains multiple caches for each data set that is opened. The CACHESIZ= data set
option specifies the size of each cache.
The CACHESIZ= value is a temporary attribute that applies only to the data set that is currently open. You can use a
different
CACHESIZ= value at different times when accessing the same file. To conserve memory, a maximum of 65024 bytes is allocated for the cache by default. The default
allows as many pages as can be completely contained in the 65024-byte cache to be buffered and accessed with a single I/O.
However, you can specify a CACHESIZ= value of up to 65024 bytes, the largest amount that can be accessed in a single I/O in an
OpenVMS operating environment.
Here is an example that uses the CACHESIZ= data set option to write a large data set efficiently. Note
that in the following example, CACHESIZ= value is not a permanent attribute of the data set:
libname cache '[]';
data cache.big (cachesiz=65024);
length a b c d e f g h i j k l m
n o p q r s t u v w x y z $200;
do ii=1 to 13000;
output;
end;
run;
In Version 7, asynchronous I/O is enabled by
default. There are no additional options that need to be specified to use this feature. For all SAS files that use a data cache, SAS performs asynchronous I/O.
Since multiple caches are now available for each SAS file, while an I/O is being performed on one cache of data, the SAS System may continue processing using
other caches. For example, when SAS writes to a file,
once the first cache becomes full an asynchronous I/O is initiated on that cache, but the SAS System does not have to wait on the
I/O to complete. While that transaction is in progress, the SAS System can continue processing new data pages and store them in one of the other available
caches. When that cache is full, an asynchronous I/O may be initiated on that cache as well.
Similarly, when SAS reads a file,
additional caches of data may be read from the file asynchronously in anticipation of those pages being requested by the SAS System. When those pages are required, they will have already been read
from disk, and no I/O wait need occur.
Because caching (with multiple caches) needs to be enabled in order for asynchronous
I/O to be effective, if the cache is disabled with the CACHESIZ=0 option or the CACHENUM=0 option, no asynchronous I/O
can occur.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.