The information in this paper so far has been about
SAS files and how they are used by an application. You will be a more effective
application developer if, in addition to understanding how to make optimum
use of SAS files, you also understand the computer resources that a server
consumes. That understanding will allow you to design your applications to
make optimum use of a server as well as optimum use of SAS files.
A server is an independently running SAS session that
brokers requests for data from other SAS sessions. There are 4 kinds of computer
resources that a server consumes:
CPU, I/O, and memory resources are consumed by every
SAS session. Messages is a name for one measurable aspect of the complex area
of communications resources; communications resources are consumed by SAS/SHARE
software and SAS/CONNECT software because these two products enable SAS sessions
to communicate with one another.
Any work done by a server consumes more than one kind
of resource (if you are looking for simple uncomplicated truths, you may want
to skip this section). A server can do several kinds of work and, as you might
expect, not all kinds of work consume resources in the same relative amounts.
For example, some work a server can do consumes much of the CPU resource but
little of the other resources, while other work consumes much of the memory
resource, less of the CPU resource, and very little of the other resources.
A server
creates processes as users connect to it and execute DATA steps, procedures,
and windows. These processes (created on users' behalf) are assigned the work
that is actually performed in the server's SAS session. This allows a process
in a server's session to do work requested by one user and then yield control
so that another process can do work for another user.
Most requests handled by the processes in a server require
small bursts of CPU time. But there are several requests that can consume
especially large amounts of CPU time:
When a SAS data set is accessed through a server, every
WHERE clause used to select observations from that data set is evaluated by
a process in the server's SAS session. This increases the server's overall
use of the CPU resource to reduce its use of the messages resource. Often,
evaluation of a WHERE clause can be optimized by using an index to locate
the desired observations. But when an index is not used, or selects more observations
than satisfy the WHERE clause, the process in the server's session must search
for observations that completely satisfy the WHERE clause. Searching can consume
a significant amount of the CPU resource. While a process conducts a search,
it yields periodically to allow other processes in the server's session to
do work for other users.
A PROC SQL view can consume quite a bit of the CPU resource.
The SQL view engine may join tables, it may need to sort intermediate files,
and there may be several WHERE clauses in the view that require evaluation.
The process in which the SQL view engine executes yields periodically while
a view is interpreted.
DATA step views and SAS/ACCESS views also consume the
CPU resource. The process in which either of these engines executes does not
yield to allow other processes to run, although the server itself allows other
processes to run when a group of observations has been prepared for transmission
to a user's SAS session. A DATA step view that does a great deal of calculation
while preparing each observation can have a visibly harmful impact on a server's
response time to other users' requests.
When a compressed SAS data file is read, processes in
the server's session decompress each observation; when a compressed SAS data
file is created or replaced, a process in the server's session compresses
each observation. In many cases the time required to decompress (or compress)
is shorter than the time required to read the additional pages of an uncompressed
file. In other words, trading increased use of the CPU resource for decreased
use of the I/O resource can, on balance, reduce the length of time users wait
for a server to respond. While a user processes a compressed data file through
a server, other processes in the server's session may execute between groups
of observations requested by that user; a SAS data file is not compressed
or decompressed in its entirety in a single operation.
The "Programming Techniques" section of this paper offers
ideas for reducing the CPU consumption of processes in a server's session
under the topics:
Since
most work done by the processes in a server's SAS session involves I/O activity,
those processes can spend a significant amount of time waiting for I/O activity
to complete. (This time includes moving the head of a disk drive to the correct
position, waiting for the disk to spin around to the position of the requested
data, and transferring the data from the disk to the computer's working storage.)
In the current release of SAS/SHARE software, while a process in a server's
session waits for I/O activity to complete, other processes in the server's
session do not perform other work that uses a different (CPU, memory, or messages)
resource.
That waiting could, it would seem, become a bottleneck
for a server, and in a few situations this problem is realized. But in practice
most of a server's memory is used for I/O buffers and processes in a server's
session typically satisfy most requests for data from I/O buffers that are
already in memory.
A server typically allocates memory for one page of
a file each time the file is opened, up to the number of pages in the file.
For example, if the application being executed by a user opens a file twice,
enough of the server's memory to contain two pages of the file is allocated;
if ten users run the application, space for 20 pages of the file is allocated
in the server's memory. The number of buffers allocated for a file will not
exceed the number of pages in the file.
Of course, the pages of the file maintained in memory
are not the same set of pages all the time: as users request pages of the
file that are not in memory, pages that are in memory are written back to
the file on disk if they have been modified, or if an in-memory page has not
been modified its buffer is simply used to read the new page.
A larger page size can reduce the number of I/O operations
required to process a SAS data file. But it takes longer to read a large page
than it takes to read a small one, so unless most of the observations in a
large page are likely to be accessed by users, large page sizes can increase
the amount of time required to perform I/O activity in the server's SAS session.
There are two patterns in which data is read from or
written to SAS files:
When an
application processes a SAS file in sequential
order, no page of the file is read into or written from the server's memory
more than once each time the file is read or written. Also, observations are
transmitted to and from users' sessions in groups, which conserves the messages
resource.
In many applications that are used with concurrently
accessed files, data is accessed in random order, i.e., a user reads the 250th
observation, then the 10,000th observation, then the 5th observation, and
so forth. When a file is processed in random order, it is much more difficult
to predict how many times each page of the file will be read into or written
from the memory of a server's SAS session. In addition, only one observation
is transmitted on each message between server and user, which does not conserve
the messages resource.
The "Programming Techniques" section of this paper offers
ideas for reducing the I/O load of a server under the topics:
A computer's working storage is used by a server to load programs, hold
I/O buffers, and maintain control information. When a server's working set
becomes large compared to the amount of memory installed on a computer, a
significant amount of the server's working storage may be stored on disk by
the operating system's virtual memory manager.
Large amounts of a server's memory are consumed by:
Since the ORDER BY clause causes the observations produced
by a view to be sorted every time the view is interpreted, it requires memory
to be used for a work area for the sorting step. Your application should only
use this clause in its views when it has a clear benefit for your users.
When a SAS data file is opened, all indexes on the file
are opened. Therefore, when a SAS data file has many indexes, a large amount
of memory in the server's SAS session can be used to store pages of the index
file and related control information. Of course, when many SAS data files
that are accessed through a server each have many indexes, this effect is
multiplied.
At SAS Institute, we have observed that the majority
of servers' memory has been consumed by I/O buffers. Carefully selecting the
number of times each file is opened by your application and the page size
of each file can have considerable impact on the amount of memory required
by a server.
The "Programming Techniques" section of this paper offers
ideas for reducing the memory requirements of a server under the topics:
Messages are
the communication events between users' SAS sessions and a server. Whenever
a piece of information (for example, an observation) is moved from a server
to a user, a message is sent from the user to the server and a reply is sent
back from the server to the user.
Messages and replies are transmitted by communications
access methods. The cost of a message varies greatly with access method. Memory-to-memory
communication within a single computer, for example via the Cross-Memory Services
(COMAMID=XMS) or Inter-User Communications Vehicle (COMAMID=IUCV) access methods
is very rapid, while messages that flow on cables between computers, for example
via the DECnet (COMAMID=DECNET) or TCP/IP (COMAMID=TCP) access methods take
much longer to travel between SAS sessions.
SAS Institute has observed that the cost of sending
data via most communications access methods is more directly a function of
the number of messages than the amount of data. In other words, to move a
million characters of data between a user and a server, it takes less time
to send the data in 100 messages than to send the data in 10,000 messages.
SAS/SHARE software conserves the messages resource
by:
The "Programming Techniques" section of this paper offers
some ideas for conserving the messages resource under the topics:
The "Tuning Options" section shows options you can
use
to control the grouping of observations on messages between servers and
users:
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.