|SAS/SHARE User's Guide|
Now that you understand how SAS and SAS/SHARE software use files and computer resources, it's time to apply that knowledge to the design and implementation of your applications.
The most productive way to optimize the performance of your application is programming it to work as efficiently as possible. You can almost always realize more performance improvement by coding your application to exploit features of SAS than you can gain by adjusting the operation of SAS.
When you decide to adjust SAS to operate differently, remember that tuning is a balancing act and invariably requires compromise. Of course, to effectively tune SAS you must understand what your application's bottlenecks are.
This section will first list some programming techniques that are based on the information presented earlier in this paper. After that, the tuning options of SAS/SHARE software and SAS will be described.
The most obvious way to reduce the amount of work done by a server is eliminating unused variables and observations from the files that are accessed through the server. To make sure that your files are no larger than they need to be, periodically remove or archive unused data.
As a SAS data file matures, users add new observations, update existing observations, and forget about old observations. In most cases the level of activity is greatest on the newest observations. If the users of your application do not frequently access older information, consider moving older observations from files that are concurrently updated to archive files that are accessed directly (instead of through a server).
Also as a SAS data file matures, new variables are added, some variables turn out to be larger than they need to be, and some variables lose their usefulness. Periodically check your application's SAS data files for variables that are longer than they need to be and for variables that are no longer used.
While compressing a SAS data file reduces the number of pages in it, compression can not be as efficient at eliminating unused space as you can be by deleting unused observations and variables and by shortening variables that are longer than necessary.
Smaller data files improve the performance of all SAS
sessions by reducing the amount of disk space required by each file, by reducing
the number of I/O operations required to process the data in each file, and
by reducing the number and size of messages required to transmit the data
in a file between a server and its users.
Creating a subset of the observations in a SAS file can consume large amounts of the I/O and messages resources. There are several subsetting techniques available in SAS:
When an index is not used to locate directly the observations that satisfy a WHERE clause, the process in the server's session must read observations from the data file until it finds one that matches the WHERE clause. This can consume a very large amount of the I/O and CPU resources. Those resource requirements can be greatly reduced when the variables in the WHERE clause are indexed.
The subsetting IF statement of the DATA step and the FIND, SEARCH, and LOCATE commands of SAS/FSP procedures perform the specified comparison in the user's SAS session instead of in a process in a server. This requires that every observation of the SAS data set be transmitted from the server's session to the user's session, which can consume a very large amount of the messages resource, in addition to the I/O and CPU resources required to read the data file. Since the comparisons of a WHERE clause are performed in the server's session, only the desired observations are transmitted to the user's session and the message resource is conserved.
The I/O resource consumption is the same unoptimized
WHERE, subsetting IF, and FSP's FIND, SEARCH, and LOCATE. Using WHERE clauses
is recommended, however, because the messages resource consumption is higher
for the subsetting IF statement and the FIND, SEARCH, and LOCATE commands.
Indexing is a tool that optimizes WHERE clause selection of observations from SAS data sets. A WHERE clause without an index requires the process in the server to read every observation in a SAS data set to find the observations that match the WHERE selection criteria. An index often allows the server to locate the records that satisfy a WHERE clause without having to read the records that do not match.
Adding indexes may be a good idea if your application seems to be taking too long to execute WHERE clauses. However, indexes require extra memory and may present a problem for a server that is memory constrained.
A complete description of index usage may be found in
the paper "Effective Use of Indexes in the SAS System," in the Proceedings
of the SAS User's Group International Sixteenth Annual Conference.
When a SAS data file is accessed through a server, creating
an index on it prevents the server from responding to other users' requests.
While it can be useful to create an index while a data file is accessed through
a server, indexes on large files should be created after hours. Indexes on
large data files should not be created while a server is expected to respond
quickly to users' requests.
Larger page sizes can be used to reduce the number of I/O operations required to process a SAS data file. But it takes longer to read a large page than it takes to read a small one and larger pages can increase the memory load on a server.
Large page sizes can be useful if most of the observations
on each page are likely to be accessed each time the page is read into the
server's memory, or if a large page size causes all or most of a SAS data
file to be kept in the server's memory. Otherwise, large page sizes can increase
the amount of time required to perform I/O activity in the server's SAS session
to the detriment of the server's ability to provide timely response to users'
It is often worth the effort to study the order in which the users of your application access the data in your application's files. That tells you how widely dispersed your users' patterns of reference are. Sometimes you can reduce the amount of dispersal by sorting one or more files by a variable (like date last updated) that happens to correlate (even to a small degree) with your users' pattern of access.
The components of SAS that are used most frequently to access data in a random order are:
The SCL OPEN() function allows you to specify that a file will be sequentially accessed (the default is random access). There are two types of sequential access that may be requested with SCL OPEN().
The server will by default transmit multiple observations per read for either of 'IS' or 'IN' open modes.
If the application's use of data is predominantly sequential, but you occasionally need to re-read a previously read observation then use a mode of 'IN' or 'UN' in your SCL OPEN() function. If the application's use of data is strictly sequential (you will never revisit a previously read observation) then use the open mode 'IS' or 'US'. The 'IS' and 'US' open modes are the most efficient for SCL. A 'IS' or 'US' open mode, however, will restrict an SCL application to those functions which access data sequentially. The SCL functions which access data in a random pattern are:
Specifying an access pattern on an SCL OPEN() function is documented in SAS Component Language: Reference. An example of specifying a sequential access pattern on an SCL OPEN() function is:
DSID = OPEN( 'MYLIB.A', 'IN' );
An open file consumes memory in both users' and servers' SAS sessions. If a server consumes too much memory, check the applications that access data through that server to see if any of them open files before they are needed or leave files open when they are not being used.
There are three strategies for using SAS data sets in an SCL program:
The initialization code of an application is the place to open the SAS data sets that will be used throughout the execution of the application. But if an application's initialization code must open a large number of files, the time it takes to get started may be long. By studying how an application is used, you may discover some SAS data sets that can be opened as functions are requested while the application executes, which can reduce the amount of time the application takes to initialize and reduces the concentration of time required to open files.
Whether they are opened during initialization or later, lookup tables that are small should usually not be closed until an application terminates because the single I/O buffer required by such a lookup table does not require a large amount of memory. In such a case it is frequently economical to use a small amount of the memory resource to conserve the CPU resource that would be required to open and close the lookup table over and over.
Larger SAS data sets, and SAS data sets that are used
extremely infrequently (for example, once during initialization) or during
a seldom-used function (for example, a lookup table on a rarely updated field),
should usually be left closed whenever they are not being used.
Consider how frequently each of your application's reports is generated and how timely the data summarized by the report must be. If a report must be based on current information, it must be based on files that are concurrently updated. A report that does not require up to the second information can be generated from files that are directly (and inexpensively) accessed instead of files that are accessed through a server.
For example, a travel agent making reservations or a stock broker making trades require every query to show up-to-the-second information. On the other hand, daily reports or analysis of long-term trends can use data that are out of date by several hours, several days, or even several weeks or months.
When copying data from a server, it may be subset horizontally with a WHERE clause and it may be subset vertically with a DROP= or KEEP= data set option. (In relational terminology, the horizontal subsetting is selection and vertical subsetting is projection.) Be sure to take advantage of both methods when copying a file from a server to make the copy of the file as small as possible and thus ensure that reports and generated as efficiently as possible.
Don't forget that files can be stored in users' WORK libraries. It can be most efficient to copy a file that is concurrently updated from a server to a user's WORK library and then use that file more than one time to generate reports. Such a copy of a file contains very timely data yet is not especially expensive to create or use.
A SAS data file that is accessed directly is almost
always less costly to use than a file that is accessed through a server.
Many applications contain one or more query functions that use a lookup file to offer a user a set of values that are valid to enter into a field. Such a file is read, but never updated, by the majority of the users of the application. Occasionally, values must be added to and removed from the lookup files as the valid input data for the application changes.
A lookup file that is used frequently and updated less often than, say, once a week is likely to be a good candidate for not being accessed through a server if it would be easy to find some time during the week when the files can be updated because the application is not being used. On the other hand, a lookup file that is updated many times each day should, in many cases, be accessed through a server because updating the file will be convenient: the lookup file can be updated while users use it to perform queries.
SAS catalog entries illustrate another way that update frequency can change:
An application may use only a few or many catalog entries. Like lookup files, catalog entries that are updated frequently are likely candidates for access through a server. But catalog entries that are never changed, or only changed very infrequently, should not be accessed through a server.
The update frequency may change for some of an application's catalog entries over time. For example, while an application is under development and being tested, it can be extremely convenient for the developers of the application to be able to update any catalog entry while those testing the application continue with their work. During this phase, the convenience of accessing the catalog entries through a server can more than pay for the cost of the overhead of server access. After the testing is completed and the application has stabilized, some or all of the application's catalogs can be moved to a SAS library that is accessed directly by the users of the application; in this phase efficient access by the users is more important than being able to update the catalog entries conveniently.
Remember that not all of an application's catalog entries
must be accessed the same way. Catalog entries that must be frequently updated
can continue to be accessed through a server, while other catalog entries
that change very seldom can be stored in SAS catalogs that are accessed directly
by the users of the application.
While it is creating each observation, a process in a server's session that is interpreting a DATA step view does not yield control to allow other processes in the server to execute other users' requests. While DATA step views can be useful in a server, they must be used carefully. A DATA step view that requires a small amount of processing to create each observation will not prevent other processes in a server's SAS session from responding to other users' requests. But a DATA step view that contains many DO loops with many calculations and reads (or even writes) many records in external files or SAS data sets can take a very long time to create each observation. Such a DATA step view should not be interpreted in a server's session because it does not yield control until each observation is created.
If it is advantageous to your application for its DATA step views to be interpreted in a server's session, be sure that any external files read by the DATA step view are available to the server's SAS session.
|Tuning Options in SAS/SHARE Software|
SAS/SHARE software makes some assumptions about the relative values of resources. For example, SAS/SHARE software considers messages to be more expensive than memory so it attempts to use more memory to reduce the number of messages. The default values and behavior may not be optimum for your application, so you have the opportunity to adjust:
SAS/SHARE software automatically attempts to conserve the message resource by transmitting observations in groups whenever possible. Observations can always be transmitted in groups when a data set is being created or replaced, but when a data set is opened for update access it is never appropriate to transmit more than one observation at a time. The grouping of observations when a data set is opened for input depends on the situation; you control whether observations are transmitted in groups according to:
The factors that control how many observations are transmitted in each group are:
A server allocates a set of multi-observation transfer buffers, called MOTBs, during its initialization that are shared by all SAS files accessed through that server. On all Release 6.09E and 6.11 hardware platforms the default size of each MOTB is 32,768 bytes. To change this size from the default value, use the TBUFSIZE= option on the PROC SERVER statement.
The value of the TBUFSIZE= option is used by a server to automatically calculate how many observations to combine into each group for all of the data sets accessed through it. Therefore, this option can be the easiest way to cause a server to generally combine more or fewer observations into each group.
When the data sets accessed through a server have large observations, the number of observations transferred in each message is small when not many observations fit into 32,768 bytes. A small number of observations per message conserves the memory resource at the expense of the messages resource, but since messages tend to be expensive relative to memory that trade-off will probably not allow your application to perform as well as it could.
The number of observations per group is calculated by subtracting a small amount of overhead from the MOTB size and dividing the result by the length of one observation. For example, consider a SAS data set in which each observation is 10,000 bytes long. The default MOTB size is 32,768. In this case three observations are transmitted in each message when the application opens the data set for sequential access. To increase the number of observations per message to 6, specify the option TBUFSIZE=64k on the PROC SERVER statement.
An example of using the TBUFSIZE= option is:
PROC SERVER TBUFSIZE=128K <other PROC SERVER options>;
Independently of the TBUFSIZE= option's effect on a server's overall behavior, you can control the number of observations per group for individual data sets that are accessed through the server. For example, if you specify TOBSNO=3, three observations will be sent in each message.
The TOBSNO= option may be specified wherever SAS data set options are accepted: as an argument to the OPEN() function of SAS Screen Control Language, on the DATA= option of a SAS procedure, and on the SET, MERGE, UPDATE, and MODIFY statements of the DATA step. It must be specified for each data set for which you want grouping behavior different from the default.
When a data set is opened for input with a sequential access pattern, a server calculates the number of observations per group as the smallest of:
When a SAS data set is opened for input with a random access pattern, the default behavior is transmitting observations individually (the group size is one). This ensures that a user always receives up-to-date data when they position to an observation, and it reduces wasted communications bandwidth because no observations are transmitted to a user's session except the specific observations requested.
At other times, the TOBSNO= data set option may be used to increase the number of observations transferred in each group. For example, consider an SCL program in which a SAS data set dsid is passed to a DATALISTC() or DATALISTN() function. The data set is read from beginning to end by the function, and then the observation chosen by the user is reread. Since by default the OPEN() function of SCL specifies a random access pattern, observations for that dsid are transmitted individually. But the access pattern of the DATALISTC() and DATALISTN() functions is really skip sequential, so transmitting observations individually is not optimum. TOBSNO=4 could be specified in a case like this to reduce the number of messages by three-quarters. (Note that the user could change the open mode from 'I' to 'IN' as an alternative to specifying a TOBSNO data set option.)
The number of observations transmitted when a data set is opened for input is summarized below: An example of using the TOBSNO= data set option is:
PROC FSVIEW DATA=MYLIB.A(TOBSNO=10);
Since MOTBs are shared by all data sets accessed through a server, it is possible not to have enough MOTBs. The number of MOTBs a server allocates is specified by the TBUFNO= option on the PROC SERVER statement. By default, four MOTBs are allocated when a server initializes. When WHERE clauses are used frequently to request a subset of data from a server and evaluation of those WHERE clauses is not optimized by an index, a server can spend a long time satisfying each user's request for observations that satisfy a WHERE clause.
Each MOTB is assigned when a request for data is made and is not released until the request is satisfied. Therefore, when a process in a server's session takes a long time to satisfy a WHERE clause, an MOTB is assigned to that process for a long time. Such a situation reduces the server's ability to reuse MOTBs by assigning them to other processes; one solution to that problem is specifying a larger value for the TBUFNO= option. Of course, you should check the WHERE clauses being used to ensure that they can be evaluated as efficiently as possible.
When a user requests one or more observations and a server cannot assign an MOTB to the process that will execute the user's request, the server writes a message to its SAS log, waits for a brief period of time, and tries again to assign an MOTB to the process for the user's request. If there are still no MOTBs available, the sequence repeats. You can tell when the number of MOTBs is not sufficient by the presence of these messages in a server's SAS log.
An example of using the TBUFNO= option is:
PROC SERVER TBUFNO=8 <other PROC SERVER options>;
Consider each SAS data view used by your application and determine whether the view should be interpreted in the server's SAS session or the users' SAS sessions. You decide where to have a view interpreted according to:
Some PROC SQL views are especially good candidates for interpretation in a server's SAS session because the number of observations produced by the view is much smaller than the number of observations read by the view, the data sets read by the view are available to the server and the amount of processing necessary to build each observation is not large.
Other PROC SQL views should be interpreted in users' SAS sessions because the number of observations produced by the view is not appreciably smaller than the number of observations read by the view, some of the data sets read by the view can be directly accessed by the users' SAS sessions, and the amount of processing done by the view is considerable.
By default, SAS data views are interpreted in a server's SAS session, but the RMTVIEW= option of the LIBNAME statement enables you to have the views in a library interpreted in users' SAS sessions instead. The NORMTVIEW option on the PROC SERVER statement enables you to prevent all SAS data views from being interpreted in the server's session.
SAS/ACCESS views do not provide update access to the underlying database when they are interpreted in a server's session, so it is often more practical to interpret SAS/ACCESS views in users' SAS session.
If it is useful for your application to have a SAS/ACCESS view interpreted in a server's session, make sure that all of the necessary database interface components are available to the server's session.
If a user's SAS session is capable of using a SAS/ACCESS interface engine to access the underlying database, it is more efficient to execute the SAS/ACCESS interface engine in the user's SAS session. Note that in this case it may be convenient to store the view file in a SAS library that is accessed through a server if the view will be updated frequently and used by more than one user.
Like SAS/ACCESS views, DATA step views are very often most useful when interpreted in users' SAS sessions. See "Know your application's DATA step views", above, for more information about interpreting DATA step views in a server's session.
See Introductionfor a complete description of the RMTVIEW= option of the LIBNAME statement.
Examples of specifying the RMTVIEW= and NORMTVIEW options are:
LIBNAME MYLIB 'my SAS data library' RMTVIEW=YES <other LIBNAME options>; PROC SERVER NORMTVIEW <other PROC SERVER options>;
Some components of SAS yield control periodically and can be directed to do so more or less frequently than their default rate. These components are called long-running processes and include evaluating WHERE clauses and interpreting PROC SQL views.
Changing the rate at which control is yielded is delicate because the act of yielding control consumes some CPU resource: increasing the frequency at which control is yielded increases a server's CPU consumption all by itself. You can change the rate at which the processes in a server yield control by varying the value of the PROC SERVER option LRPYIELD=. The default value of this option is 10,000; the option has no units.
To make long-running processes yield relatively more frequently, specify a value greater than 10,000. While a higher value may have the effect of providing more even response time to a server's users, this comes at the expense of increased consumption of the server's CPU resource. Also, the processes that run for a long time run even longer when they are asked to yield more frequently.
To make a long-running process yield less frequently, specify a value smaller than 10,000. A lower LRPYEILD= value may make some individual user requests (like an SQL join with a sort) complete sooner, but the server's other users are forced to wait as the long-running process does more work before it yields control. Response time can become more uneven when long-running processes are allowed to yield less frequently.
This option is documented in The SERVER Procedure.
An example of specifying the LRPYIELD= option is:
PROC SERVER LRPYIELD=5000 <other PROC SERVER options>;
This is not an option you specify on a SAS program statement; instead it is a method of managing the workload of concurrent access to SAS data sets.
If you determine that a server is consuming too much of a resource and you can not reduce the server's consumption of that resource any further, creating an additional server allows you to divide your applications' workload among several servers.
SAS/SHARE software includes a family of SAS macros that help you manage SAS file access through multiple servers. Those macros are documented in Using SAS/SHARE Macros for Server Access
|SAS System Options|
SAS software has several SAS I/O tuning options. The options that are most relevant to applications that access data through a server are:
When a file is created, use the BUFSIZE= data set option to specify the size of the pages of the file. The SAS default page size is optimum for files that are processed sequentially, but it may not be optimum when the observations of a file are accessed in random order. PROC CONTENTS shows the page size of a SAS data file.
You might find it useful to balance the pattern in which a file is randomly accessed against the number of observations stored on each page of the file. If most random access sequences access observations in very different locations in the file, then a small page size will improve performance because most of the observations on each page are not used. On the other hand, if most random access sequences are likely to be to observations that are physically near each other in the file, you might be able to take advantage of a large page size to have many of the observations read from the file into the server's memory at once.
If you want to keep all or most of a SAS data file in memory, you can choose a very large page size. Of course, this can consume a lot of the server's memory so you should only use such a page size when you really want to. If you expect that not much data from a large file will need to be in memory at one time, choose a small page size to make reading and writing each page as fast as possible.
If you find that your server is spending a significant amount of time waiting for I/O operations to complete, consider recreating the files that are not used for sequential access with a smaller page size.
An example of using the BUFSIZE= data set option is:
DATA MYLIB.A(BUFSIZE=6K); SET MYLIB.A; RUN;
This option is used to cause a SAS data file to be stored in compressed format. Compressing files usually makes them smaller, so a server is able to read more observations per I/O request when a file is compressed. The reduction in file size for a compressed file (which conserves the I/O resource) is paid for, though, by an increase in the consumption of the CPU resource (which is used to decompress the observations as the file is read). If your server is CPU-bound, compression could do more harm than good, but if your server is I/O-bound, compression could reduce its consumption of the I/O resource.
Top of Page
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.