| SAS Companion for the OpenVMS Environment |
The concurrency (CONCUR)
engine allows concurrent read and write access to data sets. Note that the concurrency engine supports only SAS data sets. It does not support SAS files of member types other than DATA, such
as
INDEX or CATALOG.
In contrast to the V7 engine, the CONCUR engine does not support indexing
and compression of observations. The CONCUR engine can only access files within a single machine or OpenVMS cluster; access to SAS data sets on other operating
environments and concurrent read/write access to SAS data sets across DECnet are features that are provided by SAS/SHARE
software. For more information
about using SAS/SHARE software, refer to SAS/SHARE User's Guide. The CONCUR engine is optimized for random
concurrent access, while the V7 engine is better suited to sequential access. So, for example, if you intend to use the FSEDIT procedure or the POINT= option in the SET statement to access your data
randomly, the CONCUR engine may be the best choice for you, even if you do not need any of the concurrent
access capabilities.
Version 7 of the SAS System introduces support for several new features related to data sets. The CONCUR engine
supports many of these features: member names with lengths up to 32 characters; variable names with lengths up to 32 characters; and member or variable labels with lengths up to 256 characters. Note
that while the CONCUR engine supports the creation and access of Version 6 format
files, the long character strings are not allowed when accessing or creating a Version 6 concurrency engine file. For more information about support for these
longer character strings, see SAS Language Reference: Concepts.
There
are three ways to select the CONCUR engine:
The CONCUR engine creates and accesses SAS data sets in an acceptable format to
allow record-locking and file-sharing.
- CAUTION:
- SAS data sets that are created with the CONCUR engine are not interchangeable with SAS data
sets that are accessed and created with any other engine.
If you plan to share a
particular SAS data set, create it using the CONCUR engine.
![[cautend]](../common/images/cautend.gif)
If you have a SAS data set that you want to share after it is
created, you can copy it, using the CONCUR engine as the output engine. Then it will be in the correct format for sharing. For example, if you want shared update access to a data set that was created
using the V7 engine, you can use the following statements to convert it:
libname inlib v7 '[mydir.base]';
libname outlib concur '[mydir.share]';
proc copy in=inlib out=outlib;
run;
After you run this SAS program, all SAS data sets that are created with the V7 engine in the data library that is referenced by INLIB are copied to the data library
referenced by OUTLIB using the CONCUR engine. To create data sets using the CONCUR
engine, your directory must have a version limit greater than 1.
The CONCUR engine supports the Version 7 member type
DATA.
Several concurrency engine options control the creation and access of SAS data
sets. Most of these options have direct correlation to options available through OpenVMS Record Management Services (RMS). The
CONCUR engine creates relative organization files with
record-locking enabled.
Note:
Data sets created with the CONCUR engine have a maximum observation length of 32K.
You can use the
following engine/host options with the CONCUR engine:
-
ALQ=
-
specifies the number of OpenVMS disk blocks to allocate initially to a data set when it is created. The value can range from 0 to
2,147,483,647. If the value is 0, the minimum number of blocks required for a sequential file is used. The ALQ= option defaults to the bucket size. OpenVMS RMS
always rounds the value up to the next disk cluster boundary.
The ALQ= option (allocation quantity) corresponds to the FAB$L_ALQ
field in OpenVMS RMS. For additional details, see the data set option ALQ= and
Guide to OpenVMS File Applications.
-
BKS=
-
specifies the number of OpenVMS disk blocks in each bucket of the file. The value can range from 0 to 63. If the value is 0,
the bucket size used is the minimum number of blocks needed to contain a single observation. The default value is 32.
When deciding on the bucket size to use, consider
whether the file is usually accessed randomly (small bucket size), sequentially (large bucket size), or both (medium bucket size). The bucket size is a permanent attribute of the file, so this option
applies to output files only.
The BKS= option (bucket size) corresponds to the FAB$B_BKS field in
OpenVMS RMS or the FILE BUCKET_SIZE attribute when using File Definition Language (FDL). For additional details, see the
data set option BKS= and Guide to OpenVMS File
Applications.
-
DEQ=
-
specifies the number of OpenVMS disk blocks to add each time OpenVMS RMS automatically
extends a data set during a write operation. The value can range from 0 to 65,535. OpenVMS RMS always rounds the value up
to the next disk cluster boundary. A large value can result in fewer file extensions over the life of the file; a small
value results in numerous file extensions over the life of the file. A file with numerous file extensions that may be noncontiguous slows record access.
If the value
specified is 0, OpenVMS RMS uses the default value for the process. The DEQ= option defaults to the bucket size.
The DEQ= option
(default file extension quantity) corresponds to the FAB$W_DEQ field in OpenVMS RMS. For additional details, see the data
set option DEQ= and Guide to OpenVMS File
Applications.
-
FILEFMT=
-
specifies the file format, or version of the engine, to use. Allowed values are 606, 607, and 701. The default value is 701. There was an internal file format change
between Release 6.06 and Release 6.07, and again between Version 6 and Version 7. The concurrency (CONCUR) engine can create and access all versions of the file format. When you access a file for
input or update, the CONCUR engine detects the correct version of the existing file. When you create a new file, the CONCUR engine defaults
to creating a Version 7 format file unless overridden by the FILEFMT= option.
The following example shows how to create a file in Release 6.07 format:
libname clib concur '[]';
data clib.v607 (filefmt=607);
. . . more SAS statements . . .
run;
-
HOSTFMT=
-
specifies the host platform format for a data set. The concurrency (CONCUR) engine can create and access files for both OpenVMS Alpha and OpenVMS VAX. Valid values
are ALPHA or VAX, respectively. By default the data set is created in the native format of the platform on which SAS is running. You may use the HOSTFMT= option to specify that the data set should be
created in a different representation. This is similar to using the Version 7 data set option OUTREP= to specify a data representation
in a non-native format. The use of HOSTFMT= and OUTREP= options is equivalent. HOSTFMT= is supported for compatibility with Version 6.
In the following example, the two
data steps produce the same results:
data clib.vaxfile (hostfmt=vax);
. . . more SAS statements . . .
run;
data clib.vaxfile (outrep=vax_vms);
. . . more SAS statements . . .
run;
For more information about the OUTREP= data set option, see SAS Language Reference:
Dictionary.
-
MBF=
-
specifies the number of I/O buffers you want OpenVMS RMS to allocate for a particular
file. The value can range from 0 to 127, and it represents the number of buffers to use. By default, this option is set to 2 for files opened for update and 1 for files opened for input or output. If
the value 0 is specified, the process' default value is used.
The MBF= option (multibuffer count) corresponds to the RAB$B_MBF field
in OpenVMS RMS or the CONNECT MULTIBUFFER_COUNT attribute when using FDL. For additional details, see the data set option
MBF= and Guide to OpenVMS File
Applications.
The CONCUR engine recognizes all data set options that are documented in SAS Language Reference:
Dictionary except the FILECLOSE=, COMPRESS=, and REUSE options. Of special
importance to the CONCUR engine is the portable data set option CNTLLEV=. (For details,
see CNTLLEV=.) Other data set options that are likely to be useful include
LOCKREAD= and LOCKWAIT=. (For details, see
LOCKREAD= and LOCKWAIT=.) For more information,
refer to SAS Language Reference: Dictionary.
The engine/host options that are discussed in
Engine/Host Options for the CONCUR Engine can also be used as data set options when you use the CONCUR engine.
For details, see Specifying Data Set Options.
The CONCUR engine does not use the values of any SAS system
options.
The CONCUR engine supports both creation and reading of files across
DECnet, but not the updating of files across DECnet. You are allowed to create and read files because the engine uses
multistreaming only when the file is opened for update. Support of DECnet
access means you can now specify a node name in the physical pathname of your SAS data library, as long as you do not plan to update the data sets stored in the data library. The following is an
example:
libname mylib concur 'mynode::bldgc:[testdata]';
The
CONCUR engine supports SAS System passwords. The syntax and behavior is the same as passwords used with the V7 (BASE) engine.
This section describes the internal structure of a concurrency engine data set. If you are familiar
with OpenVMS RMS, it may be helpful to know the internal file format of a concurrency engine data set.
A concurrency engine data set
is a relative format file. The record length is determined by the length of one observation, with a minimum length of 8 bytes. Because the data set is a relative
format file, the maximum observation length of a concurrency engine data set is 32,767 bytes. The first portion of the file contains header records that provide
information to the engine concerning the number of observations
in the file, the number of variables, some positioning information to optimize access, the date and time, SAS System release, operating environment the data set was created on, and so
forth.
Following the header information is information pertaining to each individual variable in the file. A NAMESTR is stored for each variable on the data set. The
NAMESTR includes the variable name, type, label, and size. Multiple NAMESTRs are stored in a single record, up to the maximum number of NAMESTRs that the record length
accommodates.
After the NAMESTRs, the observations begin. There is always one observation per record. With one exception, the record length is the observation length. If
the observation length is less than 8 bytes, the record length defaults to 8. If you delete a record in a relative format
file, the record still exists in the file, but it is marked as deleted.
Note:
In a concurrency engine data set, a data set of deleted observations takes the same amount of
disk space as a data set of valid observations. To remove the deleted observations, you must use the COPY procedure and copy the data set to a new data set type, such as a data set created with the V7
or V7TAPE engine.
Although all record-locking capabilities are provided through the use of OpenVMS RMS features, some
file-sharing capabilities are provided by OpenVMS RMS and some are provided by the engine itself. The engine can correctly set the share options of a file when
the file is opened for input or update, because the SAS System uses the name of the existing data set directly. However, output data sets are created with
a temporary name and then renamed to the actual data set name after the data set is closed. This ensures the integrity of existing data sets of the same name in case an error occurs during creation of
the new data set. Therefore, the engine must handle all file-sharing issues that disallow sharing of output files. This is done through the locking of specific filenames, which is why your directory
must have a version limit of at least 2 to create concurrency engine data sets.
Engine performance is often a trade-off between various factors. This section provides
you with the necessary information so that you can optimize the performance of the CONCUR engine in your operating environment. By controlling the size and
number of buffers, you can specify how the SAS System accesses your
data. By specifying the data set options, you can control the level and amount of data that are accessed. The amount of disk space available for these operations also effects engine
performance.
Depending on the type of record access your SAS application performs, you need to consider both the size of buffers (bucket size) and the number of
buffers (multibuffer count). For complete details about specifying the size and number of buffers, see the BKS= and MBF= data set options in
BKS=
and MBF=.
The two extremes of record access are records that are accessed completely sequentially
or completely randomly. For example, many SAS procedures typically access data sets sequentially, processing the records from first to last. On the other hand, you may access observations in a
completely random order when using the FSEDIT procedure to edit or browse observations in a data set.
There are also cases in which records are accessed randomly but may be
reaccessed frequently. One example is an application that uses a data set in which particular observations contain information that is referred to frequently. Again, using the FSEDIT procedure as an
example, the data set can be designed in such a way that you must access the first observation followed by observation 200, then the first observation again followed by observation 300, and so
on.
Finally, there are cases in which records are accessed randomly, but then adjacent records are likely to be accessed. An application can use the POINT= option in a SET
statement to selectively input the first 10 observations out of every 100 observations.
Most often, an application accesses a data set by a combination of several of these
methods. The following list gives suggestions for the number of buffers and bucket size you should use for each method:
-
completely sequential or random
access
-
is most efficient with a single buffer. However, the bucket size differs:
-
random access
-
is more efficient with a smaller bucket size.
-
sequential access
-
is more efficient with a larger bucket size.
-
random access with reaccessed
records
-
is most efficient with multiple buffers to keep the reaccessed records in the buffer cache. You should use a small bucket size in this
instance.
-
random access with subsequent adjacent access
-
is most efficient with a single buffer. However, use a larger bucket size so that more records are stored in the buffer cache. This increases the probability that the
required records have been read into memory with a single I/O.
If your program accesses the data
set by several methods, you must find a compromise between the number of buffers and bucket sizes. This is what the SAS System attempts to do with the defaults, because the intended use of the file is
unknown. Because you know the intended use of your CONCUR engine data sets, you can improve the CONCUR engine's
performance by optimizing the buffer settings.
Several data set options are portable options that are available for all engines, but they
are particularly useful in conjunction with the concurrency engine.
-
CNTLLEV=
-
specifies the level of access (control level) to the data set, whether concurrent or exclusive. If you decide to create a concurrency engine data set to take
advantage of its random access optimizations, but you do not need to provide for concurrent access at this time, you can use the CNTLLEV= data set option to
further improve performance. By default, when using the concurrency engine, data sets that are opened for input allow shared read
access, data sets that are opened for output allow no sharing, and data sets that are opened for update allow shared update access. When sharing is allowed, record-level locking is enabled. When you
do not need this feature, you can reduce the overhead of record locking by using CNTLLEV=MEM to disable the sharing.
The CNTLLEV=
data set option takes one of two values:
Each SAS procedure specifies a required control level to the engine, depending on
the intended access of the observations. If you use CNTLLEV=REC and the SAS procedure requires member-level control to ensure the integrity of the data during
processing, a warning is written to the SAS log indicating that inaccurate or unpredictable results can occur if the data set is being updated by another process during the
analysis.
A common example of improving performance by overruling the CNTLLEV default of the procedure is with the FSEDIT procedure, which uses a default
of
CNTLLEV=REC. A session using the FSEDIT procedure with a concurrency engine data set does not need to incur the overhead of record-level locking if concurrent
access is not required. By using the data set option CNTLLEV=MEM, the application tells the engine to override
the control level specification of the procedure because exclusive access at the member level is desired. This disables record-level locking, decreases the overhead for processing the data set, and
improves performance. In tests using the SET statement to input a concurrency engine data set, using the CNTLLEV=MEM option caused the step to run in one-third
the CPU time as the same step using the CNTLLEV=REC option.
For syntax and usage examples for the
CNTLLEV= data set option, see CNTLLEV= and SAS Language Reference:
Dictionary.
-
FIRSTOBS= and OBS=
-
specify a beginning and ending observation to subset your data set.
The value of the FIRSTOBS= data
set option specifies the first observation that should be included for processing in the SAS DATA step. Some engines have to read the records sequentially, discarding them until the requested
observation is reached. Because a concurrency engine data set is a relative format file, the engine can directly access the beginning observation without having to first read any other observations in
the file.
Using the OBS= data set option to specify the last observation that you want to process can improve performance by terminating the input of observations without
having to read records until the end-of-file character is reached.
For more information about the FIRSTOBS= and OBS= data set options, see SAS Language Reference:
Dictionary.
You can use the POINT= option in a SET statement to access contiguous ranges of
observation. For example, with the POINT= option, the SAS program can read observations
10 through 50, then observations 90 through 150, and so on. Obviously, only reading the records that you actually need improves performance by decreasing the number of records you must access. Due to
the physical format of a concurrency engine data set, the engine can access the required records directly.
For most data sets, the disk space that is required for a CONCUR engine data set and a V7 engine data set are comparable.
However, for data sets in which the number of observations is greater than the number of variables, concurrency engine data sets are usually smaller. An exception to this is a concurrency
engine data set that has many variables and only a few observations; in this case, space may be wasted.
However, there is a file format for both uncompressed and compressed
data sets that makes the V7 engine disk space usage more efficient.
Performance is a
main concern for many applications, so it is useful to know how the CONCUR engine compares to the V7 engine when various features of the SAS System are used:
-
Creating data sets
-
When you compare the creation and sequential input of data sets using each engine, the V7 engine tends to be faster when the data sets are small. However, as the size
of the data set increases, the V7 and CONCUR engines are comparable in CPU time used. In all cases, the page faults that are incurred for the CONCUR engine are substantially less than for the V7
engine.
-
Accessing existing data sets
-
When you compare random access of an existing file using both engines, the concurrency engine is much faster. When you use a large bucket size in the concurrency
engine, with a comparable page size in the V7 engine, the concurrency engine takes approximately one-half as much CPU time. When the bucket size and page size are small, the concurrency engine takes
about one-third as much CPU time. Again, page faults for the concurrency engine are substantially less.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.