|SAS/ACCESS Software for Relational Databases: Reference|
Many of the new SAS/ACCESS LIBNAME options improve performance when you access data on your DBMS. This section provides details on some of these options. See SAS/ACCESS LIBNAME Statement and SAS/ACCESS Data Set Options for additional information.
Note: Some of the options
discussed in this section might not be available for your DBMS. See your DBMS
chapter for more information.
|Passing WHERE Clauses to the DBMS|
The DBKEY= data set option should be used in contexts in which you want to join a large DBMS table and a relatively small SAS data set. If these conditions are not satisfied, the options will not improve performance and, in some cases, they might impact performance negatively.
Here is an example of how performance is improved by using this option:
data keyvalues; deptno=30; output; deptno=10; output; run; libname dblib oracle user=testuser password=testpass path='myorapath'; proc sql; select bigtab.deptno, bigtab.loc from dblib.dept bigtab, keyvalues smallds where bigtab.deptno = smallds.deptno; quit;
In this example, the SQL procedure retrieves all the rows in the large ORACLE table DEPT and applies the WHERE clause during SQL procedure processing in SAS. Processing can be both CPU-intensive and I/O-intensive, if MYTABLE is large. Use the DBKEY= option with the previous example and compare performance:
proc sql; select bigtab.deptno, bigtab.loc from dblib.dept(dbkey=deptno) bigtab, keyvalues smallds where bigtab.deptno = smallds.deptno; quit;
In this example, the DBKEY= option instructs the SQL
procedure to pass the WHERE clause to the SAS/ACCESS engine in a form similar
where deptno=host-variable. The engine then passes this optimized
query to the DBMS server. The host-variable is substituted, one
at a time, with DEPTNO values from the observations in the SAS data file KEYVALUES.
As a result, only rows that match the WHERE clause are retrieved from the
DBMS. Without this option, PROC SQL retrieves all the rows from the DEPT table.
The SQL statement that is created by the SAS/ACCESS engine and passed to the DBMS is similar to the following;
select deptno, loc from bigtab.deptno where deptno=:hostvariable;The host-variable takes the value of the DEPTNO variable from the SAS data file KEYVALUES. The number of SELECT statements issued is equal to the number of rows in the data file. Therefore, for improved performance, the SAS data file should contain relatively fewer rows than the DBMS table to which it is being joined.
The DBKEY= option can also be used in a SAS DATA step, with the KEY= option in the SET statement, to improve the performance of joins. Specify a value of KEY=DBKEY in this situation. The following DATA step creates a new data file by joining the data file KEYVALUES with the DBMS table MYTABLE. The variable DEPTNO is used with the DBKEY= option to cause a WHERE clause to be issued by the SAS/ACCESS engine.
data sasuser.new; set sasuser.keyvalues; set dblib.mytable(dbkey=deptno) key=dbkey; run;
You can also use the DBINDEX= option instead of the DBKEY= option if you know that the DBMS table has one or more indexes that use the column(s) on which the join is being performed. Use DBINDEX=index-name if you know the name of the index, or use DBINDEX=YES if you do not know the name.
|Sorting DBMS Data|
Sorting DBMS data can be resource-intensive, whether it is done using the SORT procedure, a BY statement, or an ORDER BY clause in an SQL procedure's SELECT statement. You should sort data only when sorted data are needed for your program. The following list includes guidelines and information about sorting data:
If you reference this same data set in a SAS program and the program includes a BY statement for a variable that corresponds to a column in the DBMS table, the SAS/ACCESS LIBNAME engine automatically generates an ORDER BY clause for that variable. Thus, the ORDER BY clause causes the DBMS to sort the data before the SAS procedure or DATA step uses the data in a SAS program. If the DBMS table is very large, this sorting can adversely affect your performance. Use a BY variable that is based on an indexed DBMS column to help reduce this negative impact.
libname mydblib oracle user=sasha password=bee path="hrdb_023" schema=hrdept; data exec_employees; set mydblib.staff (keep lastname firstname idnum); by seniority; where salary >= 150000; run; proc sql; select * from exec_employees order by salary;
|Passing Joins to the DBMS|
In versions prior to Version 7, an SQL query involving one or more DBMS tables (or view descriptors, in Version 6) was processed by SAS as if the DBMS tables were individual SAS data files. The SQL procedure fetched all the rows from each DBMS table and performed the join within the SAS. This algorithm performed poorly, especially if the tables were large and SAS and the DBMS communicated over a network.
A more efficient method is to let the DBMS perform the join and return only the results of the join to the client, in this case, SAS. In Version 7 and later, this is how joins are processed which provides a major performance enhancement for programs that perform joins across tables in a single DBMS.
For example, assume two large DBMS tables named TABLE1 and TABLE2 have a column named DEPTNO, and you want to retrieve the rows from an inner join of these tables where the DEPTNO value in TABLE1 is equal to the DEPTNO value in TABLE2.
proc sql; select tab1.deptno, dname from dblib.table1 tab1 dblib.table2 tab2 where tabl.deptno = tab2.deptno using libname dblib oracle user=scott password=tiger path='myserver'; quit;
In Version 8, this join between two tables within the same library (where the libref references a database) is detected by the SQL procedure and passed by the SAS/ACCESS engine directly to the DBMS. The DBMS processes the inner join between the two tables, and only the resulting rows are passed back to the SAS System. Both inner and outer joins between two or more DBMS tables are supported in this enhancement.
Top of Page
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.