|SAS/ACCESS Software for Relational Databases: Reference|
|Joining Data from Various Sources|
The SQL procedure provides another way to select and combine data. For example, suppose you have three data sets: two view descriptors, VLIB.CUSPHON and VLIB.CUSORDR, which are based on the ORACLE tables CUSTOMERS and ORDERS, respectively, and a SAS data file, DLIB.OUTOFSTK, which contains product names and numbers that are out of stock.
Note: See the appendix for a description of DLIB.OUTOFSTK.
You can use the SQL procedure to create a view that joins the data from these three sources and displays their output. The SAS WHERE or subsetting IF statements would not be appropriate in this case because you want to compare variables from several sources, rather than simply merging or concatenating the data.
The following SAS statements select and combine data from the view descriptors and the SAS data file to create a PROC SQL view, SLIB.BADORDR. SLIB.BADORDR retrieves customer and product information that the sales department uses to notify customers of unavailable products.
Note: Although this example shows you an alternate way to join data by using
access descriptors, SAS/ACCESS now provides more efficient ways to join your
data by using the LIBNAME statement.
proc access dbms=oracle; /* create access descriptor */ create adlib.customr.access; user=scott; orapw=tiger; path='myorapath'; table=customers; list all; /* create vlib.cusphon view */ create vlib.cusphon.view; select customer phone name; rename customer = custnum; run; /* create access descriptor */ proc access dbms=oracle; create adlib.orders.access; user=scott; orapw=tiger; path='myorapath'; table=orders; list all; /* create vlib.cusordr view */ create vlib.cusordr.view; select ordernum stocknum shipto; rename ordernum ordnum; format ordernum 5.0 stocknum 4.0; run; proc sql; create view slib.badordr as select distinct cusphon.custnum, cusphon.name, cusphon.phone, cusordr.stocknum, outofstk.fibernam as product from vlib.cusphon, vlib.cusordr, dlib.outofstk where cusordr.stocknum=outofstk.fibernum and cusphon.custnum=cusordr.shipto; quit;
The CREATE VIEW statement incorporates a WHERE clause as part of its SELECT clause. The DISTINCT keyword eliminates any duplicate rows of customer numbers that occur when companies order an unavailable product more than once.
Note: It is recommended that you not include an ORDER BY clause in a CREATE VIEW statement. This causes
the output data to be sorted every time the PROC SQL view is submitted and
might have a negative impact on performance. It is more efficient to add
an ORDER BY clause to a SELECT statement that displays your output data, as
options linesize=120; title 'Data Described by SLIB.BADORDR'; select * from slib.badordr order by custnum, product;
This SELECT statement uses the PROC SQL view SLIB.BADORDR to display joined ORACLE and SAS data in ascending order by the CUSTNUM column and then by the PRODUCT (that is, FIBERNAM) column. The data is ordered by PRODUCT because one customer might have ordered more than one product. To select all the columns from the view, use an asterisk (*) in place of column names. When an asterisk is used, the columns are displayed in the order specified in the SLIB.BADORDR view. Data Described by the PROC SQL View SLIB.BADORDR shows the data described by the SLIB.BADORDR view.
Data Described by the PROC SQL View SLIB.BADORDR
Data Described by SLIB.BADORDER CUSTOMER NAME PHONE STOCKNUM PRODUCT -------------------------------------------------------------------------------------------------------- 15432147 GREAT LAKES LABORATORY EQUIPMENT MANUFACTURERS 616/582-3906 4789 dacron 18543489 LONE STAR STATE RESEARCH SUPPLIERS 512/478-0788 8934 gold 29834248 BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY (0552)715311 3478 olefin 31548901 NATIONAL COUNCIL FOR MATERIALS RESEARCH 406/422-3413 8934 gold 43459747 RESEARCH OUTFITTERS 03/734-5111 8934 gold
Although the query uses SAS variable names like CUSTNUM, you may notice that the output uses DBMS column names like CUSTOMER. By default, PROC SQL displays SAS variable labels, which default to DBMS column names. (You can use the NOLABEL option to change this default.)
|Creating New Columns and Using the GROUP BY Clause|
Instead of creating a new PROC SQL view, you might want to summarize your data and create new columns in a report. Although you cannot use the ACCESS procedure to create new columns, you can easily do this by using the SQL procedure with data that is described by a view descriptor.
This example uses the SQL procedure to retrieve and manipulate data from the view descriptor VLIB.ALLEMP, which is based on the DB2 table SASDEMO.EMPLOYEES. When this query (as a SELECT statement is often called) is submitted, it calculates and displays the average salary for each department. The query enables you to manipulate your data and display the results without creating a SAS data set.
Because this example reports on employees' salaries, the view descriptor VLIB.ALLEMP is assigned a SAS System password (MONEY) using the DATASETS procedure. Because of the READ= level of protection, the password must be specified in the PROC SQL SELECT statement before you can see the DB2 data accessed by VLIB.ALLEMP.
In the following example, the DISTINCT keyword in the SELECT statement removes duplicate rows. The AVG function in the SQL procedure is equivalent to the SAS MEAN function.
options linesize=80; proc access dbms=db2; /* create access descriptor */ create adlib.employ.access; ssid=db2; table=sasdemo.employees; assign=yes; format empid 6.0 salary dollar12.2 jobcode 5.0 birthdate date9. hiredate date9.; list all; run; /* create vlib.allemp view */ proc access dbms=db2 accdesc=adlib.employ; create vlib.allemp.view; select all; run; /* assign a password */ proc datasets library=vlib memtype=view; modify allemp (read=money); run; /* example */ title 'Average Salary Per ACC Department'; proc sql; select distinct dept, avg(salary) label='Average Salary' format=dollar12.2 from vlib.allemp(pw=money) where dept like 'ACC%' group by dept;
The columns are displayed in the order specified in the SELECT clause of the query. Data Retrieved by an SQL Procedure Query shows the result of the query.
Data Retrieved by an SQL Procedure Query
Average Salary Per ACC Department Average DEPT Salary -------------------- ACC013 $54,591.33 ACC024 $55,370.55 ACC043 $75,000.34
To delete a password on an access or view descriptor or any SAS data set, put a slash after the password:
/* delete the password */ proc datasets library=vlib memtype=view; modify allemp (read=money/); run;
|Using Advanced PROC SQL Features|
This example combines a number of PROC SQL components including summary functions, a HAVING clause with a subquery, and an inline view within that subquery. The reason you use each component and the way in which each is evaluated are described following the example. The example displays the employees who took the most orders that have already shipped. It uses the VLIB.ALLEMP and VLIB.ALLORDR view descriptors to join data from the tables SASDEMO.EMPLOYEES and SASDEMO.ORDERS.
options ls=80; proc access dbms=db2; /* create access descriptor */ create adlib.employ.access; database=sample; table=sasdemo.employees; assign=yes; format empid 6.0 salary dollar12.2 jobcode 5.0 birthdate date9. hiredate date9.; /* create vlib.allemp view */ create vlib.allemp.view; select all; list view; run; proc access dbms=db2; /* create access descriptor */ create adlib.order.access; database=sample; table=sasdemo.orders; assign=yes; rename dateorderd = dateord processdby = procesby; format dateorderd datetime9. shipped datetime9. ordernum 5.0 length 4.0 stocknum 4.0 takenby 6.0 processdby 6.0 fabcharges 12.2; /* create vlib.allordr view */ create vlib.allordr.view; select all; list view; run;
proc sql; title 'Employees Who Took the Most Orders That Shipped'; select distinct lastname label "Took Orders", takenby, count(shipped) as ordship from vlib.allemp, vlib.allordr where takenby=empid group by takenby having ordship= (select max(ordship) from (select distinct takenby, count(shipped) as ordship from vlib.allemp, vlib.allordr where takenby=empid group by takenby)); quit;You begin to evaluate the query at its innermost level. Here the query begins by evaluating the inline view:
from (select distinct takenby, count(shipped) as ordship from vlib.allemp, vlib.allordr where takenby=empid group by takenby));
The inline view lists employees who have taken orders and counts the number of orders that have been shipped. A column alias ORDSHIP is assigned to the number of orders and is used elsewhere in the query.
TAKENBY ORDSHIP ----------------- 119012 6 212916 0 234967 0 321783 6 456910 5
The SELECT MAX(ORDSHIP) clause uses the results of the inline view to determine the highest number of orders taken and shipped, 6. When this amount is supplied to the outer query, it evaluates as if the query were written:
proc sql; select distinct lastname label "Took Orders", takenby, count(shipped) as ordship from vlib.allemp, vlib.allordr where takenby=empid group by takenby having ordship=6;
The first part of the outer query adds the names of
the employees who took the orders and joins the data from the two view descriptors
with matching EMPLOYEE numbers (
The COUNT function computes the number of shipped orders that were taken by
each employee to reduce the number of rows to one per employee. The HAVING
expression then selects rows if the number of orders is 6. Data on Orders Shipped shows the results of the PROC
Data on Orders Shipped
Employees Who Took the Most Orders That Shipped Took Orders TAKENBY ORDSHIP ------------------------------------- GONZALES 321783 6 WOLF-PROVENZA 119012 6
|Accessing Tables That Are Located on Different Nodes or Databases|
In a networking environment, you can often access data from DBMS tables that is stored on different machines or in different databases. When using the SAS/ACCESS Interface to SYBASE, for example, you use the SERVER= and DATABASE= statements to specify the locations of the tables that you want to access. Use TABLE= to specify the names of the SYBASE tables.
In the following example, you create access descriptors and view descriptors for SYBASE tables that have different owners and are stored in databases that reside on different machines. The USER= and PASSWORD= statements identify the owners of the EMPLOYEES and INVOICE tables and their passwords.
After creating the descriptors, you use the SQL procedure to join the tables' data. Because the database identification information is stored permanently in each descriptor, PROC SQL can use the view descriptors to access and join the remote SYBASE data:
/* create access descriptor */ proc access dbms=sybase; create work.employ.access; server=server1; database=personnel; user=carmen; password=aria; table=employees; /* create vlib.emp_acc view */ create vlib.emp_acc.view; select all; format empid 6.0 salary dollar12.2 jobcode 5.0 hiredate date9. birthdate date9.; subset where DEPT like 'ACC%'; list all; run; /* create access descriptor */ proc access dbms=sybase; create work.invoice.access; server=server2; database=inventory; user=joachim; password=machauf; table=invoice; /* create vlib.sainv view */ create vlib.sainv.view; select all; rename invoicenum invnum amtbilled amtbilld amountinus amtinus; format invoicenum 5. billedby 6. amtbilled 15.2 amountinus 15.2 billedon date9. paidon date9.; subset where COUNTRY in ('Argentina','Brazil'); list all; run;SYBASE objects, such as table names and columns, are case sensitive. The WHERE clauses in PROC ACCESS SUBSET statements are passed to SYBASE exactly as you type them, so you must use the correct case for SYBASE column names. The database identification statements and column names in all other statements are converted to uppercase unless they are enclosed in quotes.
options linesize=120; title 'South American Invoices and Who Submitted Them'; proc sql; select invnum, country, billedon, paidon, billedby, lastname, firstnam from vlib.emp_acc, vlib.sainv where emp_acc.empid=sainv.billedby; quit;
Data Joined from Tables in Different Databases shows the results of the PROC SQL query.
Data Joined from Tables in Different Databases
South American Invoices and Who Submitted Them INVOICENUM COUNTRY BILLEDON PAIDON BILLEDBY LASTNAME FIRSTNAME -------------------------------------------------------------------------------------------------- 12476 Argentina 24DEC1998 . 135673 HEMESLY STEPHANIE 11270 Brazil 05OCT1998 18OCT1998 239185 DOS REMEDIOS LEONARD 11285 Argentina 10OCT1998 30NOV1998 239185 DOS REMEDIOS LEONARD 11280 Brazil 07OCT1998 20OCT1998 423286 MIFUNE YUKIO 12051 Brazil 02NOV1998 . 457232 LOVELL WILLIAM 12471 Brazil 27DEC1998 . 457232 LOVELL WILLIAM
Top of Page
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.