|Doing More with SAS/ASSIST Software|
You often need data that are stored in separate tables. For example, you may want to produce a report that gets information about flight delays from one table and boarding capacity from another. Information from separate tables can be combined by matching values in columns that relate the tables. This is called joining tables. You can join a maximum of 16 tables in one query. Note that some limitations apply when you join DB2 tables. For more information, see Additional Considerations for Joining DB2 Tables.
|Introducing Table Joins|
A join combines two or more tables side by side. If you do not specify how to join the tables, you get a Cartesian product. This means that SQL combines each row from the first table with every row from the second table, the second row of the first table is then matched with every row of the second table, and so on. Therefore, two tables with 100 rows each produce 100 * 100, or 10,000, rows. You seldom want all this information, and it often causes you to exceed the available disk space.
To avoid a Cartesian product, you must specify how the tables should be combined. Typically, you want to pair rows based on matching values in one or more key columns of each table. The following example uses the tables in the following figure to illustrate a join.
Joining Two Tables
Pair the SALARY information in Table B with the NAME of the person in Table A by matching the values in the ID columns. Therefore, your join criterion is A.ID=B.ID. In SQL, the code is:
select name, salary from a, b where a.id = b.id;
Only rows with matching values of ID are output. This is known as an inner join. The following figure shows the results of this query.
Output from Joined Tables
You can also join tables that have missing values. For example, ID 3 is missing from Table B. A left join outputs all the names in Table A (the left table) and matches them to the available salary information in Table B. If a row in Table B is missing, missing values are supplied in the joined output. In this case, Michael from Table A is output with a missing value for SALARY next to his name. The following figure shows the results of this query.
Output from Left Joined Tables
A right join works similarly. All the rows from Table B (the right table) are output, and any missing rows from table A are output as missing values (or blanks, if the values are character data). The following figure shows the results of this query.
Output from Right Joined Tables
If you join two tables that both have missing rows, you can combine the effect of a left and a right join by using a full join. The following figure shows the results of this query.
Output from Fully Joined Tables
Inner joins can be performed on up to 16 tables at a time, whereas outer joins (left, right, full) work on only two tables at a time.
|Joining Tables Manually|
If tables selected for joining have a
relation defined between
them, Query and Reporting joins them automatically; see Joining Tables Automatically for more information. If
the tables do not have a relation defined between them, or if they have more
than one such relation defined, you have to join the tables manually. The
following sections show examples of how to join tables manually in both cases,
as well as an example of performing outer joins, which must be done manually.
Follow these instructions to join two tables manually:
The Incomplete Inner join message appears under Join. The warning Not all tables are joined appears because the tables do not contain a relation defined on them. Therefore, Query and Reporting has not automatically joined the tables. The following display shows the Query window with the two tables to be joined.
clear previously-defined subsets or expressions, select subset or Expression. Select Clear from the
Edit menu, then select Close from the File menu.
Query Window with Tables for Manual Join
Note: If you are joining more than five tables, not all of the table names appear
in the Join window; overwrite the first number
in the Tables field to see more tables.
The following display shows the Join Manager window with AIRLINE.JOBCODES and AIRLINE.PAYROLL before they are joined. The Group field indicates whether groups of tables are joined. 1,2 indicates that this group is not joined.
Join Manager Window With Two Tables To Be Joined
1next to A.JOBCODE and
1next to B.JOBCODE and press ENTER. Note that the column names for the two tables are the same in this example. The column names do not have to be the same, but their data types and values must match. You can select Sort by from the Edit menu to reorder columns in the tables by their names, join number, index, or matching names.
After you press ENTER and the tables are joined, the highlighting of the table names is removed, the Group field changes to 1,1 (indicating one group), and a message appears saying All tables are joined. The following display shows the Join Manager window after the tables have been joined.
Join Manager Window With Two Joined Tables
To add a third table to the joined AIRLINE.JOBCODES and AIRLINE.PAYROLL tables, follow these instructions:
The table names are displayed at the top of the Join Manager window. AIRLINE.SCHEDULE is highlighted because it is not yet joined with another table. AIRLINE.SCHEDULE has no table alias under the JOINS/relations field for the same reason. This table is also listed as a separate group in the Group field.
The following display shows the Join Manager window before the third table AIRLINE.SCHEDULE is joined with AIRLINE.JOBCODES and AIRLINE.PAYROLL.
Join Manager Window With Third Table Before Joining
Bin the first Alias field and the alias
Cin the second Alias field. When you press ENTER, the tables' columns are listed. Type
1next to B.IDNUM and C.IDNUM, respectively, and press ENTER.
When the three tables are joined, the highlighting is removed, the Group field changes to 1,1,1 (indicating one group), and a message appears saying All tables are joined, as shown in the following display.
Join Manager Table With Three Joined Tables
When you join tables, Query and Reporting uses an inner join by default. However, you can choose another type of join if you are joining only two tables. In DB2, outer joins are allowed on only two tables. Note that there may be other restrictions under DB2; for further information, see Additional Considerations for Joining DB2 Tables. Outer joins must be performed manually.
1in the A.IDNUM column and
1in the B.SUPID column and press ENTER. (These steps are described earlier in Joining Tables Without Defined Relations.)
Show SQL Window for Right Join
Exit the Show SQL window.
Results of Right Join
|Defining Relations on Tables|
When you query SAS tables, relations provide a way to identify rows and to relate (or join) rows in one table to rows in another table. You define a relation on one or more columns in a table. Relations are based on the database concept of primary and foreign keys; see SAS/ACCESS Software for Relational Databases: Reference for more information.
You define a relation on tables based on one or more columns that tables contain. For example, the SAS AIRLINE.PAYROLL table has a column named IDNUM that has the same attributes as the IDNUM column in AIRLINE.STAFF. Therefore, you would define and name a relation for IDNUM on the two tables.
Defining a relation between or among tables enables you to build intelligence into your query creation process. When one relation is defined on tables that you join in a query, Query and Reporting automatically generates a WHERE or ON clause for that query. This saves you from defining a WHERE predicate for each join query. (See Joining Tables Automatically for more information.) You can, of course, still select Subset from the Query window to add other predicates to your WHERE clause; see Retrieving a Subset of Data for more information.
If no relations are defined or if more than one relation is defined for the tables in the join, you must join the tables manually, as described in Joining Tables Manually and Joining Tables with Multiple Relations.
The examples in this section are used to demonstrate joining tables automatically and joining tables with multiple relations later in this chapter.
1next to the A.FLIGHT and B.FLIGHT columns and press ENTER. A message indicates that the tables are joined.
Relations for Current Tables Window
MARFLT. A relation name can be up to eight characters long.
March - Flight Info Relation. A description can be up to 32 characters long.
Repeat steps 2 through 9 above to define a relation
on the AIRLINE.DEST and AIRLINE.FLINFO tables. Define your relation on the
A.DEST and B.DEST columns. Name this relation
FLDEST and give it a description of
Destination Name Join. Then, repeat steps 5 through 9 to define a second
relation on the AIRLINE.DEST and AIRLINE.FLINFO tables. By typing
2 (instead of
1) next to the columns, define this relation on the A.DEST and B.ORIG
columns. Name this relation
FLORIG and give it a description of
Origin Name Join.
In order to add, edit, or delete a relation, you must have the correct permissions to do so. Permission to modify relations is determined by the Query Manager in effect. Contact your SAS Support Consultant for details.
To modify a relation, follow these instructions:
?in the Relation name field and select the relation from the resulting list.
To delete a relation, follow these instuctions:
?in the Relation name field and select the relation from the resulting list.
|Joining Tables Automatically|
The Query window can join tables automatically if a relation is defined on the tables. In addition, as already discussed, relations can be defined either by a DBA or by a user. (Note that for DB2, if any referential constraints are defined, these constraints are used by the Query Manager to join tables automatically.)
Query and Reporting automatically joins tables that have exactly one relation defined on them. The default action is to create an inner join. (See Joining Tables Manually for information on joining tables without a defined relation and for performing outer joins.)
In this section you join the AIRLINE.MARCH and AIRLINE.FLINFO tables on which you defined a relation in the previous section.
Inext to the table name in the Select window. The Table Information window appears, as shown in the following display.
Table Information Window
At this point, you can customize your query, run it, and save it as described in Saving a Query. Alternatively, you can save the query's output to a table, as described in Creating New Tables. If you intend to access the joined data frequently, it is often more efficient to save the output to a SAS table and then specify the new SAS table in your SAS code.
|Joining Tables with Multiple Relations|
Tables with multiple relations must be joined manually because you have to decide which columns to use to join the tables. This example uses the relations you defined on the AIRLINE.DEST and AIRLINE.FLINFO tables in Defining Relations on Tables.
Join Manager Window With Multiple Relations
Join Manager Window With Multiple Relations After Joining
Return to the Query window to customize, save, or run your query.
|Additional Considerations for Joining DB2 Tables|
you use the Query window, you can perform inner
joins or outer joins (left, right, and full) on DB2 tables. If one primary
key/foreign key relation is defined between the tables, Query and Reporting
joins the related tables automatically. It generates a WHERE or ON clause
that is based on the related column(s) in the tables, such as
If no primary key/foreign key relation is defined or if more than one relation exists between the contributing tables, you must specify manually which columns are used to join the tables (see Joining Tables Manually).
Versions 4 and later of DB2 includes outer join operators. Therefore, when you use the Query window and specify a left, right, or full join, Query and Reporting generates the WHERE or ON clause automatically.
DB2 versions prior to Version 4 lack outer join operators. You can still specify an outer join type in Query and Reporting. If relations are defined on the tables, a WHERE or ON clause is automatically generated. If no relations are defined, you have to join the tables manually. When you review your SQL code (by selecting Show SQL from the View menu), you see that DB2 SQL has used set operators (such as UNION) to produce an outer join result, instead of using outer join operators.
Because earlier versions of DB2 SQL do not have outer join operators, certain limitations apply to outer joins in Query and Reporting:
The following table shows the default values in outer joins.
|Data Type||Default Value|
|Character data||blank (' ')|
|Date||January 1, 0001|
|Timestamp||January 1, 0001, midnight|
Referential integrity is implemented in the DB2 database management systems by specifying conditions where all references from one database column to another column must be validated.
These referential constraints are established by defining primary keys and foreign keys for the columns in the relationship. A primary key defines a unique identifier for the rows of a table. A foreign key is a column where each value contains a value in the primary key of another defined table. After the relationship between a primary key and a foreign key has been established, DB2 enforces the referential integrity for you according to both the rules that DB2 has established and the referential constraints that you have specified.
If referential constraints have been established, the Select window uses
them to generate automatic joins between
tables. In the Query, Select,
or Report Engine windows, you can enter the
I command next to a table or column name to open the Select Table information or Column information window. These windows display information on
for your convenience.
Top of Page
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.