|SAS/ACCESS Interface to SYSTEM 2000 Data Management Software: Reference|
A SYSTEM 2000 database is hierarchical in that you store and access data according to organized relationships between groups of associated data. When a SYSTEM 2000 database is created, a plan is devised, called a database definition. In a database definition
Typically, a database is organized according to the types of data and how you want to use the data. You must understand and be familiar with your database's organization in order to retrieve and update information accurately and efficiently. And you must be familiar with the organization and contents of the database to create descriptor files for the SAS/ACCESS interface. Database Definition shows an excerpt of the EMPLOYEE database definition, which is the output of a SYSTEM 2000 DESCRIBE statement. (For a complete listing of the EMPLOYEE database definition, see Example Data).
SYSTEM RELEASE NUMBER 12.1 DATA BASE NAME IS EMPLOYEE DEFINITION NUMBER 2 DATA BASE CYCLE NUMBER 25 1* EMPLOYEE NUMBER (INTEGER NUMBER 9999) 2* LAST NAME (CHAR X(10) WITH FEW FUTURE OCCURRENCES ) 3* FORENAME (NON-KEY CHAR X(20)) . . . 16* ZIP CODE (CHAR X(5) WITH FEW FUTURE OCCURRENCES ) 100* POSITION WITHIN COMPANY (RECORD) 101* POSITION TITLE (NON-KEY CHAR X(10) IN 100) 102* DEPARTMENT (CHAR X(14) IN 100 WITH SOME FUTURE OCCURRENCES ) 103* MANAGER (CHAR XXX IN 100 WITH FEW FUTURE OCCURRENCES ) 104* POSITION TYPE (CHAR X(12) IN 100 WITH SOME FUTURE OCCURRENCES ) 105* START DATE (DATE IN 100) 106* END DATE (NON-KEY DATE IN 100) 110* SALARY WITHIN POSITION (RECORD IN 100) 111* PAY RATE (MONEY $9999.99 IN 110) 112* PAY SCHEDULE (CHAR X(7) IN 110) 113* EFFECTIVE DATE (DATE IN 110) 114* CURRENT DEDUCTION (NON-KEY MONEY $9999.99 IN 110) . . . 400* EDUCATIONAL BACKGROUND (RECORD) 410* EDUCATION (RECORD IN 400) 411* SCHOOL (CHAR X(15) IN 410) 412* DEGREE/CERTIFICATE (CHAR X(7) IN 410 WITH FEW FUTURE OCCUR RENCES ) 413* DATE COMPLETED (DATE IN 410) 414* MAJOR FIELD (NON-KEY CHAR X(16) IN 410) 415* MINOR FIELD (NON-KEY CHAR X(12) IN 410)
The database name is a unique name, from one to 16 characters long, that is assigned to a specific SYSTEM 2000 database definition. Each database also has one or more passwords associated with it.
To create descriptor files for the SAS/ACCESS interface, you must know the name and a password for the SYSTEM 2000 database you want to access.
A database definition consists of schema records and schema items, which describe a blueprint for the type of data to be stored. For example, in the preceding definition, EMPLOYEE NUMBER is a schema item and POSITION WITHIN COMPANY is a schema record.
A schema item names and defines the characteristics of a group of values; that is, it has a name, a type, and a picture (length). Each value stored in a SYSTEM 2000 database corresponds to a schema item. For example, the following schema item describes the numbers used as employee identification numbers. The four 9s indicate that each employee number can contain up to four digits.
1* EMPLOYEE NUMBER (INTEGER NUMBER 9999)
A schema record groups associated schema items; it is explained in the next section.
Schema items and records are referred to as schema components, and each is identified by both a component number and a component name, as shown below. (A component number can also be referred to as a C-number, for example, C101.)
To access data stored in a SYSTEM 2000 database, you must refer to either the component number or the component name. Both are unique in the database definition to avoid ambiguity. Each line in a database definition begins with the component number and the component name.
When you create descriptor files for SYSTEM 2000 databases, the ACCESS procedure creates corresponding SAS variable names from the SYSTEM 2000 item names. You can then use the variable names in SAS procedures.
In a SYSTEM 2000 database, associated schema items are grouped by schema record. That is, different schema records store different groups of data, and a schema item belongs to only one schema record. Grouping associated schema items into schema records is similar to planning a form; a form is usually divided into sections with one section for each set of related data.
For example, looking at the EMPLOYEE database definition in Database Definition, schema items C1 through C16 pertain to personal information on each employee. These items are grouped in one record, the ENTRY record or C0 record. (The component number and name for the ENTRY record is not listed in a definition unless it has been renamed.) The schema items for information about an employee's position (C101 through C106) are grouped in schema record C100, POSITION WITHIN COMPANY.
|Establishing Relationships between Schema Records|
Relationships between schema records are established by arranging the schema records into levels. That is, each schema record is placed at a certain level, which creates a hierarchical structure.
These levels are achieved by ranking schema items with values that occur once per employee (such as an employee's name and address) at a higher level than schema items with multiple values (such as an employee's job titles and salaries). That is, schema items having a one-to-many relationship with other schema items rank higher in the database hierarchy than those other schema items.
For example, using the EMPLOYEE database, notice the levels illustrated below. (In the database definition, shown in Levels in Database Definition, schema items are indented under their parent schema record, and schema records are stair-stepped to the right, reflecting the relationships between the records.)
Levels in Database Definition
The next set of terms refers to the relationships between the levels, which are like relationships in a family.
Schema records are disjoint if their paths are different. For example, records C110 and C410 are disjoint. When you create a view descriptor, you cannot include items that are from disjoint schema records. That is, items from C110 and items from C410 cannot be included in the same view descriptor.
From your point of view, a database consists of groups of logically related data called logical entries. The database definition serves as a pattern to create logical entries for the database and to interpret them. A logical entry contains groups of related data called data records. A data record is an identifiable set of values treated as a unit and associated with a schema record.
For example, in the EMPLOYEE database, logical entries contain data about employees; that is, all data records pertaining to a single employee make up a single logical entry. Each logical entry has a data record for personal data (such as the employee's name, address, and birthday) and a data record pertaining to the position that the employee holds in the company (such as title, department, manager, and pertinent dates). If the employee has held several positions, there is a data record for each position, as illustrated in Logical Entry.
Using the layout of the database definition, the following figure replaces the schema items with values for one employee. David Reid has held two positions: programmer and assistant programmer. In addition, he has three salary data records for his programmer position.
When you use a view descriptor in a SAS program to access a SYSTEM 2000 database, you must be familiar with the database structure in order to understand how the interface view engine maps a SYSTEM 2000 logical entry into multiple SAS observations and back again. Mapping data between the SAS System and SYSTEM 2000 software is explained next.
|Mapping Data between the SAS System and SYSTEM 2000 Software|
When you create a view descriptor to access data stored in a SYSTEM 2000 database, you define one path in the database. For example, using the EMPLOYEE database, you could define a view descriptor that includes the items LAST NAME, FORENAME, POSITION TITLE, and PAY RATE. When you access the data using the view descriptor, the interface view engine maps the specified path for each logical entry into multiple observations. For example, the logical entry for David Reid (shown above) would be mapped into the following SAS observations:
Logical Entry Mapped into SAS Observations
LASTNAME FORENAME POSITION PAYRATE REID DAVID G. ASSISTANT PROGRAMMER $1,000.00 REID DAVID G. PROGRAMMER $1,100.00 REID DAVID G. PROGRAMMER $1,200.00 REID DAVID G. PROGRAMMER $1,300.00
When browsing SYSTEM 2000 data, such as with the FSVIEW procedure, the results would be similar to Logical Entry Mapped into SAS Observations, with LASTNAME and FORENAME values repeated for each POSITION value and PAYRATE value. Actually, though, the value DAVID G. REID is stored in the database only once. For retrievals, the results are straightforward. When updating, however, you need to keep in mind that values at higher levels in the database generally do not exist as often as they seem to.
Using the FSVIEW procedure, suppose you need to change the spelling of a last name, for example, from Reid to Reed. All you need to do is type 'REED' over one of the values 'REID,' and with a single update operation, the last names are all corrected. On the other hand, using the FSEDIT procedure, suppose you want to delete an observation for David Reid. Each observation for his positions and salary data would display his last name and first name. If you deleted an observation, for example the one for Assistant Programmer, the deletion would not affect his last name and forename values, but the POSITION and PAYRATE values would be physically removed. See Browsing and Updating SYSTEM 2000 Data for more information and an example of deleting an observation from a SYSTEM 2000 logical entry.
|Null Data -- Missing Values|
A logical entry does not have to have data at every level of the database definition. A logical entry can contain nulls; that is, missing values or empty records.
Note: In the SAS System, nulls are referred to as missing
values. SYSTEM 2000 software and the SAS System handle nulls (missing values)
differently. The interface view engine, however, takes care of the differences
in a predictable, useful way. See Missing Values (Nulls)
for a discussion of the differences.
Top of Page
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.