Chapter Contents


SAS/ACCESS Software for PC File Formats: Reference

Calculating Statistics

You can also use SAS statistical procedures on PC files data. This section shows examples using the FREQ, MEANS, and RANK procedures.

Using the FREQ Procedure

Suppose you want to find the percentages of your invoices that went to each country so that you can decide where to increase your overseas marketing. The following example uses the view descriptor VLIB.INV to calculate the percentage of invoices for each country that appears in the INVOICE.DBF file.

proc access dbms=dbf;
   /* create access descriptor   */
   create adlib.invoice.access;
   rename invoicenum = invnum
          amtbilled = amtbilld ;
   format paidon      date9.
          invoicenum  5.0
          billedby    6.0;

   create vlib.inv.view;
   /* create vlib.inv view      */
   select invoicenum amtbilled
          country billedby paidon;
   list all;

proc freq data=vlib.inv;
   /* example             */
   tables country;
   title 'Data Described by VLIB.INV';

Frequency Table for Variable COUNTRY Described by View Descriptor VLIB.INV shows the one-way frequency table that this example generates.

Frequency Table for Variable COUNTRY Described by View Descriptor VLIB.INV
                      Data Described by VLIB.INV                          6


                                                 Cumulative  Cumulative
         COUNTRY            Frequency   Percent   Frequency    Percent
         Argentina                 2      11.76           2       11.76
         Australia                 1       5.88           3       17.65
         Brazil                    4      23.53           7       41.18
         USA                      10      58.82          17      100.00

For more information on the FREQ procedure, see SAS Language and Procedures: Usage and SAS Procedures Guide.

Using the MEANS Procedure

In your analysis of recent orders, suppose you also want to calculate some statistics for each U.S. customer. From the ORDERS.XLS file, the view descriptor VLIB.USAORDR selects a subset of observations that have a SHIPTO value beginning with a 1, indicating a U.S. customer.

Using the OUT= option in the SORT procedure, the data from the DBF file are extracted, placed in a SAS data file, and then sorted.

The following example generates the means and sums of the length of material ordered (in yards) and the fabric charges (in dollars) for each U.S. customer. Also included are the number of observations (N) and the number of missing values (NMISS). The MAXDEC= option specifies the number of decimal places (0-8) for PROC MEANS to use in printing the results.

proc access dbms=xls;
    create adlib.order.access;       
    /* create access descriptor    */
    rename dateorderd = dateord
           processdby = procesby;
    format dateorderd  date9.
           shipped     date9.
           ordernum    5.0
           length      4.0
           stocknum    4.0
           takenby     6.0
           processdby  6.0
           fabcharges  12.2;
    list all;

    create vlib.usaordr.view;        
    /* create vlib.usaordr view  */
    select ordernum stocknum length
           fabcharges shipto;
    subset where shipto like '1%';

proc sort data=vlib.usaordr out=work.usaorder;
    by shipto;

proc means data=work.usaordr mean    
     /* example             */
     sum n nmiss maxdec=0;
   by shipto;
   var length fabcharg;
title 'Data Described by VLIB.USAORDR';

Statistics on Fabric Length and Charges for Each U.S. Customer shows the output for this example.

Statistics on Fabric Length and Charges for Each U.S. Customer
                         Data Described by VLIB.USAORDR                        7

-------------------------------- SHIPTO=14324742 -------------------------------

          Variable  Label               Mean           Sum   N  Nmiss
          LENGTH    LENGTH              1095          4380   4      0
          FABCHARG  FABCHARGES       1934460       3868920   2      2

-------------------------------- SHIPTO=14898029 -------------------------------

          Variable  Label               Mean           Sum   N  Nmiss
          LENGTH    LENGTH              2500          5000   2      0
          FABCHARG  FABCHARGES       1400825       2801650   2      0

-------------------------------- SHIPTO=15432147 -------------------------------

          Variable  Label               Mean           Sum   N  Nmiss
          LENGTH    LENGTH               725          2900   4      0
          FABCHARG  FABCHARGES        252149        504297   2      2

-------------------------------- SHIPTO=18543489 -------------------------------

          Variable  Label               Mean           Sum   N  Nmiss
          LENGTH    LENGTH               303          1820   6      0
          FABCHARG  FABCHARGES      11063836      44255344   4      2

-------------------------------- SHIPTO=19783482 -------------------------------

          Variable  Label               Mean           Sum   N  Nmiss
          LENGTH    LENGTH               450          1800   4      0
          FABCHARG  FABCHARGES        252149       1008594   4      0

-------------------------------- SHIPTO=19876078 -------------------------------

          Variable  Label               Mean           Sum   N  Nmiss
          LENGTH    LENGTH               690          1380   2      0
          FABCHARG  FABCHARGES             .             .   0      2

For more information on the MEANS procedure, see SAS Procedures Guide.

Using the RANK Procedure

You can also use more advanced statistical procedures on PC files data. The following example uses the RANK procedure to calculate the order of birthdays for a set of employees who are listed in the EMPLOYEES.DBF file. The OUT= option creates a SAS data file, DLIB.RANKEXAM, from the view descriptor VLIB.EMPS so that the data in the SAS file can be sorted by the SORT procedure. The RANKS statement assigns the name DATERANK to the new variable (in the SAS data file) that is created by the procedure. The PRINT procedure then prints the data that are described by DLIB.RANKEXAM. You can also use the PRINT procedure to print all or some of the PC file data values described by view descriptors.

proc access dbms=dbf;
   create adlib.employ.access;             
   /* create access descriptor  */
   drop salary;
   list all;

   create vlib.emps.view;                   
   /* create vlib.emps view     */
   select empid jobcode birthdate 
      lastname jobcode;
   format birthdate date9.
          empid     6.0;
   subset where jobcode=602;

proc rank data=vlib.emps out=dlib.rankexam;   
   /* example          */
   var birthdat;
   ranks daterank;

proc sort data=dlib.rankexam;
   by lastname;

proc print data=dlib.rankexam(drop=jobcode);
   title 'Order of Dept 602 Employee Birthdays';

Data stored in the DBF file must be extracted and placed in a SAS data set before they can be sorted with a SAS procedure. (This restriction also applies to data from other PC files.) The DROP= data set option is used in the PROC PRINT statement because the JOBCODE variable is not needed in the output. The JOBCODE variable is required in the SELECT statement so it can be used in the WHERE statement. The JOBCODE variable is then included in the view descriptor, even though it is not needed in the output. Ranking of Employee Birthdays shows the result of this example.

Ranking of Employee Birthdays
                      Order of Dept 602 Employee Birthdays

          OBS     EMPID    BIRTHDAT    LASTNAME              DATERANK

            1    456910    24SEP1953     ARDIS                     5
            2    237642    13MAR1954     BATTERSBY                 6
            3    239185    28AUG1959     DOS REMEDIOS              7
            4    321783    03JUN1935     GONZALES                  2
            5    120591    12FEB1946     HAMMERSTEIN               4
            6    135673    21MAR1961     HEMESLY                   8
            7    456921    12MAY1962     KRAUSE                    9
            8    457232    15OCT1963     LOVELL                   11
            9    423286    31OCT1964     MIFUNE                   12
           10    216382    24JUL1963     PURINTON                 10
           11    234967    21DEC1967     SMITH                    13
           12    212916    29MAY1928     WACHBERGER                1
           13    119012    05JAN1946     WOLF-PROVENZA             3

When you use the PRINT procedure, you may want to take advantage of the SAS data set option OBS=, which enables you to limit the number of observations to be processed. This option is especially useful when the view descriptor describes a large amount of data, the SAS data file is large, or when you just want to see an example of the output. You cannot use OBS= if the view descriptor contains a WHERE clause in the SUBSET statement.

For more information on RANK, on other advanced statistical procedures, and for the PRINT procedure, see SAS Procedures Guide. For more information on the OBS= and FIRSTOBS= options, see SAS Language Reference: Dictionary.

Chapter Contents



Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.