Chapter Contents |
Previous |
Next |

The COMPARE Procedure |

- data set attributes (set by the data set options TYPE= and LABEL=).
- variables. PROC COMPARE checks each
variable in one data set to
determine whether it matches a variable in the other data set.
- attributes (type, length, labels, formats, and informats) of matching
variables.
- observations. PROC COMPARE checks each observation in one data
set to determine whether it matches an observation in the other data set.
PROC COMPARE either matches observations by their position in the data sets
or by the values of the ID variable.

After making these comparisons, PROC COMPARE compares the values in the parts of the data sets that match. PROC COMPARE either compares the data by the position of observations or by the values of an ID variable.

A Comparison by Position of Observations |

*Comparison by the Positions of Observations*

When you use PROC COMPARE to compare data set TWO with data set ONE, the procedure compares the first observation in data set ONE with the first observation in data set TWO, and it compares the second observation in the first data set with the second observation in the second data set, and so on. In each observation that it compares, the procedure compares the values of the IDNUM, NAME, GENDER, and GPA.

The procedure does not report on the values of the last two observations or the variable YEAR in data set TWO because there is nothing to compare them with in data set ONE.

A Comparison with an ID Variable |

For the two data sets shown in Comparison by the Value of the ID Variable , assume that IDNUM is an ID variable and that IDNUM has the same type in both data sets. The procedure compares the observations that have the same value for IDNUM. The data inside the shaded boxes show the part of the data sets that the procedure compares.

*Comparison by the Value of the ID Variable*

The data sets contain three
matching variables: NAME, GENDER, and GPA.
They also contain five matching observations - the observations with
values of

,
**2998**

,
**9866**

,
**2118**

, and
**3847**

for IDNUM.
**2342**

Data Set TWO contains two observations (IDNUM=

and IDNUM=
**7565**

) for which data set ONE contains no matching observations. Similarly,
no variable in data set ONE matches the variable YEAR in data set TWO.
**1755**

See Comparing Observations with an ID Variable for an example that uses an ID variable.

The Equality Criterion |

- The EXACT method tests for exact equality.
- The ABSOLUTE method compares the absolute difference to the value
specified by CRITERION=.
- The RELATIVE method compares the absolute relative difference
to the value specified by CRITERION=.
- The PERCENT method compares the absolute percent difference to
the value specified by CRITERION=.

For a numeric variable compared, let **x** be its value in
the base data set and let **y ** be its value in the comparison data
set. If both **x** and **y** are nonmissing, the values
are judged unequal according to the value of METHOD= and the value of CRITERION=
() as follows:

- If METHOD=EXACT, the values are unequal if
**y**does not equal**x**. - If METHOD=ABSOLUTE, the values are unequal if
- If METHOD=RELATIVE, the values are unequal if
The values are equal if

**x**=**y**=0. - If METHOD=PERCENT, the values are unequal if
or

If **x** or **y** is missing, then the comparison
depends on the NOMISSING option. If NOMISSING is in effect, a missing value
will always compare equal to anything. Otherwise, a missing value is judged
equal only to a missing value of the same type, (that is, .=., .^=.A, .A=.A,
.A^=.B, and so on).

If the value specified for CRITERION= is negative, the actual criterion used is made equal to the absolute value of times a very small number &egr; (epsilon) that depends on the numerical precision of the computer. This number &egr; is defined as the smallest positive floating-point value such that, using machine arithmetic, 1-&egr;<1<1+&egr;. Round-off or truncation error in floating-point computations is typically a few orders of magnitude larger than &egr;. This means that CRITERION=-1000 often provides a reasonable test of the equality of computed results at the machine level of precision.

The value added to the denominator in the RELATIVE method is
specified in parentheses after the method name: METHOD=RELATIVE().
If not specified in METHOD=, defaults to 0. The value of
can be used to control the behavior of the error measure when both **x** and **y** are very close to 0. If is not given
and **x** and **y** are very close to 0, any error produces
a large relative error (in the limit, 2).

Specifying a value for avoids this extreme sensitivity of the
RELATIVE method for small values. If you specify METHOD=RELATIVE()
CRITERION= when both **x** and **y** are much smaller
than in absolute value, the comparison is as if you had specified
METHOD=ABSOLUTE CRITERION=. However, when either **x**
or **y** is much larger than in absolute value, the comparison
is like METHOD=RELATIVE CRITERION=. For moderate values of **x** and **y**, METHOD=RELATIVE() CRITERION= is,
in effect, a compromise between METHOD=ABSOLUTE CRITERION=
and METHOD=RELATIVE CRITERION=.

For character variables, if one value has a greater length than the
other, the shorter value is padded with blanks for the comparison. Nonblank
character values are judged equal only if they agree at each character. If
NOMISSING is in effect, blank character values compare equal to anything.

Difference = | |

Percent Difference = | |

Percent Difference = missing for |

Formatted Values |

Chapter Contents |
Previous |
Next |
Top of Page |

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.