Chapter Contents Previous Next
 The NPAR1WAY Procedure

### Statistics Based on the Empirical Distribution Function

If you specify the EDF option, PROC NPAR1WAY computes statistics based on the empirical distribution function. These include Kolmogorov-Smirnov and Cramer-von Mises statistics, and also Kuiper statistics for two-sample data. This section gives formulas for these statistics. For further information on the formulas and the interpretation of EDF statistics, refer to Hollander and Wolfe (1973) and Gibbons and Chakraborti (1992). For details on the k-sample analogues of the Kolmogorov-Smirnov and Cramer-von Mises statistics used by NPAR1WAY, refer to Kiefer (1959).

The empirical distribution function (EDF) of a sample {xj}, j = 1,2, ... ,n, is defined as the following function:

where I(·) is the indicator function. PROC NPAR1WAY uses the subsample of values within the ith class level to generate an EDF, Fi. The EDF for the pooled sample can also be expressed as

where ni is the number of observations in the ith class level, and n is the total number of observations.

#### Kolmogorov-Smirnov Statistics

The Kolmogorov-Smirnov statistic measures the maximum deviation of the EDF within the classes from the pooled EDF. PROC NPAR1WAY computes the Kolmogorov-Smirnov statistic as

The asymptotic Kolmogorov-Smirnov statistic is computed as

In addition to the overall Kolmogorov-Smirnov statistic and the asymptotic statistic, PROC NPAR1WAY displays the values of the Fi at the maximum deviation from F, the values at the maximum deviation from F, the value of F at the maximum deviation, and the point where this maximum deviation occurs.

If there are only two class levels, PROC NPAR1WAY computes the two-sample Kolmogorov statistic as

D = maxj | F1 (xj) - F2(xj) |     where j = 1,2, ... ,n

PROC NPAR1WAY also computes the asymptotic probability of observing a larger test statistic. The quality of this approximation has been studied by Hodges (1957). For tables of the exact distribution of D when the two sample sizes are equal, refer to Lehmann (1975, p. 413). For tables of the exact distribution for unequal sample sizes, refer to Kim and Jennrich (1970, pp. 79 -170).

#### Cramer-von Mises Statistics

The Cramer-von Mises statistic is defined as

where tj is the number of ties at the jth distinct value and p is the number of distinct values. CM measures the integrated deviation of the EDF within the classes to the pooled EDF. PROC NPAR1WAY displays the contribution of each class to the sum together with the sum, which is the asymptotic value formed by multiplying the Cramer-von Mises statistic by the number of observations.

#### Kuiper Statistics

For data with two class levels, PROC NPAR1WAY computes the Kuiper statistic, its scaled value for the asymptotic distribution, and the asymptotic p-value. The Kuiper statistic is computed as

K = maxj ( F1(xj) - F2(xj) ) - minj ( F1(xj) - F2(xj) )     where j = 1,2, ... ,n

The asymptotic value is

The p-value is the probability of observing a larger value of Ka under the null hypothesis of no difference between the two classes.

 Chapter Contents Previous Next Top