Printed Output
If you request a fitted parametric distribution,
printed output summarizing the fit is produced
in addition to the graphical
display. Figure 4.9 shows the printed
output for a fitted lognormal distribution requested by
the following statements:
proc capability;
spec target=14 lsl=13.95 usl=14.05;
histogram / lognormal(indices midpercents);
run;
The summary is organized into the following parts:
 Parameters
 ChiSquare GoodnessofFit Test
 EDF GoodnessofFit Tests
 Specifications
 Indices Using the Fitted Curve
 Histogram Intervals
 Quantiles
These parts are described in the sections that follow.
Parameters
This section lists the parameters for the fitted curve
as well as the estimated mean and estimated standard
deviation. See "Formulas for Fitted Curves".
The CAPABILITY Procedure 
Fitted Lognormal Distribution for width 
Parameters for Lognormal Distribution 
Parameter 
Symbol 
Estimate 
Threshold 
Theta 
0 
Scale 
Zeta 
2.638966 
Shape 
Sigma 
0.001497 
Mean 

13.99873 
Std Dev 

0.020952 
GoodnessofFit Tests for Lognormal Distribution 
Test 
Statistic 
DF 
p Value 
KolmogorovSmirnov 
D 
0.09148348 

Pr > D 
>0.150 
Cramervon Mises 
WSq 
0.05040427 

Pr > WSq 
>0.500 
AndersonDarling 
ASq 
0.33476355 

Pr > ASq 
>0.500 
ChiSquare 
ChiSq 
2.87938822 
3 
Pr > ChiSq 
0.411 
Capability Indices Based on Lognormal Distribution 
Cp 
0.795463 
CPL 
0.776822 
CPU 
0.814021 
Cpk 
0.776822 
Cpm 
0.792237 
Histogram Bin Percents for Lognormal Distribution 
Bin Midpoint 
Percent 
Observed 
Estimated 
13.95 
4.000 
2.963 
13.97 
18.000 
15.354 
13.99 
26.000 
33.872 
14.01 
38.000 
32.055 
14.03 
10.000 
13.050 
14.05 
4.000 
2.281 
Quantiles for Lognormal Distribution 
Percent 
Quantile 
Observed 
Estimated 
1.0 
13.9440 
13.9501 
5.0 
13.9656 
13.9643 
10.0 
13.9710 
13.9719 
25.0 
13.9860 
13.9846 
50.0 
14.0018 
13.9987 
75.0 
14.0129 
14.0129 
90.0 
14.0218 
14.0256 
95.0 
14.0241 
14.0332 
99.0 
14.0470 
14.0475 

Figure 4.9: Sample Summary of Fitted Distribution
The chisquare goodnessoffit statistic for a fitted
parametric distribution is computed as
follows:
where
O_{i} = observed percentage in i^{ th} histogram interval
E_{i} = expected percentage in i^{ th} histogram interval
m = number of histogram intervals
p = number of estimated parameters
The degrees of freedom for the chisquare test is
equal to mp1. You can save the observed and
expected interval percentages in the OUTFIT= data
set discussed in "Output Data Sets".
Note that
empty intervals are not combined, and the range of intervals used to
compute begins with the first interval containing
observations and ends with the final interval
containing observations.
When you fit a parametric distribution, the HISTOGRAM statement
provides a series of goodnessoffit tests based on the empirical
distribution function (EDF). The EDF tests offer advantages over
the chisquare goodnessoffit test, including improved power and
invariance with respect to the histogram midpoints. For a
thorough discussion, refer to D'Agostino and Stephens (1986).
The empirical distribution function is defined for a set
of n independent observations X_{1}, ... ,X_{n} with a
common distribution function F(x). Denote the observations
ordered from smallest to largest as X_{(1)}, ... ,X_{(n)}.
The empirical distribution function, F_{n}(x), is defined as
Note that F_{n}(x) is a step function that takes a
step of height [1/n] at each observation.
This function estimates the distribution function
F(x). At any value x, F_{n}(x) is the proportion
of observations less than or equal to x, while F(x)
is the probability of an observation less than or equal
to x. EDF statistics measure the discrepancy between
F_{n}(x) and F(x).
The computational formulas for the EDF statistics make
use of the probability integral transformation U=F(X).
If F(X) is the distribution function of X, the random
variable U is uniformly distributed between 0 and 1.
Given n observations X_{(1)}, ... ,X_{(n)},
the values U_{(i)}=F(X_{(i)}) are computed by
applying the transformation, as shown in the
following sections.
The HISTOGRAM statement provides three EDF tests:
 KolmogorovSmirnov
 AndersonDarling
 Cramrvon Mises
These tests are based on
various measures of the discrepancy between the
empirical distribution function F_{n}(x) and the
proposed parametric cumulative distribution function F(x).
The following sections provide formal definitions of the
EDF statistics.
KolmogorovSmirnov Statistic
The KolmogorovSmirnov statistic (D) is defined as
The KolmogorovSmirnov statistic belongs to the
supremum class of EDF statistics.
This class of statistics is based on the
largest vertical difference between F(x) and F_{n}(x).
The KolmogorovSmirnov statistic is computed as the
maximum of D^{+} and D^{}, where D^{+} is the
largest vertical distance between the EDF and the
distribution function when the EDF is greater than
the distribution function, and D^{} is the largest
vertical distance when the EDF is less than the
distribution function.
AndersonDarling Statistic
The AndersonDarling statistic and the Cramrvon Mises
statistic belong to the quadratic class of EDF statistics.
This class of statistics is based on the squared difference
(F_{n}(x) F(x))^{2}. Quadratic statistics have
the following general form:
The function weights the squared difference
(F_{n}(x) F(x))^{2}.
The AndersonDarling statistic (A^{2}) is defined as
Here the weight function is
.The AndersonDarling statistic is computed as
Cramrvon Mises Statistic
The Cramrvon Mises statistic (W^{2}) is defined as
Here the weight function is .The Cramrvon Mises statistic is computed as
Probability Values for EDF Tests
Once the EDF test statistics are computed, the associated
probability values (pvalues) must be calculated. The
CAPABILITY procedure uses internal tables of probability
levels similar to those given by D'Agostino and
Stephens (1986).
If the value is between two probability
levels, then linear interpolation is used to estimate the
probability value.
The probability value depends upon the parameters that
are known and the parameters that are estimated for the
distribution you are fitting. Table 4.17 summarizes
different combinations of estimated parameters for which EDF tests
are available.
Note: The threshold (THETA=) parameter for the beta,
exponential, gamma, lognormal, and Weibull distributions
is assumed to be known. If you do not specify its value,
it is assumed to be zero and known. Likewise, the SIGMA=
parameter, which determines the upper threshold (SIGMA)
for the beta distribution, is assumed to be known; if you
do not specify its value, it is assumed to be one. These
parameters are not listed in Table 4.17 because
they are assumed to be known in all cases, and they do not affect
which EDF statistics are computed.
Table 4.17: Availability of EDF Tests
Distribution

Parameters

EDF Tests Available

Beta  and unknown  none 
 known, unknown  none 
 unknown, known  none 
 and known  all 
Exponential  unknown  all 
 known  all 
Gamma  and unknown  none 
 known, unknown  none 
 unknown, known  none 
 and known  all 
Lognormal  and unknown  all 
 known, unknown  A^{2} and W^{2} 
 unknown, known  A^{2} and W^{2} 
 and known  all 
Normal  and unknown  all 
 known, unknown  A^{2} and W^{2} 
 unknown, known  A^{2} and W^{2} 
 and known  all 
Weibull  c and unknown  A^{2} and W^{2} 
 c known, unknown  A^{2} and W^{2} 
 c unknown, known  A^{2} and W^{2} 
 c and known  all 
Specifications
This section is included in the summary only if
you provide specification limits,
and it tabulates the limits as well as the
observed percentages and estimated percentages
outside the limits.
The estimated percentages are computed
only if fitted distributions are requested and are based on
the probability that an
observed value exceeds the specification limits,
assuming the fitted distribution. The
observed percentages are the percents of
observations outside the specification limits.
This section is included in the summary only if
you specify the INDICES option in parentheses after
a distribution option, as in the statements that produce Figure 4.9.
Standard process capability
indices, such as C_{p} and C_{pk}, are not
appropriate if the data are not normally distributed.
The INDICES option computes generalizations of the
standard indices using the fact that for the normal
distribution, is both the distance from the
lower 0.135 percentile to the median (or mean) and the
distance from the median (or mean) to the upper 99.865
percentile. These percentiles are estimated from the
fitted distribution, and the appropriate
percentiletomedian distances are substituted for
in the standard formulas.
Writing T for the target, LSL and USL
for the lower and upper specification limits, and
for the percentile, the
generalized capability indices are as follows:

C_{pl} = [(P_{0.5}  LSL )/(P_{0.5}P_{0.00135})]

C_{pu} = [(USL  P_{0.5} )/(P_{0.99865}P_{0.5})]

C_{p} = [(USL  LSL)/(P_{0.99865}P_{0.00135})]

C_{pk} = min([(P_{0.5}  LSL)/(P_{0.5}P_{0.00135})],[( USL  P_{0.5})/(P_{0.99865}P_{0.5})])

K = 2 ×[((1/2)( USL+ LSL)  P_{0.5})/( USL  LSL )]
If the data are normally distributed, these formulas reduce
to the formulas for the standard capability indices, which
are given
at "Standard Capability Indices" .
The following guidelines apply to the use of
generalized capability indices requested with
the INDICES option:

When you choose the family of parametric distributions
for the fitted curve,
consider whether an appropriate family can be derived
from assumptions about the process.

Whenever possible,
examine the data distribution with
a histogram, probability plot, or quantilequantile plot.

Apply
goodnessoffit tests to
assess how well the parametric distribution models
the data.

Consider whether a
generalized index
has a meaningful practical interpretation in your application.
At the time of this writing, there is ongoing research concerning
the application of generalized capability indices, and it
is important to note that other approaches can be used with
nonnormal data:

Transform the data to normality, then compute and report
standard capability indices on the transformed scale.

Report the proportion
of nonconforming output
estimated from the fitted
distribution.

If it is not possible to adequately model the data distribution
with a parametric density,
smooth the data
distribution with a kernel density estimate
and simply report the proportion of nonconforming output.
Refer to Rodriguez (1992) for additional discussion.
Histogram Intervals
This section is included in the summary only
if you specify the MIDPERCENTS option in parentheses
after the distribution option, as in the statements that produce Figure 4.9.
This table lists the interval midpoints along with the
observed and estimated percentages of the observations
that lie in the interval. The estimated percentages
are based on the fitted distribution.
In addition, you can specify the MIDPERCENTS option
to request a table of interval
midpoints with the observed percent of observations
that lie in the interval. See the entry for the
MIDPERCENTS option.
Quantiles
This table lists observed and estimated quantiles. You can use
the PERCENTS= option to specify the list of quantiles to
appear in this list. The list in Figure 4.9
is the default list.
See the entry for the
PERCENTS= option.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.