Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
HISTOGRAM Statement

Formulas for Fitted Curves

The following sections provide information on the families of parametric distributions that you can fit with the HISTOGRAM statement. Properties of these distributions are discussed by Johnson and Kotz (1970).

Beta Distribution

The fitted density function is
p(x) = \{ \frac{(x-\theta)^{\alpha-1}(\sigma+\theta-x)^{\beta-1}}
 { B(\alpha,\b...
 ...ta + \sigma} \ 0 & {for x \leq \theta\space or x \geq \theta + \sigma\space }
 .

where B(\alpha ,\beta )=\frac{\Gamma (\alpha )\Gamma (\beta )}
 {\Gamma (\alpha +\beta )} and  
 
		 \theta = lower threshold parameter (lower endpoint parameter)
		 \sigma = scale parameter (\sigma \gt) 
		 \alpha = shape parameter (\alpha \gt) 
		 \beta = shape parameter (\beta \gt) 
		 h = width of histogram interval
Note: This notation is consistent with that of other distributions that you can fit with the HISTOGRAM statement. However, many texts, including Johnson and Kotz (1970), write the beta density function as
p(x) = \{ \frac{(x - a)^{p - 1} (b - x)^{q - 1} }
 {B(p ,q)(b - a )^{p + q - 1} }
 & {for a \lt x \lt b} \ 0 & {for x \leq a\space or x \geq b\space }
 .

The two notations are related as follows: 
 
		 \sigma = b - a 
		 \theta = a 
		 \alpha = p 
		 \beta = q
The range of the beta distribution is bounded below by a threshold parameter \theta = a and above by \theta + \sigma = b. If you specify a fitted beta curve using the BETA option, \theta must be less than the minimum data value, and \theta + \sigma must be greater than the maximum data value. You can specify \theta and \sigma with the THETA= and SIGMA= beta-options in parentheses after the keyword BETA. By default, \sigma=1 and \theta=0.If you specify THETA=EST and SIGMA=EST, maximum likelihood estimates are computed for \theta and \sigma.

In addition, you can specify \alphaand \beta with the ALPHA= and BETA= beta-options, respectively. By default, the procedure calculates maximum likelihood estimates for \alpha and \beta. For example, to fit a beta density curve to a set of data bounded below by 32 and above by 212 with maximum likelihood estimates for \alpha and \beta, use the following statement:

   histogram length / beta(theta=32 sigma=180);
The beta distributions are also referred to as Pearson Type I or II distributions. These include the power-function distribution (\beta = 1), the arc-sine distribution (\alpha =\beta = \frac{1}2), and the generalized arc-sine distributions (\alpha +\beta =1, \beta \neq \frac{1}2).

You can use the DATA step function BETAINV to compute beta quantiles and the DATA step function PROBBETA to compute beta probabilities.

Exponential Distribution

The fitted density function is
p(x) = \{ \frac{h x 100\%}{\sigma}
 \exp(-(\frac{x - \theta} {\sigma}))
 & {for x \geq \theta} \ 0 & {for x \lt \theta}
 .

where 
 
		 \theta = threshold parameter
		 \sigma = scale parameter (\sigma \gt) 
		 h = width of histogram interval
The threshold parameter \theta must be less than or equal to the minimum data value. You can specify \theta with the THRESHOLD= exponential-option. By default, \theta=0. If you specify THETA=EST, a maximum likelihood estimate is computed for \theta.In addition, you can specify \sigma with the SCALE= exponential-option. By default, the procedure calculates a maximum likelihood estimate for \sigma. Note that some authors define the scale parameter as \frac{1}{\sigma}.

The exponential distribution is a special case of both the gamma distribution (with \alpha=1) and the Weibull distribution (with c=1). A related distribution is the extreme value distribution. If Y = exp(-X) has an exponential distribution, then X has an extreme value distribution.

Gamma Distribution

The fitted density function is
p(x) = \{ \frac{h x 100\%}{\Gamma(\alpha)\sigma}
 (\frac{x - \theta}{\sigma})^{\...
 ...-(\frac{x - \theta}{\sigma}))
 & {for x \gt \theta} \ 0 & {for x \leq \theta}
 .

where 
 
		 \theta = threshold parameter
		 \sigma = scale parameter (\sigma \gt) 
		 \alpha = shape parameter (\alpha \gt) 
		 h = width of histogram interval
The threshold parameter \theta must be less than the minimum data value. You can specify \theta with the THRESHOLD= gamma-option. By default, \theta=0. If you specify THETA=EST, a maximum likelihood estimate is computed for \theta.In addition, you can specify \sigma and \alpha with the SCALE= and ALPHA= gamma-options. By default, the procedure calculates maximum likelihood estimates for \sigma and \alpha. The gamma distributions are also referred to as Pearson Type III distributions, and they include the chi-square, exponential, and Erlang distributions. The probability density function for the chi-square distribution is
p(x) = \{ \frac{1}{2\Gamma (\frac{\nu}2)}
 ( \frac{x}2 )^{\frac{\nu}2 - 1}
 \exp(-\frac{x}2)
 & {for x \gt 0} \ 0 & {for x \leq 0}
 .
Notice that this is a gamma distribution with \alpha = \frac{\nu}2, \sigma=2, and \theta=0.The exponential distribution is a gamma distribution with \alpha=1, and the Erlang distribution is a gamma distribution with \alpha being a positive integer. A related distribution is the Rayleigh distribution. If R = [(max(X1, ... ,Xn))/(min(X1, ... ,Xn))] where the Xi's are independent \chi^2_{\nu} variables, then logR is distributed with a \chi_{\nu}distribution having a probability density function of
p(x) = \{[2^{\frac{\nu}2-1}\Gamma(\frac{\nu}2)]
 ^{-1}x^{\nu-1} \exp(-\frac{x^2}2)
 & {for x \gt 0} \ 0 & {for x \leq 0}
 .
If \nu=2, the preceding distribution is referred to as the Rayleigh distribution.

You can use the DATA step function GAMINV to compute gamma quantiles and the DATA step function PROBGAM to compute gamma probabilities.

Lognormal Distribution

The fitted density function is
p(x) = \{ \frac{h x 100\%}{\sigma\sqrt{2\pi}(x - \theta)}
 \exp(-\frac{(\log(x-\theta)-\zeta)^2}
 {2\sigma^2})
 & {for  x \gt \theta} \ 0 & {for  x \leq \theta}
 .

where 
 
		 \theta = threshold parameter
		 \zeta = scale parameter (-\infty \lt \zeta \lt \infty) 
		 \sigma = shape parameter (\sigma \gt) 
		 h = width of histogram interval

The threshold parameter \theta must be less than the minimum data value. You can specify \theta with the THRESHOLD= lognormal-option. By default, \theta=0. If you specify THETA=EST, a maximum likelihood estimate is computed for \theta.You can specify \zeta and \sigma with the SCALE= and SHAPE= lognormal-options, respectively. By default, the procedure calculates maximum likelihood estimates for these parameters.

Note: This book uses \sigma to denote the shape parameter of the lognormal distribution, whereas \sigmais used to denote the scale parameter of the beta, exponential, gamma, normal, and Weibull distributions. The use of \sigma to denote the lognormal shape parameter is based on the fact that \frac{1}{\sigma}(\log(X-\theta)-\zeta)has a standard normal distribution if X is lognormally distributed.

Normal Distribution

The fitted density function is
p(x) = \frac{h x 100\%}{\sigma\sqrt{2\pi}}
 \exp(-\frac{1}2
 (\frac{x - \mu}{\sigma})^2)
 & {for -\infty \lt x \lt \infty}

where 
 
		 \mu = mean
		 \sigma = standard deviation (\sigma \gt) 
		 h = width of histogram interval
You can specify \mu and \sigmawith the MU= and SIGMA= normal-options, respectively. By default, the procedure estimates \mu with the sample mean and \sigma with the sample standard deviation.

You can use the DATA step function PROBIT to compute normal quantiles and the DATA step function PROBNORM to compute probabilities.

Weibull Distribution

The fitted density function is
p(x) = \{ \frac{ch x 100\%}{\sigma}
 (\frac{x - \theta}{\sigma})^{c - 1}
 \exp(-(\frac{x- \theta}{\sigma})^c)
 & {for  x \gt \theta} \ 0 & {for  x \leq \theta}
 .


where 
 
		 \theta = threshold parameter
		 \sigma = scale parameter (\sigma \gt) 
		 c = shape parameter (c >0) 
		 h = width of histogram interval

The threshold parameter \theta must be less than the minimum data value. You can specify \theta with the THRESHOLD= Weibull-option. By default, \theta=0. If you specify THETA=EST, a maximum likelihood estimate is computed for \theta.You can specify \sigmaand c with the SCALE= and SHAPE= Weibull-options, respectively. By default, the procedure calculates maximum likelihood estimates for \sigma and c.

The exponential distribution is a special case of the Weibull distribution where c=1.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.