Chapter Contents |
Previous |
Next |
The VARIOGRAM Procedure |
The VARIOGRAM procedure produces three data sets: the OUTVAR=SAS-data-set, the OUTPAIR=SAS-data-set, and the OUTDIST=SAS-data-set. These data sets are described in the following sections.
The details of the computation of the variogram, the robust variogram, and the covariance is described in the section "Theoretical and Computational Details of the Semivariogram".
The OUTVAR= data set contains the following variables:
The bandwidth variable, BANDW, is not included in the data set if no bandwidth specification is given in the COMPUTE statement or in a DIRECTIONS statement.
For plotting and estimation purposes, it is desirable to have as many points as possible for a variogram plot. However, a rule of thumb used in computing sample semivariograms is to use at least 30 points in each interval whenever possible. Hence, there is a lower limit to the value of the LAGDISTANCE= option.
Since the distribution of pairwise distances is seldom known in advance, the information contained in the OUTDIST= data set enables you to choose, in an iterative fashion, a value for the LAGDISTANCE= parameter. The value you choose is a compromise between the number of pairs making up each variogram point and the number of variogram points.
In some cases, the pattern of measured points may result in some lag or distance classes having a small number of pairs, while the remaining classes have a large number of pairs. By adjusting the value of the LAGDISTANCE= option to honor the rule of thumb (at least 30 pairs), you are "wasting" pairs in the other distance classes.
One strategy for solving this problem is to use less than 30 pairs for these distance classes. Then, either delete the corresponding variogram points or use them and accept the increased uncertainty. Unfortunately, the deficient distance classes are usually those close to the origin (h=0). This is the crucial portion of the experimental variogram curve for determining the form of the theoretical variogram and for detecting the presence of a nugget effect.
Another alternative is to force distance classes to contain approximately the same number of pairs. This results in distance classes of unequal widths.
While PROC VARIOGRAM does not produce such distance classes directly, the OUTPAIR= data set, described in the section "OUTPAIR=SAS-data-set", contains information on all distinct pairs of points. You can use this data set, along with the RANK procedure, to produce experimental variogram-based equal numbers of pairs in each distance class.
To request an OUTDIST= data set, you specify the OUTDIST= data set in the PROC VARIOGRAM statement and the NOVARIOGRAM option in the COMPUTE statement. The NOVARIOGRAM option prevents any variogram or covariance computation from being performed.
The simplest way of determining the distribution of pairwise distances is to determine the maximum distance h_{max} between pairs and divide this distance by some number N of intervals to produce distance classes of length . The distance between each pair of points P_{1}, P_{2}, denoted | P_{1}P_{2} |, is computed, and the pair P_{1}P_{2} is counted in the kth distance class if for k = 1, ... ,N.
The actual computation is a slight variation of this. A bound, rather than the actual maximum distance, is computed. This bound is the length of the diagonal of a bounding rectangle for the data points. This bounding rectangle is found by using the maximum and minimum x and y coordinates, x_{max}, x_{min}, y_{max}, y_{min}, and forming the rectangle determined by the points
(x_{max}, y_{max}), (x_{max}, y_{min}), (x_{min}, y_{min}), (x_{min}, y_{max})
See Figure 70.16 for an illustration of the bounding rectangle.
The pairwise distance bound, denoted by h_{b}, is given by
The lag classes corresponding to h_{0}=1 are shown in Figure 70.17.
By increasing or decreasing the value of the NHCLASSES= option, you can adjust the lag or distance class with the smallest count so that this count is around 30 or some other value that you judge appropriate.
Once you determine an appropriate value for the NHCLASSES= option, you can use the width of the lag classes as a candidate value for the LAGDIST= option in the COMPUTE statement. The width of the lag classes is determined by the upper bound (UB) and lower bound (LB) variables.
For example, read the observation from the OUTDIST= data set corresponding to lag 1 and compute the quantity UB-LB. Use this value for the LAGDIST= option in the COMPUTE statement.
Note: Do not use the 0th lag class; it is half the length of the other intervals. Use lag 1 instead.
The following variables are written to the OUTDIST= data set:
If you specify OUTPDISTANCE=D_{max} in the COMPUTE statement, all pairs P_{1}, P_{2} in the original data set that satisfy the relation are written to the OUTPAIR= data set.
Note that the OUTPAIR= data set can be very large even for a moderately sized DATA= data set.
For example, if the DATA= data set has NOBS=500, the OUTPAIR= data set has NOBS( NOBS-1)/2 =124,750 if no OUTPDISTANCE= restriction is given in the COMPUTE statement.
The OUTPAIR= data set contains information on the distance and orientation for each point pair, and you can use it for specialized continuity measure calculations.
The OUTPAIR= data set contains the following variables:
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.