Chapter Contents |
Previous |
Next |
The SIM2D Procedure |
A particularly simple method available for Gaussian spatial random fields is the LU decomposition method. This method is computationally efficient. For a given covariance matrix, the LU = LL^{T} decomposition is computed once, and the simulation proceeds by repeatedly generating a vector of independent N(0,1) random variables and multiplying by the L matrix.
One problem with this technique is memory requirements; memory is required to hold the full data and grid covariance matrix in core. While this is especially limiting in the three-dimensional case, you can use PROC SIM2D, which handles only two-dimensional data, for moderately sized simulation problems.
Rather than , what is required is the generation of a vector , that is,
with covariance matrix
If the covariance matrix is symmetric and positive definite, it has a Cholesky root L such that C can be factored as
where L is lower triangular. Refer to Ralston and Rabinowitz (1978, Chapter 9, Section 3-3) for details. This vector Z can be generated by the transformation Z = LW. Note that this is where the assumption of a Gaussian SRF is crucial. When , then Z = LW is also Gaussian. The mean of Z is
Consider now an SRF , with spatial covariance function C(h). Fix locations s_{1},s_{2}, ... ,s_{k}, and let Z denote the random vector
with corresponding covariance matrix
Since this covariance matrix is symmetric and positive definite, it has a Cholesky root, and the Z(s_{i}), i = 1, ... ,k can be simulated as described previously. This is how the SIM2D procedure implements unconditional simulation in the zero mean case. More generally,
For a conditional simulation, this distribution of
must be conditioned on the observed data. The relevant general result concerning conditional distributions of multivariate normal random variables is the following. Let , where
and
The subvector X_{1} is k×1, X_{2} is n×1, is k×k, is n×n, and is k×n, with k+n=m. The full vector X is partitioned into two subvectors X_{1} and X_{2}, and is similarly partitioned into covariances and cross covariances.
With this notation, the distribution of X_{1} conditioned on X_{2} = x_{2} is ,with
Refer to Searle (1971, pp. 46 -47) for details. The correspondence with the conditional spatial simulation problem is as follows. Let the coordinates of the observed data points be denoted ,with values .Let denote the random vector
The random vector corresponds to X_{2}, while Z corresponds to X_{1}. Then as in the previous distribution. The matrix
The dimension n for is simply the number of nonmissing observations for the VAR= variable; the values are the values of this variable. The coordinates are also found in the DATA= data set, with the variables corresponding to the x and y coordinates identified in the COORDINATES statement. Note that all VAR= variables use the same set of conditioning coordinates; this fixes the matrix C_{22} for all simulations.
The dimension k for Z is the number of grid points specified in the GRID statement. Since there is a single GRID statement, this fixes the matrix C_{11} for all simulations. Similarly, C_{12} is fixed.
The Cholesky factorization is computed once, as is the mean correction
For conditional simulations, the largest matrix held in core at any one time depends on the number of grid points and data points. Using the previous notation, the data-data covariance matrix C_{22} is n ×n, where n is the number of nonmissing observations for the VAR= variable in the DATA= data set. The grid-data cross covariance C_{12} is n ×k, where k is the number of grid points. The grid-grid covariance C_{11} is k ×k. The maximum memory required at any one time for storing these matrices is
There are additional memory requirements that add to the total memory usage, but usually these matrix calculations dominate, especially when the number of grid points is large.
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.