Chapter Contents |
Previous |
Next |

SAS/SPECTRAVIEW Software User's Guide |

Loading a Data Set with Only Three Variables |

SAS/SPECTRAVIEW requires four variables in order to load a SAS data set. However, with the following procedure, it is possible to load a data set that has only three variables.

- Create a temporary SAS data set with the following
DATA step code:
data temp; set

*yourdatasetname*; dummy=1; output; dummp =2; output; run; - Load the temporary data set TEMP into SAS/SPECTRAVIEW.
- Select the X and Y axis
variables that you are
interested in, then select DUMMY as the Z variable.
- Select the Response variable that you want.
- Select [
**Read data**]. - Use the data for your analysis.

Changing Axis Variables |

Sometimes data will load with certain axes and response
variables specified but will not with different ones due to memory constraints.
You want to specify variables that are the **best** ones as the axis
variables to build as complete a volume grid with actual data points as possible.
That is, you want to avoid specifying axis variables that are sparsely valued
or have continuous data.

For example, the sample data set MORTGAGE loads without
problems if YEARS, RATE, and AMOUNT are specified as the axis variables.
However, if you specify PAYMENT for an axis and either YEARS, RATE, or AMOUNT
as the response variable, the data may not load, because there are 16,400
unique values for PAYMENT. **Note that if a data set fails to load, the
error message in the text window specifies the number of unique values found
for each axis.**

See Specifying SAS/SPECTRAVIEW Variables for details on specifying variables and determining which variables are best.

Categorizing Data |

One of the main reasons that a data set will not load is that the data does not represent a complete grid, which most often occurs with random data or if the axis values are continuous rather than discrete. The data set may fail to load due to memory constraints, even when a larger data set loaded successfully. The problem is the number of resulting data points in the volume grid, not the number of observations.

Memory requirements for a data set depend on the number of unique X, Y, and Z values, which determines the number of data points that are created. If the number of data points becomes large, the data set may fail to load without additional memory. Of course, it takes thousands and thousands of data points to cause data loading problems.

To make the data clearer and easier to use in SAS/SPECTRAVIEW,
you can **categorize** the data, which groups numeric data to create
distinct ranges (called categories) for each axis. Instructions on how to
categorize data are in Categorizing Data.

Changing Duplicate Values Handling |

Specifying how the software handles duplicate values
can cause data not to load. For example, if you select either [**Count**] or [**Nmiss**] under the label **Duplicate
Values** and the data you want to load comprises a complete grid
having no missing x,y,z locations and no duplicate observations for the same
x,y,z location, the data would fail to load. That is,

- With [
**Count**] specified, the response value for every data point would be 1. The data would fail to load because [**Count**] requires at least two different response values for an x,y,z location. - With [
**Nmiss**] specified, the response value for every data point would be 0. The data would fail to load because [**Nmiss**] requires at least two different response values for an x,y,z location.

Instructions for specifying how the software handles duplicate values are in Handling Duplicate Values.

Removing BY Variable Specification |

Removing the BY variable specification will cut the amount of storage required by the number of BY groups in the data set.

To calculate storage requirements for a BY variable, multiply the number of unique values for each axis variable by the number of BY groups. For example, if you have five BY groups, you would need five times as much storage, because a grid is created for each value of the BY variable.

More information on BY variable processing is in Grouping Observations with a BY Variable.

Using G4GRID Procedure to Create a Complete Grid |

PROC G4GRID enables the loading of a data set that could not otherwise be loaded due to memory constraints. By using PROC G4GRID, you can fill in missing values with interpolated values or resize the data set as required. PROC G4GRID is useful when

- the response values were sampled at discrete locations,
for example, measurements of air pollution.
- the response data is functionally related to the
axis variables. That is, the response is either analytically or physically
a function of the axis variables. Air pollution measurements are a function
of discrete locations identified by axis values, but a stock's price is not
a function of a stock's name. That is, just because Granny's Kitchen stock
price is high does not mean Gerry's Garage stock price is high even though
they fall next to each other in the grid. Smoothing with PROC G4GRID would
lower Granny's stock and raise Gerry's stock because they would be assumed
to influence each other.
- you want a complete grid of values and can accept
some changes from your original values.

Complete documentation for PROC G4GRID is in Appendix 1, "The G4GRID Procedure."

Calculating Volume Grid Storage Requirements |

To understand how to calculate storage requirements, compare the following two DATA step examples.

The first example produces 9,261 observations and would load with no problems. In fact, it is a relatively small data set by SAS/SPECTRAVIEW standards. There are 21 unique values for each axis, which results in a grid that has 9,261 data points (21x21x21). Each data point requires approximately four bytes of storage on most machines. Therefore, it requires 4x9,261=~36KB of storage for the grid.

data load; drop a b c; a=0.3; b=0.2; c=0.1; do x = -1 to 1 by 0.1; do y = -1 to 1 by 0.1; do z = -1 to 1 by 0.1; response = x**2/a**2 + y**2/b**2 + z**2/c**2; output; end; end; end; run;

The second example, however, may not load, even though it has only 100 observations. The number of unique X, Y, and Z values is unknown, but by using the RANUNI function, it can be assumed that it will be close to 100 for each variable. The grid, therefore, requires 100x100x100=1,000,000 data points or about 108 times (~4MB) the storage requirement as compared to the first example.

data noload; drop seed I a b c; seed = -1; a = 0.3; b = 0.2; c = 0.1; do I = 1 to 100; x = 2.0*ranuni(seed) - 1.0; y = 2.0*ranuni(seed) - 1.0; z = 2.0*ranuni(seed) - 1.0; response = x**2/a**2 + y**2/b**2 + z**2/c**2; output; end; run;

Specifying Larger Memory Size |

.
**-memsize 100m**

Note that SAS/SPECTRAVIEW also requires additional memory for overhead, some of which is proportional to the size of the data set. It is possible that, while there is enough memory to build the grid, some other area may not succeed, which will prevent the SAS data set from loading.

Chapter Contents |
Previous |
Next |
Top of Page |

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.