|SAS/SPECTRAVIEW Software User's Guide|
The first step in the visualization process is selecting and reading your data into SAS/SPECTRAVIEW. The interface guides you through the process.
When you first invoke SAS/SPECTRAVIEW, [Data] is selected by default, ready for you to load data. Note that you can load data at any time during a SAS/SPECTRAVIEW session by reselecting [Data].
|Selecting a Libref|
To display the session's assigned librefs:
Selecting a Libref
|Selecting a Data Set|
SAS/SPECTRAVIEW works as well with small data sets (such as 20 observations) as it does with large data sets (such as a quarter million observations). The SAS data set that you select must have at least four variables to be specified for the three axis variables and the response variable, the response variable must be numeric, and each variable specified for SAS/SPECTRAVIEW must contain at least two unique values. If you want to use a BY variable, the data set must have a fifth variable as well. To load a data set that has only three variables, see Loading a Data Set with Only Three Variables.
Select the input data set from the list of names. Use the scroll bar if there are more than 10. Once you select the input data set, the software lists the data set's variables in columns from which you can select SAS/SPECTRAVIEW variables.
Selecting a Data Set
|Specifying SAS/SPECTRAVIEW Variables|
You must specify a different data set variable for each SAS/SPECTRAVIEW variable. That is, you must select a different variable from each of the X Variable, Y Variable, Z Variable, and Response variable columns. The axis variables can be either numeric or character, but the response variable must be numeric.
To help you select appropriate variables, you can place your cursor on a variable name, and the software will display a short description of it in the text window. For example, for the EPA data set, which contains the variables HOUR, LEVEL, LNGITUDE, LATITUDE, SULFATE, and OZONE, their descriptions provide the following information:
Type: Num, Label: Sulfate (ppm).
Note that any variable that is appropriate as a Response variable is not a valid choice as an axis variable, and any variable that is appropriate as an axis variable is not a valid choice for a Response variable. Attempting to read a data set with inappropriate variables selected could result in the data set failing to load. You want to specify variables that are the best ones as the axis variables to build as complete a volume grid with actual data points as possible. And you want to avoid specifying axis variables that are sparsely valued or have continuous data.
Specifying SAS/SPECTRAVIEW Variables
Once you select the four required variables, the software highlights [Read data], but you still have the option of specifying BY variable processing, duplicate values handling, data categorizing, automatic axis scaling, and data subsetting with a WHERE clause, which are discussed in the following sections.
|Grouping Observations with a BY Variable|
In addition to the four required variables, you have the option of specifying a fifth variable as a BY variable. The values of a BY variable define groups of observations, such as hour, month, or year. Specifying a BY variable allows you to animate an image so that you can see how response values change according to some grouping, like over time.
A BY variable can be either character or numeric. BY data usually includes multiple response values for a single data point.
For example, in the EPA data set, the variable HOUR contains hour values, which would be useful as a BY variable. If you imagine that the first four variables would generate a cube of data values, then specifying a BY variable would generate a sequence of cubes of data values that can be cycled through to determine how response values change over time (in this case).
If you select LNGITUDE, LATITUDE, and LEVEL as the axis variables, SULFATE as the Response variable, then HOUR as the BY variable, you will create a sequence of volumes of data to be displayed and analyzed.
Specifying a BY Variable
Note: If you do not
specify a BY variable
but your data contains BY data (like a time variable), you may receive a message
in the text window after loading the data. The message warns that there is
more than one response value for an x,y,z coordinate. When this occurs, the
software handles the response values according to the setting on the Duplicate Values panel.
|Handling Duplicate Values|
Duplicate values occur when the data has more than one observation for the same x,y,z coordinate, which could result in more than one response value for a data point. Note that if you also categorize the data or if you have specified a BY variable, the instances of duplicate values may increase.
You determine how the software handles duplicate values by selecting one of the choices under the label Duplicate Values. The default is [Last], which means that the last response value encountered for a data point is used as that location's response value.
Handling Duplicate Values
To specify how the software handles duplicate values, select one of the following options:
When you load data, each response value for the resulting data points represents a count of the observations for that location. If there are no duplicate observations for a particular x,y,z location, the response value is 1, indicating that only one observation was found for that location. Similarly, if the data includes no observations for a particular x,y,z location, the response value would be 0, meaning that the data point is missing. [Count] allows you to find the number of response values that were used to calculate other values, for example, [Mean] or [Sum]. If you load data with [Mean], you may want to know how many values were used to calculate the mean value shown at a particular x,y,z location. You can load again using [Count], then probe the data to reveal the number used for the mean.
With [Nmiss] specified, every data point has a response value indicating how many missing response values were encountered for that location. If a valid data point has five observations and only three had response values, then that data point's response value is 2, meaning two observations were found missing a response value for that location. [Nmiss] only counts valid data points having no response value. It does not count filler points generated by the software. If the data does not contain an observation for an x,y,z location, the software inserts a data point that has a missing response value. This means that if you load a data set, display it as a point cloud, and discover there are several missing values in the volume grid, you can reload the data with [Nmiss] selected and determine which missing values are caused by missing response values as opposed to missing axis values.
Categorizing data is an option that groups numeric data to create distinct ranges (called categories) for each axis. You cannot categorize character variables. The result is a reduced number of data points in the volume grid. By categorizing all three axes, you can set exactly how many data points the software will create. Categorizing data is useful
Continuous data (containing few gaps that vary slightly over a large range like weight and height) are a good candidate for categorizing. For example, to analyze a group of people's heart rate based on their age, activity level, and weight, the weight values, which would be in pounds like 139.5, 143.6, would be considered continuous. That is, it is not likely that any two people (let alone several) would have the same weight but a different age and activity level. Categorizing the weight values by creating weight categories for ranges of weight with one value to represent each category would make the data clearer and easier to use.
Discrete data (containing natural gaps like patient IDs and years) would probably not be as useful to categorize. But discrete data such as hour could be categorized into groups if the degree of precision can be reduced without losing data integrity.
To categorize data:
|[Lower]||Uses the lower bound value in each range.|
|[Midpoint]||Uses the midpoint value in each range. This is the default setting.|
|[Upper]||Uses the upper bound value in each range.|
|[Bounds]||Uses both the upper and lower bound values in each range.
The values display as a range, for example,
Categorizing data makes it more likely that the software encounters more than one response value for a given x,y,z coordinate. (Uncategorized data usually contain only one response value for each x,y,z coordinate.) When one or more of the axes are categorized, some of the data points become duplicates within a group, which could result in more than one response value for a single data point.
For example, suppose values for the X variable are integers from 1 to 100. If you categorize the X values into groups of 10 values, 1-10 would be a single category. The data points 1,1,1 and 2,1,1 and 3,1,1 and so forth are viewed by the software as the same data point in the volume grid, because they would all have the same X, Y, and Z values.
The response values for the 10 data points would appear to be 10 different response values for the same data point. The response values for the duplicate locations are handled according to the method specified for duplicate values handling, with the default being to use the last response value found as the category's response value.
|Automatically Scaling Axes|
By selecting [Auto scale], you can automatically scale the volume's three axes to the same length. The default is that the length of each axis is determined by the range of axis values. For example, an axis with values from 1 to 100 is ten times as long as an axis with values from 1 to 10.
Note: Once a data set is loaded, [Auto scale] is deselected. To
load a subsequent data set with automatic scaling, you must select [Auto scale] again.
|Subsetting Data with a WHERE Clause|
Optionally, you can specify a subset of data to be loaded into SAS/SPECTRAVIEW by specifying condition(s) that observations must meet. You can subset response values by specifying criteria for the response variable, and you can subset data points by specifying criteria for the axis variables.
Subsetting can change the size and shape of the volume grid. For example, subsetting data can create holes that are replaced with filler points, or subsetting can remove holes in data.
Prior to selecting [Read data], you can specify subsetting conditions using a SAS WHERE clause:
sulfate > .00005060.
|Reading the Data Set|
To have the software read the data, select [Read data].
The software loads the input data, applying any optional specifications. For example, if a WHERE clause is specified, the software loads only those observations meeting the criteria, and if categorizing is specified, the software changes the number of data points accordingly. Once the data set is loaded, the variable list disappears, and the software is ready for you to
If you have loading problems, see Resolving Data Loading Problems.
Top of Page
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.