Chapter Contents Previous Next
 Macros for the Design and Analysis of Experiments

## ADXTRANS: Determine an Optimal Box-Cox Power Transformation

%adxtrans(dsin, dsout, resp, model, intvllo, intvlhi, numintvl)

where

 dsin names the SAS data set that contains the coded design and the original, untransformed values of the response variable. dsout names the output SAS data set to contain the coded design and the transformed values of the response. The value for dsout can be the same as dsin. In this case the original values of the response variable are replaced by the transformed values. resp names the response variable for analysis. The Box-Cox family of transformations requires all values of resp to be positive. If resp has zero or negative values but you still want to estimate an optimal transformation, add an amount c to each response, where c is greater than the absolute value of the most negative value of resp. model lists the independent variables for analysis. intvllo is the bottom end of the range for computing the likelihood. The default value is -2. intvlhi is the top end of the range for computing the likelihood. The default value is 2. numintvl is the number of intervals tested in the range for computing the likelihood. The default value is 21.

The ADXTRANS macro uses maximum likelihood theory to estimate an optimal transformation within the class of power transformations of the form

When , a limit argument justifies using the transformation z = log(y). (Refer to Box and Cox, 1964.) The algorithm computes the likelihood of the data for several values of in the test range and takes the value for which the likelihood is maximized as the estimated optimal transform. By default, the test range is .

The ADXTRANS macro is useful in situations where the original form of the measurements for the response variable is not the best one to use when analyzing the data. For example, in many situations the original data are not normally distributed, but after applying a log transformation, the transformed data are normally distributed.

Suppose the RESULT data set contains factors T1, T2, and T3 along with values for a response variable BURST. To estimate an optimal Box-Cox power transformation using the defaults for the number of intervals and the ends of the range, use the following statements:

   %adxgen

The design with the transformed values for the response is stored in the TRESULT data set.

The ADXTRANS macro produces an output listing as well as the ADXREG output data set. The ADXREG data set contains the following variables for each value of in the test range:

 ADXCONF a character variable of length 1. The value of ADXCONF is an asterisk (*) if the associated value of is within a 95% confidence interval of the estimated optimum. Otherwise, the value of ADXCONF is a blank. ADXLAM the value of . ADXLIKE the log-likelihood based on the fit of the model to the transformed response. _RMSE_ the root mean squared error based on the fit of the model to the transformed response. effect t-values for estimates of parameters for effects in the model. The names for effect depend on the model. If the parameters in the model are T1 and T2, the ADXREG data set contains new variables T1 and T2, whose values are the t-values for the parameter estimates. The variable that contains t-values for the intercept parameter is named INTERCEP.