Chapter Contents |
Previous |
Next |
The REG Procedure |
In most applications, many of the variables considered have some predictive power, however small. If you want to choose the model that provides the best prediction using the sample estimates, you need only to guard against estimating more parameters than can be reliably estimated with the given sample size, so you should use a moderate significance level, perhaps in the range of 10 percent to 25 percent.
In addition to R^{2}, the C_{p} statistic is displayed for each model generated in the model-selection methods. The C_{p} statistic is proposed by Mallows (1973) as a criterion for selecting a model. It is a measure of total squared error defined as
where s^{2} is the MSE for the full model, and SSE_{p} is the sum-of-squares error for a model with p parameters including the intercept, if any. If C_{p} is plotted against p, Mallows recommends the model where C_{p} first approaches p. When the right model is chosen, the parameter estimates are unbiased, and this is reflected in C_{p} near p. For further discussion, refer to Daniel and Wood (1980).
The Adjusted R^{2} statistic is an alternative to R^{2} that is adjusted for the number of parameters in the model. The adjusted R^{2} statistic is calculated as
where n is the number of observations used in fitting the model, and i is an indicator variable that is 1 if the model includes an intercept, and 0 otherwise.
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.