|
Chapter Contents |
Previous |
Next |
| The REG Procedure |
As with most other interactive statements, the PLOT statement implicitly refits the model. For example, if a PLOT statement is preceded by a REWEIGHT statement, the model is recomputed, and the plot reflects the new model.
The PLOT statement cannot be used when TYPE=CORR, TYPE=COV, or TYPE=SSCP data sets are used as input to PROC REG.
You can specify several PLOT statements for each MODEL statement, and you can specify more than one plot in each PLOT statement. For detailed examples of using the PLOT statement and its options, see the section "Producing Scatter Plots".
More than one yvariable*xvariable pair can be specified to request multiple plots. The yvariables and xvariables can be
plot predicted.*residual.;
generates one plot of the predicted values by the residuals for each dependent variable in the MODEL statement. These statistics can also be plotted against any of the variables in the VAR or MODEL statements.
The yvariable and xvariable specifications can be replaced by a set of variables and statistics enclosed in parentheses. When this occurs, all possible combinations of yvariable and xvariable are generated. For example, the following two statements are equivalent.
plot (y1 y2)*(x1 x2);
plot y1*x1 y1*x2 y2*x1 y2*x2;
The statement
plot;
is equivalent to respecifying the most recent PLOT statement without any options. However, the line printer options COLLECT, HPLOTS=, SYMBOL=, and VPLOTS=, described in the "Line Printer Plots" section, apply across PLOT statements and remain in effect if they have been previously specified.
Options used for high resolution graphics plots are described in the following section; see "Line Printer Plots" for more information.
The display of high resolution graphics plots is described in the following paragraphs, the options are summarized in Table 50.3 and described in the section "Dictionary of PLOT Statement Options", and the "Examples" section contains several examples of the graphics output.
Several line printer statements and options are not supported for high resolution graphics. In particular the PAINT statement is disabled, as are the PLOT statement options CLEAR, COLLECT, HPLOTS=, NOCOLLECT, SYMBOL=, and VPLOTS=. To display more than one plot per page or to collect plots from multiple PLOT statements, use the PROC GREPLAY statement (refer to SAS/GRAPH Software: Reference). Also note that high resolution graphics options are not recognized for line printer plots.
The fitted model equation and a label are displayed in the top margin of the plot; this display can be suppressed with the NOMODEL option. If the label is requested but cannot fit on one line, it is not displayed. The equation and label are displayed on one line when possible; if more lines are required, the label is displayed in the first line with the model equation in successive lines. If displaying the entire equation causes the plot to be unacceptably small, the equation is truncated. Table 50.4 lists options to control the display of the equation. The "Examples" section illustrates the display of the model equation.
Four statistics are displayed by default in the right margin: the number of observations, R2, the adjusted R2, and the root mean square error. (See Output 50.4.1.) The display of these statistics can be suppressed with the NOSTAT option. You can specify other options to request the display of various statistics in the right margin; see Table 50.4.
A default reference line at zero is displayed if residuals are plotted; see Output 50.7.1. If the dependent variable is plotted against the independent variable in a simple linear regression model, the fitted regression line is displayed by default. (See Output 50.4.1.) Default reference lines can be suppressed with the NOLINE option; the lines are not displayed if the OVERLAY option is specified.
Specialized plots are requested with special options.
For each coefficient, the RIDGEPLOT option plots the ridge estimates
against the ridge values k; see the description of the
RIDGEPLOT option in the section
"Dictionary of PLOT Statement Options"
and Example 50.10 for more
details. The CONF option plots
% confidence intervals
for the mean while the PRED option plots
%
prediction intervals;
see the description of these options in the section
"Dictionary of PLOT Statement Options"
and in Example 50.9 for more details.
If a SELECTION= method is requested, the fitted model equation and the statistics displayed in the margin correspond to the selected model. For the ADJRSQ and CP methods, the selected model is treated as a submodel of the full model. If a CP.*NP. plot is requested, the CHOCKING= and CMALLOWS= options display model selection reference lines; see the descriptions of these options in the section "Dictionary of PLOT Statement Options" and Example 50.5 for more details.
| Keyword | Description |
| Diagnostic Statistics | |
| COOKD. | Cook's D influence statistics |
| COVRATIO. | standard influence of observation on covariance of betas |
| DFFITS. | standard influence of observation on predicted value |
| H. | leverage |
| LCL. | lower bound of
|
| LCLM. | lower bound of
|
| PREDICTED. | PRED. | P. | predicted values |
| PRESS. | residuals from refitting the model with current observation deleted |
| RESIDUAL. | R. | residuals |
| RSTUDENT. | studentized residuals with the current observation deleted |
| STDI. | standard error of the individual predicted value |
| STDP. | standard error of the mean predicted value |
| STDR. | standard error of the residual |
| STUDENT. | residuals divided by their standard errors |
| UCL. | upper bound of
|
| UCLM. | upper bound of
|
| Other Keywords used with Diagnostic Statistics | |
| NPP. | normal probability-probability plot |
| NQQ. | normal quantile-quantile plot |
| OBS. | observation number (cannot plot against OUTEST= statistics) |
| Model Fit Summary Statistics | |
| ADJRSQ. | adjusted R-square |
| AIC. | Akaike's information criterion |
| BIC. | Sawa's Bayesian information criterion |
| CP. | Mallows' Cp statistic |
| EDF. | error degrees of freedom |
| GMSEP. | estimated MSE of prediction, assuming multivariate normality |
| IN. | number of regressors in the model not including the intercept |
| JP. | final prediction error |
| MSE. | mean squared error |
| NP. | number of parameters in the model (including the intercept) |
| PC. | Amemiya's prediction criterion |
| RMSE. | root MSE |
| RSQ. | R-square |
| SBC. | SBC statistic |
| SP. | SP statistic |
| SSE. | error sum of squares |
| Option | Description |
| General Graphics Options | |
| ANNOTATE= SAS-data-set | specifies the annotate data set |
| CHOCKING=color | requests a reference line for Cp model selection criteria |
| CMALLOWS=color | requests a reference line for the Cp model selection criterion |
| CONF | requests plots of
|
| DESCRIPTION= 'string' | specifies a description for graphics catalog member |
| NAME='string' | names the plot in graphics catalog |
| OVERLAY | overlays plots from the same model |
| PRED | requests plots of
|
| RIDGEPLOT | requests the ridge trace for ridge regression |
| Axis and Legend Options | |
| LEGEND=LEGENDn | specifies LEGEND statement to be used |
| HAXIS=values | specifies tick mark values for horizontal axis |
| VAXIS=values | specifies tick mark values for vertical axis |
| Reference Line Options | |
| HREF=values | specifies reference lines perpendicular to horizontal axis |
| LHREF=linetype | specifies line style for HREF=lines |
| LLINE=linetype | specifies line style for lines displayed by default |
| LVREF=linetype | specifies line style for VREF= lines |
| NOLINE | suppresses display of any default reference line |
| VREF=values | specifies reference lines perpendicular to vertical axis |
| Color Options | |
| CAXIS=color | specifies color for axis line and tick marks |
| CFRAME=color | specifies color for frame |
| CHREF=color | specifies color for HREF=lines |
| CLINE=color | specifies color for lines displayed by default |
| CTEXT=color | specifies color for text |
| CVREF=color | specifies color for VREF= lines |
| Options for Displaying the Fitted Model Equation | |
| MODELFONT=font | specifies font of model equation and model label |
| MODELHT=value | specifies text height of model equation and model label |
| MODELLAB='label' | specifies model label |
| NOMODEL | suppresses display of the fitted model and the label |
| Options for Displaying Statistics in the Plot Margin | |
| AIC | displays Akaike's information criterion |
| BIC | displays Sawa's Bayesian information criterion |
| CP | displays Mallows' Cp statistic |
| EDF | displays the error degrees of freedom |
| GMSEP | displays the estimated MSE of prediction assuming multivariate normality |
| IN | displays the number of regressors in the model not including the intercept |
| JP | displays the Jp statistic |
| MSE | displays the mean squared error |
| NOSTAT | suppresses display of the default statistics: the number of observations, R-square, adjusted R-square, and the root mean square error |
| NP | displays the number of parameters in the model including the intercept, if any |
| PC | displays the PC statistic |
| SBC | displays the SBC statistic |
| SP | displays the S(p) statistic |
| SSE | displays the error sum of squares |
| STATFONT=font | specifies font of text displayed in the margin |
| STATHT=value | specifies height of text displayed in the margin |
For the purpose of parameter estimation, Hocking (1976)
suggests selecting a model where
.For the purpose of prediction, Hocking suggests the criterion
. You can request the single reference line
Cp =p with the CMALLOWS= option. If, for example, you specify both
CHOCKING=RED and CMALLOWS=BLUE,
then the Cp=2p-pfull line is red and the
Cp=p line is blue (see Example 50.5).
Mallows (1973) suggests that all subset models with Cp small and near p be considered for further study. See the CHOCKING= option for related model selection criteria.
If a character variable is used for the symbol, the first (left-most) nonblank character in the formatted value of the variable is used as the plotting symbol. If a character in quotes is specified, that character becomes the plotting symbol. If a character is used as the plotting symbol, and if there are different plotting symbols needed at the same point, the symbol '?' is used at that point.
If an unformatted numeric variable is used for the symbol, the symbols '1', '2', ... , '9' are used for variable values 1, 2, ... , 9. For noninteger values, only the integer portion is used as the plotting symbol. For values of 10 or greater, the symbol '*' is used. For negative values, a '?' is used. If a numeric variable is used, and if there is more than one plotting symbol needed at the same point, the sum of the variable values is used at that point. If the sum exceeds 9, the symbol '*' is used.
If a symbol is not specified, the number of replicates at the point is displayed. The symbol '*' is used if there are ten or more replicates.
If the LINEPRINTER option is used, you can specify the following options in the PLOT statement after a slash (/):
plot residual.*predicted. y*x / collect; run;
produce two plots. If these statements are then followed by
plot residual.*x; run;
two plots are again produced. The first plot shows residual against X values overlaid on residual against predicted values. The second plot is the same as that produced by the first PLOT statement.
Axes are scaled for the first plot or plots collected. The axes are not rescaled as more plots are collected.
Once specified, the COLLECT option remains in effect until the NOCOLLECT option is specified.
For more information, see the COLLECT option.
If the SYMBOL= option has not been specified, the default symbol is '1' for positions with one observation, '2' for positions with two observations, and so on. For positions with more than 9 observations, '*' is used. The SYMBOL= option (or a plotting symbol) is needed to avoid any confusion caused by this default convention. Specifying a particular symbol is especially important when either the OVERLAY or COLLECT option is being used.
If you specify the SYMBOL= option and use a number for character, that number is used for all points in the plot. For example, the statement
plot y*x / symbol='1';
produces a plot with the symbol '1' used for all points.
If you specify a plotting symbol and the SYMBOL= option, the plotting symbol overrides the SYMBOL= option. For example, in the statements
plot y*x y*v='.' / symbol='*';
the symbol used for the plot of Y against X is '*', and a '.' is used for the plot of Y against V.
If a paint symbol is defined with a PAINT statement, the paint symbol takes precedence over both the SYMBOL= option and the default plotting symbol for the PLOT statement.
For example, to specify a total of six plots per page, with two rows of three plots, use the HPLOTS= and VPLOTS= options as follows:
plot y1*x1 y1*x2 y1*x3 y2*x1 y2*x2 y2*x3 /
hplots=3 vplots=2;
run;
|
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.