Chapter Contents Previous Next
 The REG Procedure

## Example 55.2: Predicting Weight by Height and Age

In this example, the weights of school children are modeled as a function of their heights and ages. Modeling is performed separately for boys and girls. The example shows the use of a BY statement with PROC REG, multiple MODEL statements, and the OUTEST= and OUTSSCP= options, which create data sets. Since the BY statement is used, interactive processing is not possible in this example; no statements can appear after the first RUN statement. The following statements produce Output 55.2.1 through Output 55.2.4:

```   *------------Data on Age, Weight, and Height of Children-------*
| Age (months), height (inches), and weight (pounds) were      |
| recorded for a group of school children.                     |
| From Lewis and Taylor (1967).                                |
*--------------------------------------------------------------*;

data htwt;
input sex \$ age :3.1 height weight @@;
datalines;
f 143 56.3  85.0 f 155 62.3 105.0 f 153 63.3 108.0 f 161 59.0  92.0
f 191 62.5 112.5 f 171 62.5 112.0 f 185 59.0 104.0 f 142 56.5  69.0
f 160 62.0  94.5 f 140 53.8  68.5 f 139 61.5 104.0 f 178 61.5 103.5
f 157 64.5 123.5 f 149 58.3  93.0 f 143 51.3  50.5 f 145 58.8  89.0
f 191 65.3 107.0 f 150 59.5  78.5 f 147 61.3 115.0 f 180 63.3 114.0
f 141 61.8  85.0 f 140 53.5  81.0 f 164 58.0  83.5 f 176 61.3 112.0
f 185 63.3 101.0 f 166 61.5 103.5 f 175 60.8  93.5 f 180 59.0 112.0
f 210 65.5 140.0 f 146 56.3  83.5 f 170 64.3  90.0 f 162 58.0  84.0
f 149 64.3 110.5 f 139 57.5  96.0 f 186 57.8  95.0 f 197 61.5 121.0
f 169 62.3  99.5 f 177 61.8 142.5 f 185 65.3 118.0 f 182 58.3 104.5
f 173 62.8 102.5 f 166 59.3  89.5 f 168 61.5  95.0 f 169 62.0  98.5
f 150 61.3  94.0 f 184 62.3 108.0 f 139 52.8  63.5 f 147 59.8  84.5
f 144 59.5  93.5 f 177 61.3 112.0 f 178 63.5 148.5 f 197 64.8 112.0
f 146 60.0 109.0 f 145 59.0  91.5 f 147 55.8  75.0 f 145 57.8  84.0
f 155 61.3 107.0 f 167 62.3  92.5 f 183 64.3 109.5 f 143 55.5  84.0
f 183 64.5 102.5 f 185 60.0 106.0 f 148 56.3  77.0 f 147 58.3 111.5
f 154 60.0 114.0 f 156 54.5  75.0 f 144 55.8  73.5 f 154 62.8  93.5
f 152 60.5 105.0 f 191 63.3 113.5 f 190 66.8 140.0 f 140 60.0  77.0
f 148 60.5  84.5 f 189 64.3 113.5 f 143 58.3  77.5 f 178 66.5 117.5
f 164 65.3  98.0 f 157 60.5 112.0 f 147 59.5 101.0 f 148 59.0  95.0
f 177 61.3  81.0 f 171 61.5  91.0 f 172 64.8 142.0 f 190 56.8  98.5
f 183 66.5 112.0 f 143 61.5 116.5 f 179 63.0  98.5 f 186 57.0  83.5
f 182 65.5 133.0 f 182 62.0  91.5 f 142 56.0  72.5 f 165 61.3 106.5
f 165 55.5  67.0 f 154 61.0 122.5 f 150 54.5  74.0 f 155 66.0 144.5
f 163 56.5  84.0 f 141 56.0  72.5 f 147 51.5  64.0 f 210 62.0 116.0
f 171 63.0  84.0 f 167 61.0  93.5 f 182 64.0 111.5 f 144 61.0  92.0
f 193 59.8 115.0 f 141 61.3  85.0 f 164 63.3 108.0 f 186 63.5 108.0
f 169 61.5  85.0 f 175 60.3  86.0 f 180 61.3 110.5 m 165 64.8  98.0
m 157 60.5 105.0 m 144 57.3  76.5 m 150 59.5  84.0 m 150 60.8 128.0
m 139 60.5  87.0 m 189 67.0 128.0 m 183 64.8 111.0 m 147 50.5  79.0
m 146 57.5  90.0 m 160 60.5  84.0 m 156 61.8 112.0 m 173 61.3  93.0
m 151 66.3 117.0 m 141 53.3  84.0 m 150 59.0  99.5 m 164 57.8  95.0
m 153 60.0  84.0 m 206 68.3 134.0 m 250 67.5 171.5 m 176 63.8  98.5
m 176 65.0 118.5 m 140 59.5  94.5 m 185 66.0 105.0 m 180 61.8 104.0
m 146 57.3  83.0 m 183 66.0 105.5 m 140 56.5  84.0 m 151 58.3  86.0
m 151 61.0  81.0 m 144 62.8  94.0 m 160 59.3  78.5 m 178 67.3 119.5
m 193 66.3 133.0 m 162 64.5 119.0 m 164 60.5  95.0 m 186 66.0 112.0
m 143 57.5  75.0 m 175 64.0  92.0 m 175 68.0 112.0 m 175 63.5  98.5
m 173 69.0 112.5 m 170 63.8 112.5 m 174 66.0 108.0 m 164 63.5 108.0
m 144 59.5  88.0 m 156 66.3 106.0 m 149 57.0  92.0 m 144 60.0 117.5
m 147 57.0  84.0 m 188 67.3 112.0 m 169 62.0 100.0 m 172 65.0 112.0
m 150 59.5  84.0 m 193 67.8 127.5 m 157 58.0  80.5 m 168 60.0  93.5
m 140 58.5  86.5 m 156 58.3  92.5 m 156 61.5 108.5 m 158 65.0 121.0
m 184 66.5 112.0 m 156 68.5 114.0 m 144 57.0  84.0 m 176 61.5  81.0
m 168 66.5 111.5 m 149 52.5  81.0 m 142 55.0  70.0 m 188 71.0 140.0
m 203 66.5 117.0 m 142 58.8  84.0 m 189 66.3 112.0 m 188 65.8 150.5
m 200 71.0 147.0 m 152 59.5 105.0 m 174 69.8 119.5 m 166 62.5  84.0
m 145 56.5  91.0 m 143 57.5 101.0 m 163 65.3 117.5 m 166 67.3 121.0
m 182 67.0 133.0 m 173 66.0 112.0 m 155 61.8  91.5 m 162 60.0 105.0
m 177 63.0 111.0 m 177 60.5 112.0 m 175 65.5 114.0 m 166 62.0  91.0
m 150 59.0  98.0 m 150 61.8 118.0 m 188 63.3 115.5 m 163 66.0 112.0
m 171 61.8 112.0 m 162 63.0  91.0 m 141 57.5  85.0 m 174 63.0 112.0
m 142 56.0  87.5 m 148 60.5 118.0 m 140 56.8  83.5 m 160 64.0 116.0
m 144 60.0  89.0 m 206 69.5 171.5 m 159 63.3 112.0 m 149 56.3  72.0
m 193 72.0 150.0 m 194 65.3 134.5 m 152 60.8  97.0 m 146 55.0  71.5
m 139 55.0  73.5 m 186 66.5 112.0 m 161 56.8  75.0 m 153 64.8 128.0
m 196 64.5  98.0 m 164 58.0  84.0 m 159 62.8  99.0 m 178 63.8 112.0
m 153 57.8  79.5 m 155 57.3  80.5 m 178 63.5 102.5 m 142 55.0  76.0
m 164 66.5 112.0 m 189 65.0 114.0 m 164 61.5 140.0 m 167 62.0 107.5
m 151 59.3  87.0
;

title '----- Data on age, weight, and height of children ------';
proc reg outest=est1 outsscp=sscp1 rsquare;
by sex;
eq1: model  weight=height;
eq2: model  weight=height age;
proc print data=sscp1;
title2 'SSCP type data set';
proc print data=est1;
title2 'EST type data set';
run;
```

Output 55.2.1: Height and Weight Data: Female Children

 ----- Data on age, weight, and height of children ------

 The REG Procedure Model: EQ1 Dependent Variable: weight

 sex=f

 Analysis of Variance Source DF Sum ofSquares MeanSquare F Value Pr > F Model 1 21507 21507 141.09 <.0001 Error 109 16615 152.42739 Corrected Total 110 38121

 Root MSE 12.3461 R-Square 0.5642 Dependent Mean 98.8784 Adj R-Sq 0.5602 Coeff Var 12.4862

 Parameter Estimates Variable DF ParameterEstimate StandardError t Value Pr > |t| Intercept 1 -153.12891 21.24814 -7.21 <.0001 height 1 4.16361 0.35052 11.88 <.0001

 ----- Data on age, weight, and height of children ------

 The REG Procedure Model: EQ2 Dependent Variable: weight

 sex=f

 Analysis of Variance Source DF Sum ofSquares MeanSquare F Value Pr > F Model 2 22432 11216 77.21 <.0001 Error 108 15689 145.26700 Corrected Total 110 38121

 Root MSE 12.0527 R-Square 0.5884 Dependent Mean 98.8784 Adj R-Sq 0.5808 Coeff Var 12.1894

 Parameter Estimates Variable DF ParameterEstimate StandardError t Value Pr > |t| Intercept 1 -150.59698 20.76730 -7.25 <.0001 height 1 3.60378 0.40777 8.84 <.0001 age 1 1.90703 0.75543 2.52 0.0130

Output 55.2.2: Height and Weight Data: Male Children

 ----- Data on age, weight, and height of children ------

 The REG Procedure Model: EQ1 Dependent Variable: weight

 sex=m

 Analysis of Variance Source DF Sum ofSquares MeanSquare F Value Pr > F Model 1 31126 31126 206.24 <.0001 Error 124 18714 150.92222 Corrected Total 125 49840

 Root MSE 12.285 R-Square 0.6245 Dependent Mean 103.448 Adj R-Sq 0.6215 Coeff Var 11.8755

 Parameter Estimates Variable DF ParameterEstimate StandardError t Value Pr > |t| Intercept 1 -125.69807 15.99362 -7.86 <.0001 height 1 3.68977 0.25693 14.36 <.0001

 ----- Data on age, weight, and height of children ------

 The REG Procedure Model: EQ2 Dependent Variable: weight

 sex=m

 Analysis of Variance Source DF Sum ofSquares MeanSquare F Value Pr > F Model 2 32975 16487 120.24 <.0001 Error 123 16866 137.11922 Corrected Total 125 49840

 Root MSE 11.7098 R-Square 0.6616 Dependent Mean 103.448 Adj R-Sq 0.6561 Coeff Var 11.3194

 Parameter Estimates Variable DF ParameterEstimate StandardError t Value Pr > |t| Intercept 1 -113.71346 15.59021 -7.29 <.0001 height 1 2.68075 0.36809 7.28 <.0001 age 1 3.08167 0.83927 3.67 0.0004

For both females and males, the overall F statistics for both models are significant, indicating that the model explains a significant portion of the variation in the data. For females, the full model is

weight = -150.57 + 3.60 × height + 1.91 × age

and, for males, the full model is

weight = -113.71 + 2.68 × height + 3.08 × age

Output 55.2.3: SSCP Matrix

 ----- Data on age, weight, and height of children ------ SSCP type data set

 Obs sex _TYPE_ _NAME_ Intercept height weight age 1 f SSCP Intercept 111.0 6718.40 10975.50 1824.90 2 f SSCP height 6718.4 407879.32 669469.85 110818.32 3 f SSCP weight 10975.5 669469.85 1123360.75 182444.95 4 f SSCP age 1824.9 110818.32 182444.95 30363.81 5 f N 111.0 111.00 111.00 111.00 6 m SSCP Intercept 126.0 7825.00 13034.50 2072.10 7 m SSCP height 7825.0 488243.60 817919.60 129432.57 8 m SSCP weight 13034.5 817919.60 1398238.75 217717.45 9 m SSCP age 2072.1 129432.57 217717.45 34515.95 10 m N 126.0 126.00 126.00 126.00

The OUTSSCP= data set is shown in Output 55.2.3. Note how the BY groups are separated. Observations with _TYPE_=`N' contain the number of observations in the associated BY group. Observations with _TYPE_=`SSCP' contain the rows of the uncorrected sums of squares and crossproducts matrix. The observations with _NAME_=`Intercept' contain crossproducts for the intercept.

Output 55.2.4: OUTEST Data Set

 ----- Data on age, weight, and height of children ------ EST type data set

 Obs sex _MODEL_ _TYPE_ _DEPVAR_ _RMSE_ Intercept height weight age _IN_ _P_ _EDF_ _RSQ_ 1 f EQ1 PARMS weight 12.3461 -153.129 4.16361 -1 . 1 2 109 0.56416 2 f EQ2 PARMS weight 12.0527 -150.597 3.60378 -1 1.90703 2 3 108 0.58845 3 m EQ1 PARMS weight 12.2850 -125.698 3.68977 -1 . 1 2 124 0.62451 4 m EQ2 PARMS weight 11.7098 -113.713 2.68075 -1 3.08167 2 3 123 0.66161

The OUTEST= data set is displayed in Output 55.2.4; again, the BY groups are separated. The _MODEL_ column contains the labels for models from the MODEL statements. If no labels are specified, the defaults MODEL1 and MODEL2 would appear as values for _MODEL_. Note that _TYPE_=`PARMS' for all observations, indicating that all observations contain parameter estimates. The _DEPVAR_ column displays the dependent variable, and the _RMSE_ column gives the Root Mean Square Error for the associated model. The Intercept column gives the estimate for the intercept for the associated model, and variables with the same name as variables in the original data set (height, age) give parameter estimates for those variables. The dependent variable, weight, is shown with a value of -1. The _IN_ column contains the number of regressors in the model not including the intercept; _P_ contains the number of parameters in the model; _EDF_ contains the error degrees of freedom; and _RSQ_ contains the R2 statistic. Finally, note that the _IN_, _P_, _EDF_ and _RSQ_ columns appear in the OUTEST= data set since the RSQUARE option is specified in the PROC REG statement.

 Chapter Contents Previous Next Top