 General Statistics Examples

Example 8.1: Correlation

This example defines modules to compute correlation coefficients between numeric variables and standardized values for a set of data.

```      /* Module to compute correlations  */
start corr;
n=nrow(x);                      /* number of observations */
sum=x[+,] ;                        /* compute column sums */
xpx=t(x)*x-t(sum)*sum/n;         /* compute sscp matrix   */
s=diag(1/sqrt(vecdiag(xpx)));           /* scaling matrix */
corr=s*xpx*s;                       /* correlation matrix */
print "Correlation Matrix",,corr[rowname=nm colname=nm] ;
finish corr;
x
/* Module to standardize data */
start std;
mean=x[+,] /n;                       /* means for columns */
x=x-repeat(mean,n,1);            /* center x to mean zero */
ss=x[##,] ;                 /* sum of squares for columns */
std=sqrt(ss/(n-1));         /* standard deviation estimate*/
x=x*diag(1/std);                  /* scaling to std dev 1 */
print ,"Standardized Data",,X[colname=nm] ;
finish std;

/* Sample run */
x = { 1 2 3,
3 2 1,
4 2 1,
0 4 1,
24 1 0,
1 3 8};
nm={age weight height};
run corr;
run std;
```
The results are shown below.

 Correlation Matrix

 CORR AGE WEIGHT HEIGHT AGE 1 -0.717102 -0.436558 WEIGHT -0.717102 1 0.3508232 HEIGHT -0.436558 0.3508232 1

 Standardized Data

 X AGE WEIGHT HEIGHT -0.490116 -0.322749 0.2264554 -0.272287 -0.322749 -0.452911 -0.163372 -0.322749 -0.452911 -0.59903 1.6137431 -0.452911 2.0149206 -1.290994 -0.792594 -0.490116 0.6454972 1.924871

