Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
General Statistics Examples

Example 8.5: Categorical Linear Models

This example fits a linear model to a function of the response probabilities

K \log \pi = {X \beta} + e
where K is a matrix that compares each response category to the last. Data are from Kastenbaum and Lamphiear (1959). First, the Grizzle-Starmer-Koch (1969) approach is used to obtain generalized least-squares estimates of {\beta}. These form the initial values for the Newton-Raphson solution for the maximum-likelihood estimates. The CATMOD procedure can also be used to analyze these binary data (refer to Cox 1970).
      /* Categorical Linear Models                    */
      /* by Least Squares and Maximum Likelihood      */
      /*  CATLIN                                      */
      /*  Input:                                      */
      /*     n the s by p matrix of response counts   */
      /*     x the s by r design matrix               */
      /*                                              */

   start catlin;

      /*---find dimensions---*/
      s=nrow(n);                       /* number of populations */
      r=ncol(n);                         /* number of responses */
      q=r-1;                       /* number of function values */
      d=ncol(x);                 /* number of design parameters */
      qd=q*d;                     /* total number of parameters */

      /*---get probability estimates---*/
      rown=n[,+];                                 /* row totals */
      pr=n/(rown*repeat(1,1,r));       /* probability estimates */
      p=shape(pr[,1:q] ,0,1);       /* cut and shaped to vector */
      print "INITIAL PROBABILITY ESTIMATES" ,pr; 

         /*   estimate by the GSK method   */

         /* function of probabilities */
      f=log(p)-log(pr[,r])@repeat(1,q,1);

         /* inverse covariance of f */
      si=(diag(p)-p*p`)#(diag(rown)@repeat(1,q,q));
      z=x@i(q);                       /* expanded design matrix */
      h=z`*si*z;                        /* crossproducts matrix */
      g=z`*si*f;                                /* cross with f */
      beta=solve(h,g);                /* least squares solution */
      stderr=sqrt(vecdiag(inv(h)));          /* standard errors */
      run prob;
      print ,"GSK ESTIMATES" , beta stderr ,pi;

         /*   iterations for ML solution   */
      crit=1;
      do it=1 to 8 while(crit>.0005); /* iterate until converge */

         /* block diagonal weighting  */
         si=(diag(pi)-pi*pi`)#(diag(rown)@repeat(1,q,q));
         g=z`*(rown@repeat(1,q,1)#(p-pi));          /* gradient */
         h=z`*si*z;                                  /* hessian */
         delta=solve(h,g);              /* solve for correction */
         beta=beta+delta;               /* apply the correction */
         run prob;                    /* compute prob estimates */
         crit=max(abs(delta));         /* convergence criterion */
      end;
      stderr=sqrt(vecdiag(inv(h)));          /* standard errors */
      print , "ML Estimates", beta stderr, pi;
      print , "Iterations" it "Criterion" crit;
   finish catlin;

      /*   subroutine to compute new prob estimates @ parameters  */
   start prob;
      la=exp(x*shape(beta,0,q));
      pi=la/((1+la[,+] )*repeat(1,1,q));
      pi=shape(pi,0,1);
   finish prob;


      /*---prepare frequency data and design matrix---*/
   n= { 58 11 05,
        75 19 07,
        49 14 10,
        58 17 08,
        33 18 15,
        45 22 10,
        15 13 15,
        39 22 18,
        04 12 17,
        05 15 08};     /* frequency counts*/

   x= { 1 1 1 0 0 0,
        1 -1 1 0 0 0,
        1 1 0 1 0 0,
        1 -1 0 1 0 0,
        1 1 0 0 1 0,
        1 -1 0 0 1 0,
        1 1 0 0 0 1,
        1 -1 0 0 0 1,
        1 1 -1 -1 -1 -1,
        1 -1 -1 -1 -1 -1};    /* design matrix*/

   run catlin;

The results are shown below.

INITIAL PROBABILITY ESTIMATES
PR
0.7837838 0.1486486 0.0675676
0.7425743 0.1881188 0.0693069
0.6712329 0.1917808 0.1369863
0.6987952 0.2048193 0.0963855
0.5 0.2727273 0.2272727
0.5844156 0.2857143 0.1298701
0.3488372 0.3023256 0.3488372
0.4936709 0.278481 0.2278481
0.1212121 0.3636364 0.5151515
0.1785714 0.5357143 0.2857143

BETA STDERR
0.9454429 0.1290925
0.4003259 0.1284867
-0.277777 0.1164699
-0.278472 0.1255916
1.4146936 0.267351
0.474136 0.294943
0.8464701 0.2362639
0.1526095 0.2633051
0.1952395 0.2214436
0.0723489 0.2366597
-0.514488 0.2171995
-0.400831 0.2285779

PI
0.7402867
0.1674472
0.7704057
0.1745023
0.6624811
0.1917744
0.7061615
0.2047033
0.516981
0.2648871
0.5697446
0.2923278
0.3988695
0.2589096
0.4667924
0.3034204
0.1320359
0.3958019
0.1651907
0.4958784


BETA STDERR
0.9533597 0.1286179
0.4069338 0.1284592
-0.279081 0.1156222
-0.280699 0.1252816
1.4423195 0.2669357
0.4993123 0.2943437
0.8411595 0.2363089
0.1485875 0.2635159
0.1883383 0.2202755
0.0667313 0.236031
-0.527163 0.216581
-0.414965 0.2299618

PI
0.7431759
0.1673155
0.7723266
0.1744421
0.6627266
0.1916645
0.7062766
0.2049216
0.5170782
0.2646857
0.5697771
0.292607
0.3984205
0.2576653
0.4666825
0.3027898
0.1323243
0.3963114
0.165475
0.4972044

  IT   CRIT
Iterations 3 Criterion 0.0004092

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.