sasLM for SAS Users

Kyun-Seop Bae

Why sasLM?

Many statisticians move between SAS and R, and they expect the same numbers from both. For unbalanced or complex designs, however, popular R functions often produce sums of squares, standard errors, or least squares means that differ from those of SAS PROC GLM. The differences are not necessarily errors - they come from different conventions (types of sums of squares, coding of singular designs, denominators, quantile definitions, and so on).

sasLM implements the conventions of SAS so that the results match SAS PROC GLM, REG, ANOVA, TTEST, FREQ (2x2 tables), and UNIVARIATE. The package is written in base R with only mvtnorm as an additional dependency, and its results have been validated against SAS outputs and the textbooks listed in ?sasLM.

Procedure-to-function map

SAS sasLM
PROC GLM GLM, aov1, aov2, aov3, EMS
PROC GLM SOLUTION GLM(..., BETA=TRUE), REG
PROC GLM LSMEANS LSM, GLM(..., EMEAN=TRUE)
PROC GLM LSMEANS / PDIFF PDIFF, Diffogram
PROC GLM ESTIMATE/CONTRAST est, ESTM, CIest, CONTR
PROC GLM RANDOM / TEST RanTest, EMS, T3test
PROC GLM SLICE SLICE
PROC REG REG, lr, lr0, regD, Coll
PROC ANOVA aov1
PROC TTEST TTEST, tmtest, mtest, vtest
PROC UNIVARIATE UNIV and the descriptive functions
PROC FREQ (2x2, stratified) RD, RR, OR, RDmn, RRmn, ORmn, ORcmh

PROC GLM

The SAS code

PROC GLM DATA=np;
  CLASS block N P K;
  MODEL yield = block N*P*K / SOLUTION;
  LSMEANS N*P / CL;
RUN;

corresponds to:

GLM(yield ~ block + N*P*K, npk)
$ANOVA
Response : yield
                Df Sum Sq Mean Sq F value  Pr(>F)  
MODEL           11 691.08  62.825  4.0688 0.01156 *
RESIDUALS       12 185.29  15.441                  
CORRECTED TOTAL 23 876.37                          
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$Fitness
 Root MSE yield Mean Coef Var  R-square  Adj R-sq
 3.929447     54.875 7.160724 0.7885736 0.5947661

$`Type I`
      Df Sum Sq Mean Sq F value   Pr(>F)   
block  5 343.29  68.659  4.4467 0.015939 * 
N      1 189.28 189.282 12.2587 0.004372 **
P      1   8.40   8.402  0.5441 0.474904   
N:P    1  21.28  21.282  1.3783 0.263165   
K      1  95.20  95.202  6.1657 0.028795 * 
N:K    1  33.14  33.135  2.1460 0.168648   
P:K    1   0.48   0.482  0.0312 0.862752   
N:P:K  0                                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Type II`
      Df  Sum Sq Mean Sq F value   Pr(>F)   
block  4 306.293  76.573  4.9592 0.013587 * 
N      1 189.282 189.282 12.2587 0.004372 **
P      1   8.402   8.402  0.5441 0.474904   
N:P    1  21.282  21.282  1.3783 0.263165   
K      1  95.202  95.202  6.1657 0.028795 * 
N:K    1  33.135  33.135  2.1460 0.168648   
P:K    1   0.482   0.482  0.0312 0.862752   
N:P:K  0                                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Type III`
CAUTION: Singularity Exists !
      Df  Sum Sq Mean Sq F value   Pr(>F)   
block  4 306.293  76.573  4.9592 0.013587 * 
N      1 189.282 189.282 12.2587 0.004372 **
P      1   8.402   8.402  0.5441 0.474904   
N:P    1  21.282  21.282  1.3783 0.263165   
K      1  95.202  95.202  6.1657 0.028795 * 
N:K    1  33.135  33.135  2.1460 0.168648   
P:K    1   0.482   0.482  0.0312 0.862752   
N:P:K  0                                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

GLM returns the overall ANOVA, fit statistics, and the Type I, II, and III tables in one call. Note the agreement with SAS for this unbalanced design - this is the core purpose of the package.

Coefficients as with the SOLUTION option:

GLM(yield ~ block + N*P*K, npk, BETA=TRUE)$Parameter
            Estimate Estimable Std. Error Df t value  Pr(>|t|)    
(Intercept)   54.600         0     2.7785 12 19.6506 1.713e-10 ***
block1        -2.325         0     2.7785 12 -0.8368   0.41907    
block2         1.100         0     2.7785 12  0.3959   0.69913    
block3         4.425         0     2.7785 12  1.5926   0.13724    
block4        -6.225         0     2.7785 12 -2.2404   0.04477 *  
block5        -5.825         0     2.7785 12 -2.0964   0.05791 .  
block6         0.000         0     0.0000 12                      
N0            -1.383         0     2.7785 12 -0.4979   0.62758    
N1             0.000         0     0.0000 12                      
P0             2.783         0     2.7785 12  1.0017   0.33625    
P1             0.000         0     0.0000 12                      
N0:P0         -3.767         0     3.2084 12 -1.1740   0.26317    
N0:P1          0.000         0     0.0000 12                      
N1:P0          0.000         0     0.0000 12                      
N1:P1          0.000         0     0.0000 12                      
K0             6.050         0     2.7785 12  2.1774   0.05013 .  
K1             0.000         0     0.0000 12                      
N0:K0         -4.700         0     3.2084 12 -1.4649   0.16865    
N0:K1          0.000         0     0.0000 12                      
N1:K0          0.000         0     0.0000 12                      
N1:K1          0.000         0     0.0000 12                      
P0:K0          0.567         0     3.2084 12  0.1766   0.86275    
P0:K1          0.000         0     0.0000 12                      
P1:K0          0.000         0     0.0000 12                      
P1:K1          0.000         0     0.0000 12                      
N0:P0:K0       0.000         0     0.0000 12                      
N0:P0:K1       0.000         0     0.0000 12                      
N0:P1:K0       0.000         0     0.0000 12                      
N0:P1:K1       0.000         0     0.0000 12                      
N1:P0:K0       0.000         0     0.0000 12                      
N1:P0:K1       0.000         0     0.0000 12                      
N1:P1:K0       0.000         0     0.0000 12                      
N1:P1:K1       0.000         0     0.0000 12                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Least squares means with confidence limits:

LSM(weight ~ feed, chickwts, "feed")
          Group   LSmean  LowerCL  UpperCL       SE Df
sunflower  A    328.9167 297.2942 360.5392 15.83391 65
casein     A    323.5833 291.9608 355.2058 15.83391 65
meatmeal    B   276.9091 243.8805 309.9377 16.53798 65
soybean     BC  246.4286 217.1518 275.7053 14.65936 65
linseed      C  218.7500 187.1275 250.3725 15.83391 65
horsebean     D 160.2000 125.5593 194.8407 17.34518 65

Pairwise differences as with LSMEANS / PDIFF, including the Tukey adjustment:

PDIFF(weight ~ feed, chickwts, "feed", adj="tukey")
                      Estimate Lower CL Upper CL Std. Error t value Df  Pr(>|t|)    
casein - horsebean     163.383   94.420  232.347     23.485  6.9568 65 3.070e-08 ***
casein - linseed       104.833   39.079  170.587     22.392  4.6816 65 0.0002100 ***
casein - meatmeal       46.674  -20.558  113.906     22.896  2.0386 65 0.3324584    
casein - soybean        77.155   13.792  140.517     21.578  3.5756 65 0.0083653 ** 
casein - sunflower      -5.333  -71.087   60.421     22.392 -0.2382 65 0.9998902    
horsebean - linseed    -58.550 -127.514   10.414     23.485 -2.4930 65 0.1413329    
horsebean - meatmeal  -116.709 -187.083  -46.335     23.966 -4.8698 65 0.0001062 ***
horsebean - soybean    -86.229 -152.915  -19.542     22.710 -3.7969 65 0.0042167 ** 
horsebean - sunflower -168.717 -237.680  -99.753     23.485 -7.1839 65 1.220e-08 ***
linseed - meatmeal     -58.159 -125.391    9.073     22.896 -2.5402 65 0.1276965    
linseed - soybean      -27.679  -91.041   35.684     21.578 -1.2827 65 0.7932853    
linseed - sunflower   -110.167 -175.921  -44.413     22.392 -4.9198 65 8.843e-05 ***
meatmeal - soybean      30.481  -34.414   95.375     22.100  1.3792 65 0.7391356    
meatmeal - sunflower   -52.008 -119.240   15.224     22.896 -2.2715 65 0.2206962    
soybean - sunflower    -82.488 -145.850  -19.126     21.578 -3.8228 65 0.0038845 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The Dunnett test uses the first level as the control when ref is not given, as SAS does:

PDIFF(weight ~ feed, chickwts, "feed", adj="dunnett")
                   Estimate Lower CL Upper CL Std. Error t value Df  Pr(>|t|)    
horsebean - casein -163.383 -223.959 -102.808     23.485 -6.9568 65 4.559e-09 ***
linseed - casein   -104.833 -162.590  -47.077     22.392 -4.6816 65 6.376e-05 ***
meatmeal - casein   -46.674 -105.728   12.380     22.896 -2.0386 65   0.16711    
soybean - casein    -77.155 -132.810  -21.500     21.578 -3.5756 65   0.00304 ** 
sunflower - casein    5.333  -52.423   63.090     22.392  0.2382 65   0.99945    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Expected mean squares and random effects

EMS produces the expected mean squares table, and RanTest performs the tests with a random factor as the RANDOM / TEST statement of SAS PROC GLM:

EMS(yield ~ block + N*P*K, npk)
      block  N  P  K N:P N:K P:K N:P:K
block     4  0  0  0   0   0   0     0
N         0 12  0  0   6   6   0     3
P         0  0 12  0   6   0   6     3
K         0  0  0 12   0   6   6     3
N:P       0  0  0  0   6   0   0     3
N:K       0  0  0  0   0   6   0     3
P:K       0  0  0  0   0   0   6     3
N:P:K     0  0  0  0   0   0   0     0
RanTest(yield ~ block + N*P*K, npk, Random="block")
$ANOVA
      Df  Sum Sq Mean Sq F value   Pr(>F)   
block  4 306.293  76.573  4.9592 0.013587 * 
N      1 189.282 189.282 12.2587 0.004372 **
P      1   8.402   8.402  0.5441 0.474904   
K      1  95.202  95.202  6.1657 0.028795 * 
N:P    1  21.282  21.282  1.3783 0.263165   
N:K    1  33.135  33.135  2.1460 0.168648   
P:K    1   0.482   0.482  0.0312 0.862752   
N:P:K  0                                    
Error 12 185.287  15.441                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$EMS
      block N P K N:P N:K P:K N:P:K eps
block     4                           1
N           Q       Q   Q         Q   1
P             Q     Q       Q     Q   1
K               Q       Q   Q     Q   1
N:P                 Q             Q   1
N:K                     Q         Q   1
P:K                         Q     Q   1
N:P:K                                 1

attr(,"Caution")
[1] "Test assumption: Q's in the corresponding EMS row are 0."

PROC REG

REG(mpg ~ wt + hp, mtcars)
$ANOVA
Response : mpg
                Df  Sum Sq Mean Sq F value    Pr(>F)    
MODEL            2  931.00  465.50  69.211 9.109e-12 ***
RESIDUALS       29  195.05    6.73                      
CORRECTED TOTAL 31 1126.05                              
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$Fitness
 Root MSE mpg Mean Coef Var  R-square  Adj R-sq    PRESS    R2pred
 2.593412 20.09062 12.90857 0.8267855 0.8148396 246.5063 0.7810871

$Coefficients
            Estimate Std. Error Df Lower CL Upper CL t value  Pr(>|t|)    
(Intercept)   37.227    1.59879 29   33.957   40.497 23.2847 < 2.2e-16 ***
wt            -3.878    0.63273 29   -5.172   -2.584 -6.1287  1.12e-06 ***
hp            -0.032    0.00903 29   -0.050   -0.013 -3.5187  0.001451 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Heteroscedasticity-consistent standard errors (HC0, HC3) and White’s test:

REG(mpg ~ wt, mtcars, HC=TRUE)
$ANOVA
Response : mpg
                Df  Sum Sq Mean Sq F value    Pr(>F)    
MODEL            1  847.73  847.73  91.375 1.294e-10 ***
RESIDUALS       30  278.32    9.28                      
CORRECTED TOTAL 31 1126.05                              
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$Fitness
 Root MSE mpg Mean Coef Var  R-square  Adj R-sq    PRESS    R2pred
 3.045882 20.09062 15.16071 0.7528328 0.7445939 328.0228 0.7086954

$Coefficients
            Estimate Std. Error Df Lower CL Upper CL t value  Pr(>|t|)    
(Intercept)   37.285     1.8776 30   33.450   41.120  19.858 < 2.2e-16 ***
wt            -5.344     0.5591 30   -6.486   -4.203  -9.559 1.294e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$HC0
            Estimate Std. Error Df Lower CL Upper CL t value  Pr(>|t|)    
(Intercept)   37.285     2.1253 30   32.945   41.626 17.5436 < 2.2e-16 ***
wt            -5.344     0.6337 30   -6.639   -4.050 -8.4337 2.062e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$HC3
            Estimate Std. Error Df Lower CL Upper CL t value  Pr(>|t|)    
(Intercept)   37.285    2.42678 30   32.329   42.241 15.3640 8.882e-16 ***
wt            -5.344    0.73811 30   -6.852   -3.837 -7.2408 4.636e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`White Test`
    Chisq        Df         p 
1.0546002 2.0000000 0.5901963 

Weighted least squares follows the SAS WEIGHT statement: observations with nonpositive weights are excluded, and Resid=TRUE returns fitted values and residuals in the original scale as OUTPUT P= R= does.

w = mtcars$cyl
head(REG(mpg ~ wt + hp, mtcars, Weights=w, Resid=TRUE)$Fitted)
[1] 23.16925 22.25023 24.76409 21.02487 18.25006 20.29296

PROC TTEST

TTEST(mtcars$mpg[mtcars$am==0], mtcars$mpg[mtcars$am==1])
$`Statistics by group`
            Mean       SD  N       SEM      LCL      UCL
Test    17.14737 3.833966 19 0.8795722 15.29946 18.99528
Control 24.39231 6.166504 13 1.7102804 20.66593 28.11869

$`Two groups t-test for the difference of means`
Choose one row according to the variance equality assumption.
                     PE     SE     Df     LCL     UCL t value Pr(>|t|)    
If Equal Var.   -7.2449 1.7644 30.000 -10.848 -3.6415 -4.1061 0.000285 ***
If Unequal Var. -7.2449 1.9232 18.332 -11.280 -3.2097 -3.7671 0.001374 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$`Variance confidence intervals by group`
        Variance    VarLCL    VarUCL
Test    14.69930  8.392571  32.14622
Control 38.02577 19.553319 103.61743

$`F-test for the ratio of variances`
                PE     LCL    UCL Num Df Denom Df F value  Pr(>F)  
Var. Ratio 0.38656 0.12437 1.0703     18       12  0.3866 0.06691 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With summarized input (mean, SD, n) only:

tmtest(5.4, 2.2, 30, 4.8, 2.0, 28)

    Welch Two Sample t-test

data:  mean of the test (m1) and mean of the control (m0)
t = 1.0879, df = 55.965, p-value = 0.2813
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.504872  1.704872
sample estimates:
   Test Control 
    5.4     4.8 

PROC UNIVARIATE

UNIV(mtcars$mpg)
         nAll           nNA       nFinite          Mean      Variance            SD            CV 
  32.00000000    0.00000000   32.00000000   20.09062500   36.32410282    6.02694805   29.99880816 
          SEM       LowerCL       UpperCL   TrimmedMean           Min            Q1        Median 
   1.06542396   17.91767851   22.26357149   19.95333333   10.40000000   15.35000000   19.20000000 
           Q3           Max         Range           IQR           MAD         VarLL         VarUL 
  22.80000000   33.90000000   23.50000000    7.45000000    5.41149000   23.34652855   64.20343072 
     Skewness    SkewnessSE      Kurtosis    KurtosisSE GeometricMean   GeometricCV 
   0.67237714    0.41445735   -0.02200629    0.80937129   19.25006404   30.44976349 

The quartiles and the interquartile range use quantile type 2, the SAS default definition.

2x2 tables

Risk difference, relative risk, and odds ratio with score confidence intervals:

RD(7, 10, 3, 10)
   p1  p2  RD       SE        lower     upper
1 0.7 0.3 0.4 0.204939 -0.001673089 0.8016731
RR(7, 10, 3, 10)
   p1  p2       RR     SElog     lower    upper
1 0.7 0.3 2.333333 0.5255383 0.8329862 6.536056
OR(7, 10, 3, 10)
      odd1      odd2       OR     SElog     lower    upper
1 2.333333 0.4285714 5.444444 0.9759001 0.8040183 36.86729

See the vignette Stratified 2x2 Tables for the stratified Miettinen-Nurminen methods and meta-analysis.

Notes