Acknowledgement: the materials below are partially based on Montgomery, D. C., Peck, E. A., Vining, G. G., Introduction to Linear Regression Analysis (5th Edition), Wiley Series in Probability and Statistics, 2012.

Example

Let us start with a data set. Suppose we want to predict and model patients’ total body fat percentages using measurements of body fat percentages on triceps, thigh, and mid-arm.

y: total body fat percentages

x1: triceps body fat percentages

x2: thigh body fat percentages

x3: mid-arm body fat percentages

BF <- read.csv("data_BodyFat.csv",h=T)
dim(BF)[1]
## [1] 20
names(BF)
## [1] "x1" "x2" "x3" "y"
head(BF)
##     x1   x2   x3    y
## 1 19.5 43.1 29.1 11.9
## 2 24.7 49.8 28.2 22.8
## 3 30.7 51.9 37.0 18.7
## 4 29.8 54.3 31.1 20.1
## 5 19.1 42.2 30.9 12.9
## 6 25.6 53.9 23.7 21.7
pairs(BF,pch=20)

cor(BF)
##           x1        x2        x3         y
## x1 1.0000000 0.9238425 0.4577772 0.8432654
## x2 0.9238425 1.0000000 0.0846675 0.8780896
## x3 0.4577772 0.0846675 1.0000000 0.1424440
## y  0.8432654 0.8780896 0.1424440 1.0000000
model1=lm(y~x1+x2+x3,data=BF)
summary(model1)
## 
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = BF)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7263 -1.6111  0.3923  1.4656  4.1277 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  117.085     99.782   1.173    0.258
## x1             4.334      3.016   1.437    0.170
## x2            -2.857      2.582  -1.106    0.285
## x3            -2.186      1.595  -1.370    0.190
## 
## Residual standard error: 2.48 on 16 degrees of freedom
## Multiple R-squared:  0.8014, Adjusted R-squared:  0.7641 
## F-statistic: 21.52 on 3 and 16 DF,  p-value: 7.343e-06

High correlation between triceps and thigh measurements but not with midarm.

Multicollinearity

Issues with Multicollinearity

Figure examples:

Informal Diagnostics

Variance Inflation Factors

\[ VIF_j=\frac{1}{1-R^2_j} \]

Deal with Multicollinearity

Methods for Dealing with Multicollinearity

Ridge Regression

Ridge Regression modifies method of least squares to biased estimators of the regression coefficients

Traditional least square estimation: \[ \hat{\beta} = \arg\min \sum_{i=1}^{n} (y_i - \beta_0 - \sum_{j=1}^{k} \beta_j x_{ij})^2 \]

Ridge Regression estimation: \[ \hat{\beta}_\lambda = \arg\min \sum_{i=1}^{n} (y_i - \beta_0 - \sum_{j=1}^{k} \beta_j x_{ij})^2 + \lambda \sum_{j=1}^{p} \beta_j^2 \]

Plots \(\lambda\) against the coefficient estimates

Will show impact of multicollinearity in the stability of the coefficients

Choose \(\lambda\) such that \(\hat{\beta}_\lambda\) is stable and the MSE is acceptable

Ridge regression is a good alternative if the model user wants to have all regressors in the model

#install.packages("glmnet")
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 3.0-1
library(MASS)
vif <- function(mod, ...) {
    if (any(is.na(coef(mod)))) 
        stop ("there are aliased coefficients in the model")
    v <- vcov(mod)
    assign <- attr(model.matrix(mod), "assign")
    if (names(coefficients(mod)[1]) == "(Intercept)") {
        v <- v[-1, -1]
        assign <- assign[-1]
    }
    else warning("No intercept: vifs may not be sensible.")
    terms <- labels(terms(mod))
    n.terms <- length(terms)
    if (n.terms < 2) stop("model contains fewer than 2 terms")
    R <- cov2cor(v)
    detR <- det(R)
    result <- matrix(0, n.terms, 3)
    rownames(result) <- terms
    colnames(result) <- c("GVIF", "Df", "GVIF^(1/(2*Df))")
    for (term in 1:n.terms) {
        subs <- which(assign == term)
        result[term, 1] <- det(as.matrix(R[subs, subs])) *
            det(as.matrix(R[-subs, -subs])) / detR
        result[term, 2] <- length(subs)
    }
    if (all(result[, 2] == 1)) result <- result[, 1]
    else result[, 3] <- result[, 1]^(1/(2 * result[, 2]))
    result
}
vif(model1)
##       x1       x2       x3 
## 708.8429 564.3434 104.6060
library(glmnet)
rreg=glmnet(x=as.matrix(BF[,-4]), y=BF[,4],alpha=0,lambda = seq(0,0.5,0.01))
rbind(rreg$lambda,rreg$beta)
## 4 x 51 sparse Matrix of class "dgCMatrix"
##    [[ suppressing 51 column names 's0', 's1', 's2' ... ]]
##                                                                     
##     0.5000000  0.4900000  0.4800000  0.4700000  0.4600000  0.4500000
## x1  0.4312373  0.4316196  0.4322645  0.4330645  0.4339617  0.4349253
## x2  0.4370659  0.4375796  0.4378913  0.4380835  0.4382005  0.4382662
## x3 -0.1141151 -0.1146001 -0.1152155 -0.1159092 -0.1166533 -0.1174327
##                                                                     
##     0.4400000  0.4300000  0.4200000  0.4100000  0.4000000  0.3900000
## x1  0.4359388  0.4369938  0.4380862  0.4392147  0.4403794  0.4415814
## x2  0.4382931  0.4382877  0.4382527  0.4381892  0.4380969  0.4379748
## x3 -0.1182396 -0.1190699 -0.1199218 -0.1207947 -0.1216887 -0.1226044
##                                                                     
##     0.3800000  0.3700000  0.3600000  0.3500000  0.3400000  0.3300000
## x1  0.4428224  0.4441045  0.4454302  0.4468025  0.4482245  0.4496999
## x2  0.4378214  0.4376348  0.4374127  0.4371527  0.4368520  0.4365074
## x3 -0.1235429 -0.1245053 -0.1254929 -0.1265073 -0.1275503 -0.1286238
##                                                                     
##     0.3200000  0.3100000  0.3000000  0.2900000  0.2800000  0.2700000
## x1  0.4512324  0.4528265  0.4547170  0.4566152  0.4585564  0.4605643
## x2  0.4361157  0.4356731  0.4349902  0.4342984  0.4335692  0.4327830
## x3 -0.1297298 -0.1308707 -0.1321646 -0.1334671 -0.1347957 -0.1361629
##                                                                     
##     0.2600000  0.2500000  0.2400000  0.2300000  0.2200000  0.2100000
## x1  0.4626572  0.4651257  0.4676284  0.4704937  0.4733934  0.4766835
## x2  0.4319249  0.4307581  0.4295587  0.4280588  0.4265246  0.4246646
## x3 -0.1375778 -0.1391880 -0.1408211 -0.1426441 -0.1444910 -0.1465431
##                                                                     
##     0.2000000  0.1900000  0.1800000  0.1700000  0.1600000  0.1500000
## x1  0.4800450  0.4838628  0.4881014  0.4927671  0.4979024  0.5035803
## x2  0.4227384  0.4204291  0.4177636  0.4147346  0.4113047  0.4074109
## x3 -0.1486386 -0.1509747 -0.1535340 -0.1563209 -0.1593584 -0.1626850
##                                                                     
##     0.1400000  0.1300000  0.1200000  0.1100000  0.1000000  0.0900000
## x1  0.5099019  0.5172989  0.5258012  0.5358791  0.5476771  0.5616054
## x2  0.4029663  0.3976080  0.3913056  0.3836600  0.3745417  0.3635975
## x3 -0.1663544 -0.1705907 -0.1754120 -0.1810646 -0.1876272 -0.1953170
##                                                                     
##     0.0800000  0.0700000  0.0600000  0.0500000  0.0400000  0.0300000
## x1  0.5792731  0.6012218  0.6302994  0.6697241  0.7271691  0.8186159
## x2  0.3494620  0.3316591  0.3077599  0.2750022  0.2268213  0.1495351
## x3 -0.2049759 -0.2168954 -0.2325715 -0.2537031 -0.2843309 -0.3328752
##                                     
##     0.020000000  0.0100000  .       
## x1  0.987221926  1.4028661  4.113612
## x2  0.006197198 -0.3486234 -2.668242
## x3 -0.422073583 -0.6414275 -2.069967
matplot(rreg$lambda,t(rreg$beta),type="l",ylab="coefficient",xlab="lambda")
abline(h=0)

#select(rreg)
#rreg

Comments on Ridge Regression

\(R^2\) can be defined analogously to least square estimate

Estimates are more stable than least square estimates and little affected by small changes in the data

Estimates provide good estimates of mean responses for levels of the predictor variables outside the region of the observations

Inferential procedures are not applicable since exact distributions are not known

Selection of \(\lambda\) is judgmental.

Another Example

#Webster data
#install.packages("car")
#library(car)
W <- read.csv("data_Webster.csv",h=T)
pairs(W,pch=20)

n=dim(W)[1]
pairs(W+cbind(0,matrix(1.5*runif(n*4),n,4),0,0),pch=20)

model1=lm(y~x1+x2+x3+x4+x5+x6,data=W)
summary(model1)
## 
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6, data = W)
## 
## Residuals:
##          1          2          3          4          5          6 
## -3.698e-15 -1.545e+00  1.545e+00  7.649e-01 -2.517e-01 -5.132e-01 
##          7          8          9         10         11         12 
## -6.030e-01  4.730e-01  1.300e-01  1.510e-01 -2.371e-01  8.610e-02 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  16.6599    14.0465   1.186 0.288885    
## x1           -0.5313     1.3418  -0.396 0.708482    
## x2           -0.8385     1.4206  -0.590 0.580722    
## x3           -0.7753     1.4094  -0.550 0.605914    
## x4           -0.8440     1.4031  -0.601 0.573745    
## x5            1.0232     0.3909   2.617 0.047247 *  
## x6            5.0470     0.7277   6.936 0.000956 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.129 on 5 degrees of freedom
## Multiple R-squared:  0.9457, Adjusted R-squared:  0.8806 
## F-statistic: 14.52 on 6 and 5 DF,  p-value: 0.004993
rreg <- glmnet(as.matrix(W[,-1]),W[,1],alpha=0,lambda=seq(0,0.2,.01))
rbind(rreg$lambda,rreg$beta)
## 7 x 21 sparse Matrix of class "dgCMatrix"
##    [[ suppressing 21 column names 's0', 's1', 's2' ... ]]
##                                                                
##     0.200000000  0.19000000  0.18000000  0.17000000  0.16000000
## x1  0.171841980  0.17147381  0.17141456  0.17135186  0.17109254
## x2 -0.082476146 -0.08346788 -0.08412549 -0.08478592 -0.08565685
## x3 -0.009277846 -0.01080200 -0.01205498 -0.01334375 -0.01485757
## x4 -0.097257945 -0.09808449 -0.09862875 -0.09918995 -0.09995534
## x5  1.001994679  1.00454668  1.00701669  1.00945447  1.01188741
## x6  4.689062458  4.70627701  4.72374209  4.74140345  4.75922489
##                                                                           
##     0.15000000  0.14000000  0.13000000  0.12000000  0.11000000  0.10000000
## x1  0.17083233  0.17053514  0.17017082  0.16971425  0.16891288  0.16755448
## x2 -0.08652962 -0.08744168 -0.08842518 -0.08950679 -0.09095557 -0.09299885
## x3 -0.01638751 -0.01798007 -0.01966612 -0.02147204 -0.02365695 -0.02642894
## x4 -0.10072965 -0.10155258 -0.10245452 -0.10346147 -0.10482746 -0.10676485
## x5  1.01431184  1.01671515  1.01909850  1.02146117  1.02381419  1.02617989
## x6  4.77719648  4.79534417  4.81366687  4.83216556  4.85082235  4.86959199
##                                                                           
##     0.09000000  0.08000000  0.07000000  0.06000000  0.05000000  0.04000000
## x1  0.16614853  0.16384486  0.16116009  0.15711833  0.15157243  0.14295745
## x2 -0.09509029 -0.09813670 -0.10158390 -0.10647031 -0.11294738 -0.12267351
## x3 -0.02928527 -0.03308809 -0.03732844 -0.04300549 -0.05029319 -0.06082395
## x4 -0.10877334 -0.11170891 -0.11506473 -0.11983615 -0.12619561 -0.13576981
## x5  1.02849637  1.03081248  1.03306319  1.03527721  1.03740169  1.03940733
## x6  4.88857033  4.90765970  4.92696009  4.94637967  4.96596429  4.98563943
##                                                 
##     0.03000000  0.0200000  0.01000000  .        
## x1  0.12892511  0.1021866  0.03485326 -0.4249941
## x2 -0.13813117 -0.1670323 -0.23888023 -0.7259783
## x3 -0.07707419 -0.1066999 -0.17901478 -0.6638667
## x4 -0.15102595 -0.1795933 -0.25068390 -0.7328694
## x5  1.04118335  1.0424608  1.04216009  1.0267822
## x6  5.00535961  5.0249027  5.04346264  5.0507129
matplot(rreg$lambda,t(rreg$beta),type="l",ylab="coefficient",xlab="lambda")
abline(h=0)