Acknowledgement: the materials below are partially based on Montgomery, D. C., Peck, E. A., Vining, G. G., Introduction to Linear Regression Analysis (5th Edition), Wiley Series in Probability and Statistics, 2012.
Let us start with a data set. Suppose we want to predict and model patients’ total body fat percentages using measurements of body fat percentages on triceps, thigh, and mid-arm.
y: total body fat percentages
x1: triceps body fat percentages
x2: thigh body fat percentages
x3: mid-arm body fat percentages
BF <- read.csv("data_BodyFat.csv",h=T)
dim(BF)[1]
## [1] 20
names(BF)
## [1] "x1" "x2" "x3" "y"
head(BF)
## x1 x2 x3 y
## 1 19.5 43.1 29.1 11.9
## 2 24.7 49.8 28.2 22.8
## 3 30.7 51.9 37.0 18.7
## 4 29.8 54.3 31.1 20.1
## 5 19.1 42.2 30.9 12.9
## 6 25.6 53.9 23.7 21.7
pairs(BF,pch=20)
cor(BF)
## x1 x2 x3 y
## x1 1.0000000 0.9238425 0.4577772 0.8432654
## x2 0.9238425 1.0000000 0.0846675 0.8780896
## x3 0.4577772 0.0846675 1.0000000 0.1424440
## y 0.8432654 0.8780896 0.1424440 1.0000000
model1=lm(y~x1+x2+x3,data=BF)
summary(model1)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = BF)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7263 -1.6111 0.3923 1.4656 4.1277
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 117.085 99.782 1.173 0.258
## x1 4.334 3.016 1.437 0.170
## x2 -2.857 2.582 -1.106 0.285
## x3 -2.186 1.595 -1.370 0.190
##
## Residual standard error: 2.48 on 16 degrees of freedom
## Multiple R-squared: 0.8014, Adjusted R-squared: 0.7641
## F-statistic: 21.52 on 3 and 16 DF, p-value: 7.343e-06
High correlation between triceps and thigh measurements but not with midarm.
A serious problem that may dramatically impact the usefulness of a regression model is multicollinearity, or near-linear dependence among the regression variables.
Multicollinearity implies near-linear dependence among the regressors.
The presence of multicollinearity can dramatically impact the ability to estimate regression coefficients and other uses of the regression model.
Issues with Multicollinearity
Adding or deleting a predictor variable changes the regression coefficients
Standard error of the regression coefficients become large when the predictor variables are highly correlated
(IMPORTANT!!! A large wiggle room in 3D or higher dimension)
Estimated regression coefficients individually may not be significant but a statistical relationship exists for the set of predictor variables and the response variable
Figure examples:
Large changes in the estimated regression coefficients when a predictor variable is added or deleted
Nonsignificant results in individual tests of regression coefficients for important predictors
Regression coefficients with a sign that is opposite of that expected
Large coefficients of simple correlation between pairs of predictor variables
Wide confidence intervals for the regression coefficients for important predictors
\[ VIF_j=\frac{1}{1-R^2_j} \]
where \(R^2_j\) is the coefficient of determination obtained from regressing \(j\)-th covariate, \(x_j\), on the other covariates.
If \(x_j\) is highly correlated with any other regressor variables, then \(R^2_j\) will be large.
The square root of the variance inflation factor indicates how much larger the standard error is, compared with what it would be if that variable were uncorrelated with the other predictor variables in the model. For example, if VIF = 10, then \(\sqrt{VIF} \approx 3.3\) and the confidence interval is 3.3 times longer than if the regressors had been orthogonal
Largest VIF used as an indicator of multicollinearity
Guideline: VIFs > 5 to 10 are considered significant
Methods for Dealing with Multicollinearity
Collect more data
Specify the model differently
Use Ridge regression
Ridge Regression modifies method of least squares to biased estimators of the regression coefficients
Traditional least square estimation: \[ \hat{\beta} = \arg\min \sum_{i=1}^{n} (y_i - \beta_0 - \sum_{j=1}^{k} \beta_j x_{ij})^2 \]
Ridge Regression estimation: \[ \hat{\beta}_\lambda = \arg\min \sum_{i=1}^{n} (y_i - \beta_0 - \sum_{j=1}^{k} \beta_j x_{ij})^2 + \lambda \sum_{j=1}^{p} \beta_j^2 \]
Plots \(\lambda\) against the coefficient estimates
Will show impact of multicollinearity in the stability of the coefficients
Choose \(\lambda\) such that \(\hat{\beta}_\lambda\) is stable and the MSE is acceptable
Ridge regression is a good alternative if the model user wants to have all regressors in the model
#install.packages("glmnet")
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 3.0-1
library(MASS)
vif <- function(mod, ...) {
if (any(is.na(coef(mod))))
stop ("there are aliased coefficients in the model")
v <- vcov(mod)
assign <- attr(model.matrix(mod), "assign")
if (names(coefficients(mod)[1]) == "(Intercept)") {
v <- v[-1, -1]
assign <- assign[-1]
}
else warning("No intercept: vifs may not be sensible.")
terms <- labels(terms(mod))
n.terms <- length(terms)
if (n.terms < 2) stop("model contains fewer than 2 terms")
R <- cov2cor(v)
detR <- det(R)
result <- matrix(0, n.terms, 3)
rownames(result) <- terms
colnames(result) <- c("GVIF", "Df", "GVIF^(1/(2*Df))")
for (term in 1:n.terms) {
subs <- which(assign == term)
result[term, 1] <- det(as.matrix(R[subs, subs])) *
det(as.matrix(R[-subs, -subs])) / detR
result[term, 2] <- length(subs)
}
if (all(result[, 2] == 1)) result <- result[, 1]
else result[, 3] <- result[, 1]^(1/(2 * result[, 2]))
result
}
vif(model1)
## x1 x2 x3
## 708.8429 564.3434 104.6060
library(glmnet)
rreg=glmnet(x=as.matrix(BF[,-4]), y=BF[,4],alpha=0,lambda = seq(0,0.5,0.01))
rbind(rreg$lambda,rreg$beta)
## 4 x 51 sparse Matrix of class "dgCMatrix"
## [[ suppressing 51 column names 's0', 's1', 's2' ... ]]
##
## 0.5000000 0.4900000 0.4800000 0.4700000 0.4600000 0.4500000
## x1 0.4312373 0.4316196 0.4322645 0.4330645 0.4339617 0.4349253
## x2 0.4370659 0.4375796 0.4378913 0.4380835 0.4382005 0.4382662
## x3 -0.1141151 -0.1146001 -0.1152155 -0.1159092 -0.1166533 -0.1174327
##
## 0.4400000 0.4300000 0.4200000 0.4100000 0.4000000 0.3900000
## x1 0.4359388 0.4369938 0.4380862 0.4392147 0.4403794 0.4415814
## x2 0.4382931 0.4382877 0.4382527 0.4381892 0.4380969 0.4379748
## x3 -0.1182396 -0.1190699 -0.1199218 -0.1207947 -0.1216887 -0.1226044
##
## 0.3800000 0.3700000 0.3600000 0.3500000 0.3400000 0.3300000
## x1 0.4428224 0.4441045 0.4454302 0.4468025 0.4482245 0.4496999
## x2 0.4378214 0.4376348 0.4374127 0.4371527 0.4368520 0.4365074
## x3 -0.1235429 -0.1245053 -0.1254929 -0.1265073 -0.1275503 -0.1286238
##
## 0.3200000 0.3100000 0.3000000 0.2900000 0.2800000 0.2700000
## x1 0.4512324 0.4528265 0.4547170 0.4566152 0.4585564 0.4605643
## x2 0.4361157 0.4356731 0.4349902 0.4342984 0.4335692 0.4327830
## x3 -0.1297298 -0.1308707 -0.1321646 -0.1334671 -0.1347957 -0.1361629
##
## 0.2600000 0.2500000 0.2400000 0.2300000 0.2200000 0.2100000
## x1 0.4626572 0.4651257 0.4676284 0.4704937 0.4733934 0.4766835
## x2 0.4319249 0.4307581 0.4295587 0.4280588 0.4265246 0.4246646
## x3 -0.1375778 -0.1391880 -0.1408211 -0.1426441 -0.1444910 -0.1465431
##
## 0.2000000 0.1900000 0.1800000 0.1700000 0.1600000 0.1500000
## x1 0.4800450 0.4838628 0.4881014 0.4927671 0.4979024 0.5035803
## x2 0.4227384 0.4204291 0.4177636 0.4147346 0.4113047 0.4074109
## x3 -0.1486386 -0.1509747 -0.1535340 -0.1563209 -0.1593584 -0.1626850
##
## 0.1400000 0.1300000 0.1200000 0.1100000 0.1000000 0.0900000
## x1 0.5099019 0.5172989 0.5258012 0.5358791 0.5476771 0.5616054
## x2 0.4029663 0.3976080 0.3913056 0.3836600 0.3745417 0.3635975
## x3 -0.1663544 -0.1705907 -0.1754120 -0.1810646 -0.1876272 -0.1953170
##
## 0.0800000 0.0700000 0.0600000 0.0500000 0.0400000 0.0300000
## x1 0.5792731 0.6012218 0.6302994 0.6697241 0.7271691 0.8186159
## x2 0.3494620 0.3316591 0.3077599 0.2750022 0.2268213 0.1495351
## x3 -0.2049759 -0.2168954 -0.2325715 -0.2537031 -0.2843309 -0.3328752
##
## 0.020000000 0.0100000 .
## x1 0.987221926 1.4028661 4.113612
## x2 0.006197198 -0.3486234 -2.668242
## x3 -0.422073583 -0.6414275 -2.069967
matplot(rreg$lambda,t(rreg$beta),type="l",ylab="coefficient",xlab="lambda")
abline(h=0)
#select(rreg)
#rreg
#Webster data
#install.packages("car")
#library(car)
W <- read.csv("data_Webster.csv",h=T)
pairs(W,pch=20)
n=dim(W)[1]
pairs(W+cbind(0,matrix(1.5*runif(n*4),n,4),0,0),pch=20)
model1=lm(y~x1+x2+x3+x4+x5+x6,data=W)
summary(model1)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6, data = W)
##
## Residuals:
## 1 2 3 4 5 6
## -3.698e-15 -1.545e+00 1.545e+00 7.649e-01 -2.517e-01 -5.132e-01
## 7 8 9 10 11 12
## -6.030e-01 4.730e-01 1.300e-01 1.510e-01 -2.371e-01 8.610e-02
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.6599 14.0465 1.186 0.288885
## x1 -0.5313 1.3418 -0.396 0.708482
## x2 -0.8385 1.4206 -0.590 0.580722
## x3 -0.7753 1.4094 -0.550 0.605914
## x4 -0.8440 1.4031 -0.601 0.573745
## x5 1.0232 0.3909 2.617 0.047247 *
## x6 5.0470 0.7277 6.936 0.000956 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.129 on 5 degrees of freedom
## Multiple R-squared: 0.9457, Adjusted R-squared: 0.8806
## F-statistic: 14.52 on 6 and 5 DF, p-value: 0.004993
rreg <- glmnet(as.matrix(W[,-1]),W[,1],alpha=0,lambda=seq(0,0.2,.01))
rbind(rreg$lambda,rreg$beta)
## 7 x 21 sparse Matrix of class "dgCMatrix"
## [[ suppressing 21 column names 's0', 's1', 's2' ... ]]
##
## 0.200000000 0.19000000 0.18000000 0.17000000 0.16000000
## x1 0.171841980 0.17147381 0.17141456 0.17135186 0.17109254
## x2 -0.082476146 -0.08346788 -0.08412549 -0.08478592 -0.08565685
## x3 -0.009277846 -0.01080200 -0.01205498 -0.01334375 -0.01485757
## x4 -0.097257945 -0.09808449 -0.09862875 -0.09918995 -0.09995534
## x5 1.001994679 1.00454668 1.00701669 1.00945447 1.01188741
## x6 4.689062458 4.70627701 4.72374209 4.74140345 4.75922489
##
## 0.15000000 0.14000000 0.13000000 0.12000000 0.11000000 0.10000000
## x1 0.17083233 0.17053514 0.17017082 0.16971425 0.16891288 0.16755448
## x2 -0.08652962 -0.08744168 -0.08842518 -0.08950679 -0.09095557 -0.09299885
## x3 -0.01638751 -0.01798007 -0.01966612 -0.02147204 -0.02365695 -0.02642894
## x4 -0.10072965 -0.10155258 -0.10245452 -0.10346147 -0.10482746 -0.10676485
## x5 1.01431184 1.01671515 1.01909850 1.02146117 1.02381419 1.02617989
## x6 4.77719648 4.79534417 4.81366687 4.83216556 4.85082235 4.86959199
##
## 0.09000000 0.08000000 0.07000000 0.06000000 0.05000000 0.04000000
## x1 0.16614853 0.16384486 0.16116009 0.15711833 0.15157243 0.14295745
## x2 -0.09509029 -0.09813670 -0.10158390 -0.10647031 -0.11294738 -0.12267351
## x3 -0.02928527 -0.03308809 -0.03732844 -0.04300549 -0.05029319 -0.06082395
## x4 -0.10877334 -0.11170891 -0.11506473 -0.11983615 -0.12619561 -0.13576981
## x5 1.02849637 1.03081248 1.03306319 1.03527721 1.03740169 1.03940733
## x6 4.88857033 4.90765970 4.92696009 4.94637967 4.96596429 4.98563943
##
## 0.03000000 0.0200000 0.01000000 .
## x1 0.12892511 0.1021866 0.03485326 -0.4249941
## x2 -0.13813117 -0.1670323 -0.23888023 -0.7259783
## x3 -0.07707419 -0.1066999 -0.17901478 -0.6638667
## x4 -0.15102595 -0.1795933 -0.25068390 -0.7328694
## x5 1.04118335 1.0424608 1.04216009 1.0267822
## x6 5.00535961 5.0249027 5.04346264 5.0507129
matplot(rreg$lambda,t(rreg$beta),type="l",ylab="coefficient",xlab="lambda")
abline(h=0)
Comments on Ridge Regression
\(R^2\) can be defined analogously to least square estimate
Estimates are more stable than least square estimates and little affected by small changes in the data
Estimates provide good estimates of mean responses for levels of the predictor variables outside the region of the observations
Inferential procedures are not applicable since exact distributions are not known
Selection of \(\lambda\) is judgmental.