library(AER)
library(dynlm)
library(tidyverse)
library(fpp)
library(fpp2)
library(forecast)
library(readxl)
library(stargazer)
library(scales)
library(quantmod)
library(urca)
library(vars)
library(tseries)
When a time series \(Z_t\) has a unit autoregressive root, \(Z_t\) is integrated of order one. This is often denoted by \(Z_t \sim I(1)\). We simply say that \(Z_t\) is \(I(1)\). If \(Z_t\) is \(I(1)\), its first difference \(\nabla Z_t\) is stationary.
\(Z_t\) is \(I(2)\) when \(Z_t\) needs to be differenced twice in order to obtain a stationary series. Using the notation introduced here, if \(Z_t\) is \(I(2)\), its first difference \(\nabla Z_t\) is \(I(1)\) and its second difference \(\nabla^2 Z_t\) is stationary. \(Z_t\) is \(I(d)\) when \(Z_t\) must be differenced \(d\) times to obtain a stationary series.
When \(Z_t\) is stationary, it is integrated of order 0 so \(Z_t\) is \(I(0)\).
When \(Z_t\) and \(Y_t\) are \(I(1)\) and if there is a \(\theta\) such that \(Y_t-\theta Z_t\) is \(I(0)\), \(Z_t\) and \(Y_t\) are cointegrated. Put differently, cointegration of \(Z_t\) and \(Y_t\) means that \(Z_t\) and \(Y_t\) have the same or a common stochastic trend and that this trend can be eliminated by taking a specific difference of the series such that the resulting series is stationary.
R functions for cointegration analysis are implemented in the package urca
.
It is fairly easy to obtain differences of time series in R. For example, the function diff()
returns suitably lagged and iterated differences of numeric vectors, matrices and time series objects of the class ts.
Two series are cointegrated when their trends are not too far apart and are in some sense similar. This vague statement, though, can be made precise by conducting a cointegration test, which tests whether the residuals from regressing one series on the other one are stationary. If they are, the series are cointegrated. Thus, a cointegration test is in fact a Dickey-Fuler stationarity test on residuals, and its null hypothesis is of noncointegration. In other words, we would like to reject the null hypothesis in a cointegration test, as we wanted in a stationarity test.
Let us apply this method to determine the state of cointegration between the series f and b in dataset usa.rda
. 104 observations, quarterly data, (1984:Q1 - 2009:Q4).
gdp
: real US gross domestic product
inf
: annual inflation rate
f
: Federal Funds rate
b
: 3-year Bond rate
load("usa.rda")
usa
## gdp inf f b
## 1 3807.4 9.47 9.69 11.19
## 2 3906.3 10.03 10.56 12.64
## 3 3976.0 10.83 11.39 12.64
## 4 4034.0 11.51 9.27 11.10
## 5 4117.2 10.51 8.48 10.68
## 6 4175.7 9.24 7.92 9.76
## 7 4258.3 8.37 7.90 9.29
## 8 4318.7 7.00 8.10 8.84
## 9 4382.4 6.16 7.83 7.94
## 10 4423.2 5.90 6.92 7.18
## 11 4491.3 5.37 6.21 6.66
## 12 4543.3 4.95 6.27 6.48
## 13 4611.1 5.69 6.22 6.52
## 14 4686.7 6.62 6.65 7.72
## 15 4764.5 6.47 6.84 8.15
## 16 4883.1 6.40 6.92 8.29
## 17 4948.6 6.40 6.66 7.58
## 18 5059.3 6.73 7.16 8.10
## 19 5142.8 7.65 7.98 8.59
## 20 5251.0 8.57 8.47 8.75
## 21 5360.3 9.30 9.44 9.38
## 22 5453.6 10.20 9.73 8.92
## 23 5532.9 11.11 9.08 8.07
## 24 5581.7 11.92 8.61 7.86
## 25 5708.1 13.35 8.25 8.38
## 26 5797.4 13.55 8.24 8.62
## 27 5850.6 12.10 8.16 8.25
## 28 5846.0 11.91 7.74 7.76
## 29 5880.2 10.65 6.43 7.27
## 30 5962.0 9.33 5.86 7.25
## 31 6033.7 10.29 5.64 6.89
## 32 6092.5 9.12 4.82 5.84
## 33 6190.7 7.32 4.02 5.77
## 34 6295.2 6.53 3.77 5.78
## 35 6389.7 5.61 3.26 4.68
## 36 6493.6 4.42 3.04 5.00
## 37 6544.5 3.54 3.04 4.64
## 38 6622.7 3.28 3.00 4.41
## 39 6688.3 2.59 3.06 4.32
## 40 6813.8 3.25 2.99 4.41
## 41 6916.3 4.43 3.21 4.90
## 42 7044.3 4.25 3.94 6.20
## 43 7131.8 4.17 4.49 6.56
## 44 7248.2 4.00 5.17 7.40
## 45 7307.7 3.52 5.81 7.27
## 46 7355.8 3.67 6.02 6.25
## 47 7452.5 3.29 5.80 5.96
## 48 7542.5 3.45 5.72 5.58
## 49 7638.2 3.04 5.36 5.38
## 50 7800.0 1.60 5.24 6.29
## 51 7892.7 1.62 5.31 6.36
## 52 8023.0 1.28 5.28 5.94
## 53 8137.0 2.17 5.28 6.19
## 54 8276.8 3.69 5.52 6.42
## 55 8409.9 4.10 5.53 6.01
## 56 8505.7 4.40 5.51 5.78
## 57 8600.6 3.89 5.52 5.46
## 58 8698.6 3.84 5.50 5.57
## 59 8847.2 4.03 5.53 5.11
## 60 9027.5 4.22 4.86 4.41
## 61 9148.6 4.71 4.73 4.87
## 62 9252.6 5.09 4.75 5.35
## 63 9405.1 4.57 5.09 5.71
## 64 9607.7 4.50 5.31 6.00
## 65 9709.5 5.10 5.68 6.56
## 66 9949.1 4.48 6.27 6.52
## 67 10017.5 5.39 6.52 6.16
## 68 10129.8 6.04 6.47 5.63
## 69 10165.1 5.15 5.59 4.64
## 70 10301.3 4.73 4.33 4.43
## 71 10305.2 3.80 3.50 3.93
## 72 10373.1 2.95 2.13 3.33
## 73 10498.7 2.83 1.73 3.75
## 74 10601.9 3.05 1.75 3.77
## 75 10701.7 3.05 1.74 2.62
## 76 10766.9 3.00 1.44 2.27
## 77 10888.4 3.15 1.25 2.07
## 78 11008.1 3.10 1.25 1.77
## 79 11255.7 2.71 1.02 2.20
## 80 11416.5 2.69 1.00 2.38
## 81 11597.2 2.48 1.00 2.17
## 82 11778.4 2.35 1.01 2.98
## 83 11950.5 2.84 1.43 2.92
## 84 12144.9 2.62 1.95 3.05
## 85 12379.5 2.80 2.47 3.61
## 86 12516.8 3.05 2.94 3.73
## 87 12741.6 2.61 3.46 3.98
## 88 12915.6 2.62 3.98 4.37
## 89 13183.5 2.70 4.46 4.58
## 90 13347.8 2.81 4.91 4.98
## 91 13452.9 2.90 5.25 4.87
## 92 13611.5 3.14 5.25 4.65
## 93 13789.5 2.90 5.26 4.68
## 94 14008.2 2.32 5.25 4.76
## 95 14158.2 2.18 5.07 4.41
## 96 14291.3 1.85 4.50 3.50
## 97 14328.4 1.45 3.18 2.17
## 98 14471.8 1.59 2.09 2.67
## 99 14484.9 1.58 1.94 2.63
## 100 14191.2 1.54 0.51 1.48
## 101 14049.7 1.65 0.18 1.27
## 102 14034.5 2.09 0.18 1.49
## 103 14114.7 2.32 0.16 1.56
## 104 14277.3 2.59 0.12 1.39
usa.ts <- ts(usa, start=c(1984,1), end=c(2009,4),
frequency=4)
Dgdp <- diff(usa.ts[,1])
Dinf <- diff(usa.ts[,"inf"])
Df <- diff(usa.ts[,"f"])
Db <- diff(usa.ts[,"b"])
usa.ts.df <- ts.union(gdp=usa.ts[,1], # package tseries
inf=usa.ts[,2],
f=usa.ts[,3],
b=usa.ts[,4],
Dgdp,Dinf,Df,Db,
dframe=TRUE)
plot(usa.ts.df$f)
points(usa.ts.df$b,type="l",lty=2)
acf(usa.ts.df$f)
adf.test(usa.ts.df$f, k=10)
##
## Augmented Dickey-Fuller Test
##
## data: usa.ts.df$f
## Dickey-Fuller = -3.3726, Lag order = 10, p-value = 0.06283
## alternative hypothesis: stationary
acf(usa.ts.df$b)
adf.test(usa.ts.df$b, k=10)
##
## Augmented Dickey-Fuller Test
##
## data: usa.ts.df$b
## Dickey-Fuller = -2.9838, Lag order = 10, p-value = 0.1687
## alternative hypothesis: stationary
fb.dyn <- dynlm(usa.ts.df$b~usa.ts.df$f)
ehat.fb <- resid(fb.dyn)
adf.test(ehat.fb)
##
## Augmented Dickey-Fuller Test
##
## data: ehat.fb
## Dickey-Fuller = -4.0009, Lag order = 4, p-value = 0.01184
## alternative hypothesis: stationary
test=ur.df(ehat.fb,type="trend",lags = 4)
summary(test)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8577 -0.2790 -0.0320 0.2075 1.1325
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.10356 0.09883 1.048 0.297445
## z.lag.1 -0.31001 0.07748 -4.001 0.000127 ***
## tt -0.00242 0.00169 -1.432 0.155511
## z.diff.lag1 0.36889 0.09844 3.747 0.000312 ***
## z.diff.lag2 -0.01423 0.10540 -0.135 0.892884
## z.diff.lag3 0.21686 0.09715 2.232 0.028021 *
## z.diff.lag4 -0.07490 0.09910 -0.756 0.451713
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4027 on 92 degrees of freedom
## Multiple R-squared: 0.2577, Adjusted R-squared: 0.2093
## F-statistic: 5.323 on 6 and 92 DF, p-value: 9.43e-05
##
##
## Value of test-statistic is: -4.0009 5.6324 8.3633
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2 6.22 4.75 4.07
## phi3 8.43 6.49 5.47
test=ur.df(ehat.fb,type="none",lags = 1)
summary(test)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.91006 -0.31171 -0.08908 0.18918 1.10640
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -0.2245 0.0535 -4.196 5.89e-05 ***
## z.diff.lag 0.2540 0.0937 2.711 0.00789 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4173 on 100 degrees of freedom
## Multiple R-squared: 0.1689, Adjusted R-squared: 0.1523
## F-statistic: 10.16 on 2 and 100 DF, p-value: 9.598e-05
##
##
## Value of test-statistic is: -4.1961
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
In conclusion, we reject the null hypothesis that the residuals have unit roots, therefore the series are cointegrated.
R has a special function to perform cointegration tests, function po.test in package tseries. (The name comes from the method it uses, which is called “Phillips-Ouliaris.”) The main argument of the function is a matrix having in its first column the dependent variable of the cointegration equation and the independent variables in the other columns. Let me illustrate its application in the case of the same series fb and f .
bfx <- as.matrix(cbind(usa.ts.df$b,usa.ts.df$f), demean=FALSE)
po.test(bfx)
##
## Phillips-Ouliaris Cointegration Test
##
## data: bfx
## Phillips-Ouliaris demeaned = -20.508, Truncation lag parameter =
## 1, p-value = 0.04986
The PO test marginally rejects the null of no cointegration at the 5 percent level.
If two I(1) time series \(Z_t\) and \(Y_t\) are cointegrated, their differences are stationary and can be modeled in a VAR which is augmented by the regressor \(Z_{t−1}− \beta_0 - \beta_1 Y_{t−1}\). This is called a vector error correction model (VEC) and \(Z_t − \beta_0 - \beta_1 Y_t\) is called the error correction term. Lagged values of the error correction term are useful for predicting \(\nabla Z_t\) and/or \(\nabla Y_t\).
\[ \nabla Z_t = c_1 + \gamma (Z_{t−1}− \beta_0 - \beta_1 Y_{t−1}) + a_{1,t}\\ \nabla Y_t = c_2 + \gamma (Z_{t−1}− \beta_0 - \beta_1 Y_{t−1}) + a_{2,t} \]
Example
The dataset gdp
, which includes real GDP series for Australia and USA for the period since 1970:Q1 to 2000:Q4. First we determine the order of integration of the two series.
load("gdp.rda")
gdp
## usa aus
## 1 38.3011 38.2355
## 2 38.3734 38.7551
## 3 38.7137 38.7706
## 4 38.2991 38.8948
## 5 39.3615 39.5621
## 6 39.5836 39.6402
## 7 39.8973 40.8614
## 8 40.0114 40.6741
## 9 40.7224 40.2642
## 10 41.6840 41.1712
## 11 42.0813 40.9545
## 12 42.7699 41.4527
## 13 43.8558 42.2458
## 14 44.3631 42.5829
## 15 44.1267 42.8018
## 16 44.5485 44.1979
## 17 44.1624 44.2659
## 18 44.2897 42.9056
## 19 43.8609 43.6730
## 20 43.6887 43.6216
## 21 43.1662 43.7671
## 22 43.4819 45.1109
## 23 44.2184 44.8353
## 24 44.7980 44.2290
## 25 45.8065 45.8493
## 26 46.1477 46.1597
## 27 46.3688 46.6023
## 28 46.7009 46.8704
## 29 47.2652 46.5686
## 30 48.1932 47.2920
## 31 49.0560 47.1294
## 32 49.0509 47.0068
## 33 49.2088 47.1513
## 34 51.1483 47.7485
## 35 51.6525 48.2949
## 36 52.3319 48.8370
## 37 52.4338 50.6371
## 38 52.4837 49.4096
## 39 52.8616 50.0276
## 40 53.0175 51.0850
## 41 53.1866 51.0754
## 42 52.1129 51.1573
## 43 52.0263 51.6731
## 44 52.9910 52.4480
## 45 54.0647 52.5540
## 46 53.6429 53.4546
## 47 54.2918 54.4216
## 48 53.6154 54.4135
## 49 52.7363 53.8024
## 50 53.0195 54.1406
## 51 52.8188 53.8286
## 52 52.8657 52.9478
## 53 53.5176 52.3897
## 54 54.7247 52.3426
## 55 55.8055 53.8554
## 56 56.9474 54.9471
## 57 58.0608 56.2341
## 58 59.0601 56.4813
## 59 59.6346 56.9522
## 60 60.1246 57.7030
## 61 60.6797 58.6346
## 62 61.1982 60.0244
## 63 62.1547 60.7548
## 64 62.6325 60.9999
## 65 63.2315 61.1380
## 66 63.4820 60.7377
## 67 64.0902 61.3044
## 68 64.4131 61.8770
## 69 64.8368 62.6626
## 70 65.5499 63.4032
## 71 66.1448 64.4135
## 72 67.2999 65.7845
## 73 67.6289 66.1885
## 74 68.4887 66.4363
## 75 68.8544 66.9270
## 76 69.7630 67.9759
## 77 70.4710 68.6255
## 78 70.9334 69.8922
## 79 71.4387 70.6371
## 80 71.6200 70.6526
## 81 72.4471 71.4488
## 82 72.6325 71.4836
## 83 72.6376 70.7650
## 84 72.0886 70.9464
## 85 71.7209 70.5541
## 86 72.1864 70.3144
## 87 72.5347 70.2865
## 88 72.8750 70.4235
## 89 73.6298 71.1010
## 90 74.3398 71.1492
## 91 75.0691 71.9904
## 92 75.8963 73.1570
## 93 75.9880 74.0137
## 94 76.3730 74.4343
## 95 76.7652 74.2963
## 96 77.7981 75.8883
## 97 78.5896 77.3219
## 98 79.6143 78.2573
## 99 80.0605 79.1948
## 100 80.9987 79.4731
## 101 81.2238 80.0478
## 102 81.3695 80.7312
## 103 82.0326 81.9085
## 104 82.6326 82.4945
## 105 83.2153 83.8590
## 106 84.5792 84.3439
## 107 85.2882 85.1460
## 108 86.2855 86.0921
## 109 86.9527 86.1376
## 110 88.2739 88.3177
## 111 89.3730 88.7009
## 112 90.0320 89.7535
## 113 91.0283 91.0137
## 114 91.6303 91.7078
## 115 92.6856 93.1574
## 116 94.0934 94.8436
## 117 94.8920 95.4114
## 118 95.6774 96.4576
## 119 96.7938 96.8290
## 120 98.5143 98.4477
## 121 98.7639 99.2766
## 122 100.3150 100.4380
## 123 100.2000 100.6040
## 124 100.7210 99.6807
gdp <- ts(gdp, start=c(1970,1), end=c(2000,4), frequency=4)
ts.plot(gdp[,"usa"],gdp[,"aus"], type="l",
lty=c(1,2), col=c(1,2))
legend("topleft", border=NULL, legend=c("USA","AUS"),
lty=c(1,2), col=c(1,2))
The two series in levels reveal a common trend and, therefore, suggest that the series are nonstationary.
adf.test(gdp[,"usa"])
##
## Augmented Dickey-Fuller Test
##
## data: gdp[, "usa"]
## Dickey-Fuller = -0.90827, Lag order = 4, p-value = 0.9492
## alternative hypothesis: stationary
adf.test(gdp[,"aus"])
##
## Augmented Dickey-Fuller Test
##
## data: gdp[, "aus"]
## Dickey-Fuller = -0.61238, Lag order = 4, p-value = 0.9755
## alternative hypothesis: stationary
adf.test(diff(gdp[,"usa"]))
## Warning in adf.test(diff(gdp[, "usa"])): p-value smaller than printed p-
## value
##
## Augmented Dickey-Fuller Test
##
## data: diff(gdp[, "usa"])
## Dickey-Fuller = -4.2929, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
adf.test(diff(gdp[,"aus"]))
## Warning in adf.test(diff(gdp[, "aus"])): p-value smaller than printed p-
## value
##
## Augmented Dickey-Fuller Test
##
## data: diff(gdp[, "aus"])
## Dickey-Fuller = -4.4168, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
The stationarity tests indicate that both series are I(1), Let us now test them for cointegration. We regression aus
on usa
to obtain the residuals and check its stationarity.
cint1.dyn <- dynlm(aus~usa-1, data=gdp)
summary(cint1.dyn)
##
## Time series regression with "ts" data:
## Start = 1970(1), End = 2000(4)
##
## Call:
## dynlm(formula = aus ~ usa - 1, data = gdp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.72821 -1.06932 -0.08543 0.91592 2.26603
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## usa 0.985350 0.001657 594.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.219 on 123 degrees of freedom
## Multiple R-squared: 0.9997, Adjusted R-squared: 0.9996
## F-statistic: 3.538e+05 on 1 and 123 DF, p-value: < 2.2e-16
ehat <- resid(cint1.dyn)
plot(ehat)
adf.test(ehat)
##
## Augmented Dickey-Fuller Test
##
## data: ehat
## Dickey-Fuller = -2.7767, Lag order = 4, p-value = 0.2538
## alternative hypothesis: stationary
summary(ur.df(ehat,type="trend",lags = 4))
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4582 -0.3237 0.0162 0.3902 1.4508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.124966 0.122120 -1.023 0.30837
## z.lag.1 -0.149378 0.053796 -2.777 0.00644 **
## tt 0.001642 0.001676 0.979 0.32951
## z.diff.lag1 0.011419 0.097235 0.117 0.90673
## z.diff.lag2 0.055311 0.096916 0.571 0.56934
## z.diff.lag3 0.072081 0.096755 0.745 0.45784
## z.diff.lag4 -0.057948 0.096178 -0.603 0.54805
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6127 on 112 degrees of freedom
## Multiple R-squared: 0.08504, Adjusted R-squared: 0.03602
## F-statistic: 1.735 on 6 and 112 DF, p-value: 0.1193
##
##
## Value of test-statistic is: -2.7767 2.658 3.9866
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2 6.22 4.75 4.07
## phi3 8.43 6.49 5.47
summary(ur.df(ehat,type="drift",lags = 4))
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.49763 -0.33712 0.01271 0.38747 1.43283
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.01898 0.05657 -0.335 0.73790
## z.lag.1 -0.14038 0.05300 -2.649 0.00923 **
## z.diff.lag1 0.01405 0.09718 0.145 0.88530
## z.diff.lag2 0.05693 0.09688 0.588 0.55799
## z.diff.lag3 0.07521 0.09668 0.778 0.43824
## z.diff.lag4 -0.05722 0.09616 -0.595 0.55296
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6126 on 113 degrees of freedom
## Multiple R-squared: 0.0772, Adjusted R-squared: 0.03637
## F-statistic: 1.891 on 5 and 113 DF, p-value: 0.1014
##
##
## Value of test-statistic is: -2.6489 3.5086
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1 6.52 4.63 3.81
summary(ur.df(ehat,type="none",lags = 4))
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.51392 -0.35572 -0.00387 0.36798 1.41733
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -0.13830 0.05243 -2.638 0.00951 **
## z.diff.lag1 0.01255 0.09670 0.130 0.89695
## z.diff.lag2 0.05532 0.09639 0.574 0.56712
## z.diff.lag3 0.07362 0.09619 0.765 0.44564
## z.diff.lag4 -0.05866 0.09569 -0.613 0.54107
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6102 on 114 degrees of freedom
## Multiple R-squared: 0.0763, Adjusted R-squared: 0.03579
## F-statistic: 1.883 on 5 and 114 DF, p-value: 0.1026
##
##
## Value of test-statistic is: -2.638
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
Our test rejects the null of no cointegration, meaning that the series are cointegrated. With cointegrated series we can construct a VEC model to better understand the causal relationship between the two variables.
vecaus<- dynlm(d(aus)~L(ehat), data=gdp)
vecusa <- dynlm(d(usa)~L(ehat), data=gdp)
summary(vecaus)
##
## Time series regression with "ts" data:
## Start = 1970(2), End = 2000(4)
##
## Call:
## dynlm(formula = d(aus) ~ L(ehat), data = gdp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.82072 -0.41633 0.01881 0.41823 1.73368
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.49171 0.05791 8.491 6.12e-14 ***
## L(ehat) -0.09870 0.04752 -2.077 0.0399 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6409 on 121 degrees of freedom
## Multiple R-squared: 0.03443, Adjusted R-squared: 0.02645
## F-statistic: 4.315 on 1 and 121 DF, p-value: 0.03989
summary(vecusa)
##
## Time series regression with "ts" data:
## Start = 1970(2), End = 2000(4)
##
## Call:
## dynlm(formula = d(usa) ~ L(ehat), data = gdp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.54329 -0.30388 0.03483 0.36117 1.47005
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.50988 0.04668 10.92 <2e-16 ***
## L(ehat) 0.03025 0.03830 0.79 0.431
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5166 on 121 degrees of freedom
## Multiple R-squared: 0.005129, Adjusted R-squared: -0.003093
## F-statistic: 0.6238 on 1 and 121 DF, p-value: 0.4312
The coefficient on the error correction term ( L(ehat)
) is significant for Australia, suggesting that changes in the US economy do affect Australian economy; the error correction coefficient in the US equation is not statistically significant, suggesting that changes in Australia do not influence American economy. To interpret the sign of the error correction coefficient, one should remember that the residual measures the deviation of Australian economy from its cointegrating level of 0.985 of the US economy (see the value of the slope in cint1.dyn
).
On the other hand, the VAR model can be used when the two variables \(Z_t\) and \(Y_t\) under study are I(1) but not cointegrated. Here is an example.
Let us look at the income-consumption relationship based on the fred detaset, where consumption and income are already in logs, and the period is 1960:Q1 to 2009:Q4. 200 observations on the following 2 variables: c
, log of real consumption expenditure; y
, log of real disposable income.
load("fred.rda")
fred
## c y
## 1 7.479017 7.578401
## 2 7.491590 7.583807
## 3 7.487622 7.584671
## 4 7.488965 7.584061
## 5 7.488685 7.593122
## 6 7.503455 7.608077
## 7 7.508294 7.621636
## 8 7.528117 7.641084
## 9 7.538708 7.650502
## 10 7.550924 7.661621
## 11 7.558986 7.667158
## 12 7.573069 7.675360
## 13 7.579781 7.683634
## 14 7.589285 7.693071
## 15 7.602801 7.704316
## 16 7.611150 7.720728
## 17 7.630704 7.740490
## 18 7.648120 7.767306
## 19 7.666316 7.780178
## 20 7.669122 7.791688
## 21 7.691109 7.802782
## 22 7.702104 7.813794
## 23 7.719130 7.840942
## 24 7.746863 7.860995
## 25 7.761532 7.870319
## 26 7.764084 7.872722
## 27 7.775486 7.885254
## 28 7.779634 7.896627
## 29 7.785430 7.911471
## 30 7.798974 7.919647
## 31 7.804088 7.929162
## 32 7.810231 7.936446
## 33 7.833838 7.953740
## 34 7.849051 7.970222
## 35 7.867680 7.972225
## 36 7.872265 7.978791
## 37 7.883409 7.980503
## 38 7.889759 7.990543
## 39 7.894579 8.010658
## 40 7.902524 8.017934
## 41 7.908644 8.022930
## 42 7.913228 8.040286
## 43 7.921935 8.053696
## 44 7.919211 8.050416
## 45 7.938160 8.070062
## 46 7.947290 8.084963
## 47 7.955215 8.089390
## 48 7.971707 8.099919
## 49 7.984974 8.106183
## 50 8.003898 8.115551
## 51 8.019218 8.135816
## 52 8.042410 8.174844
## 53 8.060540 8.186102
## 54 8.060035 8.195941
## 55 8.063598 8.200947
## 56 8.060666 8.213084
## 57 8.051883 8.197621
## 58 8.055348 8.188995
## 59 8.059466 8.192128
## 60 8.044723 8.187577
## 61 8.053060 8.183649
## 62 8.069593 8.229324
## 63 8.083761 8.215710
## 64 8.094287 8.225771
## 65 8.114085 8.240385
## 66 8.123202 8.248581
## 67 8.133734 8.257567
## 68 8.146651 8.264724
## 69 8.158029 8.265522
## 70 8.163542 8.276853
## 71 8.173040 8.291747
## 72 8.187994 8.311693
## 73 8.193815 8.317864
## 74 8.214979 8.329272
## 75 8.219164 8.337373
## 76 8.227188 8.345170
## 77 8.232227 8.356062
## 78 8.231642 8.349035
## 79 8.241413 8.354745
## 80 8.244071 8.362712
## 81 8.242335 8.365486
## 82 8.219380 8.351398
## 83 8.230044 8.361638
## 84 8.243283 8.382083
## 85 8.248738 8.379791
## 86 8.248738 8.379906
## 87 8.252785 8.401872
## 88 8.245201 8.403890
## 89 8.251638 8.405054
## 90 8.255231 8.412010
## 91 8.262869 8.416400
## 92 8.280939 8.419889
## 93 8.290694 8.427750
## 94 8.310341 8.435007
## 95 8.327871 8.449920
## 96 8.343601 8.470332
## 97 8.352130 8.491957
## 98 8.366347 8.509040
## 99 8.374015 8.524487
## 100 8.387107 8.533952
## 101 8.403935 8.531490
## 102 8.412988 8.551150
## 103 8.431810 8.544945
## 104 8.434007 8.555240
## 105 8.442319 8.567088
## 106 8.452911 8.578382
## 107 8.470248 8.583599
## 108 8.476246 8.584197
## 109 8.474724 8.590332
## 110 8.488032 8.579417
## 111 8.499111 8.597150
## 112 8.501511 8.611248
## 113 8.518073 8.623731
## 114 8.525320 8.633072
## 115 8.533185 8.640737
## 116 8.544886 8.649712
## 117 8.548556 8.661016
## 118 8.553024 8.656920
## 119 8.563332 8.663369
## 120 8.568209 8.670995
## 121 8.576085 8.678478
## 122 8.579379 8.685061
## 123 8.583168 8.685771
## 124 8.575368 8.679006
## 125 8.572514 8.682080
## 126 8.580112 8.689650
## 127 8.583917 8.691751
## 128 8.583468 8.698247
## 129 8.600523 8.713237
## 130 8.606430 8.720868
## 131 8.617419 8.725929
## 132 8.629557 8.740049
## 133 8.633589 8.725264
## 134 8.643138 8.740705
## 135 8.653942 8.742846
## 136 8.662799 8.757485
## 137 8.673872 8.753466
## 138 8.681266 8.770377
## 139 8.689246 8.777694
## 140 8.699065 8.790878
## 141 8.700231 8.797337
## 142 8.708392 8.797428
## 143 8.717289 8.804895
## 144 8.724305 8.810788
## 145 8.733417 8.821393
## 146 8.744663 8.832092
## 147 8.750620 8.840566
## 148 8.758742 8.846036
## 149 8.768761 8.854936
## 150 8.772796 8.862413
## 151 8.789660 8.873804
## 152 8.801033 8.887584
## 153 8.810937 8.910073
## 154 8.827996 8.924257
## 155 8.841173 8.934719
## 156 8.856390 8.942003
## 157 8.866201 8.948768
## 158 8.881822 8.951051
## 159 8.893765 8.957498
## 160 8.907775 8.973060
## 161 8.922832 8.993850
## 162 8.932186 9.004042
## 163 8.941925 9.014605
## 164 8.950727 9.016100
## 165 8.954712 9.023589
## 166 8.958476 9.020837
## 167 8.962866 9.045996
## 168 8.978408 9.034259
## 169 8.981845 9.060865
## 170 8.986922 9.066343
## 171 8.993676 9.062907
## 172 8.997221 9.065268
## 173 9.002369 9.068927
## 174 9.011621 9.083926
## 175 9.025468 9.097776
## 176 9.030974 9.103468
## 177 9.040453 9.107854
## 178 9.045843 9.117677
## 179 9.054365 9.124336
## 180 9.065800 9.138135
## 181 9.073260 9.125828
## 182 9.082836 9.132703
## 183 9.089934 9.138630
## 184 9.092514 9.144062
## 185 9.103490 9.162599
## 186 9.108861 9.171319
## 187 9.115007 9.176008
## 188 9.124957 9.188994
## 189 9.134010 9.193215
## 190 9.136855 9.194485
## 191 9.141590 9.198662
## 192 9.144585 9.198895
## 193 9.143089 9.192869
## 194 9.143239 9.216223
## 195 9.134291 9.194038
## 196 9.126448 9.202349
## 197 9.127958 9.202953
## 198 9.125762 9.218060
## 199 9.132660 9.208779
## 200 9.136640 9.211190
fred <- ts(fred, start=c(1960,1),end=c(2009,4),frequency=4)
ts.plot(fred[,"c"],fred[,"y"], type="l",
lty=c(1,2), col=c(1,2))
legend("topleft", border=NULL, legend=c("c","y"),
lty=c(1,2), col=c(1,2))
Are the two series cointegrated?
acf(fred[,"c"])
acf(fred[,"y"])
adf.test(fred[,"c"])
##
## Augmented Dickey-Fuller Test
##
## data: fred[, "c"]
## Dickey-Fuller = -2.6202, Lag order = 5, p-value = 0.3163
## alternative hypothesis: stationary
adf.test(fred[,"y"])
##
## Augmented Dickey-Fuller Test
##
## data: fred[, "y"]
## Dickey-Fuller = -2.2905, Lag order = 5, p-value = 0.4544
## alternative hypothesis: stationary
adf.test(diff(fred[,"c"]))
## Warning in adf.test(diff(fred[, "c"])): p-value smaller than printed p-
## value
##
## Augmented Dickey-Fuller Test
##
## data: diff(fred[, "c"])
## Dickey-Fuller = -4.713, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
adf.test(diff(fred[,"y"]))
## Warning in adf.test(diff(fred[, "y"])): p-value smaller than printed p-
## value
##
## Augmented Dickey-Fuller Test
##
## data: diff(fred[, "y"])
## Dickey-Fuller = -5.7751, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
cointcy <- dynlm(c~y, data=fred)
ehat <- resid(cointcy)
adf.test(ehat)
##
## Augmented Dickey-Fuller Test
##
## data: ehat
## Dickey-Fuller = -2.5617, Lag order = 5, p-value = 0.3409
## alternative hypothesis: stationary
Figure shows a long serial correlation sequence; therefore, I will let R calculate the lag order in the ADF test. As the results of the above adf and cointegration tests show, the series are both I(1) but they fail the cointegration test (the series are not cointegrated.) (Please rememebr that the adf.test function uses a constant and trend in the test equation; therefore, the critical values are not the same as in the textbook. However, the results of the tests should be the same most of the time.)
Dc <- diff(fred[,"c"])
Dy <- diff(fred[,"y"])
varmat <- as.matrix(cbind(Dc,Dy))
varfit <- VAR(varmat) # `VAR()` from package `vars`
summary(varfit)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: Dc, Dy
## Deterministic variables: const
## Sample size: 198
## Log Likelihood: 1400.444
## Roots of the characteristic polynomial:
## 0.3441 0.3425
## Call:
## VAR(y = varmat)
##
##
## Estimation results for equation Dc:
## ===================================
## Dc = Dc.l1 + Dy.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Dc.l1 0.2156068 0.0747486 2.884 0.00436 **
## Dy.l1 0.1493798 0.0577343 2.587 0.01040 *
## const 0.0052776 0.0007573 6.969 4.81e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.006575 on 195 degrees of freedom
## Multiple R-Squared: 0.1205, Adjusted R-squared: 0.1115
## F-statistic: 13.36 on 2 and 195 DF, p-value: 3.661e-06
##
##
## Estimation results for equation Dy:
## ===================================
## Dy = Dc.l1 + Dy.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Dc.l1 0.4754276 0.0973264 4.885 2.15e-06 ***
## Dy.l1 -0.2171679 0.0751730 -2.889 0.0043 **
## const 0.0060367 0.0009861 6.122 4.99e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.008562 on 195 degrees of freedom
## Multiple R-Squared: 0.1118, Adjusted R-squared: 0.1027
## F-statistic: 12.27 on 2 and 195 DF, p-value: 9.53e-06
##
##
##
## Covariance matrix of residuals:
## Dc Dy
## Dc 4.324e-05 2.508e-05
## Dy 2.508e-05 7.330e-05
##
## Correlation matrix of residuals:
## Dc Dy
## Dc 1.0000 0.4456
## Dy 0.4456 1.0000
Function VAR(), which is part of the package vars (Pfaff 2013), accepts the following main arguments: y= a matrix containing the endogenous variables in the VAR model, p= the desired lag order (default is 1), and exogen= a matrix of exogenous variables. (VAR is a more powerful instrument than I imply here; please type ?VAR() for more information.)
Nonstationarity can lead to spurious regression, an apparent relationship between variables that are, in reality not related. The following code sequence generates two independent random walk processes, y and x , and regresses y on x .
T <- 1000
set.seed(1357)
y <- ts(rep(0,T))
vy <- ts(rnorm(T))
for (t in 2:T){
y[t] <- y[t-1]+vy[t]
}
set.seed(4365)
x <- ts(rep(0,T))
vx <- ts(rnorm(T))
for (t in 2:T){
x[t] <- x[t-1]+vx[t]
}
y <- ts(y[300:1000])
x <- ts(x[300:1000])
ts.plot(y,x, ylab="y and x")
spurious.ols <- lm(y~x)
summary(spurious.ols)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.554 -5.973 -2.453 4.508 24.678
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20.38711 1.61958 -12.588 < 2e-16 ***
## x -0.28188 0.04331 -6.508 1.45e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.954 on 699 degrees of freedom
## Multiple R-squared: 0.05713, Adjusted R-squared: 0.05578
## F-statistic: 42.35 on 1 and 699 DF, p-value: 1.454e-10
The summary output of the regression shows a strong correlation between the two variables, thugh they have been generated independently. (Not any two randomly generated processes need to create spurious regression, though.) Figure 12.3 depicts the two time series, y and x , and Figure 12.4 shows them in a scatterplot.
plot(x, y, type="p", col="grey")