library(AER)
library(dynlm)
library(tidyverse)
library(fpp)
library(fpp2)
library(forecast)
library(readxl)
library(stargazer)
library(scales)
library(quantmod)
library(urca)
library(vars)
library(tseries)

1 Orders of Integration

When a time series \(Z_t\) has a unit autoregressive root, \(Z_t\) is integrated of order one. This is often denoted by \(Z_t \sim I(1)\). We simply say that \(Z_t\) is \(I(1)\). If \(Z_t\) is \(I(1)\), its first difference \(\nabla Z_t\) is stationary.

\(Z_t\) is \(I(2)\) when \(Z_t\) needs to be differenced twice in order to obtain a stationary series. Using the notation introduced here, if \(Z_t\) is \(I(2)\), its first difference \(\nabla Z_t\) is \(I(1)\) and its second difference \(\nabla^2 Z_t\) is stationary. \(Z_t\) is \(I(d)\) when \(Z_t\) must be differenced \(d\) times to obtain a stationary series.

When \(Z_t\) is stationary, it is integrated of order 0 so \(Z_t\) is \(I(0)\).

2 Cointegration

When \(Z_t\) and \(Y_t\) are \(I(1)\) and if there is a \(\theta\) such that \(Y_t-\theta Z_t\) is \(I(0)\), \(Z_t\) and \(Y_t\) are cointegrated. Put differently, cointegration of \(Z_t\) and \(Y_t\) means that \(Z_t\) and \(Y_t\) have the same or a common stochastic trend and that this trend can be eliminated by taking a specific difference of the series such that the resulting series is stationary.

R functions for cointegration analysis are implemented in the package urca.

It is fairly easy to obtain differences of time series in R. For example, the function diff() returns suitably lagged and iterated differences of numeric vectors, matrices and time series objects of the class ts.

Two series are cointegrated when their trends are not too far apart and are in some sense similar. This vague statement, though, can be made precise by conducting a cointegration test, which tests whether the residuals from regressing one series on the other one are stationary. If they are, the series are cointegrated. Thus, a cointegration test is in fact a Dickey-Fuler stationarity test on residuals, and its null hypothesis is of noncointegration. In other words, we would like to reject the null hypothesis in a cointegration test, as we wanted in a stationarity test.

Let us apply this method to determine the state of cointegration between the series f and b in dataset usa.rda. 104 observations, quarterly data, (1984:Q1 - 2009:Q4).

gdp : real US gross domestic product

inf : annual inflation rate

f : Federal Funds rate

b : 3-year Bond rate

load("usa.rda")
usa
##         gdp   inf     f     b
## 1    3807.4  9.47  9.69 11.19
## 2    3906.3 10.03 10.56 12.64
## 3    3976.0 10.83 11.39 12.64
## 4    4034.0 11.51  9.27 11.10
## 5    4117.2 10.51  8.48 10.68
## 6    4175.7  9.24  7.92  9.76
## 7    4258.3  8.37  7.90  9.29
## 8    4318.7  7.00  8.10  8.84
## 9    4382.4  6.16  7.83  7.94
## 10   4423.2  5.90  6.92  7.18
## 11   4491.3  5.37  6.21  6.66
## 12   4543.3  4.95  6.27  6.48
## 13   4611.1  5.69  6.22  6.52
## 14   4686.7  6.62  6.65  7.72
## 15   4764.5  6.47  6.84  8.15
## 16   4883.1  6.40  6.92  8.29
## 17   4948.6  6.40  6.66  7.58
## 18   5059.3  6.73  7.16  8.10
## 19   5142.8  7.65  7.98  8.59
## 20   5251.0  8.57  8.47  8.75
## 21   5360.3  9.30  9.44  9.38
## 22   5453.6 10.20  9.73  8.92
## 23   5532.9 11.11  9.08  8.07
## 24   5581.7 11.92  8.61  7.86
## 25   5708.1 13.35  8.25  8.38
## 26   5797.4 13.55  8.24  8.62
## 27   5850.6 12.10  8.16  8.25
## 28   5846.0 11.91  7.74  7.76
## 29   5880.2 10.65  6.43  7.27
## 30   5962.0  9.33  5.86  7.25
## 31   6033.7 10.29  5.64  6.89
## 32   6092.5  9.12  4.82  5.84
## 33   6190.7  7.32  4.02  5.77
## 34   6295.2  6.53  3.77  5.78
## 35   6389.7  5.61  3.26  4.68
## 36   6493.6  4.42  3.04  5.00
## 37   6544.5  3.54  3.04  4.64
## 38   6622.7  3.28  3.00  4.41
## 39   6688.3  2.59  3.06  4.32
## 40   6813.8  3.25  2.99  4.41
## 41   6916.3  4.43  3.21  4.90
## 42   7044.3  4.25  3.94  6.20
## 43   7131.8  4.17  4.49  6.56
## 44   7248.2  4.00  5.17  7.40
## 45   7307.7  3.52  5.81  7.27
## 46   7355.8  3.67  6.02  6.25
## 47   7452.5  3.29  5.80  5.96
## 48   7542.5  3.45  5.72  5.58
## 49   7638.2  3.04  5.36  5.38
## 50   7800.0  1.60  5.24  6.29
## 51   7892.7  1.62  5.31  6.36
## 52   8023.0  1.28  5.28  5.94
## 53   8137.0  2.17  5.28  6.19
## 54   8276.8  3.69  5.52  6.42
## 55   8409.9  4.10  5.53  6.01
## 56   8505.7  4.40  5.51  5.78
## 57   8600.6  3.89  5.52  5.46
## 58   8698.6  3.84  5.50  5.57
## 59   8847.2  4.03  5.53  5.11
## 60   9027.5  4.22  4.86  4.41
## 61   9148.6  4.71  4.73  4.87
## 62   9252.6  5.09  4.75  5.35
## 63   9405.1  4.57  5.09  5.71
## 64   9607.7  4.50  5.31  6.00
## 65   9709.5  5.10  5.68  6.56
## 66   9949.1  4.48  6.27  6.52
## 67  10017.5  5.39  6.52  6.16
## 68  10129.8  6.04  6.47  5.63
## 69  10165.1  5.15  5.59  4.64
## 70  10301.3  4.73  4.33  4.43
## 71  10305.2  3.80  3.50  3.93
## 72  10373.1  2.95  2.13  3.33
## 73  10498.7  2.83  1.73  3.75
## 74  10601.9  3.05  1.75  3.77
## 75  10701.7  3.05  1.74  2.62
## 76  10766.9  3.00  1.44  2.27
## 77  10888.4  3.15  1.25  2.07
## 78  11008.1  3.10  1.25  1.77
## 79  11255.7  2.71  1.02  2.20
## 80  11416.5  2.69  1.00  2.38
## 81  11597.2  2.48  1.00  2.17
## 82  11778.4  2.35  1.01  2.98
## 83  11950.5  2.84  1.43  2.92
## 84  12144.9  2.62  1.95  3.05
## 85  12379.5  2.80  2.47  3.61
## 86  12516.8  3.05  2.94  3.73
## 87  12741.6  2.61  3.46  3.98
## 88  12915.6  2.62  3.98  4.37
## 89  13183.5  2.70  4.46  4.58
## 90  13347.8  2.81  4.91  4.98
## 91  13452.9  2.90  5.25  4.87
## 92  13611.5  3.14  5.25  4.65
## 93  13789.5  2.90  5.26  4.68
## 94  14008.2  2.32  5.25  4.76
## 95  14158.2  2.18  5.07  4.41
## 96  14291.3  1.85  4.50  3.50
## 97  14328.4  1.45  3.18  2.17
## 98  14471.8  1.59  2.09  2.67
## 99  14484.9  1.58  1.94  2.63
## 100 14191.2  1.54  0.51  1.48
## 101 14049.7  1.65  0.18  1.27
## 102 14034.5  2.09  0.18  1.49
## 103 14114.7  2.32  0.16  1.56
## 104 14277.3  2.59  0.12  1.39
usa.ts <- ts(usa, start=c(1984,1), end=c(2009,4),
               frequency=4)
Dgdp <- diff(usa.ts[,1])
Dinf <- diff(usa.ts[,"inf"])
Df <- diff(usa.ts[,"f"])
Db <- diff(usa.ts[,"b"])
usa.ts.df <- ts.union(gdp=usa.ts[,1], # package tseries
                      inf=usa.ts[,2], 
                      f=usa.ts[,3],
                      b=usa.ts[,4],
                      Dgdp,Dinf,Df,Db,
                      dframe=TRUE)

plot(usa.ts.df$f)
points(usa.ts.df$b,type="l",lty=2)

acf(usa.ts.df$f)

adf.test(usa.ts.df$f, k=10)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  usa.ts.df$f
## Dickey-Fuller = -3.3726, Lag order = 10, p-value = 0.06283
## alternative hypothesis: stationary
acf(usa.ts.df$b)

adf.test(usa.ts.df$b, k=10)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  usa.ts.df$b
## Dickey-Fuller = -2.9838, Lag order = 10, p-value = 0.1687
## alternative hypothesis: stationary
fb.dyn <- dynlm(usa.ts.df$b~usa.ts.df$f)
ehat.fb <- resid(fb.dyn)
adf.test(ehat.fb)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ehat.fb
## Dickey-Fuller = -4.0009, Lag order = 4, p-value = 0.01184
## alternative hypothesis: stationary
test=ur.df(ehat.fb,type="trend",lags = 4)
summary(test)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8577 -0.2790 -0.0320  0.2075  1.1325 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.10356    0.09883   1.048 0.297445    
## z.lag.1     -0.31001    0.07748  -4.001 0.000127 ***
## tt          -0.00242    0.00169  -1.432 0.155511    
## z.diff.lag1  0.36889    0.09844   3.747 0.000312 ***
## z.diff.lag2 -0.01423    0.10540  -0.135 0.892884    
## z.diff.lag3  0.21686    0.09715   2.232 0.028021 *  
## z.diff.lag4 -0.07490    0.09910  -0.756 0.451713    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4027 on 92 degrees of freedom
## Multiple R-squared:  0.2577, Adjusted R-squared:  0.2093 
## F-statistic: 5.323 on 6 and 92 DF,  p-value: 9.43e-05
## 
## 
## Value of test-statistic is: -4.0009 5.6324 8.3633 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2  6.22  4.75  4.07
## phi3  8.43  6.49  5.47
test=ur.df(ehat.fb,type="none",lags = 1)
summary(test)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.91006 -0.31171 -0.08908  0.18918  1.10640 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1     -0.2245     0.0535  -4.196 5.89e-05 ***
## z.diff.lag   0.2540     0.0937   2.711  0.00789 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4173 on 100 degrees of freedom
## Multiple R-squared:  0.1689, Adjusted R-squared:  0.1523 
## F-statistic: 10.16 on 2 and 100 DF,  p-value: 9.598e-05
## 
## 
## Value of test-statistic is: -4.1961 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

In conclusion, we reject the null hypothesis that the residuals have unit roots, therefore the series are cointegrated.

R has a special function to perform cointegration tests, function po.test in package tseries. (The name comes from the method it uses, which is called “Phillips-Ouliaris.”) The main argument of the function is a matrix having in its first column the dependent variable of the cointegration equation and the independent variables in the other columns. Let me illustrate its application in the case of the same series fb and f .

bfx <- as.matrix(cbind(usa.ts.df$b,usa.ts.df$f), demean=FALSE)
po.test(bfx)
## 
##  Phillips-Ouliaris Cointegration Test
## 
## data:  bfx
## Phillips-Ouliaris demeaned = -20.508, Truncation lag parameter =
## 1, p-value = 0.04986

The PO test marginally rejects the null of no cointegration at the 5 percent level.

3 Vector Error Correction

If two I(1) time series \(Z_t\) and \(Y_t\) are cointegrated, their differences are stationary and can be modeled in a VAR which is augmented by the regressor \(Z_{t−1}− \beta_0 - \beta_1 Y_{t−1}\). This is called a vector error correction model (VEC) and \(Z_t − \beta_0 - \beta_1 Y_t\) is called the error correction term. Lagged values of the error correction term are useful for predicting \(\nabla Z_t\) and/or \(\nabla Y_t\).

\[ \nabla Z_t = c_1 + \gamma (Z_{t−1}− \beta_0 - \beta_1 Y_{t−1}) + a_{1,t}\\ \nabla Y_t = c_2 + \gamma (Z_{t−1}− \beta_0 - \beta_1 Y_{t−1}) + a_{2,t} \]

Example

The dataset gdp , which includes real GDP series for Australia and USA for the period since 1970:Q1 to 2000:Q4. First we determine the order of integration of the two series.

load("gdp.rda")
gdp
##          usa      aus
## 1    38.3011  38.2355
## 2    38.3734  38.7551
## 3    38.7137  38.7706
## 4    38.2991  38.8948
## 5    39.3615  39.5621
## 6    39.5836  39.6402
## 7    39.8973  40.8614
## 8    40.0114  40.6741
## 9    40.7224  40.2642
## 10   41.6840  41.1712
## 11   42.0813  40.9545
## 12   42.7699  41.4527
## 13   43.8558  42.2458
## 14   44.3631  42.5829
## 15   44.1267  42.8018
## 16   44.5485  44.1979
## 17   44.1624  44.2659
## 18   44.2897  42.9056
## 19   43.8609  43.6730
## 20   43.6887  43.6216
## 21   43.1662  43.7671
## 22   43.4819  45.1109
## 23   44.2184  44.8353
## 24   44.7980  44.2290
## 25   45.8065  45.8493
## 26   46.1477  46.1597
## 27   46.3688  46.6023
## 28   46.7009  46.8704
## 29   47.2652  46.5686
## 30   48.1932  47.2920
## 31   49.0560  47.1294
## 32   49.0509  47.0068
## 33   49.2088  47.1513
## 34   51.1483  47.7485
## 35   51.6525  48.2949
## 36   52.3319  48.8370
## 37   52.4338  50.6371
## 38   52.4837  49.4096
## 39   52.8616  50.0276
## 40   53.0175  51.0850
## 41   53.1866  51.0754
## 42   52.1129  51.1573
## 43   52.0263  51.6731
## 44   52.9910  52.4480
## 45   54.0647  52.5540
## 46   53.6429  53.4546
## 47   54.2918  54.4216
## 48   53.6154  54.4135
## 49   52.7363  53.8024
## 50   53.0195  54.1406
## 51   52.8188  53.8286
## 52   52.8657  52.9478
## 53   53.5176  52.3897
## 54   54.7247  52.3426
## 55   55.8055  53.8554
## 56   56.9474  54.9471
## 57   58.0608  56.2341
## 58   59.0601  56.4813
## 59   59.6346  56.9522
## 60   60.1246  57.7030
## 61   60.6797  58.6346
## 62   61.1982  60.0244
## 63   62.1547  60.7548
## 64   62.6325  60.9999
## 65   63.2315  61.1380
## 66   63.4820  60.7377
## 67   64.0902  61.3044
## 68   64.4131  61.8770
## 69   64.8368  62.6626
## 70   65.5499  63.4032
## 71   66.1448  64.4135
## 72   67.2999  65.7845
## 73   67.6289  66.1885
## 74   68.4887  66.4363
## 75   68.8544  66.9270
## 76   69.7630  67.9759
## 77   70.4710  68.6255
## 78   70.9334  69.8922
## 79   71.4387  70.6371
## 80   71.6200  70.6526
## 81   72.4471  71.4488
## 82   72.6325  71.4836
## 83   72.6376  70.7650
## 84   72.0886  70.9464
## 85   71.7209  70.5541
## 86   72.1864  70.3144
## 87   72.5347  70.2865
## 88   72.8750  70.4235
## 89   73.6298  71.1010
## 90   74.3398  71.1492
## 91   75.0691  71.9904
## 92   75.8963  73.1570
## 93   75.9880  74.0137
## 94   76.3730  74.4343
## 95   76.7652  74.2963
## 96   77.7981  75.8883
## 97   78.5896  77.3219
## 98   79.6143  78.2573
## 99   80.0605  79.1948
## 100  80.9987  79.4731
## 101  81.2238  80.0478
## 102  81.3695  80.7312
## 103  82.0326  81.9085
## 104  82.6326  82.4945
## 105  83.2153  83.8590
## 106  84.5792  84.3439
## 107  85.2882  85.1460
## 108  86.2855  86.0921
## 109  86.9527  86.1376
## 110  88.2739  88.3177
## 111  89.3730  88.7009
## 112  90.0320  89.7535
## 113  91.0283  91.0137
## 114  91.6303  91.7078
## 115  92.6856  93.1574
## 116  94.0934  94.8436
## 117  94.8920  95.4114
## 118  95.6774  96.4576
## 119  96.7938  96.8290
## 120  98.5143  98.4477
## 121  98.7639  99.2766
## 122 100.3150 100.4380
## 123 100.2000 100.6040
## 124 100.7210  99.6807
gdp <- ts(gdp, start=c(1970,1), end=c(2000,4), frequency=4)

ts.plot(gdp[,"usa"],gdp[,"aus"], type="l", 
        lty=c(1,2), col=c(1,2))
legend("topleft", border=NULL, legend=c("USA","AUS"), 
       lty=c(1,2), col=c(1,2))

The two series in levels reveal a common trend and, therefore, suggest that the series are nonstationary.

adf.test(gdp[,"usa"])
## 
##  Augmented Dickey-Fuller Test
## 
## data:  gdp[, "usa"]
## Dickey-Fuller = -0.90827, Lag order = 4, p-value = 0.9492
## alternative hypothesis: stationary
adf.test(gdp[,"aus"])
## 
##  Augmented Dickey-Fuller Test
## 
## data:  gdp[, "aus"]
## Dickey-Fuller = -0.61238, Lag order = 4, p-value = 0.9755
## alternative hypothesis: stationary
adf.test(diff(gdp[,"usa"]))
## Warning in adf.test(diff(gdp[, "usa"])): p-value smaller than printed p-
## value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(gdp[, "usa"])
## Dickey-Fuller = -4.2929, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
adf.test(diff(gdp[,"aus"]))
## Warning in adf.test(diff(gdp[, "aus"])): p-value smaller than printed p-
## value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(gdp[, "aus"])
## Dickey-Fuller = -4.4168, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

The stationarity tests indicate that both series are I(1), Let us now test them for cointegration. We regression aus on usa to obtain the residuals and check its stationarity.

cint1.dyn <- dynlm(aus~usa-1, data=gdp)
summary(cint1.dyn)
## 
## Time series regression with "ts" data:
## Start = 1970(1), End = 2000(4)
## 
## Call:
## dynlm(formula = aus ~ usa - 1, data = gdp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.72821 -1.06932 -0.08543  0.91592  2.26603 
## 
## Coefficients:
##     Estimate Std. Error t value Pr(>|t|)    
## usa 0.985350   0.001657   594.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.219 on 123 degrees of freedom
## Multiple R-squared:  0.9997, Adjusted R-squared:  0.9996 
## F-statistic: 3.538e+05 on 1 and 123 DF,  p-value: < 2.2e-16
ehat <- resid(cint1.dyn)
plot(ehat)

adf.test(ehat)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ehat
## Dickey-Fuller = -2.7767, Lag order = 4, p-value = 0.2538
## alternative hypothesis: stationary
summary(ur.df(ehat,type="trend",lags = 4))
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4582 -0.3237  0.0162  0.3902  1.4508 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -0.124966   0.122120  -1.023  0.30837   
## z.lag.1     -0.149378   0.053796  -2.777  0.00644 **
## tt           0.001642   0.001676   0.979  0.32951   
## z.diff.lag1  0.011419   0.097235   0.117  0.90673   
## z.diff.lag2  0.055311   0.096916   0.571  0.56934   
## z.diff.lag3  0.072081   0.096755   0.745  0.45784   
## z.diff.lag4 -0.057948   0.096178  -0.603  0.54805   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6127 on 112 degrees of freedom
## Multiple R-squared:  0.08504,    Adjusted R-squared:  0.03602 
## F-statistic: 1.735 on 6 and 112 DF,  p-value: 0.1193
## 
## 
## Value of test-statistic is: -2.7767 2.658 3.9866 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2  6.22  4.75  4.07
## phi3  8.43  6.49  5.47
summary(ur.df(ehat,type="drift",lags = 4))
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.49763 -0.33712  0.01271  0.38747  1.43283 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -0.01898    0.05657  -0.335  0.73790   
## z.lag.1     -0.14038    0.05300  -2.649  0.00923 **
## z.diff.lag1  0.01405    0.09718   0.145  0.88530   
## z.diff.lag2  0.05693    0.09688   0.588  0.55799   
## z.diff.lag3  0.07521    0.09668   0.778  0.43824   
## z.diff.lag4 -0.05722    0.09616  -0.595  0.55296   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6126 on 113 degrees of freedom
## Multiple R-squared:  0.0772, Adjusted R-squared:  0.03637 
## F-statistic: 1.891 on 5 and 113 DF,  p-value: 0.1014
## 
## 
## Value of test-statistic is: -2.6489 3.5086 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1  6.52  4.63  3.81
summary(ur.df(ehat,type="none",lags = 4))
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.51392 -0.35572 -0.00387  0.36798  1.41733 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## z.lag.1     -0.13830    0.05243  -2.638  0.00951 **
## z.diff.lag1  0.01255    0.09670   0.130  0.89695   
## z.diff.lag2  0.05532    0.09639   0.574  0.56712   
## z.diff.lag3  0.07362    0.09619   0.765  0.44564   
## z.diff.lag4 -0.05866    0.09569  -0.613  0.54107   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6102 on 114 degrees of freedom
## Multiple R-squared:  0.0763, Adjusted R-squared:  0.03579 
## F-statistic: 1.883 on 5 and 114 DF,  p-value: 0.1026
## 
## 
## Value of test-statistic is: -2.638 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

Our test rejects the null of no cointegration, meaning that the series are cointegrated. With cointegrated series we can construct a VEC model to better understand the causal relationship between the two variables.

vecaus<- dynlm(d(aus)~L(ehat), data=gdp)
vecusa <- dynlm(d(usa)~L(ehat), data=gdp)
summary(vecaus)
## 
## Time series regression with "ts" data:
## Start = 1970(2), End = 2000(4)
## 
## Call:
## dynlm(formula = d(aus) ~ L(ehat), data = gdp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.82072 -0.41633  0.01881  0.41823  1.73368 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.49171    0.05791   8.491 6.12e-14 ***
## L(ehat)     -0.09870    0.04752  -2.077   0.0399 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6409 on 121 degrees of freedom
## Multiple R-squared:  0.03443,    Adjusted R-squared:  0.02645 
## F-statistic: 4.315 on 1 and 121 DF,  p-value: 0.03989
summary(vecusa)
## 
## Time series regression with "ts" data:
## Start = 1970(2), End = 2000(4)
## 
## Call:
## dynlm(formula = d(usa) ~ L(ehat), data = gdp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.54329 -0.30388  0.03483  0.36117  1.47005 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.50988    0.04668   10.92   <2e-16 ***
## L(ehat)      0.03025    0.03830    0.79    0.431    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5166 on 121 degrees of freedom
## Multiple R-squared:  0.005129,   Adjusted R-squared:  -0.003093 
## F-statistic: 0.6238 on 1 and 121 DF,  p-value: 0.4312

The coefficient on the error correction term ( L(ehat) ) is significant for Australia, suggesting that changes in the US economy do affect Australian economy; the error correction coefficient in the US equation is not statistically significant, suggesting that changes in Australia do not influence American economy. To interpret the sign of the error correction coefficient, one should remember that the residual measures the deviation of Australian economy from its cointegrating level of 0.985 of the US economy (see the value of the slope in cint1.dyn).

4 VAR as a Solution to Non-cointegrated Time Series

On the other hand, the VAR model can be used when the two variables \(Z_t\) and \(Y_t\) under study are I(1) but not cointegrated. Here is an example.

Let us look at the income-consumption relationship based on the fred detaset, where consumption and income are already in logs, and the period is 1960:Q1 to 2009:Q4. 200 observations on the following 2 variables: c, log of real consumption expenditure; y, log of real disposable income.

load("fred.rda")
fred
##            c        y
## 1   7.479017 7.578401
## 2   7.491590 7.583807
## 3   7.487622 7.584671
## 4   7.488965 7.584061
## 5   7.488685 7.593122
## 6   7.503455 7.608077
## 7   7.508294 7.621636
## 8   7.528117 7.641084
## 9   7.538708 7.650502
## 10  7.550924 7.661621
## 11  7.558986 7.667158
## 12  7.573069 7.675360
## 13  7.579781 7.683634
## 14  7.589285 7.693071
## 15  7.602801 7.704316
## 16  7.611150 7.720728
## 17  7.630704 7.740490
## 18  7.648120 7.767306
## 19  7.666316 7.780178
## 20  7.669122 7.791688
## 21  7.691109 7.802782
## 22  7.702104 7.813794
## 23  7.719130 7.840942
## 24  7.746863 7.860995
## 25  7.761532 7.870319
## 26  7.764084 7.872722
## 27  7.775486 7.885254
## 28  7.779634 7.896627
## 29  7.785430 7.911471
## 30  7.798974 7.919647
## 31  7.804088 7.929162
## 32  7.810231 7.936446
## 33  7.833838 7.953740
## 34  7.849051 7.970222
## 35  7.867680 7.972225
## 36  7.872265 7.978791
## 37  7.883409 7.980503
## 38  7.889759 7.990543
## 39  7.894579 8.010658
## 40  7.902524 8.017934
## 41  7.908644 8.022930
## 42  7.913228 8.040286
## 43  7.921935 8.053696
## 44  7.919211 8.050416
## 45  7.938160 8.070062
## 46  7.947290 8.084963
## 47  7.955215 8.089390
## 48  7.971707 8.099919
## 49  7.984974 8.106183
## 50  8.003898 8.115551
## 51  8.019218 8.135816
## 52  8.042410 8.174844
## 53  8.060540 8.186102
## 54  8.060035 8.195941
## 55  8.063598 8.200947
## 56  8.060666 8.213084
## 57  8.051883 8.197621
## 58  8.055348 8.188995
## 59  8.059466 8.192128
## 60  8.044723 8.187577
## 61  8.053060 8.183649
## 62  8.069593 8.229324
## 63  8.083761 8.215710
## 64  8.094287 8.225771
## 65  8.114085 8.240385
## 66  8.123202 8.248581
## 67  8.133734 8.257567
## 68  8.146651 8.264724
## 69  8.158029 8.265522
## 70  8.163542 8.276853
## 71  8.173040 8.291747
## 72  8.187994 8.311693
## 73  8.193815 8.317864
## 74  8.214979 8.329272
## 75  8.219164 8.337373
## 76  8.227188 8.345170
## 77  8.232227 8.356062
## 78  8.231642 8.349035
## 79  8.241413 8.354745
## 80  8.244071 8.362712
## 81  8.242335 8.365486
## 82  8.219380 8.351398
## 83  8.230044 8.361638
## 84  8.243283 8.382083
## 85  8.248738 8.379791
## 86  8.248738 8.379906
## 87  8.252785 8.401872
## 88  8.245201 8.403890
## 89  8.251638 8.405054
## 90  8.255231 8.412010
## 91  8.262869 8.416400
## 92  8.280939 8.419889
## 93  8.290694 8.427750
## 94  8.310341 8.435007
## 95  8.327871 8.449920
## 96  8.343601 8.470332
## 97  8.352130 8.491957
## 98  8.366347 8.509040
## 99  8.374015 8.524487
## 100 8.387107 8.533952
## 101 8.403935 8.531490
## 102 8.412988 8.551150
## 103 8.431810 8.544945
## 104 8.434007 8.555240
## 105 8.442319 8.567088
## 106 8.452911 8.578382
## 107 8.470248 8.583599
## 108 8.476246 8.584197
## 109 8.474724 8.590332
## 110 8.488032 8.579417
## 111 8.499111 8.597150
## 112 8.501511 8.611248
## 113 8.518073 8.623731
## 114 8.525320 8.633072
## 115 8.533185 8.640737
## 116 8.544886 8.649712
## 117 8.548556 8.661016
## 118 8.553024 8.656920
## 119 8.563332 8.663369
## 120 8.568209 8.670995
## 121 8.576085 8.678478
## 122 8.579379 8.685061
## 123 8.583168 8.685771
## 124 8.575368 8.679006
## 125 8.572514 8.682080
## 126 8.580112 8.689650
## 127 8.583917 8.691751
## 128 8.583468 8.698247
## 129 8.600523 8.713237
## 130 8.606430 8.720868
## 131 8.617419 8.725929
## 132 8.629557 8.740049
## 133 8.633589 8.725264
## 134 8.643138 8.740705
## 135 8.653942 8.742846
## 136 8.662799 8.757485
## 137 8.673872 8.753466
## 138 8.681266 8.770377
## 139 8.689246 8.777694
## 140 8.699065 8.790878
## 141 8.700231 8.797337
## 142 8.708392 8.797428
## 143 8.717289 8.804895
## 144 8.724305 8.810788
## 145 8.733417 8.821393
## 146 8.744663 8.832092
## 147 8.750620 8.840566
## 148 8.758742 8.846036
## 149 8.768761 8.854936
## 150 8.772796 8.862413
## 151 8.789660 8.873804
## 152 8.801033 8.887584
## 153 8.810937 8.910073
## 154 8.827996 8.924257
## 155 8.841173 8.934719
## 156 8.856390 8.942003
## 157 8.866201 8.948768
## 158 8.881822 8.951051
## 159 8.893765 8.957498
## 160 8.907775 8.973060
## 161 8.922832 8.993850
## 162 8.932186 9.004042
## 163 8.941925 9.014605
## 164 8.950727 9.016100
## 165 8.954712 9.023589
## 166 8.958476 9.020837
## 167 8.962866 9.045996
## 168 8.978408 9.034259
## 169 8.981845 9.060865
## 170 8.986922 9.066343
## 171 8.993676 9.062907
## 172 8.997221 9.065268
## 173 9.002369 9.068927
## 174 9.011621 9.083926
## 175 9.025468 9.097776
## 176 9.030974 9.103468
## 177 9.040453 9.107854
## 178 9.045843 9.117677
## 179 9.054365 9.124336
## 180 9.065800 9.138135
## 181 9.073260 9.125828
## 182 9.082836 9.132703
## 183 9.089934 9.138630
## 184 9.092514 9.144062
## 185 9.103490 9.162599
## 186 9.108861 9.171319
## 187 9.115007 9.176008
## 188 9.124957 9.188994
## 189 9.134010 9.193215
## 190 9.136855 9.194485
## 191 9.141590 9.198662
## 192 9.144585 9.198895
## 193 9.143089 9.192869
## 194 9.143239 9.216223
## 195 9.134291 9.194038
## 196 9.126448 9.202349
## 197 9.127958 9.202953
## 198 9.125762 9.218060
## 199 9.132660 9.208779
## 200 9.136640 9.211190
fred <- ts(fred, start=c(1960,1),end=c(2009,4),frequency=4)
ts.plot(fred[,"c"],fred[,"y"], type="l", 
        lty=c(1,2), col=c(1,2))
legend("topleft", border=NULL, legend=c("c","y"), 
       lty=c(1,2), col=c(1,2))

Are the two series cointegrated?

acf(fred[,"c"])

acf(fred[,"y"])

adf.test(fred[,"c"])
## 
##  Augmented Dickey-Fuller Test
## 
## data:  fred[, "c"]
## Dickey-Fuller = -2.6202, Lag order = 5, p-value = 0.3163
## alternative hypothesis: stationary
adf.test(fred[,"y"])
## 
##  Augmented Dickey-Fuller Test
## 
## data:  fred[, "y"]
## Dickey-Fuller = -2.2905, Lag order = 5, p-value = 0.4544
## alternative hypothesis: stationary
adf.test(diff(fred[,"c"]))
## Warning in adf.test(diff(fred[, "c"])): p-value smaller than printed p-
## value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(fred[, "c"])
## Dickey-Fuller = -4.713, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
adf.test(diff(fred[,"y"]))
## Warning in adf.test(diff(fred[, "y"])): p-value smaller than printed p-
## value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(fred[, "y"])
## Dickey-Fuller = -5.7751, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
cointcy <- dynlm(c~y, data=fred)
ehat <- resid(cointcy)
adf.test(ehat)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ehat
## Dickey-Fuller = -2.5617, Lag order = 5, p-value = 0.3409
## alternative hypothesis: stationary

Figure shows a long serial correlation sequence; therefore, I will let R calculate the lag order in the ADF test. As the results of the above adf and cointegration tests show, the series are both I(1) but they fail the cointegration test (the series are not cointegrated.) (Please rememebr that the adf.test function uses a constant and trend in the test equation; therefore, the critical values are not the same as in the textbook. However, the results of the tests should be the same most of the time.)

Dc <- diff(fred[,"c"])
Dy <- diff(fred[,"y"])
varmat <- as.matrix(cbind(Dc,Dy))
varfit <- VAR(varmat) # `VAR()` from package `vars`
summary(varfit)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: Dc, Dy 
## Deterministic variables: const 
## Sample size: 198 
## Log Likelihood: 1400.444 
## Roots of the characteristic polynomial:
## 0.3441 0.3425
## Call:
## VAR(y = varmat)
## 
## 
## Estimation results for equation Dc: 
## =================================== 
## Dc = Dc.l1 + Dy.l1 + const 
## 
##        Estimate Std. Error t value Pr(>|t|)    
## Dc.l1 0.2156068  0.0747486   2.884  0.00436 ** 
## Dy.l1 0.1493798  0.0577343   2.587  0.01040 *  
## const 0.0052776  0.0007573   6.969 4.81e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.006575 on 195 degrees of freedom
## Multiple R-Squared: 0.1205,  Adjusted R-squared: 0.1115 
## F-statistic: 13.36 on 2 and 195 DF,  p-value: 3.661e-06 
## 
## 
## Estimation results for equation Dy: 
## =================================== 
## Dy = Dc.l1 + Dy.l1 + const 
## 
##         Estimate Std. Error t value Pr(>|t|)    
## Dc.l1  0.4754276  0.0973264   4.885 2.15e-06 ***
## Dy.l1 -0.2171679  0.0751730  -2.889   0.0043 ** 
## const  0.0060367  0.0009861   6.122 4.99e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.008562 on 195 degrees of freedom
## Multiple R-Squared: 0.1118,  Adjusted R-squared: 0.1027 
## F-statistic: 12.27 on 2 and 195 DF,  p-value: 9.53e-06 
## 
## 
## 
## Covariance matrix of residuals:
##           Dc        Dy
## Dc 4.324e-05 2.508e-05
## Dy 2.508e-05 7.330e-05
## 
## Correlation matrix of residuals:
##        Dc     Dy
## Dc 1.0000 0.4456
## Dy 0.4456 1.0000

Function VAR(), which is part of the package vars (Pfaff 2013), accepts the following main arguments: y= a matrix containing the endogenous variables in the VAR model, p= the desired lag order (default is 1), and exogen= a matrix of exogenous variables. (VAR is a more powerful instrument than I imply here; please type ?VAR() for more information.)

5 Spurious Regression

Nonstationarity can lead to spurious regression, an apparent relationship between variables that are, in reality not related. The following code sequence generates two independent random walk processes, y and x , and regresses y on x .

T <- 1000
set.seed(1357)
y <- ts(rep(0,T))
vy <- ts(rnorm(T))
for (t in 2:T){
  y[t] <- y[t-1]+vy[t]
}

set.seed(4365)
x <- ts(rep(0,T))
vx <- ts(rnorm(T))
for (t in 2:T){
  x[t] <- x[t-1]+vx[t]
}
y <- ts(y[300:1000])
x <- ts(x[300:1000])
ts.plot(y,x, ylab="y and x")

spurious.ols <- lm(y~x)
summary(spurious.ols)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.554  -5.973  -2.453   4.508  24.678 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.38711    1.61958 -12.588  < 2e-16 ***
## x            -0.28188    0.04331  -6.508 1.45e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.954 on 699 degrees of freedom
## Multiple R-squared:  0.05713,    Adjusted R-squared:  0.05578 
## F-statistic: 42.35 on 1 and 699 DF,  p-value: 1.454e-10

The summary output of the regression shows a strong correlation between the two variables, thugh they have been generated independently. (Not any two randomly generated processes need to create spurious regression, though.) Figure 12.3 depicts the two time series, y and x , and Figure 12.4 shows them in a scatterplot.

plot(x, y, type="p", col="grey")