8 Time Series – Econometrics with R

8.1 General aspects of time series data analysis

Time series arise in many fields of economics, especially in macroeconomics and financial economics
The main feature of time series is temporal ordering of observations, i.e., we have a natural ordering and thus, data may not be arbitrarily reordered
Typical features and problems: serial correlation, non-independence of observations and non-stationarity, which pose problems for test-statistics, asymptotic theory and the phenomenon of spurious regressions
The outcome of economic variables (e.g. GNP, Dow Jones) is uncertain; they should therefore be modeled as random variables
Thus, time series are seen as a sequences of random variables, i,e., stochastic processes
Randomness does not come from sampling from a population - A “sample” is the one realized path of the time series out of the many possible paths the stochastic process could have taken
Time series analysis focuses on modeling the dependency of a variable on its own past, and on the present and past values of other variables

8.1.1 Several popular times series models, example: Phillips Curve

Static model
- In static time series models, the current value of one variable is modeled as the result of the current values of explanatory variables
  (\pi_t \ldots inflation rate at time t, \; u_t \ldots unemployment rate at time t, \; e_t \ldots error term at time t)
- \pi_t = \beta_0 + \beta_1 u_t + e_t
Finite distributed lag model
- \pi_t = \beta_0 + \beta_1 u_t + \beta_2 u_{t-1} + e_t
- Impact effect, resp. transitory shock of unemployment on inflation: \beta_1, \beta_2
- Long run effect, resp. effect of a permanent shock of unemployment on inflation (set all time indexes to t and sum coefficients): \beta_1+\beta_2
Autoregressive model – model with the lagged endogenous variable \pi_{t-1}
- \pi_t = \beta_0 + \beta_1 u_t + \beta_2 u_{t-1} + \beta_3 \pi_{t-1} + e_t
- Long run effect, resp. effect of a permanent shock of unemployment on inflation: \dfrac {\beta_1+\beta_2}{1-\beta_3}

8.1.2 Implementation of time series data in R

Dealing with time series in R is a little bit more tricky than dealing with cross section data. The reason is that the program must know the begin and end dates (time) of the time series and whether there are some “holes” in the data for treating times series specific operations like lags or seasonal effects
Thus, times series need some additional infrastructure to to this job
- The built in infrastructure in R for times series is the class “ts”. Basically, ts-objects are numeric data vectors or matrices, augmented with an additional attribute, “tsp”, which store the starting time, end time and frequency of the data; 1 for yearly, 4 for quarterly and 12 for monthly data. The disadvantage of this class is that it can only deal with “regular” data, meaning equispaced data points with no “holes” in the data, and does not support weekly, daily or high frequency data
- Alternative infrastructures are, among others, the “zoo” class and the “xts” class. Both classes augment the data vectors or matrices with an additional index variable, which maps the corresponding time or date to each data point. Both classes supports non-regular (irregular spaced) data as well as weekly, daily and high frequency data. Furthermore, merging data with different ranges are possible as well as aggregating data to another frequency. The disadvantage of these classes are that they are a bit more complicated to set up - you generally have to deal with Date classes, especially for the xts class. However, often an appropriate index variable is included in the data set
You want to have functions which support the special features of time series. For instance, the workhorse function lm() does not support basic time series operations like lags or diffs. You could use dynlm() from the package dynlm instead, which supports time trends, seasonal dummies and d and L operators

8.1.3 Example: Analysis of Wage and Price Inflation Interactions with R

Defining data as time series

rm(list=ls()) # clear all
library(wooldridge); library(AER); library(zoo); library(dynlm); 
library(gt) # for beautiful tables
data("wageprc")
class(wageprc)

[1] "data.frame"

gt(wageprc[1:10,])

price	wage	t	lprice	lwage	gprice	gwage	gwage_1	gwage_2	gwage_3	gwage_4	gwage_5	gwage_6	gwage_7	gwage_8	gwage_9	gwage_10	gwage_11	gwage_12	gprice_1
92.6	2.32	1	4.528289	0.8415672	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
92.5	2.32	2	4.527209	0.8415672	-0.001080513	0.000000000	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
92.6	2.32	3	4.528289	0.8415672	0.001080513	0.000000000	0.000000000	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	-0.001080513
92.7	2.34	4	4.529368	0.8501509	0.001079082	0.008583724	0.000000000	0.000000000	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0.001080513
92.7	2.35	5	4.529368	0.8544153	0.000000000	0.004264414	0.008583724	0.000000000	0.000000000	NA	NA	NA	NA	NA	NA	NA	NA	NA	0.001079082
92.9	2.36	6	4.531524	0.8586616	0.002155304	0.004246294	0.004264414	0.008583724	0.000000000	0.000000000	NA	NA	NA	NA	NA	NA	NA	NA	0.000000000
93.1	2.36	7	4.533674	0.8586616	0.002150536	0.000000000	0.004246294	0.004264414	0.008583724	0.000000000	0.000000000	NA	NA	NA	NA	NA	NA	NA	0.002155304
93.0	2.37	8	4.532599	0.8628899	-0.001074791	0.004228294	0.000000000	0.004246294	0.004264414	0.008583724	0.000000000	0.000000000	NA	NA	NA	NA	NA	NA	0.002150536
93.2	2.40	9	4.534748	0.8754688	0.002148151	0.012578905	0.004228294	0.000000000	0.004246294	0.004264414	0.008583724	0.000000000	0	NA	NA	NA	NA	NA	-0.001074791
93.3	2.39	10	4.535820	0.8712934	0.001072407	-0.004175365	0.012578905	0.004228294	0.000000000	0.004246294	0.004264414	0.008583724	0	0	NA	NA	NA	NA	0.002148151

Table 8.1: Price and Wage Data

Now, we convert them to time series with the “ts” command

# ts representation
data <- ( ts(wageprc[, c(4,5)], start=c(1980,3), frequency=12)) # start 1980 M3
class(data)

       [1] "mts"    "ts"     "matrix" "array"

data[1:8,]

              lprice     lwage
       [1,] 4.528289 0.8415672
       [2,] 4.527209 0.8415672
       [3,] 4.528289 0.8415672
       [4,] 4.529368 0.8501509
       [5,] 4.529368 0.8544153
       [6,] 4.531524 0.8586616
       [7,] 4.533674 0.8586616
       [8,] 4.532599 0.8628899

The same with the “zoo” package

# zoo representation 
l <- length(wageprc$lprice) # we need the length of the data 

# As there is no particular index variable in the data, we have to define an index variable m
# We begin at 1980 M3. Thus we add to 1980:  2/12 : (2+l-1)/12
# (For more info about zoo and dates: <http://www.princeton.edu/~otorres/TimeSeriesR101.pdf>)

m <- as.yearmon( 1980 + seq( from = 2, to = 2+(l-1) ) / 12 ) 
data <- zoo( wageprc[, c(4,5)],  order.by = m )
class(data)

       [1] "zoo"

data[1:8,]

                  lprice     lwage
       Mar 1980 4.528289 0.8415672
       Apr 1980 4.527209 0.8415672
       May 1980 4.528289 0.8415672
       Jun 1980 4.529368 0.8501509
       Jul 1980 4.529368 0.8544153
       Aug 1980 4.531524 0.8586616
       Sep 1980 4.533674 0.8586616
       Oct 1980 4.532599 0.8628899

tail(data)

                  lprice    lwage
       Jul 2003 5.825115 2.189416
       Aug 2003 5.829240 2.188296
       Sep 2003 5.831296 2.187174
       Oct 2003 5.836855 2.190536
       Nov 2003 5.841804 2.202765
       Dec 2003 5.844414 2.206074

Plotting time series

# With zoo data, you can use the $ operator. 
# With ts data, you would need matrix-indexing: data[, "lprice"]
plot( diff( data$lprice )  )

# Plotting the whole data matrix
plot( diff(data) )

Granger Causality Test: \Delta lprice \Rightarrow \Delta lwage

We are regressing \Delta lwage_t on several lags of \Delta lwage_t and several lags of \Delta lprice_{t} (and likewise some other lags of additional control variables) and look whether the lags of \Delta lprice_{t} have some additional predictive power for \Delta lwage_t, beyond (conditioned on) the lags of \Delta lwage_t (and the other lagged control variables). If this is the case, we say that \Delta lprice is Granger-Causal for \Delta lwage, or \Delta lprice \Rightarrow \Delta lwage

summary( wage <- dynlm( d(lwage) ~  L(d(lprice), 1:4) + L(d(lwage), 1:4), data = data) )

       
       Time series regression with "zoo" data:
       Start = Aug 1980, End = Dec 2003
       
       Call:
       dynlm(formula = d(lwage) ~ L(d(lprice), 1:4) + L(d(lwage), 1:4), 
           data = data)
       
       Residuals:
              Min         1Q     Median         3Q        Max 
       -0.0122246 -0.0028638 -0.0005247  0.0018289  0.0135806 
       
       Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
       (Intercept)         0.0031931  0.0006746   4.733 3.56e-06 ***
       L(d(lprice), 1:4)1  0.1519367  0.1132631   1.341 0.180894    
       L(d(lprice), 1:4)2 -0.0348319  0.1232691  -0.283 0.777723    
       L(d(lprice), 1:4)3  0.3106324  0.1239067   2.507 0.012760 *  
       L(d(lprice), 1:4)4  0.0349598  0.1150571   0.304 0.761476    
       L(d(lwage), 1:4)1  -0.1942465  0.0596269  -3.258 0.001266 ** 
       L(d(lwage), 1:4)2  -0.1303027  0.0598510  -2.177 0.030332 *  
       L(d(lwage), 1:4)3   0.0017028  0.0599862   0.028 0.977375    
       L(d(lwage), 1:4)4   0.2125389  0.0589520   3.605 0.000371 ***
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
       
       Residual standard error: 0.004549 on 272 degrees of freedom
       Multiple R-squared:  0.1637, Adjusted R-squared:  0.1391 
       F-statistic: 6.655 on 8 and 272 DF,  p-value: 5.943e-08

Conditioned on lagged wage inflation, does lagged price inflation affect current wage inflation?

See C. W. J. Granger (1969)

lht(wage, matchCoefs(wage,"lprice"))

       
       Linear hypothesis test:
       L(d(lprice),4)1 = 0
       L(d(lprice),4)2 = 0
       L(d(lprice),4)3 = 0
       L(d(lprice),4)4 = 0
       
       Model 1: restricted model
       Model 2: d(lwage) ~ L(d(lprice), 1:4) + L(d(lwage), 1:4)
       
         Res.Df       RSS Df  Sum of Sq     F    Pr(>F)    
       1    276 0.0060694                                  
       2    272 0.0056277  4 0.00044169 5.337 0.0003755 ***
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Result: Price inflation does Granger cause wage inflation (has predictive power for wage inflation)
Problems: Instantaneous causality, missing additional control variables and expectation effects - post hoc, ergo propter hoc fallacy

Granger Causality Test: \Delta lwage \Rightarrow \Delta lprice

We are regressing \Delta lprice_t on the very same variables as above and look whether the lags of \Delta lwage_{t} have some additional predictive power for \Delta lprice_t, beyond (conditioned on) the lags of \Delta lprice_t (and the other lagged control variables). If this is the case, we say that \Delta lprice is Granger-Causal for \Delta lwage, or \Delta lprice \Rightarrow \Delta lwage

summary(  price <- dynlm( d(lprice) ~  L(d(lprice), 1:4) + L(d(lwage), 1:4), data = data)   )

       
       Time series regression with "zoo" data:
       Start = Aug 1980, End = Dec 2003
       
       Call:
       dynlm(formula = d(lprice) ~ L(d(lprice), 1:4) + L(d(lwage), 1:4), 
           data = data)
       
       Residuals:
              Min         1Q     Median         3Q        Max 
       -0.0089767 -0.0012300 -0.0001126  0.0013557  0.0134337 
       
       Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
       (Intercept)         0.0008842  0.0003622   2.441  0.01528 *  
       L(d(lprice), 1:4)1  0.4398115  0.0608146   7.232 4.84e-12 ***
       L(d(lprice), 1:4)2  0.1959992  0.0661871   2.961  0.00333 ** 
       L(d(lprice), 1:4)3  0.0936098  0.0665294   1.407  0.16056    
       L(d(lprice), 1:4)4  0.0514227  0.0617778   0.832  0.40592    
       L(d(lwage), 1:4)1   0.0241188  0.0320156   0.753  0.45189    
       L(d(lwage), 1:4)2  -0.0332092  0.0321359  -1.033  0.30234    
       L(d(lwage), 1:4)3  -0.0039232  0.0322085  -0.122  0.90314    
       L(d(lwage), 1:4)4   0.0455599  0.0316532   1.439  0.15120    
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
       
       Residual standard error: 0.002442 on 272 degrees of freedom
       Multiple R-squared:  0.4937, Adjusted R-squared:  0.4788 
       F-statistic: 33.16 on 8 and 272 DF,  p-value: < 2.2e-16

Conditioned on lagged price inflation, does lagged wage inflation affect current price inflation?

(See C. W. J. Granger (1969))

lht(price, matchCoefs(price,"lwage"))

       
       Linear hypothesis test:
       L(d(lwage),4)1 = 0
       L(d(lwage),4)2 = 0
       L(d(lwage),4)3 = 0
       L(d(lwage),4)4 = 0
       
       Model 1: restricted model
       Model 2: d(lprice) ~ L(d(lprice), 1:4) + L(d(lwage), 1:4)
       
         Res.Df       RSS Df  Sum of Sq      F Pr(>F)
       1    276 0.0016500                            
       2    272 0.0016224  4 2.7552e-05 1.1548 0.3312

Result: Wage inflation does not Granger cause price inflation (has no predictive power for price inflation)
Problems: Instantaneous causality, missing additional control variables and expectation effects - post hoc, ergo propter hoc fallacy

8.2 Auto-correlated Erros

One of the most common problems in regressions with time series is autocorrelation of the errors. This violates assumption MLR.5: (compare the chapter “Multiple Regression”, slide 48)

\operatorname{Var}(\mathbf u \, | \, \mathbf X) = E (\mathbf{u u}' \, | \, \mathbf{X}) = \sigma^2 \mathbf I \tag{8.1}

This has basically three consequences for the OLS estimates
1. OLS estimates are not consistent any more if, additionally to autocorrelated errors, the explanatory variables are not strict exogenous, E(u_t|\mathbf X=0)
  - This is especially the case for lagged endogenous variables y_{t-s} as right hand side variables!
2. The usual formula for the covariance matrix of \hat {\boldsymbol \beta} is false, and therefore all test statistics
  - A remedy for this is a Heteroskedasticity and Autocorrelation Consistent covariance matrix estimate, HAC covariance matrix, leading to HAC standard errors for \hat {\boldsymbol \beta} and asymptotically valid tests
  - These are computed in a similar, but much more complicated manner (based on the residuals \hat u_t we have to estimate the autocorrelations of the errors over time) as we did for the Heteroskedasticity Consistent covariance matrix, HC (see the chapter “Heteroskedasticity”). As a formal derivation of this is quite tedious, we simply point to the literature, like Green (2017)
3. OLS looses its effiency

8.2.1 HAC Covariance Matrix

Regarding the second problem, we try to estimate the true covariance matrix of \hat {\boldsymbol \beta}. Basically, the residuals of the regression are used to estimate (\mathbf X' \boldsymbol \Omega \mathbf X), with \hat \sigma^2 \boldsymbol \Omega being the covariance matrix of the error term u_t, leading to an estimate of the covariance matrix for \hat {\boldsymbol \beta}

\widehat {\operatorname{Var}(\hat {\boldsymbol \beta} \, | \, \mathbf X)} \ = \ (\mathbf{X} \mathbf{X})^{-1} \, {(\widehat{\hat \sigma^2\mathbf X' \boldsymbol\Omega \mathbf X})} \, (\mathbf{X} \mathbf{X})^{-1} \tag{8.2}

Using this HAC estimate of the covariance matrix for \hat {\boldsymbol \beta}, which is often called Newey-West estimator, the usual tests can be carried out
The big advantages of using HAC standard errors is that
- it is easily implemented (usually by an option in the software package)
- is working for heteroskedasticity as well as autocorrelation in the errors
- we do not have to assume a particular form of heteroskedasticity or autocorrelation
Major disadvantages
- Tests are only asymptotically valid (meaning we need a lot of observations)
- OLS estimates of \boldsymbol \beta are still not efficient any more

8.2.2 Loss of Efficiency with Auto-Correlated Erros \Rightarrow FGLS

Regarding the third problem, we already know that autocorrelated errors lead to a loss of efficiency; OLS estimates are not BLUE any more!
- A possible remedy for this is using Feasible Generalized Least Square (FGLS), as we did in the case of heteroskedastic errors (through weighting the observations to offset the varying variance \to WLS)
- For doing this, one needs a hypothesis about the particular form of the autocorrelation. Usually, one assumes an autoregressive process of order one or two for the error term. (Later, we will discuss such processes more deeply.) Suppose e_t is not autocorrelated, and u_t follows an autoregressive process of order one

u_t = \rho u_{t-1} + e_t, \quad |\rho|<1 \tag{8.3}

Applying FGLS, in a first step, we have to estimate \rho with he help of a regression of the OLS residuals on the residuals of the previous period (according to the alleged error model above)
Having an estimate for \rho, in a second step, we transform the original model, y_t = \mathbf x_t \boldsymbol \beta + u_t, u_t autocorrelated, in the following way

y_t - \hat \rho y_{t-1} = \boldsymbol \beta (\mathbf x_t - \hat \rho \mathbf x_{t-1}) + \underbrace {u_t - \hat \rho u_{t-1}}_{e_t} \ \ \ \Rightarrow \ \ \ y_t^* = \boldsymbol \beta \mathbf x_t^* + e_t \tag{8.4}

Note, the error term e_t of this model, with the transformed variables y_t^* and \mathbf x_t^*, is not autocorrelated. Hence, the parameters \boldsymbol \beta can be estimated efficiently with this transformed model

This model transformation (taking quasi differences of the variables) is called Koyeck Transformation and leads to a FGLS estimator, which is called Cochrane-Orcutt estimator
- Remark: One can use the estimated parameters of this transformed model and plug these back into the original model to compute new residuals. With these new residuals we can get an updated estimate for \rho and transform the variables once more, but this time with this new \hat \rho. This algorithm can be repeated until \hat \rho doesn’t change any more. However, it is not guaranteed that the iterations just described actually lead to better (more efficient) estimates thus, often no iterations are performed
Applying the Cochrane-Orcutt procedure, at least one (the first) observation is lost (because of the used lags in the transformation). The Prais-Winsten procedure avoids this loss of efficiency by weighting the first observation with \sqrt {1 - \hat \rho^2}
Both methods only work if the explanatory variables \mathbf x_t are strict exogenous as otherwise the estimate \hat \rho from the first step is even asymptotically biased (but there is a remedy for this, which is discussed in the section about “testing for autocorrelation”)

There are several other methods for obtaining FGLS estimates. By means of these procedures, \rho and \boldsymbol \beta are estimated simultaneously, either by Maximum Likelihood estimation or with restricted non-linear least square

For both approaches, one has to rely on numerical methods. As with modern computers there are no real costs in terms of computation time, these numerical methods are the ones typically used in today’s software packages
The non-linear approach is based on the transformed model, but rearranged, and estimates

y_t = \ \rho y_{t-1} + \boldsymbol \beta \mathbf x_t - \rho \boldsymbol \beta \mathbf x_{t-1} + e_t

Note, the coefficient vector of \mathbf x_{t-1} is non-linear in the parameters \boldsymbol \beta and \rho and furthermore, non-linear restrictions are imposed on them
- This restrictions can be tested against a general autoregressive/distributed lag model with no autocorrelated errors – common root test
- This approach only requires weak exogeneity of the explanatory variables and not strict exogeneity to yield consistent estimates

8.2.3 Significance of Auto-Correlated Residuals

Generally, autocorrelation in the residuals indicates that the model does not exploit all the information contained in the data

Autocorrelation in the residuals of an estimated regression model can have several causes
1. A general misspecification of the model, e.g., omitted variables or non-linearities not taken into account
2. A misspecified dynamic structure of the model, i.e., omitted lags of the explanatory variables or missing autoregressive terms like y_{t-1}. In this case, the model is not dynamically complete, i.e., the error term is correlated with lags of the explanatory variables or missing autoregressive terms
3. Structural breaks or other non-stationarities
4. Autocorrelated errors

Hence, before applying the remedies for autocorrelated errors as discussed above, one has to check whether the first three causes for autocorrelated residuals can be ruled out – if not, these have to be tackled first!

As autocorrelated residuals might be a strong signal for all these problems mentioned above, it is important to have a test procedure for detecting autocorrelation

8.3 Testing for Autocorrelation

8.3.1 The Durbin-Watson Test

The simplest method to check for autocorrelated residuals is to regress the residuals of the original model on the lagged residuals and test whether \rho = 0 with a simple t-test:

\hat u_t = \rho \hat u_{t-1} + e_t \tag{8.5}

This asymptotically valid test is particularly powerful against an autoregressive process of order one in the error term. But to some extent, it can detect more general forms of autocorrelations too, as long as these lead to a correlation between \hat u_t and \hat u_{t-1} – for instance even structural breaks

Another asymptotically equivalent variant of this test is the very popular Durbin-Watson test with the test statistic

DW = \frac{ \sum_{i=2}^{n}\left(\hat{u}_{i}-\hat{u}_{i-1}\right)^{2}}{\sum_{i=1}^{n} \hat{u}_{i}^{2}} \approx 2-2 \hat{\rho}, \ \ \in [0,4] \tag{8.6}

The numerator is small (DW<2) if consecutive residuals stick together (have the same sign), which means that they are positively autocorrelated
The numerator is large (DW>2) if the residuals change the sign very often, which means that they are negatively autocorrelated
Durbin and Watson have tabulated the distribution of DW even for small samples which depends on n and k and further exhibits a range of indeterminacy, which lowers its practicability

8.3.2 The Breusch-Godfrey test

However, both of the test procedures above are only working if all explanatory variables of the original model are strict exogenous. Suppose not and the original model is

y_t = \mathbf x_t \boldsymbol \beta + \textcolor{red}{\gamma y_{t-1}} + u_t \quad \Rightarrow \quad y_t = \mathbf x_t \boldsymbol \beta + \gamma y_{t-1} + \underbrace {(\rho u_{t-1} + e_t)}_{u_t} \tag{8.7}

with the lagged endogenous variable y_{t-1} and an autocorrelated error u_t=\rho u_{t-1}+e_t

In this case, because y_{t-1} is strongly correlated with u_{t-1} per definition, the parameter estimate \hat \gamma will capture most of the autocorrelation in u_t. This implies that the residuals of this regression \hat u_t will look like much more as a not autocorrelated process. This is due to the first problem of autocorrelated errors mentioned above in Section 8.2
So, if we run a regression \hat u_t = \rho \hat u_{t-1} + e_t to test for autocorrelation, the estimate \hat \rho will therefore be heavily biased toward zero rendering the whole test procedure pointless
However, the argument above also points to a remedy for this problem; why not directly estimating in some way the right version of the model above

So, we try the following specification for the test equation (compare to Equation 8.5):

\hat u_t = \textcolor{red} {\mathbf x_t \boldsymbol \delta_1 + \delta_2 y_{t-1}} + \rho \hat u_{t-1} + e_t \tag{8.8}

Although not obvious, one can show that this procedure works, even if we have weak exogenous variables like y_{t-1} and therefore “biased” residuals from the original model, see Breusch and Godfrey (1981). The reason is that \rho is now estimated conditional on y_{t-1} and \mathbf x_t (FWL-Theorem) and thereby correcting for the “biased” residuals. Note, if \rho=0, \boldsymbol \delta_1 and \delta_2 have to be zero as well! (orthogonality property of OLS)
The test is whether \rho = 0, which can be examined with an usual t-test, or with a F-Test if we test for higher order autocorrelations with \rho_1 \hat u_{t-1} + \rho_2 \hat u_{t-2} + \ldots \, as autoregressive terms in the test equation. In the latter case we test whether \rho_1, \ \rho_2, \ldots = 0
More popular and powerful is a Lagrange Multiplier version of the test which is called Breusch-Godfrey test. In this case, the test statistic is n \cdot R^2, with the R^2 from the above test equation; (basically, this tests whether all coefficients of the test equation are zero and might therefore be more powerful). The test statistic is \chi^2 distributed with degrees of freedom equal to the number of autoregressive terms in the test equation

The Breusch-Godfrey test is today’s standard test for autocorrelation. It can be used with weak exogenous variables, is flexible as it can test for higher order autocorrelations and could easily be made robust against heteroskedasticity using HC standard errors

8.4 Example: Estimating a dynamic version of Okun’s Law

Reading in data from Excel table and converting them to time series

library(zoo)
library(AER)
library(dynlm) 

# For reading in Excel tables
library(readxl) 

# Reading in Excel table with data from 1957 Q1 - 2013 Q4
USMacro <- read_xlsx("us_macro_quarterly.xlsx")

       New names:
       • `` -> `...1`

# The first column of the data is a string variable of dates with format 1957:01, 
# but had no name. It automatically gets the name: ...1 

# Define date variable from the first column with the zoo function as.yearqtr() 
date <- as.yearqtr(USMacro$...1, format = "%Y:0%q")

# Convert to zoo data, but: important, remove the first column as it is a non-numeric variable 
# Remember, time series objects are matrices and can contain only numbers
usdata <- zoo(USMacro[, c("GDPC96", "UNRATE") ], order.by=date) 
class(usdata)

       [1] "zoo"

Defining and plotting the data

# We want to estimate Okun's Law, the relationship between the change in unemployment 
# rate dependent on the GDP growth rate

dgdp = diff( log(usdata$GDPC96) )*100 # define quarterly growth rates of gdp 
usdata <- cbind(usdata, dgdp)  # merging to usdata, not necessary in this case
plot( cbind( usdata$dgdp, usdata$UNRATE) )

Estimation

# Estimating Okun's Law with dynlm(). ( without an autoregressive term L(d(UNRATE)) ) 

model0 <- dynlm( d(UNRATE) ~ L(dgdp,1:3) + trend(usdata), data = usdata )
summary(model0)

       
       Time series regression with "zoo" data:
       Start = 1958 Q1, End = 2013 Q4
       
       Call:
       dynlm(formula = d(UNRATE) ~ L(dgdp, 1:3) + trend(usdata), data = usdata)
       
       Residuals:
            Min       1Q   Median       3Q      Max 
       -0.84119 -0.15431 -0.01462  0.14159  1.19502 
       
       Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
       (Intercept)    0.3593795  0.0485308   7.405 2.79e-12 ***
       L(dgdp, 1:3)1 -0.2124797  0.0223698  -9.499  < 2e-16 ***
       L(dgdp, 1:3)2 -0.0961287  0.0232396  -4.136 5.03e-05 ***
       L(dgdp, 1:3)3 -0.0268484  0.0222796  -1.205  0.22948    
       trend(usdata) -0.0008135  0.0002941  -2.766  0.00616 ** 
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
       
       Residual standard error: 0.2772 on 219 degrees of freedom
         (1 observation deleted due to missingness)
       Multiple R-squared:  0.4274, Adjusted R-squared:  0.4169 
       F-statistic: 40.86 on 4 and 219 DF,  p-value: < 2.2e-16

Plotting residuals and autocorrelation function for residuals

# Plotting residuals, looks they are somewhat positively autocorrelated 
plot(model0$residuals, main = "Residuals - Base Model")
abline(0,0)
acf(model0$residuals)

Testing for autocorrelation in residuals

# Durbin-Watson test points to autocorrelation 
dwtest(model0)

       
        Durbin-Watson test
       
       data:  model0
       DW = 1.6085, p-value = 0.001252
       alternative hypothesis: true autocorrelation is greater than 0

# Simple test for autocorrelation, regressing uhat on lagged uhat, 
# clear sign for autocorrelation

u0 = model0$residuals
coefs <- coeftest( dynlm( u0 ~ L(u0) )  )
coefs

       
       t test of coefficients:
       
                     Estimate Std. Error t value Pr(>|t|)   
       (Intercept) -0.0040673  0.0177612 -0.2290 0.819083   
       L(u0)        0.1722748  0.0647150  2.6621 0.008337 **
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# storing rho
rho <- coefs[2,1]
round(rho, digits=3)

       [1] 0.172

# Breusch-Godfrey test (option fill = NA seems to be necessary)
# clear sign for autocorrelation
bgtest(model0, fill = NA)

       
        Breusch-Godfrey test for serial correlation of order up to 1
       
       data:  model0
       LM test = 17.027, df = 1, p-value = 3.685e-05

8.4.1 First remedy: Testing results with HAC standard errors

But caution, you should also consider misspecifications, as these lead to autocorrelated errors, see Section 8.2.3

# First remedy for autocorrelated errors: HAC standard errors, 
# correcting biased standard errors, t-statistics and p-values
# But caution, you should also consider misspecifications, as these leads to autocorrelated errors too 

coeftest( model0, vcov. = vcovHAC )

       
       t test of coefficients:
       
                        Estimate  Std. Error t value  Pr(>|t|)    
       (Intercept)    0.35937948  0.05839617  6.1542 3.553e-09 ***
       L(dgdp, 1:3)1 -0.21247966  0.03919073 -5.4217 1.555e-07 ***
       L(dgdp, 1:3)2 -0.09612865  0.02405879 -3.9956 8.817e-05 ***
       L(dgdp, 1:3)3 -0.02684840  0.02714539 -0.9891   0.32373    
       trend(usdata) -0.00081354  0.00035591 -2.2858   0.02322 *  
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing results

Code

library(modelsummary)
modelsummary( list("default statistics "=model0, " HAC statistics"=model0), 
              fmt = 3,
              stars = TRUE, 
              statistic = c("std.error", "[ {statistic} ]", "[ {p.value} ]"),
              vcov = c("classical", vcovHAC),
              gof_omit = "A|L|B|F|R",
              coef_omit = 1,
              coef_rename = c('L(dgdp, 1:3)1' = 'L1_gGDP', 
                              'L(dgdp, 1:3)2' = 'L2_gGDP',
                              'L(dgdp, 1:3)3' = 'L3_gGDP'),
              align = "cdd",
              output = "flextable")

Table 8.2: Okun’s Law: Comparing default and HAC se, t-statistics and p-values

	default statistics	HAC statistics
L1_gGDP	-0.212***	-0.212***
	(0.022)	(0.039)
	[ -9.499 ]	[ -5.422 ]
	[ <0.001 ]	[ <0.001 ]
L2_gGDP	-0.096***	-0.096***
	(0.023)	(0.024)
	[ -4.136 ]	[ -3.996 ]
	[ <0.001 ]	[ <0.001 ]
L3_gGDP	-0.027	-0.027
	(0.022)	(0.027)
	[ -1.205 ]	[ -0.989 ]
	[ 0.229 ]	[ 0.324 ]
trend(usdata)	-0.001**	-0.001*
	(0.000)	(0.000)
	[ -2.766 ]	[ -2.286 ]
	[ 0.006 ]	[ 0.023 ]
Num.Obs.	224	224
Std.Errors	IID	Custom
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

8.4.2 Second remedy: Estimation by FGLS with rho=0.172

(see Equation 8.4)

# Estimate Okun's Law by FGLS. But this depends on a consistent estimate of rho, 
# which we have stored above 

model0GLS <- dynlm( ( d(UNRATE) - rho*L(d(UNRATE)) )  ~ 
                      L( ( dgdp - rho*L(dgdp) )  ,1:3) + trend(usdata), data = usdata )
summary(model0GLS)

     
     Time series regression with "zoo" data:
     Start = 1958 Q2, End = 2013 Q4
     
     Call:
     dynlm(formula = (d(UNRATE) - rho * L(d(UNRATE))) ~ L((dgdp - 
         rho * L(dgdp)), 1:3) + trend(usdata), data = usdata)
     
     Residuals:
         Min      1Q  Median      3Q     Max 
     -0.8008 -0.1405 -0.0207  0.1361  1.1586 
     
     Coefficients:
                                       Estimate Std. Error t value Pr(>|t|)    
     (Intercept)                      0.2435411  0.0461056   5.282 3.08e-07 ***
     L((dgdp - rho * L(dgdp)), 1:3)1 -0.1616469  0.0215529  -7.500 1.59e-12 ***
     L((dgdp - rho * L(dgdp)), 1:3)2 -0.1044128  0.0213147  -4.899 1.88e-06 ***
     L((dgdp - rho * L(dgdp)), 1:3)3 -0.0274345  0.0212663  -1.290   0.1984    
     trend(usdata)                   -0.0004783  0.0002819  -1.697   0.0912 .  
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     
     Residual standard error: 0.2636 on 218 degrees of freedom
       (1 observation deleted due to missingness)
     Multiple R-squared:  0.3163,   Adjusted R-squared:  0.3038 
     F-statistic: 25.22 on 4 and 218 DF,  p-value: < 2.2e-16

Comparing results

Code

library(modelsummary)
modelsummary( list("OLS "= model0, " OLS w HAC "= model0, " FGLS"= model0GLS), 
              fmt = 3,
              stars = TRUE, 
              statistic = c("statistic"),
              vcov = c("classical", vcovHAC, "classical"),
              gof_omit = "A|L|B|F|RM",
              coef_omit = 1,
              coef_rename = c('L(dgdp, 1:3)1' = 'L1_gGDP', 
                              'L(dgdp, 1:3)2' = 'L2_gGDP',
                              'L(dgdp, 1:3)3' = 'L3_gGDP',
                              'L((dgdp - rho * L(dgdp)), 1:3)1' = 'L1_gGDP',
                              'L((dgdp - rho * L(dgdp)), 1:3)2' = 'L2_gGDP',
                              'L((dgdp - rho * L(dgdp)), 1:3)3' = 'L3_gGDP' ),
              align = "lddd",
              output = "flextable")

Table 8.3: Okun’s Law: Comparing OLS, OLS with HAC-CovMat and FGLS, (t-statistics in brackets)

	OLS	OLS w HAC	FGLS
L1_gGDP	-0.212***	-0.212***	-0.162***
	(-9.499)	(-5.422)	(-7.500)
L2_gGDP	-0.096***	-0.096***	-0.104***
	(-4.136)	(-3.996)	(-4.899)
L3_gGDP	-0.027	-0.027	-0.027
	(-1.205)	(-0.989)	(-1.290)
trend(usdata)	-0.001**	-0.001*	-0.000+
	(-2.766)	(-2.286)	(-1.697)
Num.Obs.	224	224	223
R2	0.427	0.427	0.316
Std.Errors	IID	Custom	IID
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

Estimation with a lagged endogenous variable (autoregressive term)

armodel <- dynlm( d(UNRATE) ~ L(dgdp,1:3) + L( d(UNRATE) ) + trend(usdata), data = usdata )
summary(armodel)

       
       Time series regression with "zoo" data:
       Start = 1958 Q1, End = 2013 Q4
       
       Call:
       dynlm(formula = d(UNRATE) ~ L(dgdp, 1:3) + L(d(UNRATE)) + trend(usdata), 
           data = usdata)
       
       Residuals:
            Min       1Q   Median       3Q      Max 
       -0.79307 -0.13517 -0.02653  0.13349  1.07227 
       
       Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
       (Intercept)    0.1815164  0.0640424   2.834  0.00502 ** 
       L(dgdp, 1:3)1 -0.1313215  0.0293821  -4.469 1.26e-05 ***
       L(dgdp, 1:3)2 -0.0438121  0.0258604  -1.694  0.09166 .  
       L(dgdp, 1:3)3  0.0041877  0.0228314   0.183  0.85464    
       L(d(UNRATE))   0.3578653  0.0877720   4.077 6.39e-05 ***
       trend(usdata) -0.0004030  0.0003015  -1.337  0.18269    
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
       
       Residual standard error: 0.2678 on 218 degrees of freedom
         (1 observation deleted due to missingness)
       Multiple R-squared:  0.468,  Adjusted R-squared:  0.4557 
       F-statistic: 38.35 on 5 and 218 DF,  p-value: < 2.2e-16

Plotting residuals

Doesn’t looks that they are positively autocorrelated; but possible heteroskedasticity!

u <- armodel$residuals
plot(u, main = "Residuals - Autoregressive Model")
abline(0,0)

Simple test for autocorrelation in model with autoregressive term

Seems to be no autocorrelation – but this test is biased (why?)

usdata <- cbind(usdata, u) # merging to usdata, but not necessary in this case 
summary( dynlm( u ~ L(u), data = usdata )  )

       
       Time series regression with "zoo" data:
       Start = 1958 Q2, End = 2013 Q4
       
       Call:
       dynlm(formula = u ~ L(u), data = usdata)
       
       Residuals:
            Min       1Q   Median       3Q      Max 
       -0.77741 -0.13650 -0.02195  0.13431  1.06558 
       
       Coefficients:
                    Estimate Std. Error t value Pr(>|t|)
       (Intercept) -0.003797   0.017393  -0.218    0.827
       L(u)         0.019182   0.065716   0.292    0.771
       
       Residual standard error: 0.2597 on 221 degrees of freedom
         (4 observations deleted due to missingness)
       Multiple R-squared:  0.0003854,  Adjusted R-squared:  -0.004138 
       F-statistic: 0.0852 on 1 and 221 DF,  p-value: 0.7706

Breusch-Godfrey test for autocorrelation

Points to strong autocorrelation of order = 1

BG <- dynlm( u ~ L(dgdp,1:3) + L( d(UNRATE) ) + trend(usdata) +  L(u) , data = usdata )
summary(BG)

       
       Time series regression with "zoo" data:
       Start = 1958 Q2, End = 2013 Q4
       
       Call:
       dynlm(formula = u ~ L(dgdp, 1:3) + L(d(UNRATE)) + trend(usdata) + 
           L(u), data = usdata)
       
       Residuals:
            Min       1Q   Median       3Q      Max 
       -0.70784 -0.13588 -0.00744  0.13031  1.03772 
       
       Coefficients:
                       Estimate Std. Error t value Pr(>|t|)  
       (Intercept)    0.1635626  0.0955163   1.712   0.0883 .
       L(dgdp, 1:3)1  0.0135892  0.0285452   0.476   0.6345  
       L(dgdp, 1:3)2 -0.1271705  0.0529555  -2.401   0.0172 *
       L(dgdp, 1:3)3 -0.0530382  0.0315176  -1.683   0.0939 .
       L(d(UNRATE))  -0.5636989  0.2307361  -2.443   0.0154 *
       trend(usdata) -0.0002997  0.0003356  -0.893   0.3728  
       L(u)           0.6102879  0.2362478   2.583   0.0104 *
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
       
       Residual standard error: 0.2585 on 216 degrees of freedom
         (2 observations deleted due to missingness)
       Multiple R-squared:  0.03231,    Adjusted R-squared:  0.005429 
       F-statistic: 1.202 on 6 and 216 DF,  p-value: 0.3064

Breusch-Godfrey LM-test for autocorrelation, order = 1, chi^2 variant

# by hand
# n * R^2
nR <-  length(BG$residuals) * summary(BG)$r.squared

# p-value
pnR <- 1-pchisq(nR, df=1) # p-value

# Printing result
print( paste( "Breusch-Godfrey Test, order=1: n * R^2 =", sprintf( "%.3f",nR ), " p-value =", sprintf( "%.4f",pnR ) ))

       [1] "Breusch-Godfrey Test, order=1: n * R^2 = 7.205  p-value = 0.0073"

Built in function bgtest()

# Attention: !!! the R function bgtest() seems to not work properly 
# without the option fill=NA !!!
bgtest(armodel, order = 1, fill=NA)

       
        Breusch-Godfrey test for serial correlation of order up to 1
       
       data:  armodel
       LM test = 7.2514, df = 1, p-value = 0.007085

8.5 General Properties of OLS with Time Series

Under our standard assumptions MLR.1 - MLR.5 OLS, estimates are (see, Wooldridge (2019)
- Unbiased
- Efficient - Gauss-Markov Theorem
- Consistent
- Asymptotic normal distributed
- With MLR.6, normal distributed in finite samples
However, assumption MLR.2 – Random Sample – is almost never fulfilled with time series! This assumption is therefore replaced by TSMLR.2: Weak Dependence
- Weak dependence means that correlations between successive observations become smaller with more time distance between the observations and eventually converge to zero (fast “enough”)
Under MLR.4 – strong exogeneity – E(u_t|\mathbf X)=0, OLS estimators are unbiased
- However, contrary to the case of cross section data and random samples, in a time series context, this is an extreme strong assumption. It means that u_t is uncorrelated not only with \mathbf x_t, but also with all past and future values \mathbf x_{t+s}, \mathbf x_{t-s}. This rules out all autoregressive models with y_{t-s} as explanatory variable, as current y_{t}, which is obvously correlatet with u_t, is part of future \mathbf x_{t} in this case. Furthermore, policy feedback are ruled out too
Hence, generally only weak exogeneity is assumed: E(u_t|\mathbf x_t)=0. This, together with weak dependence, is sufficient for consistency and asymptotic normality, but not for unbiasedness

Furthermore, MLR.1 (linearity in parameters) is modified as we additionally assume that all variables follow a (weak) stationary stochastic process
A stochastic process, \{\ldots, y_{t-1}, \mathbf x_{t-1}, u_{t-1},\ \ y_t, \mathbf x_t, u_t, \ \ y_{t+1}, \mathbf x_{t+1}, u_{t+1}, \ldots \}, is a sequence of random variables over time
- Strong (or strict) stationarity means, that the joint distribution of the random variables is independent from time, meaning the distribution F(y_t, \mathbf x_t, u_t)=F(y_{t+\tau}, \mathbf x_{t+\tau}, u_{t+\tau}), \, \forall \tau
- Weak stationarity or variance stationarity means that only the first two moments of the joint distribution – expectations, variances and (auto)covariances – are independent from time and are finite for all t. For instance, we have: E(y_t, \mathbf x_t, u_t)=E(y_{t+\tau}, \mathbf x_{t+\tau}, u_{t+\tau}), \, \forall \tau, \text{ } E(y_t \cdot y_s)=E(y_{t+\tau} \cdot y_{s+\tau}), \, \forall \tau,s, E(y_{t} \cdot \mathbf x_{s})=E(y_{t+\tau} \cdot \mathbf x_{s+\tau}), \, \forall \tau,s, \ \ldots
This stationarity assumption, and its related assumption of weak dependence, are needed to be able to apply the LLN (Theorem A.2) and the CLT (Theorem A.3) and are therefore important for the asymptotic properties of the estimators. (Remark: These two assumptions are the counterparts of the i.i.d. assumption for cross section data). Unfortunately, these stationarity assumptions are quite often violated with macroeconomic or financial time series
However, as we will see later, there are tests to reject some important forms of non-stationarity and weak dependence

8.5.1 Some important stationary stochastic processes

White Noise Process

The most basic stationary stochastic processes is the White Noise process – WN process (German: Reiner Zufallsprozess, RZ), with the following properties

E(e_t)=0, \ \ \ \operatorname{Var}(e_t)=\sigma_e^2, \ \ \ E(e_t \cdot e_s)=0, \ \forall t \ne s

Moving Average process

A Moving Average process is a weighted sum of current and past values of a white noise process

\operatorname{MA}(q):= \ u_t = e_t + \theta_1 e_{t-1} + \cdots + \theta_q e_{t-q}

For a MA(1) process we get

E(u_t) = E(e_t) + \theta E(e_{t-1})=0

\operatorname{Var}(u_t) = \operatorname{Var}(e_t) + \theta^2\operatorname{Var}(e_t) = \sigma_e^2 (1+\theta^2)

E(u_t\cdot u_{t-1})=E\left( (e_t+\theta e_{t-1})(e_{t-1}+\theta e_{t-2})\right)=\theta\sigma_e^2

E(u_t\cdot u_{t-2})=E\left( (e_t+\theta e_{t-1})(e_{t-2}+\theta e_{t-3})\right)=0

because e_t is a WN-process with E(e_t\cdot e_{t-s})=0, s \ne 0

Thus, we arrived to the important result that for a MA(q) process, the covariance of (u_t, u_{t-s}) is zero for s > q

Autoregressive Process

An Autoregressive process of order p is defined as

\operatorname{AR}(p):= \ u_t = \rho_1 u_{t-1} + \cdots + \rho_p u_{t-p} + e_t

with e_t being a WN process. For on AR(1) process

u_t = \rho u_{t-1} + e_t \tag{8.9} we get (we assume stationarity)

E(u_t)=\rho E(u_{t-1})+E(e_t) \ \ \Rightarrow \ \ E(u_t)(1-\rho)=E(e_t) \ \ \Rightarrow

E(u_t)=\dfrac {1}{1-\rho}E(e_t)=0

and

\operatorname{Var}(u_t) = \rho^2 \operatorname{Var}(u_{t-1}) + \sigma_e^2 \ \ \Rightarrow \ \ \operatorname{Var}(u_t) := \sigma_u^2 =\dfrac {1}{1-\rho^2}\sigma_e^2 \

Finally, we calculate the auto-covariance of an AR(1) process

E(u_t\cdot u_{t-1})=E ( \, \underbrace {(\rho u_{t-1} + e_t)}_{u_t} \, u_{t-1} ) = \rho E(u_{t-1}^2) + \underbrace {E(e_t u_{t-1})}_{0} = \rho \sigma_u^2

E(u_t\cdot u_{t-2})=E\left( (\rho u_{t-1}+e_t) \, u_{t-2} \right) = \rho \underbrace {E(u_{t-1}u_{t-2})}_{\rho \sigma_u^2} + \underbrace {E(e_t u_{t-2})}_{0} = \rho ( \rho \sigma_u^2) = \rho^2 \sigma_u^2

Thus, we arrived to the important result that for an AR(1) process, the covariance of (u_t, u_{t-s}), is: \, E(u_t, u_{t-s}) = \rho^s \sigma_u^2, \ \ \forall s. This is a geometrically declining function in s, if |\rho|<1

MA representation of AR process

Every stationary AR(p) process has a MA representation, typically of infinite order (the reverse is generally not true, hence MA processes are of a more general type than AR processes!)
By a recursive expansion of Equation 8.9, the AR(1) process, we get

u_t = \rho u_{t-1} + e_t = \rho \underbrace {(\rho u_{t-2} + e_{t-1})}_{u_{t-1}} + e_t = e_t + \rho e_{t-1} + \rho^2 u_{t-2} \quad \text{ and}

u_t = e_t + \rho e_{t-1} + \rho^2 \underbrace {(\rho u_{t-3} + e_{t-2})}_{u_{t-2}} = e_t + \rho e_{t-1} + \rho^2 e_{t-2} + \rho^3 u_{t-3}

And finally, as \lim \limits_{s \to \infty} \rho^{s} = 0, we arrive to a MA representation of infinite order

u_t = e_t + \rho e_{t-1} + \rho^2 e_{t-2} + \rho^3 e_{t-3} + \cdots \ = \ \sum\nolimits_{i=0}^{\infty} \rho^i e_{t-i} \tag{8.10}

The coefficients of this MA representation constitute the so called Impulse Response of u_t regarding to a temporary unit shock in e_t. Such impulse responses are a very important tool in analyzing the reaction of variables to shocks, especially in a multivariate context \to VARs (Vector Autoregressive Processes)

8.5.2 Non-stationary AR processes – Random Walk, Trends and Unit Roots

Given an AR(1) process

u_t = \rho u_{t-1} + e_t

a necessary and sufficient condition for stationarity is |\rho|<1. This can be proved with the corresponding MA representation Equation 8.10

If \rho=1, we get a very specific and important process, a Random Walk u_t = u_{t-1} + e_t \tag{8.11}
The MA representation for a random walk with a staring value u_0 at time t=0 is

u_t = e_t + e_{t-1} + e_{t-2} + e_{t-3} + \cdots + e_{1} + u_{0} = \sum\nolimits_{i=1}^t e_i + u_0 \tag{8.12}

And the variance of u_t is

\sigma_u^2 = t\cdot \sigma_e^2 + \operatorname {Var} (u_{0})

which depends on time t. If t \to \infty, \sigma_u^2 \to \infty, so this clearly non-stationary
Random Walks also violate the weak dependence assumption as Equation 8.12 clearly shows. A random walk is therefore a long memory process; shocks long ago, e_{t-s} with s very large, do not diminish as in an AR(1) process with |\rho|<1, but retain their influence on u_t
Further note that the best prediction of the random walk u_t is E(u_t|u_{t-1}, u_{t-2}, \ldots ) = u_{t-1}. Hence, the difference between u_t and u_{t-1} is unpredictable. This property is of fundamental importance in economics and finance

8.5.3 Trends in time series

Deterministic Trend

Very often, time series exhibit time trends or even breaks. In this case, the assumption of stationarity is violated and the resulting complications for econometric analysis, like testing hypothesis, depend on the specific type of the non-stationarity, see Nelson and Plosser (1982)
Time series can exhibit a deterministic trend, a stochastic trend or both
An example for a deterministic trend is

y_t = \beta_0 + \beta_1 x_t + \beta_2 trend + e_t

This poses no problem for estimation of \beta_0, \beta_1, as by the FWL-theorem, this is the same as when y_t and x_t are first regressed on the trend variable and the detrended residuals are then used to estimate \beta_0, \beta_1. If these residuals are stationary, y_t and x_t are called trend stationary

Stochastic Trend

An example for a stochastic trend (and an additional deterministic trend) is

y_t = \beta_0 + y_{t-1} + \beta_1 x_t + e_t \tag{8.13}

This is an autoregressive model with an unit root (to this later). After a t times recursive expansions of the equation above we get

y_t = \beta_0 \cdot t + \beta_1 \sum_{i=1}^t x_{i} + \sum_{i=1}^t e_{i} + y_0

The first term represents the deterministic trend. It depends on \beta_0, which is called a drift in this case; because the constant is added up t times in an autoregressive model with a unit root this leads to a deterministic trend
The last sum represents the stochastic trend, because this is the MA representation of a random walk, compare Equation 8.12
The first sum could also represent a stochastic trend component, if x_t is treated as random
Note, if we specify Equation 8.13 as \Delta y_t = \beta_0 + \beta_1 x_t + e_t (estimation in 1^{st} differences), no trends arise

Unit roots and Lag-Operator

Stochastic trends, like the ones above, are called unit roots. Why this labeling?

In theoretical time series analysis, the so called lag operator L, (or backshift operator B) is commonly used as an extraordinary helpful tool
- The lag operator L is a function applied to time series and shifts the whole times series one period back to the past. In general, L can be manipulated like an ordinary algebraic symbol. We have

L x_t := x_{t-1}, \quad L^2 x_t = x_{t-2}, \quad L^s x_t = x_{t-s} , \quad L^{-1} x_t = x_{t+1}

The difference operator \Delta, or D, is defined as

\Delta:= (1-L), \ \ \Rightarrow \ \ \Delta x_t \, = \, x_t - L x_t \, = \, x_t - x_{t-1}

Thus, our autoregressive model, Equation 8.13, can be written as

y_t = \beta_0 + L y_{t} + \beta_1 x_t + e_t \ \ \Rightarrow \ \ (1-L)y_t = \beta_0 + \beta_1 x_t + e_t

The left hand term of the right variant is an autoregressive lag-polynomial. And the root of this polynomial is (1-L)=0 \ \Rightarrow \ L = 1, a unit root
The lag operator makes it very convenient to handle AR processes of higher orders (and of MA processes of course). For instance, consider the following AR(3) model

y_t = \rho_1 y_{t-1} + \rho_2 y_{t-2} + \rho_3 y_{t-3} + e_t \ \ \Rightarrow
\underbrace {(1-\rho_1L - \rho_2L^2 -\rho_3L^3)}_{a(L)} y_t = e_t \ \ \Rightarrow

a(L)y_t = e_t \tag{8.14}

with a(L) the corresponding AR-polynomial. This is an extremely compact notation for an AR process
If e_t is not WN, but follow itself an MA process, e_t=b(L)\epsilon_t with \epsilon_t WN, we have

a(L)y_t = b(L)\epsilon_t \tag{8.15}

which is called an ARMA(p,q) process, with p equal to the order of the AR part and q equal to the order of the MA part

Factorization of the lag-polynomial

If it is possible to factor the lag-polynomial a(L) into (1-L) \, \tilde a(L), with \tilde a(L) representing a stationary lag-polynomial, then a(L) contains an unit root and the process is non-stationary and not weakly dependent
One can show that if all roots of a(L) are outside the complex unit circle, a factorization (1-L) \, \tilde a(L) is not possible and the AR process is stationary; a necessary condition for this is that the sum of the coefficients of a(L) is strict positive and less than 2, i.e., 0<a(1)<2 (if the first term of a(L) is 1)
Furthermore, in this case (stationarity) the lag-polynomial a(L) is invertible and the AR process Equation 8.14 has a MA representation with finite second moments, i.e., is stationary

y_t = a(L)^{-1}e_t

Remark: The inverse of a lag-polynomial a(L) is defined by:

a(L)^{-1}a(L)=1

For example:

(1-\rho L)^{-1} = (1 + \rho L + \rho^2 L^2 + \rho^3 L^3 + \ldots)

which can be proven by multiplying out \, a(L)^{-1}a(L)
If the AR polynomial is invertible, even an ARMA process Equation 8.15 has a MA representation

y_t = a(L)^{-1}b(L)\epsilon_t

On the other hand, if b(L) is invertible, the MA (or ARMA) process has an AR representation. But this has nothing to do with stationarity as every MA process of finite order is stationary. Hence, it is possible that a stationary MA process has no AR representation \rightarrow MA processes are more general

8.5.4 Some simulated stochastic processes

Code

# Simulating several stochastic processes

library(matrixStats)  # for matrix cumSum
set.seed(12345) # for reproducibility 
n=100  # number of Observations



# MA(2) process; with theta1 = 0.6, theta2 = 0.4

e = matrix( rnorm(300), ncol = 3 )
e = ts(e, 1, frequency = 1 )  # convert to a ts object to be able to use the lag function
u = e + 0.6*lag(e,-1) + 0.4*lag(e,-2)

Code

# Stable AR(1) process

ar=NULL   # to initialize ar 
for (i in 1:3) {
  ar_t = arima.sim( list( order(1,0,0), ar=c(0.8) ), n=n  )
  ar  = cbind(ar_t, ar)
}


# Trend stationary AR(1) process; we simply add a deterministic trend to last AR(1) from above

ar_t = ar_t + cumsum( rep(0.5, 100) )

Code

# Random walk; cumulative sum of WN process

e = matrix( rnorm(n*10), ncol = 10)
rw = colCumsums(e)


# Random walk with drift; cumulative sum of WN process with d=0.5

ed = e + 0.5
rwd = colCumsums(ed)



# Random walk with stronger drift; cumulative sum of WN process with d=0.5

ed1 = e + 1.5
rwd1 = colCumsums(ed1)

Plotting of MA(2) processes

Code

matplot(u, type = "l", lty = 1, lwd = 1.5, main="MA(2) process; with theta1 = 0.6, theta2 = 0.4");  
abline(0,0);  
acf( u[,2], main= "ACF");

Plotting of AR(1) processes

Code

matplot(ar, type = "l", lty = 1, lwd = 1.5, main = "AR(1), with rho = 0.8"); 
abline(0,0);  
acf( ar[,1], main= "ACF");

Plotting Random Walks

Code

matplot(rw, type = "l", lty = 1, lwd = 1.5, main = "Random Walk");  
abline(0,0);  
acf( rw[,1], main= "ACF")

Plotting AR(1) with deterministic trend

Code

matplot(ar_t, type = "l", lty = 1, lwd = 1.5, main = "AR(1) + determ. trend, with rho = 0.8");  
abline(0,0.5); 
acf( ar_t, main= "ACF" )

Plotting Random Walks with drift

Code

matplot(rwd, type = "l", lty = 1, lwd = 1.5, main="Random Walk with drift"); 
abline(0,.5);  
acf( rwd[,1], main= "ACF")

Plotting Random Walks with strong drift

Code

matplot(rwd1, type = "l", lty = 1, lwd = 1.5, main="Random Walk with strong drift"); 
abline(0,1.5);  
acf( rwd1[,1], main= "ACF");

8.6 Consequences of unit roots for regressions

In this section, we will present some important theory
Confirm the theory with simulation experiments
Introduce the concepts of Integration and Cointegration

8.6.1 Theory

Non-stationarity in general, and unit roots in particular have serious consequences for regression analyses with time series, see Pagan and Wickens (1989)
Regression equations containing variables with unit roots can lead to
- Spurious regression
- Non-standard distributions of estimated coefficients with severe consequences for tests
- Super-consistency of estimated coefficients
In the following, we lay out the statistical reasons for these complications. To keep the analysis simple we consider a model with only one explanatory variable. Furthermore, y_i and x_i should be demeaned (so we need no intercept) y_i = \beta \, x_i + u_i \tag{8.16}
For this case the usual OLS estimator for \beta is

\hat \beta \, = \, \left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i^2 \right)^{-1} \left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i y_i \right) \, = \, \\ \qquad \quad \quad \beta + \underbrace{\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i^2 \right)^{-1}}_{M_{xx}^{-1}} \underbrace{\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i u_i \right)}_{M_{xu}} \tag{8.17}

Case 1: all variables and the error term u are well behaved

Assuming the usual case that the variables are well behaved (our former shortcut for stationarity and weak dependency) we can show by applying the LLN and CLT (Theorem A.2, Theorem A.3) that
- \hat \beta is consistent and
- \hat \beta is asymptotically normally distributed
In particular, regarding consistency of \hat {\beta}, by the LLN and according Equation 8.17, we have

\hat {\beta} \ \stackrel{p}{\longrightarrow} \ \beta + {\underbrace {\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i^2 \right)}_{\stackrel{p}{\longrightarrow} \ \sigma_x^2}} ^{-1} \underbrace {\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i u_i \right)}_{\stackrel{p}{\longrightarrow} \ 0} \ = \ \beta

Regarding the asymptotic normal distribution of \hat \beta, by the CLT we have

\sqrt n \, (\hat { \beta} - \beta) \ \stackrel{d}{\longrightarrow} \ {\underbrace {\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i^2 \right)}_{\stackrel{p}{\longrightarrow} \ \sigma_x^2}} ^{-1} \underbrace {\left( \frac {1}{\sqrt n}\sum\nolimits_{i=1}^n x_i u_i \right)}_{\stackrel{d}{\longrightarrow} \ NDRV} \ = \ NDRV \tag{8.18} with NDRV meaning “Normal Distributed Random Variable”

Case 2: x follows a random walk but u is stationary

The situation is different if x_i follows a random walk but u_i is still stationary and weakly dependent. In this case the conditions for the LLN and CLT are not satisfied any more and one can show that (with NSDRV for a non-standard distributed random variable)

\frac {1}{n} \left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i^2 \right) \ \stackrel{d}{\longrightarrow} \ NSDRV

\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i u_i \right) \ \stackrel{d}{\longrightarrow} \ NSDRV

Regarding consistency of \hat \beta, according to Equation 8.17 we have

\hat \beta \ \stackrel{p}{\longrightarrow} \ \beta + {\underbrace {\left( \frac {1}{n^2}\sum\nolimits_{i=1}^n x_i^2 \right)}_{\stackrel{d}{\longrightarrow} \ NSDRV}} ^{-1} \underbrace {\left( \frac {1}{n^2}\sum\nolimits_{i=1}^n x_i u_i \right)}_{\stackrel{p}{\longrightarrow} \ \frac {1}{n}NSDRV \to \ 0 } \ = \ \beta \tag{8.19}

The rate of convergence actually is n instead of the usual rate \sqrt n, compare Equation 8.18.
Hence, \hat \beta is super consistent
We further reach to the conclusion that the distribution of \hat \beta is non-standard distributed

n \, (\hat \beta - \beta) \ \stackrel{d}{\longrightarrow} \ {\underbrace {\left( \frac {1}{n^2}\sum\nolimits_{i=1}^n x_i^2 \right)}_{\stackrel{d}{\longrightarrow} \ NSDRV}} ^{-1} \underbrace {\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i u_i \right)}_{\stackrel{d}{\longrightarrow} \ NSDRV} \ = \ NSDRV \tag{8.20}

Case 3: both x and u are random walks

The situation dramatically change if both x_i and u_i follow a random walk

\frac {1}{n} \left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i^2 \right) \ \stackrel{d}{\longrightarrow} \ NSDRV

\frac {1}{n} \left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i u_i \right) \ \stackrel{d}{\longrightarrow} \ NSDRV

The first result is the same as above but the second result is remarkable: even dividing by n^2 does not lead to a convergence to zero

Applying this result to the formula for \hat \beta, Equation 8.17, we get the reason for Spurious Regressions

(\hat \beta - \beta) \ \stackrel{d}{\longrightarrow} \ {\underbrace {\left( \frac {1}{n^2}\sum\nolimits_{i=1}^n x_i^2 \right)}_{\stackrel{d}{\longrightarrow} \ NSDRV}} ^{-1} \underbrace {\left( \frac {1}{n^2}\sum\nolimits_{i=1}^n x_i u_i \right)}_{\stackrel{d}{\longrightarrow} \ NSDRV} \ = \ NSDRV \tag{8.21}

We have no convergence to \beta even if we divide and multiply by n^3 as the first inverse moment will go to +\infty and the second to zero, so that \hat \beta \to \beta + \infty \cdot 0, which could be anything. Hence, \hat \beta does not converge to the true \beta but to a non-standard distributed random variable even with a very large sample \Rightarrow Spurious Regressions Problem, see C. W. Granger and Newbold (1974)

8.6.2 Simulations

To underpin our theoretical results we simulate the convergence of moments M_{xx}^{-1}M_{xu}=(\hat \beta - \beta), depending on the ts-properties

Code

rm(.Random.seed, envir=globalenv())
betahat_minus_beta = list()

x_random_walk = 0  # x random walk or white noise
rho = 0.5  # AR parameter for u; AR(1) with rho=0.5 or random walk

# for 20 and 200 observations, 20000 replications
for (n in c(20, 200)) {
  Mxx = NULL
  Mxu = NULL
  for (i in 1:20000) {
    e1 = rnorm(n)
    e2 = rnorm(n)
    ifelse( x_random_walk == 1, x <- cumsum(e1), x <- e1)
    u <- rep(0, n)
    for (t in 2:n) {u[t] = rho * u[t-1] + e2[t] * 2}
    mxx <- sum(x * x) / (n)
    mxu <- sum(x * u) / (n)
    Mxx[i] = mxx
    Mxu[i] = mxu
  }
  if (n==20)   {n1=20;  betahat_minus_beta[[1]] = Mxu/Mxx}
  if (n==200)  {n2=200; betahat_minus_beta[[2]] = Mxu/Mxx}
}

Simulation 1: both x and u stationary

Normal rate of convergence
With n=200, already normally distributed

Code

hist(betahat_minus_beta[[1]], breaks=50, freq = FALSE,  main = paste0("Mxu/Mxx with n = ", n1), xlim=c(-2,2))
curve(dnorm(x, mean = 0 ,sd=sd(betahat_minus_beta[[1]])),add = TRUE, col = "red") 
hist(betahat_minus_beta[[2]], breaks=50, freq = FALSE,  main = paste0("Mxu/Mxx with n = ", n2), xlim=c(-2,2))
curve(dnorm(x, mean = 0 ,sd=sd(betahat_minus_beta[[2]])),add = TRUE, col = "red")

Simulation 2: x random walk, u stationary

Code

rm(.Random.seed, envir=globalenv())
betahat_minus_beta = list()

x_random_walk = 1
rho = 0.5

for (n in c(20, 200)) {
  Mxx = NULL
  Mxu = NULL
  for (i in 1:20000) {
    e1 = rnorm(n)
    e2 = rnorm(n)
    ifelse( x_random_walk == 1, x <- cumsum(e1), x <- e1)
    u <- rep(0, n)
    for (t in 2:n) {u[t] = rho * u[t-1] + e2[t] * 2}
    mxx <- sum(x * x) / (n ^ 2)
    mxu <- sum(x * u) / (n ^ 2)
    Mxx[i] = mxx
    Mxu[i] = mxu
  }
  if (n==20)   {n1=20;  betahat_minus_beta[[1]] = Mxu/Mxx}
  if (n==200)  {n2=200; betahat_minus_beta[[2]] = Mxu/Mxx}
}

Super consistency
Clear deviations from normal distribution with both n=20 and n=200

Code

hist(betahat_minus_beta[[1]], breaks=50, freq = FALSE,  main = paste0("Mxu/Mxx with n = ", n1), xlim=c(-2,2))
curve(dnorm(x, mean = 0 ,sd=sd(betahat_minus_beta[[1]])),add = TRUE, col = "red") 
hist(betahat_minus_beta[[2]], breaks=50, freq = FALSE,  main = paste0("Mxu/Mxx with n = ", n2), xlim=c(-2,2))
curve(dnorm(x, mean = 0 ,sd=sd(betahat_minus_beta[[2]])),add = TRUE, col = "red")

Simulation 3: x random walk, u random walk

Code

rm(.Random.seed, envir=globalenv())
betahat_minus_beta = list()

x_random_walk = 1
rho = 1

for (n in c(20, 200)) {
  Mxx = NULL
  Mxu = NULL
  for (i in 1:20000) {
    e1 = rnorm(n)
    e2 = rnorm(n)
    ifelse( x_random_walk == 1, x <- cumsum(e1), x <- e1)
    u <- rep(0, n)
    for (t in 2:n) {u[t] = rho * u[t-1] + e2[t] * 2}
    mxx <- sum(x * x) / (n ^ 2)
    mxu <- sum(x * u) / (n ^ 2)
    Mxx[i] = mxx
    Mxu[i] = mxu
  }
  if (n==20)   {n1=20;  betahat_minus_beta[[1]] = Mxu/Mxx}
  if (n==200)  {n2=200; betahat_minus_beta[[2]] = Mxu/Mxx}
}

No convergence to the true value at all, in fact the distribution is independent of n ==> spurious regression
Clear deviations from normal distribution with both n=20 and n=200

Code

hist(betahat_minus_beta[[1]], breaks=50, freq = FALSE,  main = paste0("Mxu/Mxx with n = ", n1), xlim=c(-8,8))
curve(dnorm(x, mean = 0 ,sd=sd(betahat_minus_beta[[1]])),add = TRUE, col = "red") 
hist(betahat_minus_beta[[2]], breaks=50, freq = FALSE,  main = paste0("Mxu/Mxx with n = ", n2), xlim=c(-8,8))
curve(dnorm(x, mean = 0 ,sd=sd(betahat_minus_beta[[2]])),add = TRUE, col = "red")

8.6.3 Order of Integration and Cointegration

We have seen that unit roots in the variables can have dramatic consequences for the asymptotic behavior of the OLS estimators. Especially, we learned that the features of the error process are of great relevance. To be more clear, we have to make some important definitions
1. A variable is integrated of order one, I(1), if it is non-stationary but becomes stationary after applying the difference operator \Delta = (1-L) once. In this case, the variable contains one unit root in its autoregressive lag polynomial: a(L)=\Delta \tilde a(L), with \tilde a(L) stationary (invertible)
2. For an I(2) process – integrated of order two – we have to apply the difference operator \Delta twice to become stationary. Further, an I(0) process is already stationary
3. If we have two I(1) variables y and x and there exists a linear combination (y - \alpha x) which is I(0) – i.e., is stationary – than the variables y and x are said to be cointegrated
The concept of cointegration is of enormous importance in modern macroeconomic theory and macro-econometrics because it has a clear economic interpretation as a long run equilibrium relationship
- If two variables are I(1), they exhibit a stochastic trend (random walk components) and one would expect that they are drifting further away from each other in the course of time
- If this is not the case, if they stay side by side, then there must be some economic forces at work – adjustments toward a long run equilibrium – which accomplish that. In this case, there exists a cointegration relationship (y - \alpha x) which is stationary, I(0), and so they are cointegrated

8.6.4 Several important special cases

Let as reconsider or model Equation 9.14, \text{ } y_i = x_i \beta + u_i
If both x and y are I(1), but are cointegrated, \beta corresponds to \alpha and OLS will estimate the cointegration relationship by minimizing the sum of squared residuals; Equation 9.14 is then called cointegration equation (but note, a causal interpretations still requires x_i to be exogenous). Furthermore, the error term u in Equation 9.14 is stationary and the results of Equation 8.19 and Equation 8.20 apply. Thus we have
- OLS consistently estimates \beta, with a convergence rate n, i.e., OLS is super consistent
- The distribution of \hat \beta is generally non-standard and therefore the usual tests do not apply. One have to relay on simulated distributions
If both x and y are I(1), but are not cointegrated, then they are drifting away from each other in the course of time which implies a non-stationary error term u. In this case result of Equation 8.21 applies and
- regressing y on x typically leads to a spurious regression. The result is unpredictable as \hat \beta is a non-standard distributed random variable whose variance does not shrink with larger n
- However, there could be a valid “short run” relationship in first differences, \Delta y_i = \Delta x_i \beta + v_i. As the first differences are stationary, this relationship can be estimated by OLS in the usual manner
If y is I(0), stationary, and x is I(1), then \hat \beta will converge in a super consistent manner to zero, i.e., results Equation 8.19 and Equation 8.20 apply. This feature is utilized by the so called Unit Root Tests, see Section 8.7 below

Some further cases and qualifications on our present analysis

Assume u is stationary
- If x is strict exogenous, the whole analysis could be conditioned on x and in this case the time series properties of x play no role. We can think of x as fixed numbers in repeated samples and all our asymptotic standard results apply. However, strict exogeneity of the right hand side variables is rare in a time series context
- If x exhibits a unit root (stochastic trend) with drift, the standard results apply likewise. The reason is that in this case the deterministic trend dominates the stochastic one asymptotically. However, this is only the case, if there is no other right hand side variable with a deterministic trend or stochastic trend with drift. The reason is that in this case, according to the FWL theorem, all other variables behave as they were detrended
- Furthermore, all standard results apply, if the left and right hand side variables are either stationary or cointegrated. In this case we always can find reparameterizations so that every coefficient can be assigned to a stationary variable. This is especially true for VARs with more than one lag, as a variable and its lagged values are always cointegrated. Hence, there is generally no need to estimated VARs in differences (as many people believe)
The spurious regression problem remains even if x is strict exogenous – the error term u is still non-stationary in this case. If the variables are trend-stationary, i.e., only exhibit a deterministic trend, the simple remedy avoiding spurious regressions is detrending

8.7 Unit Root Test and Error Correction Form

8.7.1 Dickey-Fuller (DF) Test

We have seen that a possible non-stationarity of times series variables in regression models determines the statistical properties of the estimators. Hence, we clearly need a test for stationary or non-stationary
Consider a pure random walk y_t = y_{t-1} + e_t \ \ \Rightarrow \ \ \Delta y_t = e_t \tag{8.22}
To test for non-stationarity we add the variable y_{t-1} to the right hand side of the right variant of Equation 8.22. If the the process is actually a random walk, results Equation 8.19 and Equation 8.20 apply and the coefficient of y_{t-1} would converge with rate n to the true value zero. This is the Dickey-Fuller (DF) Test:

\Delta y_t = \delta y_{t-1} + \beta_0 + (\gamma \cdot t) + e_t \tag{8.23}

Thus, we are testing whether \delta = 0 by means of the usual t-statistic. However, H_0: \delta = 0, therefore we presuppose non-stationarity. Hence, under H_0, \delta is non-standard distributed, Equation 8.20 apply, and its t-statistic follows the so called Dickey-Fuller distribution, which has been tabulated by them with Monte-Carlo-simulations. Typically, critical values of the Dickey-Fuller distribution are notably larger then the usual ones; one-sided 5%: -2.86 without time trend and -3.41 with a time trend included in the test equation
To allow for a proper alternative, if H_0 is not true and y is actually stationary, we usually add a constant \beta_0 and even a time trend (\gamma \cdot t) to the test equation Equation 8.22, especially if y exhibit a trend

8.7.2 Augmented Dickey-Fuller Unit Root (ADF) Test

If y does not follow a pure random walk but an AR process with unit root, additional terms, which take care oft this more complicated process, have to be added. As we presuppose an AR process with a unit root, we augment the test Equation 8.23 by autoregressive terms \Delta y_{t-s}, leading to the so called Augmented Dickey-Fuller (ADF) Test. Once again, we test \delta = 0 by means of the usual t-statistic

\Delta y_t = \delta y_{t-1} + \beta_0 + \textcolor{red} {\beta_1 \Delta y_{t-1} + \beta_2 \Delta y_{t-2} + \cdots + \beta_k \Delta y_{t-k}} + (\gamma \cdot t) + e_t \tag{8.24}

Because of its simplicity and quite good statistical properties the ADF test is the most popular unit root test (there have been developed many other unit root tests in the literature, but many of them are variants of the ADF test)
The ADF test could even be used as a test for cointegration, which is then called the Engle-Granger procedure, see Engle and Granger (1987). To do this
1. We first have to check whether both (or more) variables are actually I(1), using the ADF test
2. If both are I(1), and an I(2) property is ruled out, we estimate the hypothetical cointegration regression Equation 9.14
3. After that, we look if the residuals of the cointegration regression Equation 9.14 are indeed stationary. We can employ the ADF test to accomplish this
4. However, because residuals of a regression always look somewhat stationary – OLS will minimize the residual variance – the critical values for this test are different than those for the ADF test applied to non-residuals time series; these critical values are even larger than the usual ADF ones
This procedure can easily be extended to more explanatory variables (with even larger critical values). But in this case, usually more advanced multivariate methods, like the Johansen procedure, Johansen (1991), are used to estimate and test for multiple cointegration relationships

8.7.3 Error Correction Form

A cointegration relationship describes the long run connection between variables
However, there is a possibility to estimate short as well as long run relationships in one equation. This is called error correction form

\Delta y_t = + \beta_0 + \beta_1 \Delta y_{t-1} + \beta_2 \Delta y_{t-2} + \cdots + \beta_k \Delta y_{t-k} + \\ \gamma_1 \Delta x_{t-1} + \gamma_2 \Delta x_{t-2} + \cdots + \gamma_k \Delta x_{t-k} + \\ \delta \textcolor{red} { (y_{t-1} - \alpha x_{t-1})} + \\ \gamma \cdot t + e_t \tag{8.25}

The term in brackets is the error correction term. In the long run, this equilibrium relationship must hold and therefore be stationary. This relationship could be estimated by Equation 8.25 “freely” or in a two step procedure:
1. First, estimate a cointegration regression in levels without any lags as in Equation 9.14
2. Test for cointegration, i.e., whether the residuals of the cointegration regression are stationary
3. If they are stationary take the lagged residuals of the cointegration regression, representing (y_{t-1} - \hat\alpha x_{t-1}), as an additional variable in Equation 8.25; then \delta measures the adjustment speed toward the long run equilibrium
The autoregressive and distributed-lag terms in differences in Equation 8.25 describe the short run interactions between the variables. The lag length k is often chosen so that the error term e_t is not autocorrelated

8.8 Examples of regressions with non-stationary time series

8.8.1 Spurious Regression

Code

# Needed Libraries
library(matrixStats)  # for matrix colCumsums
library(tseries)  # for Augmented Dickey-Fuller Unit Root
library(egcm)  # Engle-Granger cointegration tests
library(zoo)  # time series infrastructure 
set.seed(1111) # for reproducibility; will always generate the same random numbers

Code

# Simulating two independent random walks
# First, generate two white noise processes 
n=250  # number of Observations

e = matrix( rnorm(n*2), ncol = 2 ) # random matrix with n rows and two columns 
e = zooreg(e, 1, n, frequency = 1 )  # convert a ts-zoo object to be able to use the lag function

# generate two random walks; cumulative sums of columns the WN processes matrix; compare 
u <- colCumsums(e); # u consists of two columns with the two random walks
colnames(u) <- c("y", "x")   # giving names
u <- zooreg(u, 1, n, frequency = 1)   # convert a ts-zoo object

Plot the two independently generated random walks

Code

matplot(u, type = "l", lty = 1, lwd = 1.5, main = "Two independent random walks")
abline(0,0)

Regressing the two independently generated random walks

As this is a regression of I(1) variables without lags this is a so called Engle and Granger (1987) cointegration regression, like Equation 9.14

mod <- dynlm( u$y ~ u$x )
coeftest(mod)  # printing the coefficients with std-errors and test statistics

       
       t test of coefficients:
       
                    Estimate Std. Error  t value  Pr(>|t|)    
       (Intercept)  6.030243   0.815726   7.3925 2.208e-12 ***
       u$x         -2.578410   0.078267 -32.9438 < 2.2e-16 ***
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We get a highly significant relationship, but we know that this is totally spurious. How can this happen?
Test of autocorrelation in residuals: The DW test points to highly autocorrelated residuals, a first hint for a non-stationarity of the residuals

dwtest(mod)

       
        Durbin-Watson test
       
       data:  mod
       DW = 0.13832, p-value < 2.2e-16
       alternative hypothesis: true autocorrelation is greater than 0

Scatterplot of the two independently generated random walks

A spurious connection between the two independent random walks is reflected even in the scatterplot

Code

plot(u$x, u$y, main = "Scatterplot of the two independent random wakls")
abline(mod)  # add regression line of the estimated model to the plot

Residual Plot

Plot of the residuals reveal a very high degree of autocorrelation ==> possibly non-stationary

Code

plot( mod$residuals, main = "Residuals extremly autocorrelated" )
abline(0,0)

Dickey-Fuller test

DF test equation, Equation 8.23: \Delta(residuals) on lagged residuals ==> nearly a random walk. But what critical value should we use for the t-statistic of \delta ?

coeftest( dynlm( d(mod$residuals) ~ L(mod$residuals) ) )

       
       t test of coefficients:
       
                         Estimate Std. Error t value Pr(>|t|)   
       (Intercept)       0.065411   0.163973  0.3989   0.6903   
       L(mod$residuals) -0.071646   0.023257 -3.0806   0.0023 **
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Augmented Dickey-Fuller test

ADF test, test Equation 8.24, a formal test for non-stationarity – H₀ is non-stationarity
But caution, Dickey-Fuller critical values a are not valid for residuals, as OLS residuals always look somewhat stationary due to the minimization of the residual variance. But nonetheless H₀, non-stationarity, cannot be rejected
The used lag length k for autoregressive correction terms in Equation 8.24 is 6 in this case. There are other procedures for ADF tests in R, which choose an “optimal” lag length according to some information criteria

adf.test(mod$residuals)

       
        Augmented Dickey-Fuller Test
       
       data:  mod$residuals
       Dickey-Fuller = -2.9887, Lag order = 6, p-value = 0.1596
       alternative hypothesis: stationary

Engle-Granger cointegration test

Basically an ADF test for the residuals of the cointegration regression
But the egcm() procedure uses correct critical values for residuals
Furthermore, besides an ADF test, a whole battery of other tests are carried out
None of them can reject the H₀ of non-stationary of the residuals at the usual significance level of 5%

==> no cointegration
==> spurious regression

summary( erg <- egcm(u$x, u$y) )

       Y[i] =  -2.5784 X[i] +   6.0302 + R[i], R[i] =   0.9542 R[i-1] + eps[i], eps ~ N(0,  2.5887^2)
               (0.0783)        (1.5299)                (0.0232)
       
       R[250] = 6.7085 (t = 0.950)
       
       WARNING: X and Y do not appear to be cointegrated.
       
       Unit Root Tests of Residuals
                                                           Statistic    p-value
         Augmented Dickey Fuller (ADF)                        -2.963    0.10724
         Phillips-Perron (PP)                                -18.214    0.09059
         Pantula, Gonzales-Farias and Fuller (PGFF)            0.932    0.08524
         Elliott, Rothenberg and Stock DF-GLS (ERSD)          -1.605    0.30010
         Johansen's Trace Test (JOT)                         -16.280    0.17832
         Schmidt and Phillips Rho (SPR)                      -17.815    0.22337
       
       Variances
         SD(diff(X))          =   0.957716
         SD(diff(Y))          =   1.058080
         SD(diff(residuals))  =   2.631344
         SD(residuals)        =   7.063248
         SD(innovations)      =   2.588656
       
       Half life       =  14.786804
       R[last]         =   6.708500 (t=0.95)

8.8.2 Cointegration

Code

# Simulating two cointegrated time series
set.seed(1234) # for reproducibility; will always generate the same random numbers 

# First, generate two white noise processes 
v1 = rnorm(250);  v2 = rnorm(250)*2 

# Generating one MA(1) process w1 and one AR(2) process w2; first generate two vectors with zeros
w1 = rep(0, 250);  w2 = rep(0, 250); 

# Loop to generate the MA(1) and AR(2) process 
for (t in 3:250) {
  w1[t] = 0.4 * v1[t] + 0.4 * v1[t-1]            # MA(1)
  w2[t] = 0.6 * w2[t-1] + 0.3 * w2[t-2] + v2[t]  # AR(2)
}

# Second, generate time series with unit root: cumsum of MA(1) process; more general than random walk 
rw = cumsum(w1)  # actually, this is an ARMA(1,1) with unit root ( or an ARIMA(0,1,1) )

# x and y exhibit a common stochastic trend hence, they are cointegrated
x = rw
y = rw + w2

# transform to ts-zoo objects to make available the time series functions
x = zooreg(x,1,250,frequency = 1)  #  transform to a ts-zoo object
y = zooreg(y,1,250,frequency = 1)  #  transform to a ts-zoo object

Plot of the two cointegrated time series

Although both are I(1), they stick together

matplot(cbind(y,x),  type = "l", lty = 1, lwd = 1.5, main = "Two cointegrated time series")

Testing whether both variables are non-stationary

Before doing a cointegration test, we have to check, whether the variables are actually I(1)

adf.test(y)        # testing whether I(1); I(1) is not rejected

       
        Augmented Dickey-Fuller Test
       
       data:  y
       Dickey-Fuller = -2.1743, Lag order = 6, p-value = 0.5024
       alternative hypothesis: stationary

adf.test(diff(y))  # ruling out I(2); I(2) is rejected

       
        Augmented Dickey-Fuller Test
       
       data:  diff(y)
       Dickey-Fuller = -7.1506, Lag order = 6, p-value = 0.01
       alternative hypothesis: stationary

The same for variable x

adf.test(x)        # testing whether I(1); I(1) is not rejected

       
        Augmented Dickey-Fuller Test
       
       data:  x
       Dickey-Fuller = -2.5146, Lag order = 6, p-value = 0.3592
       alternative hypothesis: stationary

adf.test(diff(x))  # ruling out I(2); I(2) is rejected

       
        Augmented Dickey-Fuller Test
       
       data:  diff(x)
       Dickey-Fuller = -4.6994, Lag order = 6, p-value = 0.01
       alternative hypothesis: stationary

Hence, both y and x are probably I(1). Therefore, we can look, whether they are cointegrated

Cointegration regression

Regressing the two I(1) processes y on x, without any lags ==> Engle and Granger (1987) cointegration regression, like Equation 9.14

summary( cointreg <- dynlm(y ~ x) )

       
       Time series regression with "zooreg" data:
       Start = 1, End = 250
       
       Call:
       dynlm(formula = y ~ x)
       
       Residuals:
           Min      1Q  Median      3Q     Max 
       -8.3602 -1.9828 -0.0212  2.1864  7.0777 
       
       Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
       (Intercept) 0.005508   0.404782   0.014    0.989    
       x           1.029490   0.036939  27.870   <2e-16 ***
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
       
       Residual standard error: 3.235 on 248 degrees of freedom
       Multiple R-squared:  0.758,  Adjusted R-squared:  0.757 
       F-statistic: 776.7 on 1 and 248 DF,  p-value: < 2.2e-16

The true alpha of the simulated cointegration regression is 1 and the Engle-Granger estimate for alpha is very close: 1.029, although with only 250 observations ==> super consistency

Plotting residuals of cointegration regression

Residuals of cointegration regression; they look stationary although clearly autocorrelated

plot(cointreg$residuals, main="Residuals of cointegration regression")
abline(0,0)

Unit root test for residuals

DF test of residuals

DF test equation, Equation 8.23, H₀ is non-stationarity:

coefDF1 <- coeftest( dynlm( d(cointreg$residuals) ~ L(cointreg$residuals) ) )
coefDF1

       
       t test of coefficients:
       
                              Estimate Std. Error t value Pr(>|t|)    
       (Intercept)            0.021665   0.134920  0.1606   0.8726    
       L(cointreg$residuals) -0.238571   0.042206 -5.6525 4.35e-08 ***
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\Delta(residuals) on lagged residuals ==> seems to be stationary
Note the t-value of L(cointreg$residuals) = -5.65. Seems to support cointegration. But see comments below about critical values

ADF test of residuals

Equation 8.24, test for non-stationarity, H₀ is non-stationarity. Seems to support cointegration
But caution, Dickey-Fuller critical values a are not valid for residuals, as OLS residuals always look somewhat stationary due to the minimization of the residual variance
The used lag length k for autoregressive correction terms is 6 in this case There are other procedures for ADF tests in R, which choose an “optimal” lag length according to some information criteria

adf.test(cointreg$residuals)

       
        Augmented Dickey-Fuller Test
       
       data:  cointreg$residuals
       Dickey-Fuller = -3.7125, Lag order = 6, p-value = 0.02389
       alternative hypothesis: stationary

Testing for cointegration with the package “egcm”

Engle-Granger cointegration test. Basically an ADF test for the residuals of the cointegration regression
But the egcm() procedure uses correct critical values for residuals
Furthermore, besides an ADF test, a whole battery of other tests are carried out
- Most of them can reject the H₀ of non-stationary of the residuals at the usual significance level of 5%
- specially look at the Elliott, Rothenberg and Stock (ERSD) test, which is a variant of the ADF test with local detrending of the data before testing and said to be the most powerful unit root test
- ==> variables are cointegrated
- ==> no spurious regression

summary( erg <- egcm(x, y, i1test="adf") )

       Y[i] =   1.0295 X[i] +   0.0055 + R[i], R[i] =   0.7839 R[i-1] + eps[i], eps ~ N(0,  2.1258^2)
               (0.0369)        (1.1374)                (0.0421)
       
       R[250] = 7.0777 (t = 2.193)
       
       WARNING: The series seem cointegrated but the residuals are not AR(1).
       
       Unit Root Tests of Residuals
                                                           Statistic    p-value
         Augmented Dickey Fuller (ADF)                        -3.698    0.01675
         Phillips-Perron (PP)                                -60.407    0.00010
         Pantula, Gonzales-Farias and Fuller (PGFF)            0.758    0.00010
         Elliott, Rothenberg and Stock DF-GLS (ERSD)          -4.486    0.00010
         Johansen's Trace Test (JOT)                         -20.575    0.04989
         Schmidt and Phillips Rho (SPR)                      -44.863    0.00586
       
       Variances
         SD(diff(X))          =   0.614024
         SD(diff(Y))          =   2.390978
         SD(diff(residuals))  =   2.257862
         SD(residuals)        =   3.228016
         SD(innovations)      =   2.125839
       
       Half life       =   2.846572
       R[last]         =   7.077663 (t=2.19)

8.8.3 Estimating an Error Correction Model

Estimating an error correction model with lagged residuals from the cointegration regression above as error-correction term

See Equation 8.25

erc <- dynlm( d(y) ~ L(d(y)) + L(d(x)) + L(cointreg$residuals) )
summary(erc)

       
       Time series regression with "zooreg" data:
       Start = 3, End = 250
       
       Call:
       dynlm(formula = d(y) ~ L(d(y)) + L(d(x)) + L(cointreg$residuals))
       
       Residuals:
           Min      1Q  Median      3Q     Max 
       -6.8811 -1.4633  0.0166  1.4280  5.8036 
       
       Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
       (Intercept)            0.02802    0.13553   0.207 0.836395    
       L(d(y))               -0.22553    0.06477  -3.482 0.000589 ***
       L(d(x))                1.23727    0.23845   5.189 4.45e-07 ***
       L(cointreg$residuals) -0.19225    0.04547  -4.228 3.34e-05 ***
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
       
       Residual standard error: 2.134 on 244 degrees of freedom
       Multiple R-squared:  0.2162, Adjusted R-squared:  0.2065 
       F-statistic: 22.43 on 3 and 244 DF,  p-value: 7.357e-13

Estimating the error correction model in “free form”

The implied parameter \alpha of the cointegration equation, Equation 9.14, is the coefficient of L(x) / -coefficient of L(y). See Equation 8.25

ercfree <- dynlm( d(y) ~ L(d(y)) + L(d(x)) + L(y) + L(x) )
summary(ercfree)

       
       Time series regression with "zooreg" data:
       Start = 3, End = 250
       
       Call:
       dynlm(formula = d(y) ~ L(d(y)) + L(d(x)) + L(y) + L(x))
       
       Residuals:
           Min      1Q  Median      3Q     Max 
       -6.8906 -1.4647  0.0212  1.4193  5.7937 
       
       Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
       (Intercept)  0.03871    0.27236   0.142 0.887098    
       L(d(y))     -0.22562    0.06494  -3.474 0.000606 ***
       L(d(x))      1.23692    0.23909   5.173 4.81e-07 ***
       L(y)        -0.19221    0.04558  -4.217 3.50e-05 ***
       L(x)         0.19889    0.05253   3.786 0.000193 ***
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
       
       Residual standard error: 2.139 on 243 degrees of freedom
       Multiple R-squared:  0.2162, Adjusted R-squared:  0.2033 
       F-statistic: 16.75 on 4 and 243 DF,  p-value: 3.839e-12

# Implied estimate of alpha:
alpha <-  -coef(ercfree)[5] / coef(ercfree)[4]

The true alpha of the simulated cointegration regression is 1 and the estimate for alpha is very close: 1.035, ==> super consistency
Furthermore, it is also very similar to the Engle-Granger estimate for alpha from above, which was 1.029