Time series arise in many fields of economics, especially in macroeconomics and financial economics
The main feature of time series is temporal ordering of observations, i.e., we have a natural ordering and thus, data may not be arbitrarily reordered
Typical features and problems: serial correlation, non-independence of observations and non-stationarity, which pose problems for test-statistics, asymptotic theory and the phenomenon of spurious regressions
The outcome of economic variables (e.g. GNP, Dow Jones) is uncertain; they should therefore be modeled as random variables
Thus, time series are seen as a sequences of random variables, i,e., stochastic processes
Randomness does not come from sampling from a population - A “sample” is the one realized path of the time series out of the many possible paths the stochastic process could have taken
Time series analysis focuses on modeling the dependency of a variable on its own past, and on the present and past values of other variables
8.1.1 Several popular times series models, example: Phillips Curve
Static model
In static time series models, the current value of one variable is modeled as the result of the current values of explanatory variables
(\pi_t \ldots inflation rate at time t, \; u_t \ldots unemployment rate at time t, \; e_t \ldots error term at time t)
Long run effect, resp. effect of a permanent shock of unemployment on inflation: \dfrac {\beta_1+\beta_2}{1-\beta_3}
8.1.2 Implementation of time series data in R
Dealing with time series in R is a little bit more tricky than dealing with cross section data. The reason is that the program must know the begin and end dates (time) of the time series and whether there are some “holes” in the data for treating times series specific operations like lags or seasonal effects
Thus, times series need some additional infrastructure to to this job
The built in infrastructure in R for times series is the class “ts”. Basically, ts-objects are numeric data vectors or matrices, augmented with an additional attribute, “tsp”, which store the starting time, end time and frequency of the data; 1 for yearly, 4 for quarterly and 12 for monthly data. The disadvantage of this class is that it can only deal with “regular” data, meaning equispaced data points with no “holes” in the data, and does not support weekly, daily or high frequency data
Alternative infrastructures are, among others, the “zoo” class and the “xts” class. Both classes augment the data vectors or matrices with an additionalindex variable, which maps the corresponding time or date to each data point. Both classes supports non-regular (irregular spaced) data as well as weekly, daily and high frequency data. Furthermore, merging data with different ranges are possible as well as aggregating data to another frequency. The disadvantage of these classes are that they are a bit more complicated to set up - you generally have to deal with Date classes, especially for the xts class. However, often an appropriate index variable is included in the data set
You want to have functions which support the special features of time series. For instance, the workhorse function lm() does not support basic time series operations like lags or diffs. You could use dynlm() from the package dynlm instead, which supports time trends, seasonal dummies and d and L operators
8.1.3 Example: Analysis of Wage and Price Inflation Interactions with R
# zoo representation l<-length(wageprc$lprice)# we need the length of the data # As there is no particular index variable in the data, we have to define an index variable m# We begin at 1980 M3. Thus we add to 1980: 2/12 : (2+l-1)/12# (For more info about zoo and dates: <http://www.princeton.edu/~otorres/TimeSeriesR101.pdf>)m<-as.yearmon(1980+seq( from =2, to =2+(l-1))/12)data<-zoo(wageprc[, c(4,5)], order.by =m)class(data)
[1] "zoo"
data[1:8,]
lprice lwage
Mar 1980 4.528289 0.8415672
Apr 1980 4.527209 0.8415672
May 1980 4.528289 0.8415672
Jun 1980 4.529368 0.8501509
Jul 1980 4.529368 0.8544153
Aug 1980 4.531524 0.8586616
Sep 1980 4.533674 0.8586616
Oct 1980 4.532599 0.8628899
We are regressing \Delta lwage_t on several lags of \Delta lwage_t and several lags of \Delta lprice_{t} (and likewise some other lags of additional control variables) and look whether the lags of \Delta lprice_{t} have some additional predictive power for \Delta lwage_t, beyond (conditioned on) the lags of \Delta lwage_t (and the other lagged control variables). If this is the case, we say that \Delta lprice is Granger-Causal for \Delta lwage, or \Delta lprice \Rightarrow \Delta lwage
summary(wage<-dynlm(d(lwage)~L(d(lprice), 1:4)+L(d(lwage), 1:4), data =data))
Time series regression with "zoo" data:
Start = Aug 1980, End = Dec 2003
Call:
dynlm(formula = d(lwage) ~ L(d(lprice), 1:4) + L(d(lwage), 1:4),
data = data)
Residuals:
Min 1Q Median 3Q Max
-0.0122246 -0.0028638 -0.0005247 0.0018289 0.0135806
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0031931 0.0006746 4.733 3.56e-06 ***
L(d(lprice), 1:4)1 0.1519367 0.1132631 1.341 0.180894
L(d(lprice), 1:4)2 -0.0348319 0.1232691 -0.283 0.777723
L(d(lprice), 1:4)3 0.3106324 0.1239067 2.507 0.012760 *
L(d(lprice), 1:4)4 0.0349598 0.1150571 0.304 0.761476
L(d(lwage), 1:4)1 -0.1942465 0.0596269 -3.258 0.001266 **
L(d(lwage), 1:4)2 -0.1303027 0.0598510 -2.177 0.030332 *
L(d(lwage), 1:4)3 0.0017028 0.0599862 0.028 0.977375
L(d(lwage), 1:4)4 0.2125389 0.0589520 3.605 0.000371 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.004549 on 272 degrees of freedom
Multiple R-squared: 0.1637, Adjusted R-squared: 0.1391
F-statistic: 6.655 on 8 and 272 DF, p-value: 5.943e-08
Conditioned on lagged wage inflation, does lagged price inflation affect current wage inflation?
Linear hypothesis test
Hypothesis:
L(d(lprice),4)1 = 0
L(d(lprice),4)2 = 0
L(d(lprice),4)3 = 0
L(d(lprice),4)4 = 0
Model 1: restricted model
Model 2: d(lwage) ~ L(d(lprice), 1:4) + L(d(lwage), 1:4)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 276 0.0060694
2 272 0.0056277 4 0.00044169 5.337 0.0003755 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Result: Price inflation does Granger cause wage inflation (has predictive power for wage inflation)
Problems: Instantaneous causality, missing additional control variables and expectation effects - post hoc, ergo propter hoc fallacy
We are regressing \Delta lprice_t on the very same variables as above and look whether the lags of \Delta lwage_{t} have some additional predictive power for \Delta lprice_t, beyond (conditioned on) the lags of \Delta lprice_t (and the other lagged control variables). If this is the case, we say that \Delta lprice is Granger-Causal for \Delta lwage, or \Delta lprice \Rightarrow \Delta lwage
summary(price<-dynlm(d(lprice)~L(d(lprice), 1:4)+L(d(lwage), 1:4), data =data))
Time series regression with "zoo" data:
Start = Aug 1980, End = Dec 2003
Call:
dynlm(formula = d(lprice) ~ L(d(lprice), 1:4) + L(d(lwage), 1:4),
data = data)
Residuals:
Min 1Q Median 3Q Max
-0.0089767 -0.0012300 -0.0001126 0.0013557 0.0134337
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0008842 0.0003622 2.441 0.01528 *
L(d(lprice), 1:4)1 0.4398115 0.0608146 7.232 4.84e-12 ***
L(d(lprice), 1:4)2 0.1959992 0.0661871 2.961 0.00333 **
L(d(lprice), 1:4)3 0.0936098 0.0665294 1.407 0.16056
L(d(lprice), 1:4)4 0.0514227 0.0617778 0.832 0.40592
L(d(lwage), 1:4)1 0.0241188 0.0320156 0.753 0.45189
L(d(lwage), 1:4)2 -0.0332092 0.0321359 -1.033 0.30234
L(d(lwage), 1:4)3 -0.0039232 0.0322085 -0.122 0.90314
L(d(lwage), 1:4)4 0.0455599 0.0316532 1.439 0.15120
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.002442 on 272 degrees of freedom
Multiple R-squared: 0.4937, Adjusted R-squared: 0.4788
F-statistic: 33.16 on 8 and 272 DF, p-value: < 2.2e-16
Conditioned on lagged price inflation, does lagged wage inflation affect current price inflation?
Linear hypothesis test
Hypothesis:
L(d(lwage),4)1 = 0
L(d(lwage),4)2 = 0
L(d(lwage),4)3 = 0
L(d(lwage),4)4 = 0
Model 1: restricted model
Model 2: d(lprice) ~ L(d(lprice), 1:4) + L(d(lwage), 1:4)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 276 0.0016500
2 272 0.0016224 4 2.7552e-05 1.1548 0.3312
Result: Wage inflation does not Granger cause price inflation (has no predictive power for price inflation)
Problems: Instantaneous causality, missing additional control variables and expectation effects - post hoc, ergo propter hoc fallacy
8.2 Auto-correlated Erros
One of the most common problems in regressions with time series is autocorrelation of the errors. This violates assumption MLR.5: (compare the chapter “Multiple Regression”, slide 48)
\operatorname{Var}(\mathbf u \, | \, \mathbf X) = E (\mathbf{u u}' \, | \, \mathbf{X}) = \sigma^2 \mathbf I
\tag{8.1}
This has basically three consequences for the OLS estimates
OLS estimates are not consistent any more if, additionally to autocorrelated errors, the explanatory variables are notstrict exogenous, E(u_t|\mathbf X=0)
This is especially the case for lagged endogenous variablesy_{t-s} as right hand side variables!
The usual formula for the covariance matrix of \hat {\boldsymbol \beta} is false, and therefore all test statistics
A remedy for this is a Heteroskedasticity and Autocorrelation Consistent covariance matrix estimate, HAC covariance matrix, leading to HAC standard errors for \hat {\boldsymbol \beta} and asymptotically valid tests
These are computed in a similar, but much more complicated manner (based on the residuals \hat u_t we have to estimate the autocorrelations of the errors over time) as we did for the Heteroskedasticity Consistent covariance matrix, HC (see the chapter “Heteroskedasticity”). As a formal derivation of this is quite tedious, we simply point to the literature, like Green (2017)
OLS looses its effiency
8.2.1 HAC Covariance Matrix
Regarding the second problem, we try to estimate the true covariance matrix of \hat {\boldsymbol \beta}. Basically, the residuals of the regression are used to estimate (\mathbf X' \boldsymbol \Omega \mathbf X), with \hat \sigma^2 \boldsymbol \Omega being the covariance matrix of the error term u_t, leading to an estimate of the covariance matrix for \hat {\boldsymbol \beta}
Using this HAC estimate of the covariance matrix for \hat {\boldsymbol \beta}, which is often called Newey-West estimator, the usual tests can be carried out
The big advantages of using HAC standard errors is that
it is easily implemented (usually by an option in the software package)
is working for heteroskedasticity as well as autocorrelation in the errors
we do not have to assume a particular form of heteroskedasticity or autocorrelation
Major disadvantages
Tests are only asymptotically valid (meaning we need a lot of observations)
OLS estimates of \boldsymbol \beta are still not efficient any more
8.2.2 Loss of Efficiency with Auto-Correlated Erros \RightarrowFGLS
Regarding the third problem, we already know that autocorrelated errors lead to a loss of efficiency; OLS estimates are not BLUE any more!
A possible remedy for this is using Feasible Generalized Least Square (FGLS), as we did in the case of heteroskedastic errors (through weighting the observations to offset the varying variance \to WLS)
For doing this, one needs a hypothesis about the particular form of the autocorrelation. Usually, one assumes an autoregressive process of order one or two for the error term. (Later, we will discuss such processes more deeply.) Suppose e_t is not autocorrelated, and u_t follows an autoregressive process of order one
Applying FGLS, in a first step, we have to estimate\rho with he help of a regression of the OLS residuals on the residuals of the previous period (according to the alleged error model above)
Having an estimate for \rho, in a second step, we transform the original model, y_t = \mathbf x_t \boldsymbol \beta + u_t, u_t autocorrelated, in the following way
Note, the error term e_t of this model, with the transformed variablesy_t^* and \mathbf x_t^*, is not autocorrelated. Hence, the parameters \boldsymbol \beta can be estimated efficiently with this transformed model
This model transformation (taking quasi differences of the variables) is called Koyeck Transformation and leads to a FGLS estimator, which is called Cochrane-Orcutt estimator
Remark: One can use the estimated parameters of this transformed model and plug these back into the original model to compute new residuals. With these new residuals we can get an updated estimate for \rho and transform the variables once more, but this time with this new \hat \rho. This algorithm can be repeated until \hat \rho doesn’t change any more. However, it is not guaranteed that the iterations just described actually lead to better (more efficient) estimates thus, often no iterations are performed
Applying the Cochrane-Orcutt procedure, at least one (the first) observation is lost (because of the used lags in the transformation). The Prais-Winsten procedure avoids this loss of efficiency by weighting the first observation with \sqrt {1 - \hat \rho^2}
Both methods only work if the explanatory variables \mathbf x_t are strict exogenous as otherwise the estimate \hat \rho from the first step is even asymptotically biased (but there is a remedy for this, which is discussed in the section about “testing for autocorrelation”)
There are several other methods for obtaining FGLS estimates. By means of these procedures, \rho and \boldsymbol \beta are estimated simultaneously, either by Maximum Likelihood estimation or with restricted non-linear least square
For both approaches, one has to rely on numerical methods. As with modern computers there are no real costs in terms of computation time, these numerical methods are the ones typically used in today’s software packages
The non-linear approach is based on the transformed model, but rearranged, and estimates
Note, the coefficient vector of \mathbf x_{t-1} is non-linear in the parameters \boldsymbol \beta and \rho and furthermore, non-linear restrictions are imposed on them
This restrictions can be tested against a general autoregressive/distributed lag model with noautocorrelated errors – common root test
This approach only requires weak exogeneity of the explanatory variables and not strict exogeneity to yield consistent estimates
8.2.3 Significance of Auto-Correlated Residuals
Generally, autocorrelation in the residuals indicates that the model does not exploit all the information contained in the data
Autocorrelation in the residuals of an estimated regression model can have several causes
A general misspecification of the model, e.g., omitted variables or non-linearities not taken into account
A misspecifieddynamicstructure of the model, i.e., omitted lags of the explanatory variables or missing autoregressive terms like y_{t-1}. In this case, the model is not dynamically complete, i.e., the error term is correlated with lags of the explanatory variables or missing autoregressive terms
Structural breaks or other non-stationarities
Autocorrelated errors
Hence, before applying the remedies for autocorrelated errors as discussed above, one has to check whether the first three causes for autocorrelated residuals can be ruled out – if not,thesehave to be tackled first!
As autocorrelated residuals might be a strong signal for all these problems mentioned above, it is important to have a test procedure for detecting autocorrelation
8.3 Testing for Autocorrelation
8.3.1 The Durbin-Watson Test
The simplest method to check for autocorrelated residuals is to regress the residuals of the original model on the lagged residuals and test whether \rho = 0 with a simple t-test:
\hat u_t = \rho \hat u_{t-1} + e_t
\tag{8.5}
This asymptotically valid test is particularly powerful against an autoregressive process of order one in the error term. But to some extent, it can detect more general forms of autocorrelations too, as long as these lead to a correlation between \hat u_t and \hat u_{t-1} – for instance even structural breaks
Another asymptotically equivalent variant of this test is the very popular Durbin-Watson test with the test statistic
The numerator is small (DW<2) if consecutive residuals stick together (have the same sign), which means that they are positively autocorrelated
The numerator is large (DW>2) if the residuals change the sign very often, which means that they are negatively autocorrelated
Durbin and Watson have tabulated the distribution of DW even for small samples which depends on n and k and further exhibits a range of indeterminacy, which lowers its practicability
8.3.2 The Breusch-Godfrey test
However, both of the test procedures above are only working if all explanatory variables of the original model are strict exogenous. Suppose not and the original model is
with the lagged endogenous variable y_{t-1} and an autocorrelated error u_t=\rho u_{t-1}+e_t
In this case, because y_{t-1} is strongly correlated with u_{t-1} per definition, the parameter estimate \hat \gamma will capture most of the autocorrelation in u_t. This implies that the residuals of this regression \hat u_t will look like much more as a not autocorrelated process. This is due to the first problem of autocorrelated errors mentioned above in Section 8.2
So, if we run a regression \hat u_t = \rho \hat u_{t-1} + e_t to test for autocorrelation, the estimate \hat \rho will therefore be heavily biased toward zero rendering the whole test procedure pointless
However, the argument above also points to a remedy for this problem; why not directly estimating in some way the right version of the model above
So, we try the following specification for the test equation (compare to Equation 8.5):
Although not obvious, one can show that this procedure works, even if we have weak exogenous variables like y_{t-1} and therefore “biased” residuals from the original model, see Breusch and Godfrey (1981). The reason is that \rho is now estimated conditional on y_{t-1} and \mathbf x_t (FWL-Theorem) and thereby correcting for the “biased” residuals. Note, if \rho=0, \boldsymbol \delta_1 and \delta_2have to be zero as well! (orthogonality property of OLS)
The test is whether \rho = 0, which can be examined with an usual t-test, or with a F-Test if we test for higher order autocorrelations with \rho_1 \hat u_{t-1} + \rho_2 \hat u_{t-2} + \ldots \, as autoregressive terms in the test equation. In the latter case we test whether \rho_1, \ \rho_2, \ldots = 0
More popular and powerful is a Lagrange Multiplier version of the test which is called Breusch-Godfrey test. In this case, the test statistic is n \cdot R^2, with the R^2 from the above test equation; (basically, this tests whether all coefficients of the test equation are zero and might therefore be more powerful). The test statistic is \chi^2 distributed with degrees of freedom equal to the number of autoregressive terms in the test equation
The Breusch-Godfrey test is today’s standard test for autocorrelation. It can be used with weak exogenous variables, is flexible as it can test for higher order autocorrelations and could easily be made robust against heteroskedasticity using HC standard errors
8.4 Example: Estimating a dynamic version of Okun’s Law
Reading in data from Excel table and converting them to time series
library(zoo)library(AER)library(dynlm)# For reading in Excel tableslibrary(readxl)# Reading in Excel table with data from 1957 Q1 - 2013 Q4USMacro<-read_xlsx("us_macro_quarterly.xlsx")
New names:
• `` -> `...1`
# The first column of the data is a string variable of dates with format 1957:01, # but had no name. It automatically gets the name: ...1 # Define date variable from the first column with the zoo function as.yearqtr() date<-as.yearqtr(USMacro$...1, format ="%Y:0%q")# Convert to zoo data, but: important, remove the first column as it is a non-numeric variable # Remember, time series objects are matrices and can contain only numbersusdata<-zoo(USMacro[, c("GDPC96", "UNRATE")], order.by=date)class(usdata)
[1] "zoo"
Defining and plotting the data
# We want to estimate Okun's Law, the relationship between the change in unemployment # rate dependent on the GDP growth ratedgdp=diff(log(usdata$GDPC96))*100# define quarterly growth rates of gdp usdata<-cbind(usdata, dgdp)# merging to usdata, not necessary in this caseplot(cbind(usdata$dgdp, usdata$UNRATE))
Estimation
# Estimating Okun's Law with dynlm(). ( without an autoregressive term L(d(UNRATE)) ) model0<-dynlm(d(UNRATE)~L(dgdp,1:3)+trend(usdata), data =usdata)summary(model0)
Time series regression with "zoo" data:
Start = 1958 Q1, End = 2013 Q4
Call:
dynlm(formula = d(UNRATE) ~ L(dgdp, 1:3) + trend(usdata), data = usdata)
Residuals:
Min 1Q Median 3Q Max
-0.84119 -0.15431 -0.01462 0.14159 1.19502
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3593795 0.0485308 7.405 2.79e-12 ***
L(dgdp, 1:3)1 -0.2124797 0.0223698 -9.499 < 2e-16 ***
L(dgdp, 1:3)2 -0.0961287 0.0232396 -4.136 5.03e-05 ***
L(dgdp, 1:3)3 -0.0268484 0.0222796 -1.205 0.22948
trend(usdata) -0.0008135 0.0002941 -2.766 0.00616 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2772 on 219 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.4274, Adjusted R-squared: 0.4169
F-statistic: 40.86 on 4 and 219 DF, p-value: < 2.2e-16
Plotting residuals and autocorrelation function for residuals
# Plotting residuals, looks they are somewhat positively autocorrelated plot(model0$residuals, main ="Residuals - Base Model")abline(0,0)acf(model0$residuals)
Testing for autocorrelation in residuals
# Durbin-Watson test points to autocorrelation dwtest(model0)
Durbin-Watson test
data: model0
DW = 1.6085, p-value = 0.001252
alternative hypothesis: true autocorrelation is greater than 0
# Simple test for autocorrelation, regressing uhat on lagged uhat, # clear sign for autocorrelationu0=model0$residualscoefs<-coeftest(dynlm(u0~L(u0)))coefs
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0040673 0.0177612 -0.2290 0.819083
L(u0) 0.1722748 0.0647150 2.6621 0.008337 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Breusch-Godfrey test (option fill = NA seems to be necessary)# clear sign for autocorrelationbgtest(model0, fill =NA)
Breusch-Godfrey test for serial correlation of order up to 1
data: model0
LM test = 17.027, df = 1, p-value = 3.685e-05
8.4.1 First remedy: Testing results with HAC standard errors
But caution, you should also consider misspecifications, as these lead to autocorrelated errors, see Section 8.2.3
# First remedy for autocorrelated errors: HAC standard errors, # correcting biased standard errors, t-statistics and p-values# But caution, you should also consider misspecifications, as these leads to autocorrelated errors too coeftest(model0, vcov. =vcovHAC)
# Estimate Okun's Law by FGLS. But this depends on a consistent estimate of rho, # which we have stored above model0GLS<-dynlm((d(UNRATE)-rho*L(d(UNRATE)))~L((dgdp-rho*L(dgdp)) ,1:3)+trend(usdata), data =usdata)summary(model0GLS)
Time series regression with "zoo" data:
Start = 1958 Q2, End = 2013 Q4
Call:
dynlm(formula = (d(UNRATE) - rho * L(d(UNRATE))) ~ L((dgdp -
rho * L(dgdp)), 1:3) + trend(usdata), data = usdata)
Residuals:
Min 1Q Median 3Q Max
-0.8008 -0.1405 -0.0207 0.1361 1.1586
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2435411 0.0461056 5.282 3.08e-07 ***
L((dgdp - rho * L(dgdp)), 1:3)1 -0.1616469 0.0215529 -7.500 1.59e-12 ***
L((dgdp - rho * L(dgdp)), 1:3)2 -0.1044128 0.0213147 -4.899 1.88e-06 ***
L((dgdp - rho * L(dgdp)), 1:3)3 -0.0274345 0.0212663 -1.290 0.1984
trend(usdata) -0.0004783 0.0002819 -1.697 0.0912 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2636 on 218 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.3163, Adjusted R-squared: 0.3038
F-statistic: 25.22 on 4 and 218 DF, p-value: < 2.2e-16
Okun’s Law: Comparing OLS, OLS with HAC-CovMat and FGLS,
(t-statistics in brackets)
OLS
OLS w HAC
FGLS
L1_gGDP
-0.212***
-0.212***
-0.162***
(-9.499)
(-5.422)
(-7.500)
L2_gGDP
-0.096***
-0.096***
-0.104***
(-4.136)
(-3.996)
(-4.899)
L3_gGDP
-0.027
-0.027
-0.027
(-1.205)
(-0.989)
(-1.290)
trend(usdata)
-0.001**
-0.001*
0.000+
(-2.766)
(-2.286)
(-1.697)
Num.Obs.
224
224
223
R2
0.427
0.427
0.316
Std.Errors
IID
Custom
IID
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Estimation with a lagged endogenous variable (autoregressive term)
armodel<-dynlm(d(UNRATE)~L(dgdp,1:3)+L(d(UNRATE))+trend(usdata), data =usdata)summary(armodel)
Time series regression with "zoo" data:
Start = 1958 Q1, End = 2013 Q4
Call:
dynlm(formula = d(UNRATE) ~ L(dgdp, 1:3) + L(d(UNRATE)) + trend(usdata),
data = usdata)
Residuals:
Min 1Q Median 3Q Max
-0.79307 -0.13517 -0.02653 0.13349 1.07227
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1815164 0.0640424 2.834 0.00502 **
L(dgdp, 1:3)1 -0.1313215 0.0293821 -4.469 1.26e-05 ***
L(dgdp, 1:3)2 -0.0438121 0.0258604 -1.694 0.09166 .
L(dgdp, 1:3)3 0.0041877 0.0228314 0.183 0.85464
L(d(UNRATE)) 0.3578653 0.0877720 4.077 6.39e-05 ***
trend(usdata) -0.0004030 0.0003015 -1.337 0.18269
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2678 on 218 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.468, Adjusted R-squared: 0.4557
F-statistic: 38.35 on 5 and 218 DF, p-value: < 2.2e-16
Plotting residuals
Doesn’t looks that they are positively autocorrelated; but possible heteroskedasticity!
u<-armodel$residualsplot(u, main ="Residuals - Autoregressive Model")abline(0,0)
Simple test for autocorrelation in model with autoregressive term
Seems to be no autocorrelation – but this test is biased (why?)
usdata<-cbind(usdata, u)# merging to usdata, but not necessary in this case summary(dynlm(u~L(u), data =usdata))
Time series regression with "zoo" data:
Start = 1958 Q2, End = 2013 Q4
Call:
dynlm(formula = u ~ L(u), data = usdata)
Residuals:
Min 1Q Median 3Q Max
-0.77741 -0.13650 -0.02195 0.13431 1.06558
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.003797 0.017393 -0.218 0.827
L(u) 0.019182 0.065716 0.292 0.771
Residual standard error: 0.2597 on 221 degrees of freedom
(4 observations deleted due to missingness)
Multiple R-squared: 0.0003854, Adjusted R-squared: -0.004138
F-statistic: 0.0852 on 1 and 221 DF, p-value: 0.7706
Breusch-Godfrey test for autocorrelation
Points to strong autocorrelation of order = 1
BG<-dynlm(u~L(dgdp,1:3)+L(d(UNRATE))+trend(usdata)+L(u) , data =usdata)summary(BG)
Time series regression with "zoo" data:
Start = 1958 Q2, End = 2013 Q4
Call:
dynlm(formula = u ~ L(dgdp, 1:3) + L(d(UNRATE)) + trend(usdata) +
L(u), data = usdata)
Residuals:
Min 1Q Median 3Q Max
-0.70784 -0.13588 -0.00744 0.13031 1.03772
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1635626 0.0955163 1.712 0.0883 .
L(dgdp, 1:3)1 0.0135892 0.0285452 0.476 0.6345
L(dgdp, 1:3)2 -0.1271705 0.0529555 -2.401 0.0172 *
L(dgdp, 1:3)3 -0.0530382 0.0315176 -1.683 0.0939 .
L(d(UNRATE)) -0.5636989 0.2307361 -2.443 0.0154 *
trend(usdata) -0.0002997 0.0003356 -0.893 0.3728
L(u) 0.6102879 0.2362478 2.583 0.0104 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2585 on 216 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.03231, Adjusted R-squared: 0.005429
F-statistic: 1.202 on 6 and 216 DF, p-value: 0.3064
Breusch-Godfrey LM-test for autocorrelation, order = 1, chi^2 variant
# by hand# n * R^2nR<-length(BG$residuals)*summary(BG)$r.squared# p-valuepnR<-1-pchisq(nR, df=1)# p-value# Printing resultprint(paste("Breusch-Godfrey Test, order=1: n * R^2 =", sprintf("%.3f",nR), " p-value =", sprintf("%.4f",pnR)))
# Attention: !!! the R function bgtest() seems to not work properly # without the option fill=NA !!!bgtest(armodel, order =1, fill=NA)
Breusch-Godfrey test for serial correlation of order up to 1
data: armodel
LM test = 7.2514, df = 1, p-value = 0.007085
8.5 General Properties of OLS with Time Series
Under our standard assumptions MLR.1 - MLR.5 OLS, estimates are (see, Wooldridge (2019)
Unbiased
Efficient - Gauss-Markov Theorem
Consistent
Asymptotic normal distributed
With MLR.6, normal distributed in finite samples
However, assumption MLR.2 – Random Sample – is almost never fulfilled with time series! This assumption is therefore replaced by TSMLR.2: Weak Dependence
Weak dependence means that correlations between successive observations become smaller with more time distance between the observations and eventually converge to zero (fast “enough”)
Under MLR.4 – strong exogeneity – E(u_t|\mathbf X)=0, OLS estimators are unbiased
However, contrary to the case of cross section data and random samples, in a time series context, this is an extreme strong assumption. It means that u_t is uncorrelated not only with \mathbf x_t, but also with all past and future values\mathbf x_{t+s}, \mathbf x_{t-s}. This rules out all autoregressive models with y_{t-s} as explanatory variable, as current y_{t}, which is obvously correlatet with u_t, is part of future \mathbf x_{t} in this case. Furthermore, policy feedback are ruled out too
Hence, generally only weak exogeneity is assumed: E(u_t|\mathbf x_t)=0. This, together with weak dependence, is sufficient for consistency and asymptotic normality, but not for unbiasedness
Furthermore, MLR.1 (linearity in parameters) is modified as we additionally assume that all variables follow a (weak) stationary stochastic process
A stochastic process, \{\ldots, y_{t-1}, \mathbf x_{t-1}, u_{t-1},\ \ y_t, \mathbf x_t, u_t, \ \ y_{t+1}, \mathbf x_{t+1}, u_{t+1}, \ldots \}, is a sequence of random variables over time
Strong (or strict) stationarity means, that the joint distribution of the random variables is independent from time, meaning the distribution F(y_t, \mathbf x_t, u_t)=F(y_{t+\tau}, \mathbf x_{t+\tau}, u_{t+\tau}), \, \forall \tau
Weak stationarity or variance stationarity means that only the first two moments of the joint distribution – expectations, variances and (auto)covariances – are independent from time and are finite for all t. For instance, we have: E(y_t, \mathbf x_t, u_t)=E(y_{t+\tau}, \mathbf x_{t+\tau}, u_{t+\tau}), \, \forall \tau, \text{ } E(y_t \cdot y_s)=E(y_{t+\tau} \cdot y_{s+\tau}), \, \forall \tau,s, E(y_{t} \cdot \mathbf x_{s})=E(y_{t+\tau} \cdot \mathbf x_{s+\tau}), \, \forall \tau,s, \ \ldots
This stationarity assumption, and its related assumption of weak dependence, are needed to be able to apply the LLN (Theorem A.2) and the CLT (Theorem A.3) and are therefore important for the asymptotic properties of the estimators. (Remark: These two assumptions are the counterparts of the i.i.d. assumption for cross section data). Unfortunately, these stationarity assumptions are quite often violated with macroeconomic or financial time series
However, as we will see later, there are tests to reject some important forms of non-stationarity and weak dependence
8.5.1 Some important stationary stochastic processes
White Noise Process
The most basic stationary stochastic processes is the White Noise process – WN process (German: Reiner Zufallsprozess, RZ), with the following properties
E(e_t)=0, \ \ \ \operatorname{Var}(e_t)=\sigma_e^2, \ \ \ E(e_t \cdot e_s)=0, \ \forall t \ne s
Moving Average process
A Moving Average process is a weighted sum of current and past values of a white noise process
Thus, we arrived to the important result that for an AR(1) process, the covariance of (u_t, u_{t-s}), is: \, E(u_t, u_{t-s}) = \rho^s \sigma_u^2, \ \ \forall s. This is a geometrically declining function in s, if |\rho|<1
MA representation of AR process
Every stationary AR(p) process has a MA representation, typically of infinite order (the reverse is generally not true, hence MA processes are of a more general type than AR processes!)
By a recursive expansion of Equation 8.9, the AR(1) process, we get
The coefficients of this MA representation constitute the so called Impulse Response of u_t regarding to a temporary unit shock in e_t. Such impulse responses are a very important tool in analyzing the reaction of variables to shocks, especially in a multivariate context \to VARs (Vector Autoregressive Processes)
8.5.2 Non-stationary AR processes – Random Walk, Trends and Unit Roots
Given an AR(1) process
u_t = \rho u_{t-1} + e_t
a necessary and sufficient condition for stationarity is |\rho|<1. This can be proved with the corresponding MA representation Equation 8.10
If \rho=1, we get a very specific and important process, a Random Walk
u_t = u_{t-1} + e_t
\tag{8.11}
The MA representation for a random walk with a staring value u_0 at time t=0 is
which depends on time t. If t \to \infty, \sigma_u^2 \to \infty, so this clearly non-stationary
Random Walks also violate the weak dependence assumption as Equation 8.12 clearly shows. A random walk is therefore a long memory process; shocks long ago, e_{t-s} with s very large, do not diminish as in an AR(1) process with |\rho|<1, but retain their influence on u_t
Further note that the best prediction of the random walk u_t is E(u_t|u_{t-1}, u_{t-2}, \ldots ) = u_{t-1}. Hence, the difference between u_t and u_{t-1}is unpredictable. This property is of fundamental importance in economics and finance
8.5.3 Trends in time series
Deterministic Trend
Very often, time series exhibit time trends or even breaks. In this case, the assumption of stationarity is violated and the resulting complications for econometric analysis, like testing hypothesis, depend on the specific type of the non-stationarity, see Nelson and Plosser (1982)
Time series can exhibit a deterministic trend, a stochastic trend or both
An example for a deterministic trend is
y_t = \beta_0 + \beta_1 x_t + \beta_2 trend + e_t
This poses no problem for estimation of \beta_0, \beta_1, as by the FWL-theorem, this is the same as when y_t and x_t are first regressed on the trend variable and the detrended residuals are then used to estimate \beta_0, \beta_1. If these residuals are stationary, y_t and x_t are called trend stationary
Stochastic Trend
An example for a stochastic trend (and an additional deterministic trend) is
The first term represents the deterministic trend. It depends on \beta_0, which is called a drift in this case; because the constant is added upt times in an autoregressive model with a unit root this leads to a deterministic trend
The last sum represents the stochastic trend, because this is the MA representation of a random walk, compare Equation 8.12
The first sum could also represent a stochastic trend component, if x_t is treated as random
Note, if we specify Equation 8.13 as \Delta y_t = \beta_0 + \beta_1 x_t + e_t (estimation in 1^{st} differences), no trends arise
Unit roots and Lag-Operator
Stochastic trends, like the ones above, are called unit roots. Why this labeling?
In theoretical time series analysis, the so called lag operator L, (or backshift operator B) is commonly used as an extraordinary helpful tool
The lag operator L is a function applied to time series and shifts the whole times series one period back to the past. In general, L can be manipulated like an ordinary algebraic symbol. We have
The left hand term of the right variant is an autoregressive lag-polynomial. And the root of this polynomial is (1-L)=0 \ \Rightarrow \ L = 1, a unit root
The lag operator makes it very convenient to handle AR processes of higher orders (and of MA processes of course). For instance, consider the following AR(3) model
with a(L) the corresponding AR-polynomial. This is an extremely compact notation for an AR process
If e_t is not WN, but follow itself an MA process, e_t=b(L)\epsilon_t with \epsilon_t WN, we have
a(L)y_t = b(L)\epsilon_t
\tag{8.15}
which is called an ARMA(p,q) process, with p equal to the order of the AR part and q equal to the order of the MA part
Factorization of the lag-polynomial
If it is possible to factor the lag-polynomial a(L) into (1-L) \, \tilde a(L), with \tilde a(L) representing a stationary lag-polynomial, then a(L)contains an unit root and the process is non-stationary and not weakly dependent
One can show that if all roots of a(L) are outside the complex unit circle, a factorization (1-L) \, \tilde a(L) is not possible and the AR process is stationary; a necessary condition for this is that the sum of the coefficients ofa(L) is strict positive and less than 2, i.e., 0<a(1)<2 (if the first term of a(L) is 1)
Furthermore, in this case (stationarity) the lag-polynomial a(L) is invertible and the AR processEquation 8.14has a MA representation with finite second moments, i.e., is stationary
y_t = a(L)^{-1}e_t
Remark: The inverse of a lag-polynomiala(L) is defined by:
which can be proven by multiplying out \, a(L)^{-1}a(L)
If the AR polynomial is invertible, even an ARMA process Equation 8.15 has a MA representation
y_t = a(L)^{-1}b(L)\epsilon_t
On the other hand, if b(L) is invertible, the MA (or ARMA) process has an AR representation. But this has nothing to do with stationarity as every MA process of finite order is stationary. Hence, it is possible that a stationary MA process has no AR representation \rightarrow MA processes are more general
8.5.4 Some simulated stochastic processes
Code
# Simulating several stochastic processeslibrary(matrixStats)# for matrix cumSumset.seed(12345)# for reproducibility n=100# number of Observations# MA(2) process; with theta1 = 0.6, theta2 = 0.4e=matrix(rnorm(300), ncol =3)e=ts(e, 1, frequency =1)# convert to a ts object to be able to use the lag functionu=e+0.6*lag(e,-1)+0.4*lag(e,-2)
Code
# Stable AR(1) processar=NULL# to initialize ar for(iin1:3){ar_t=arima.sim(list(order(1,0,0), ar=c(0.8)), n=n)ar=cbind(ar_t, ar)}# Trend stationary AR(1) process; we simply add a deterministic trend to last AR(1) from abovear_t=ar_t+cumsum(rep(0.5, 100))
Code
# Random walk; cumulative sum of WN processe=matrix(rnorm(n*10), ncol =10)rw=colCumsums(e)# Random walk with drift; cumulative sum of WN process with d=0.5ed=e+0.5rwd=colCumsums(ed)# Random walk with stronger drift; cumulative sum of WN process with d=0.5ed1=e+1.5rwd1=colCumsums(ed1)
Plotting of MA(2) processes
Code
matplot(u, type ="l", lty =1, lwd =1.5, main="MA(2) process; with theta1 = 0.6, theta2 = 0.4"); abline(0,0); acf(u[,2], main="ACF");
Plotting of AR(1) processes
Code
matplot(ar, type ="l", lty =1, lwd =1.5, main ="AR(1), with rho = 0.8"); abline(0,0); acf(ar[,1], main="ACF");
Plotting Random Walks
Code
matplot(rw, type ="l", lty =1, lwd =1.5, main ="Random Walk"); abline(0,0); acf(rw[,1], main="ACF")
Plotting AR(1) with deterministic trend
Code
matplot(ar_t, type ="l", lty =1, lwd =1.5, main ="AR(1) + determ. trend, with rho = 0.8"); abline(0,0.5); acf(ar_t, main="ACF")
Plotting Random Walks with drift
Code
matplot(rwd, type ="l", lty =1, lwd =1.5, main="Random Walk with drift"); abline(0,.5); acf(rwd[,1], main="ACF")
Plotting Random Walks with strong drift
Code
matplot(rwd1, type ="l", lty =1, lwd =1.5, main="Random Walk with strong drift"); abline(0,1.5); acf(rwd1[,1], main="ACF");
8.6 Consequences of unit roots for regressions
In this section, we will present some important theory
Confirm the theory with simulation experiments
Introduce the concepts of Integration and Cointegration
8.6.1 Theory
Non-stationarity in general, and unit roots in particular have serious consequences for regression analyses with time series, see Pagan and Wickens (1989)
Regression equations containing variables with unit roots can lead to
Spurious regression
Non-standard distributions of estimated coefficients with severe consequences for tests
Super-consistency of estimated coefficients
In the following, we lay out the statistical reasons for these complications. To keep the analysis simple we consider a model with only one explanatory variable. Furthermore, y_i and x_i should be demeaned (so we need no intercept)
y_i = \beta \, x_i + u_i
\tag{8.16}
For this case the usual OLS estimator for \beta is
Case 1: all variables and the error termuare well behaved
Assuming the usual case that the variables are well behaved (our former shortcut for stationarity and weak dependency) we can show by applying the LLN and CLT (Theorem A.2, Theorem A.3) that
\hat \beta is consistent and
\hat \beta is asymptotically normally distributed
In particular, regarding consistency of \hat {\beta}, by the LLN and according Equation 8.17, we have
The situation is different if x_i follows a random walk but u_i is still stationary and weakly dependent. In this case the conditions for the LLN and CLT are not satisfied any more and one can show that (with NSDRV for a non-standard distributed random variable)
We have no convergence to \beta even if we divide and multiply by n^3 as the first inverse moment will go to +\infty and the second to zero, so that \hat \beta \to \beta + \infty \cdot 0, which could be anything. Hence, \hat \beta does not converge to the true \beta but to a non-standard distributed random variable even with a very large sample \RightarrowSpurious Regressions Problem, see C. W. Granger and Newbold (1974)
8.6.2 Simulations
To underpin our theoretical results we simulate the convergence of moments M_{xx}^{-1}M_{xu}=(\hat \beta - \beta), depending on the ts-properties
Code
rm(.Random.seed, envir=globalenv())betahat_minus_beta=list()x_random_walk=0# x random walk or white noiserho=0.5# AR parameter for u; AR(1) with rho=0.5 or random walk# for 20 and 200 observations, 20000 replicationsfor(ninc(20, 200)){Mxx=NULLMxu=NULLfor(iin1:20000){e1=rnorm(n)e2=rnorm(n)ifelse(x_random_walk==1, x<-cumsum(e1), x<-e1)u<-rep(0, n)for(tin2:n){u[t]=rho*u[t-1]+e2[t]*2}mxx<-sum(x*x)/(n)mxu<-sum(x*u)/(n)Mxx[i]=mxxMxu[i]=mxu}if(n==20){n1=20; betahat_minus_beta[[1]]=Mxu/Mxx}if(n==200){n2=200; betahat_minus_beta[[2]]=Mxu/Mxx}}
Simulation 1: both x and u stationary
Normal rate of convergence
With n=200, already normally distributed
Code
hist(betahat_minus_beta[[1]], breaks=50, freq =FALSE, main =paste0("Mxu/Mxx with n = ", n1), xlim=c(-2,2))curve(dnorm(x, mean =0 ,sd=sd(betahat_minus_beta[[1]])),add =TRUE, col ="red")hist(betahat_minus_beta[[2]], breaks=50, freq =FALSE, main =paste0("Mxu/Mxx with n = ", n2), xlim=c(-2,2))curve(dnorm(x, mean =0 ,sd=sd(betahat_minus_beta[[2]])),add =TRUE, col ="red")
Clear deviations from normal distribution with both n=20 and n=200
Code
hist(betahat_minus_beta[[1]], breaks=50, freq =FALSE, main =paste0("Mxu/Mxx with n = ", n1), xlim=c(-2,2))curve(dnorm(x, mean =0 ,sd=sd(betahat_minus_beta[[1]])),add =TRUE, col ="red")hist(betahat_minus_beta[[2]], breaks=50, freq =FALSE, main =paste0("Mxu/Mxx with n = ", n2), xlim=c(-2,2))curve(dnorm(x, mean =0 ,sd=sd(betahat_minus_beta[[2]])),add =TRUE, col ="red")
No convergence to the true value at all, in fact the distribution is independent of n ==> spurious regression
Clear deviations from normal distribution with both n=20 and n=200
Code
hist(betahat_minus_beta[[1]], breaks=50, freq =FALSE, main =paste0("Mxu/Mxx with n = ", n1), xlim=c(-8,8))curve(dnorm(x, mean =0 ,sd=sd(betahat_minus_beta[[1]])),add =TRUE, col ="red")hist(betahat_minus_beta[[2]], breaks=50, freq =FALSE, main =paste0("Mxu/Mxx with n = ", n2), xlim=c(-8,8))curve(dnorm(x, mean =0 ,sd=sd(betahat_minus_beta[[2]])),add =TRUE, col ="red")
8.6.3 Order of Integration and Cointegration
We have seen that unit roots in the variables can have dramatic consequences for the asymptotic behavior of the OLS estimators. Especially, we learned that the features of the error process are of great relevance. To be more clear, we have to make some important definitions
A variable is integrated of order one, I(1), if it is non-stationary but becomes stationary after applying the difference operator \Delta = (1-L) once. In this case, the variable contains one unit root in its autoregressive lag polynomial: a(L)=\Delta \tilde a(L), with \tilde a(L) stationary (invertible)
For an I(2) process – integrated of order two – we have to apply the difference operator \Delta twice to become stationary. Further, an I(0) process is already stationary
If we have two I(1) variables y and x and there exists a linear combination (y - \alpha x) which is I(0) – i.e., is stationary – than the variables y and x are said to be cointegrated
The concept of cointegration is of enormous importance in modern macroeconomic theory and macro-econometrics because it has a clear economic interpretation as a long run equilibrium relationship
If two variables are I(1), they exhibit a stochastic trend (random walk components) and one would expect that they are drifting further away from each other in the course of time
If this is not the case, if they stay side by side, then there must be some economic forces at work – adjustments toward a long run equilibrium – which accomplish that. In this case, there exists a cointegration relationship(y - \alpha x) which is stationary, I(0), and so they are cointegrated
8.6.4 Several important special cases
Let as reconsider or model Equation 9.14, \text{ } y_i = x_i \beta + u_i
If bothx and y are I(1), but are cointegrated, \beta corresponds to \alpha and OLS will estimate the cointegration relationship by minimizing the sum of squared residuals; Equation 9.14 is then called cointegration equation (but note, a causal interpretations still requires x_i to be exogenous). Furthermore, the error termu in Equation 9.14 is stationary and the results of Equation 8.19 and Equation 8.20 apply. Thus we have
OLS consistently estimates \beta, with a convergence rate n, i.e., OLS is super consistent
The distribution of \hat \beta is generally non-standard and therefore the usual tests do not apply. One have to relay on simulated distributions
If bothx and y are I(1), but are notcointegrated, then they are drifting away from each other in the course of time which implies a non-stationary error termu. In this case result of Equation 8.21 applies and
regressing y on x typically leads to a spurious regression. The result is unpredictable as \hat \beta is a non-standard distributed random variable whose variance does not shrink with larger n
However, there could be a valid “short run” relationship in first differences, \Delta y_i = \Delta x_i \beta + v_i. As the first differences are stationary, this relationship can be estimated by OLS in the usual manner
If y is I(0), stationary, and x is I(1), then \hat \beta will converge in a super consistent manner to zero, i.e., results Equation 8.19 and Equation 8.20 apply. This feature is utilized by the so called Unit Root Tests, see Section 8.7 below
Some further cases and qualifications on our present analysis
Assume u is stationary
If x is strict exogenous, the whole analysis could be conditioned on x and in this case the time series properties of x play no role. We can think of x as fixed numbers in repeated samples and all our asymptotic standard results apply. However, strict exogeneity of the right hand side variables is rare in a time series context
If x exhibits a unit root (stochastic trend) with drift, the standard results apply likewise. The reason is that in this case the deterministic trend dominates the stochastic one asymptotically. However, this is only the case, if there is no other right hand side variable with a deterministic trend or stochastic trend with drift. The reason is that in this case, according to the FWL theorem, all other variables behave as they were detrended
Furthermore, all standard results apply, if the left and right hand side variables are either stationary or cointegrated. In this case we always can find reparameterizations so that every coefficient can be assigned to a stationary variable. This is especially true for VARs with more than one lag, as a variable and its lagged values are always cointegrated. Hence, there is generally no need to estimated VARs in differences (as many people believe)
The spurious regression problem remains even if x is strict exogenous – the error term u is still non-stationary in this case. If the variables are trend-stationary, i.e., only exhibit a deterministic trend, the simple remedy avoiding spurious regressions is detrending
8.7 Unit Root Test and Error Correction Form
8.7.1 Dickey-Fuller (DF) Test
We have seen that a possible non-stationarity of times series variables in regression models determines the statistical properties of the estimators. Hence, we clearly need a test for stationary or non-stationary
Consider a pure random walk
y_t = y_{t-1} + e_t \ \ \Rightarrow \ \ \Delta y_t = e_t
\tag{8.22}
To test for non-stationarity we add the variable y_{t-1} to the right hand side of the right variant of Equation 8.22. If the the process is actually a random walk, results Equation 8.19 and Equation 8.20 apply and the coefficient of y_{t-1} would converge with rate n to the true value zero. This is the Dickey-Fuller (DF) Test:
Thus, we are testing whether \delta = 0 by means of the usual t-statistic. However, H_0: \delta = 0, therefore we presuppose non-stationarity. Hence, under H_0, \delta is non-standard distributed, Equation 8.20 apply, and its t-statistic follows the so called Dickey-Fuller distribution, which has been tabulated by them with Monte-Carlo-simulations. Typically, critical values of the Dickey-Fuller distribution are notably larger then the usual ones; one-sided 5%: -2.86 without time trend and -3.41 with a time trend included in the test equation
To allow for a proper alternative, if H_0 is not true and y is actually stationary, we usually add a constant \beta_0 and even a time trend (\gamma \cdot t) to the test equation Equation 8.22, especially if y exhibit a trend
8.7.2 Augmented Dickey-Fuller Unit Root (ADF) Test
If y does not follow a pure random walk but an AR process with unit root, additional terms, which take care oft this more complicated process, have to be added. As we presuppose an AR process with a unit root, we augment the test Equation 8.23 by autoregressive terms \Delta y_{t-s}, leading to the so called Augmented Dickey-Fuller (ADF) Test. Once again, we test \delta = 0 by means of the usual t-statistic
Because of its simplicity and quite good statistical properties the ADF test is the most popular unit root test (there have been developed many other unit root tests in the literature, but many of them are variants of the ADF test)
The ADF test could even be used as a test for cointegration, which is then called the Engle-Granger procedure, see Engle and Granger (1987). To do this
We first have to check whether both (or more) variables are actually I(1), using the ADF test
If both are I(1), and an I(2) property is ruled out, we estimate the hypothetical cointegration regressionEquation 9.14
After that, we look if the residuals of the cointegration regression Equation 9.14 are indeed stationary. We can employ the ADF test to accomplish this
However, because residuals of a regression always look somewhat stationary – OLS will minimize the residual variance – the critical values for this test are different than those for the ADF test applied to non-residuals time series; these critical values are even larger than the usual ADF ones
This procedure can easily be extended to more explanatory variables (with even larger critical values). But in this case, usually more advanced multivariate methods, like the Johansen procedure, Johansen (1991), are used to estimate and test for multiple cointegration relationships
8.7.3 Error Correction Form
A cointegration relationship describes the long run connection between variables
However, there is a possibility to estimate short as well as long run relationships in one equation. This is called error correction form
The term in brackets is the error correction term. In the long run, this equilibrium relationship must hold and therefore be stationary. This relationship could be estimated by Equation 8.25 “freely” or in a two step procedure:
First, estimate a cointegration regression in levels without any lags as in Equation 9.14
Test for cointegration, i.e., whether the residuals of the cointegration regression are stationary
If they are stationary take the lagged residuals of the cointegration regression, representing (y_{t-1} - \hat\alpha x_{t-1}), as an additional variable in Equation 8.25; then \delta measures the adjustment speed toward the long run equilibrium
The autoregressive and distributed-lag terms in differences in Equation 8.25 describe the short run interactions between the variables. The lag length k is often chosen so that the error term e_t is not autocorrelated
8.8 Examples of regressions with non-stationary time series
8.8.1 Spurious Regression
Code
# Needed Librarieslibrary(matrixStats)# for matrix colCumsumslibrary(tseries)# for Augmented Dickey-Fuller Unit Rootlibrary(egcm)# Engle-Granger cointegration testslibrary(zoo)# time series infrastructure set.seed(1111)# for reproducibility; will always generate the same random numbers
Code
# Simulating two independent random walks# First, generate two white noise processes n=250# number of Observationse=matrix(rnorm(n*2), ncol =2)# random matrix with n rows and two columns e=zooreg(e, 1, n, frequency =1)# convert a ts-zoo object to be able to use the lag function# generate two random walks; cumulative sums of columns the WN processes matrix; compare u<-colCumsums(e); # u consists of two columns with the two random walkscolnames(u)<-c("y", "x")# giving namesu<-zooreg(u, 1, n, frequency =1)# convert a ts-zoo object
Plot the two independently generated random walks
Code
matplot(u, type ="l", lty =1, lwd =1.5, main ="Two independent random walks")abline(0,0)
Regressing the two independently generated random walks
As this is a regression of I(1) variables without lags this is a so called Engle and Granger (1987) cointegration regression, like Equation 9.14
mod<-dynlm(u$y~u$x)coeftest(mod)# printing the coefficients with std-errors and test statistics
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.030243 0.815726 7.3925 2.208e-12 ***
u$x -2.578410 0.078267 -32.9438 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We get a highly significant relationship, but we know that this is totally spurious. How can this happen?
Test of autocorrelation in residuals: The DW test points to highly autocorrelated residuals, a first hint for a non-stationarity of the residuals
Durbin-Watson test
data: mod
DW = 0.13832, p-value < 2.2e-16
alternative hypothesis: true autocorrelation is greater than 0
Scatterplot of the two independently generated random walks
A spurious connection between the two independent random walks is reflected even in the scatterplot
Code
plot(u$x, u$y, main ="Scatterplot of the two independent random wakls")abline(mod)# add regression line of the estimated model to the plot
Residual Plot
Plot of the residuals reveal a very high degree of autocorrelation ==> possibly non-stationary
Code
plot(mod$residuals, main ="Residuals extremly autocorrelated")abline(0,0)
Dickey-Fuller test
DF test equation, Equation 8.23: \Delta(residuals) on lagged residuals ==> nearly a random walk. But what critical value should we use for the t-statistic of \delta ?
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.065411 0.163973 0.3989 0.6903
L(mod$residuals) -0.071646 0.023257 -3.0806 0.0023 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Augmented Dickey-Fuller test
ADF test, test Equation 8.24, a formal test for non-stationarity – H0 is non-stationarity
But caution, Dickey-Fuller critical values a are not valid for residuals, as OLS residuals always look somewhat stationary due to the minimization of the residual variance. But nonetheless H0, non-stationarity, cannot be rejected
The used lag length k for autoregressive correction terms in Equation 8.24 is 6 in this case. There are other procedures for ADF tests in R, which choose an “optimal” lag length according to some information criteria
Y[i] = -2.5784 X[i] + 6.0302 + R[i], R[i] = 0.9542 R[i-1] + eps[i], eps ~ N(0, 2.5887^2)
(0.0783) (1.5299) (0.0232)
R[250] = 6.7085 (t = 0.950)
WARNING: X and Y do not appear to be cointegrated.
Unit Root Tests of Residuals
Statistic p-value
Augmented Dickey Fuller (ADF) -2.963 0.10724
Phillips-Perron (PP) -18.214 0.09059
Pantula, Gonzales-Farias and Fuller (PGFF) 0.932 0.08524
Elliott, Rothenberg and Stock DF-GLS (ERSD) -1.605 0.30010
Johansen's Trace Test (JOT) -16.280 0.17832
Schmidt and Phillips Rho (SPR) -17.815 0.22337
Variances
SD(diff(X)) = 0.957716
SD(diff(Y)) = 1.058080
SD(diff(residuals)) = 2.631344
SD(residuals) = 7.063248
SD(innovations) = 2.588656
Half life = 14.786804
R[last] = 6.708500 (t=0.95)
8.8.2 Cointegration
Code
# Simulating two cointegrated time seriesset.seed(1234)# for reproducibility; will always generate the same random numbers # First, generate two white noise processes v1=rnorm(250); v2=rnorm(250)*2# Generating one MA(1) process w1 and one AR(2) process w2; first generate two vectors with zerosw1=rep(0, 250); w2=rep(0, 250); # Loop to generate the MA(1) and AR(2) process for(tin3:250){w1[t]=0.4*v1[t]+0.4*v1[t-1]# MA(1)w2[t]=0.6*w2[t-1]+0.3*w2[t-2]+v2[t]# AR(2)}# Second, generate time series with unit root: cumsum of MA(1) process; more general than random walk rw=cumsum(w1)# actually, this is an ARMA(1,1) with unit root ( or an ARIMA(0,1,1) )# x and y exhibit a common stochastic trend hence, they are cointegratedx=rwy=rw+w2# transform to ts-zoo objects to make available the time series functionsx=zooreg(x,1,250,frequency =1)# transform to a ts-zoo objecty=zooreg(y,1,250,frequency =1)# transform to a ts-zoo object
Plot of the two cointegrated time series
Although both are I(1), they stick together
matplot(cbind(y,x), type ="l", lty =1, lwd =1.5, main ="Two cointegrated time series")
Testing whether both variables are non-stationary
Before doing a cointegration test, we have to check, whether the variables are actually I(1)
adf.test(y)# testing whether I(1); I(1) is not rejected
Augmented Dickey-Fuller Test
data: y
Dickey-Fuller = -2.1743, Lag order = 6, p-value = 0.5024
alternative hypothesis: stationary
adf.test(diff(y))# ruling out I(2); I(2) is rejected
Augmented Dickey-Fuller Test
data: diff(y)
Dickey-Fuller = -7.1506, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
The same for variable x
adf.test(x)# testing whether I(1); I(1) is not rejected
Augmented Dickey-Fuller Test
data: x
Dickey-Fuller = -2.5146, Lag order = 6, p-value = 0.3592
alternative hypothesis: stationary
adf.test(diff(x))# ruling out I(2); I(2) is rejected
Augmented Dickey-Fuller Test
data: diff(x)
Dickey-Fuller = -4.6994, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
Hence, both y and x are probably I(1). Therefore, we can look, whether they are cointegrated
Cointegration regression
Regressing the two I(1) processes y on x, without any lags ==> Engle and Granger (1987) cointegration regression, like Equation 9.14
Time series regression with "zooreg" data:
Start = 1, End = 250
Call:
dynlm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-8.3602 -1.9828 -0.0212 2.1864 7.0777
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.005508 0.404782 0.014 0.989
x 1.029490 0.036939 27.870 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.235 on 248 degrees of freedom
Multiple R-squared: 0.758, Adjusted R-squared: 0.757
F-statistic: 776.7 on 1 and 248 DF, p-value: < 2.2e-16
The true alpha of the simulated cointegration regression is 1 and the Engle-Granger estimate for alpha is very close: 1.029, although with only 250 observations ==> super consistency
Plotting residuals of cointegration regression
Residuals of cointegration regression; they look stationary although clearly autocorrelated
plot(cointreg$residuals, main="Residuals of cointegration regression")abline(0,0)
Unit root test for residuals
DF test of residuals
DF test equation, Equation 8.23, H0 is non-stationarity:
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.021665 0.134920 0.1606 0.8726
L(cointreg$residuals) -0.238571 0.042206 -5.6525 4.35e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\Delta(residuals) on lagged residuals ==> seems to be stationary
Note the t-value of L(cointreg$residuals) = -5.65. Seems to support cointegration. But see comments below about critical values
ADF test of residuals
Equation 8.24, test for non-stationarity, H0 is non-stationarity. Seems to support cointegration
But caution, Dickey-Fuller critical values a are not valid for residuals, as OLS residuals always look somewhat stationary due to the minimization of the residual variance
The used lag length k for autoregressive correction terms is 6 in this case There are other procedures for ADF tests in R, which choose an “optimal” lag length according to some information criteria
Augmented Dickey-Fuller Test
data: cointreg$residuals
Dickey-Fuller = -3.7125, Lag order = 6, p-value = 0.02389
alternative hypothesis: stationary
Testing for cointegration with the package “egcm”
Engle-Granger cointegration test. Basically an ADF test for the residuals of the cointegration regression
But the egcm() procedure uses correct critical values for residuals
Furthermore, besides an ADF test, a whole battery of other tests are carried out
Most of them can reject the H0 of non-stationary of the residuals at the usual significance level of 5%
specially look at the Elliott, Rothenberg and Stock (ERSD) test, which is a variant of the ADF test with local detrending of the data before testing and said to be the most powerful unit root test
Time series regression with "zooreg" data:
Start = 3, End = 250
Call:
dynlm(formula = d(y) ~ L(d(y)) + L(d(x)) + L(y) + L(x))
Residuals:
Min 1Q Median 3Q Max
-6.8906 -1.4647 0.0212 1.4193 5.7937
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.03871 0.27236 0.142 0.887098
L(d(y)) -0.22562 0.06494 -3.474 0.000606 ***
L(d(x)) 1.23692 0.23909 5.173 4.81e-07 ***
L(y) -0.19221 0.04558 -4.217 3.50e-05 ***
L(x) 0.19889 0.05253 3.786 0.000193 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.139 on 243 degrees of freedom
Multiple R-squared: 0.2162, Adjusted R-squared: 0.2033
F-statistic: 16.75 on 4 and 243 DF, p-value: 3.839e-12
# Implied estimate of alpha:alpha<--coef(ercfree)[5]/coef(ercfree)[4]
The true alpha of the simulated cointegration regression is 1 and the estimate for alpha is very close: 1.035, ==> super consistency
Furthermore, it is also very similar to the Engle-Granger estimate for alpha from above, which was 1.029
Breusch, Trevor S, and Leslie G Godfrey. 1981. “A Review of Recent Work on Testing for Autocorrelation in Dynamic Simultaneous Models.”Macroeconomic Analysis, Essays in Macroeconomics and Economics 1: 63–100.
Engle, Robert F, and Clive WJ Granger. 1987. “Co-Integration and Error Correction: Representation, Estimation, and Testing.”Econometrica: Journal of the Econometric Society, 251–76.
Granger, C. W. J. 1969. “Investigating Causal Relations by Econometric Models and Cross-Spectral Methods.”Econometrica 37 (3): 424. https://doi.org/10.2307/1912791.
Granger, Clive WJ, and Paul Newbold. 1974. “Spurious Regressions in Econometrics.”Journal of Econometrics 2 (2): 111–20.
Green, William H. 2017. Eocnometric Analysis. Eighth ed. Pearson.
Johansen, Soren. 1991. “Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models.”Econometrica 59 (6): 1551. https://doi.org/10.2307/2938278.
Nelson, Charles R., and Charles R. Plosser. 1982. “Trends and Random Walks in Macroeconmic Time Series.”Journal of Monetary Economics 10 (2): 139–62. https://doi.org/10.1016/0304-3932(82)90012-5.
Pagan, A. R., and M. R. Wickens. 1989. “A Survey of Some Recent Econometric Methods.”The Economic Journal 99 (398): 962. https://doi.org/10.2307/2234084.
Wooldridge, Jeffrey M. 2019. Introductory Econometrics: A Modern Approach. Seventh ed. Boston: Cengage.
---title-block-banner: true# title: "Time Series"subtitle: "Mainly based on @wooldridge_Intro_Econometrics, Chapters 10, 11 and 12"---# Time Series\pagebreak## General aspects of time series data analysis- Time series arise in many fields of economics, especially in *macroeconomics* and *financial economics*- The main feature of time series is *temporal ordering of observations*, i.e., we have a natural ordering and thus, data may not be arbitrarily reordered- Typical features and problems: **serial correlation**, **non-independence** of observations and **non-stationarity**, which pose problems for test-statistics, asymptotic theory and the phenomenon of *spurious regressions*- The outcome of economic variables (e.g. GNP, Dow Jones) is uncertain; they should therefore be modeled as random variables- Thus, time series are seen as a **sequences of random variables**, i,e., **stochastic processes**- Randomness does not come from sampling from a population - A "sample" *is the one realized path of the time series out of the many possible paths the stochastic process could have taken*- *Time series analysis* focuses on modeling the *dependency* of a variable on its *own past*, and on the *present and past values of other variables*------------------------------------------------------------------------### Several popular times series models, example: Phillips Curve- *Static model* - In *static* time series models, the current value of one variable is modeled as the result of the current values of explanatory variables\ ($\pi_t \ldots$ inflation rate at time *t*, $\; u_t \ldots$ unemployment rate at time *t*, $\; e_t \ldots$ error term at time *t*) - $$\pi_t = \beta_0 + \beta_1 u_t + e_t$$- *Finite distributed lag* model - $$\pi_t = \beta_0 + \beta_1 u_t + \beta_2 u_{t-1} + e_t$$ - *Impact effect*, resp. *transitory shock* of unemployment on inflation: $\beta_1$, $\beta_2$ - *Long run effect*, resp. effect of a *permanent shock* of unemployment on inflation (set all time indexes to $t$ and sum coefficients): $\beta_1+\beta_2$- *Autoregressive model* -- model with the *lagged endogenous variable* $\pi_{t-1}$ - $$\pi_t = \beta_0 + \beta_1 u_t + \beta_2 u_{t-1} + \beta_3 \pi_{t-1} + e_t$$ - *Long run effect*, resp. effect of a *permanent shock* of unemployment on inflation: $\dfrac {\beta_1+\beta_2}{1-\beta_3}$------------------------------------------------------------------------### Implementation of time series data in R- Dealing with time series in R is a little bit more tricky than dealing with cross section data. The reason is that the program must know the begin and end dates (time) of the time series and whether there are some "holes" in the data for treating times series specific operations like lags or seasonal effects- Thus, times series need some additional infrastructure to to this job - The built in infrastructure in R for times series is the class "**ts**". Basically, ts-objects are numeric data vectors or matrices, augmented with an additional attribute, "tsp", which store the starting time, end time and frequency of the data; 1 for yearly, 4 for quarterly and 12 for monthly data. The *disadvantage* of this class is that it can only deal with "regular" data, meaning *equispaced* data points with no "holes" in the data, and does not support weekly, daily or high frequency data - Alternative infrastructures are, among others, the "**zoo**" class and the "**xts**" class. Both classes augment the data vectors or matrices with an *additional* **index variable**, which maps the corresponding time or date to each data point. Both classes supports non-regular (irregular spaced) data as well as weekly, daily and high frequency data. Furthermore, *merging* data with different ranges are possible as well as *aggregating* data to another frequency. The *disadvantage* of these classes are that they are a bit more complicated to set up - you generally have to deal with Date classes, especially for the xts class. However, often an appropriate index variable is included in the data set- You want to have functions which support the special features of time series. For instance, the workhorse function lm() does *not* support basic time series operations like *lags* or *diffs*. You could use dynlm() from the package dynlm instead, which supports time trends, seasonal dummies and d and L operators------------------------------------------------------------------------### Example: Analysis of Wage and Price Inflation Interactions with R#### Defining data as time series```{r, message=FALSE}#| label: tbl-inflationdata#| tbl-cap: "Price and Wage Data"#| tbl-cap-location: bottom#| column: pagerm(list=ls()) # clear alllibrary(wooldridge); library(AER); library(zoo); library(dynlm); library(gt) # for beautiful tablesdata("wageprc")class(wageprc)gt(wageprc[1:10,])```Now, we convert them to time series with the "ts" command```{r}#| comment: " "# ts representationdata <- ( ts(wageprc[, c(4,5)], start=c(1980,3), frequency=12)) # start 1980 M3class(data)data[1:8,]```The same with the "zoo" package```{r}#| comment: " "# zoo representation l <-length(wageprc$lprice) # we need the length of the data # As there is no particular index variable in the data, we have to define an index variable m# We begin at 1980 M3. Thus we add to 1980: 2/12 : (2+l-1)/12# (For more info about zoo and dates: <http://www.princeton.edu/~otorres/TimeSeriesR101.pdf>)m <-as.yearmon( 1980+seq( from =2, to =2+(l-1) ) /12 ) data <-zoo( wageprc[, c(4,5)], order.by = m )class(data)data[1:8,]tail(data)```------------------------------------------------------------------------#### Plotting time series```{r}#| fig-width: 6#| fig-align: center#| fig-asp: 0.75# With zoo data, you can use the $ operator. # With ts data, you would need matrix-indexing: data[, "lprice"]plot( diff( data$lprice ) )``````{r}#| fig-width: 6#| fig-align: center#| fig-asp: 0.75# Plotting the whole data matrixplot( diff(data) )```------------------------------------------------------------------------\pagebreak#### ***Granger Causality Test***: $\Delta lprice \Rightarrow \Delta lwage$We are regressing $\Delta lwage_t$ on several **lags** of $\Delta lwage_t$ and several **lags** of $\Delta lprice_{t}$ (and likewise some other lags of additional control variables) and look whether the lags of $\Delta lprice_{t}$ have some additional predictive power for $\Delta lwage_t$, beyond (*conditioned on*) the lags of $\Delta lwage_t$ (and the other lagged control variables). If this is the case, we say that $\Delta lprice$ is *Granger-Causal* for $\Delta lwage$, or $\Delta lprice \Rightarrow \Delta lwage$```{r}#| comment: " "summary( wage <-dynlm( d(lwage) ~L(d(lprice), 1:4) +L(d(lwage), 1:4), data = data) )```------------------------------------------------------------------------##### Conditioned on lagged wage inflation, does lagged price inflation affect current wage inflation?See @granger1969```{r}#| comment: " "lht(wage, matchCoefs(wage,"lprice"))```**Result**: Price inflation does *Granger cause* wage inflation (has *predictive power* for wage inflation)\Problems: Instantaneous causality, missing additional control variables and expectation effects - *post hoc, ergo propter hoc fallacy*------------------------------------------------------------------------\pagebreak#### ***Granger Causality Test***: $\Delta lwage \Rightarrow \Delta lprice$We are regressing $\Delta lprice_t$ on the very same variables as above and look whether the lags of $\Delta lwage_{t}$ have some additional predictive power for $\Delta lprice_t$, beyond (*conditioned on*) the lags of $\Delta lprice_t$ (and the other lagged control variables). If this is the case, we say that $\Delta lprice$ is *Granger-Causal* for $\Delta lwage$, or $\Delta lprice \Rightarrow \Delta lwage$```{r}#| comment: " "summary( price <-dynlm( d(lprice) ~L(d(lprice), 1:4) +L(d(lwage), 1:4), data = data) )```------------------------------------------------------------------------##### Conditioned on lagged price inflation, does lagged wage inflation affect current price inflation?(See @granger1969)```{r}#| comment: " "lht(price, matchCoefs(price,"lwage"))```**Result**: Wage inflation does *not Granger cause* price inflation (has *no predictive power* for price inflation)\Problems: Instantaneous causality, missing additional control variables and expectation effects - *post hoc, ergo propter hoc fallacy*------------------------------------------------------------------------\pagebreak## Auto-correlated Erros {#sec-autocorrelatederros}- One of the most common problems in regressions with time series is **autocorrelation of the errors**. This *violates* assumption MLR.5: (compare the chapter "Multiple Regression", slide 48)$$\operatorname{Var}(\mathbf u \, | \, \mathbf X) = E (\mathbf{u u}' \, | \, \mathbf{X}) = \sigma^2 \mathbf I$$ {#eq-MLR.5}- This has basically **three consequences** for the OLS estimates 1. OLS estimates are *not consistent* any more if, additionally to autocorrelated errors, the explanatory variables are **not** *strict exogenous*, $E(u_t|\mathbf X=0)$ - This is especially the case for *lagged endogenous variables* $y_{t-s}$ as right hand side variables! 2. The usual *formula for the covariance matrix* of $\hat {\boldsymbol \beta}$ is *false*, and therefore all *test statistics* - A remedy for this is a **H**eteroskedasticity and **A**utocorrelation **C**onsistent covariance matrix estimate, **HAC** covariance matrix, leading to **HAC standard errors** for $\hat {\boldsymbol \beta}$ and ***asymptotically*** valid tests - These are computed in a similar, but much more complicated manner (based on the residuals $\hat u_t$ we have to estimate the autocorrelations of the errors over time) as we did for the **H**eteroskedasticity **C**onsistent covariance matrix, **HC** (see the chapter "Heteroskedasticity"). As a formal derivation of this is quite tedious, we simply point to the literature, like @green2017 3. OLS looses its effiency------------------------------------------------------------------------### HAC Covariance Matrix- Regarding the **second** problem, we try to estimate the true covariance matrix of $\hat {\boldsymbol \beta}$. Basically, the residuals of the regression are used to estimate $(\mathbf X' \boldsymbol \Omega \mathbf X)$, with $\hat \sigma^2 \boldsymbol \Omega$ being the covariance matrix of the error term $u_t$, leading to an estimate of the covariance matrix for $\hat {\boldsymbol \beta}$$$\widehat {\operatorname{Var}(\hat {\boldsymbol \beta} \, | \, \mathbf X)} \ = \ (\mathbf{X} \mathbf{X})^{-1} \, {(\widehat{\hat \sigma^2\mathbf X' \boldsymbol\Omega \mathbf X})} \, (\mathbf{X} \mathbf{X})^{-1}$$ {#eq-HAC-CovM}- Using this HAC estimate of the covariance matrix for $\hat {\boldsymbol \beta}$, which is often called **Newey-West** estimator, the usual tests can be carried out- The big advantages of using HAC standard errors is that - it is easily implemented (usually by an option in the software package) - is working for heteroskedasticity as well as autocorrelation in the errors - we do not have to assume a particular form of heteroskedasticity or autocorrelation- Major disadvantages - Tests are only *asymptotically valid* (meaning we need a lot of observations) - OLS estimates of $\boldsymbol \beta$ are still not *efficient* any more------------------------------------------------------------------------### Loss of Efficiency with Auto-Correlated Erros $\Rightarrow$ *FGLS*- Regarding the **third** problem, we already know that autocorrelated errors lead to a **loss of efficiency**; OLS estimates are not **BLUE** any more! - A possible remedy for this is using *Feasible Generalized Least Square* (FGLS), as we did in the case of heteroskedastic errors (through weighting the observations to offset the varying variance $\to$ WLS) - For doing this, one needs a hypothesis about the *particular form* of the autocorrelation. Usually, one assumes an *autoregressive process* of order one or two for the error term. (Later, we will discuss such processes more deeply.) Suppose $e_t$ is *not* autocorrelated, and $u_t$ follows an *autoregressive process* of order one$$u_t = \rho u_{t-1} + e_t, \quad |\rho|<1$$ {#eq-ar1}- Applying **FGLS**, in a **first step**, we have to *estimate* $\rho$ with he help of a regression of the OLS residuals on the residuals of the previous period (according to the alleged error model above)- Having an estimate for $\rho$, in a **second step**, we *transform* the original model, $y_t = \mathbf x_t \boldsymbol \beta + u_t$, $u_t$ autocorrelated, in the following way$$ y_t - \hat \rho y_{t-1} = \boldsymbol \beta (\mathbf x_t - \hat \rho \mathbf x_{t-1}) + \underbrace {u_t - \hat \rho u_{t-1}}_{e_t} \ \ \ \Rightarrow \ \ \ y_t^* = \boldsymbol \beta \mathbf x_t^* + e_t$$ {#eq-Koyeck}- Note, the error term $e_t$ of this model, with the *transformed variables* $y_t^*$ and $\mathbf x_t^*$, is **not autocorrelated**. Hence, the parameters $\boldsymbol \beta$ can be *estimated efficiently with this transformed model*------------------------------------------------------------------------- This model transformation (taking *quasi differences* of the variables) is called *Koyeck Transformation* and leads to a FGLS estimator, which is called *Cochrane-Orcutt* estimator - Remark: One can use the estimated parameters of this *transformed* model and plug these back into the *original* model to compute new residuals. With these new residuals we can get an updated estimate for $\rho$ and transform the variables once more, but this time with this new $\hat \rho$. This algorithm can be repeated until $\hat \rho$ doesn't change any more. However, it is not guaranteed that the iterations just described actually lead to better (more efficient) estimates thus, often no iterations are performed- Applying the Cochrane-Orcutt procedure, at least one (the first) observation is lost (because of the used lags in the transformation). The *Prais-Winsten* procedure avoids this loss of efficiency by weighting the first observation with $\sqrt {1 - \hat \rho^2}$- Both methods only work if the explanatory variables $\mathbf x_t$ are *strict exogenous* as otherwise the estimate $\hat \rho$ from the *first step* is even asymptotically biased (but there is a remedy for this, which is discussed in the section about "testing for autocorrelation")------------------------------------------------------------------------There are several *other methods* for obtaining FGLS estimates. By means of these procedures, $\rho$ and $\boldsymbol \beta$ are estimated *simultaneously*, either by *Maximum Likelihood* estimation or with restricted *non-linear least square*- For both approaches, one has to rely on numerical methods. As with modern computers there are no real costs in terms of computation time, these numerical methods are the ones typically used in today's software packages- The non-linear approach is based on the transformed model, but rearranged, and estimates$$ y_t = \ \rho y_{t-1} + \boldsymbol \beta \mathbf x_t - \rho \boldsymbol \beta \mathbf x_{t-1} + e_t$$- Note, the coefficient vector of $\mathbf x_{t-1}$ is non-linear in the parameters $\boldsymbol \beta$ and $\rho$ and furthermore, non-linear restrictions are imposed on them - This restrictions can be tested against a general autoregressive/distributed lag model with ***no*** *autocorrelated errors* -- common root test - This approach only requires *weak* exogeneity of the explanatory variables and not *strict* exogeneity to yield consistent estimates------------------------------------------------------------------------### Significance of Auto-Correlated Residuals {#sec-caution}Generally, *autocorrelation in the residuals indicates that the model does not exploit all the information contained in the data*- Autocorrelation in the **residuals** of an *estimated regression model* can have several *causes* 1) A **general misspecification of the model**, e.g., omitted variables or non-linearities not taken into account 2) A *misspecified* ***dynamic*** *structure* of the model, i.e., omitted lags of the explanatory variables or missing autoregressive terms like $y_{t-1}$. In this case, the model is not **dynamically complete**, i.e., the error term is correlated with lags of the explanatory variables or missing autoregressive terms 3) **Structural breaks** or other **non-stationarities** 4) **Autocorrelated errors**> Hence, *before applying the remedies for autocorrelated errors as discussed above, one has to check whether the first three causes for autocorrelated residuals can be ruled out* -- *if not,* ***these*** *have to be tackled first!*> As autocorrelated residuals might be a *strong signal* for all these problems mentioned above, it is important to have a *test procedure* for detecting autocorrelation------------------------------------------------------------------------## Testing for Autocorrelation### The Durbin-Watson TestThe simplest method to check for autocorrelated residuals is to regress the residuals of the original model on the lagged residuals and test whether $\rho = 0$ with a simple t-test:$$\hat u_t = \rho \hat u_{t-1} + e_t$$ {#eq-simpleartest}- This *asymptotically* valid test is particularly powerful against an autoregressive process of order one in the error term. But to some extent, it can detect more general forms of autocorrelations too, as long as these lead to a correlation between $\hat u_t$ and $\hat u_{t-1}$ -- for instance even structural breaksAnother asymptotically equivalent variant of this test is the very popular **Durbin-Watson** test with the test statistic$$ DW = \frac{ \sum_{i=2}^{n}\left(\hat{u}_{i}-\hat{u}_{i-1}\right)^{2}}{\sum_{i=1}^{n} \hat{u}_{i}^{2}} \approx 2-2 \hat{\rho}, \ \ \in [0,4]$$ {#eq-DWtest}- The numerator is small (*DW*\<2) if consecutive residuals stick together (have the same sign), which means that they are *positively autocorrelated*\- The numerator is large (*DW*\>2) if the residuals change the sign very often, which means that they are *negatively autocorrelated*- Durbin and Watson have tabulated the distribution of *DW* even *for small samples* which depends on *n* and *k* and further exhibits a range of indeterminacy, which lowers its practicability------------------------------------------------------------------------### The Breusch-Godfrey testHowever, *both of the test procedures above are only working if all explanatory variables of the original model are strict exogenous*. Suppose not and the original model is$$y_t = \mathbf x_t \boldsymbol \beta + \textcolor{red}{\gamma y_{t-1}} + u_t \quad \Rightarrow \quad y_t = \mathbf x_t \boldsymbol \beta + \gamma y_{t-1} + \underbrace {(\rho u_{t-1} + e_t)}_{u_t}$$ {#eq-model_with_ar_term}with the lagged endogenous variable $y_{t-1}$ and an autocorrelated error $u_t=\rho u_{t-1}+e_t$- In this case, because $y_{t-1}$ is strongly correlated with $u_{t-1}$ per definition, the parameter estimate $\hat \gamma$ will capture most of the autocorrelation in $u_t$. This implies that the residuals of this regression $\hat u_t$ will look like much more as a *not* autocorrelated process. This is due to the first problem of autocorrelated errors mentioned above in @sec-autocorrelatederros- So, if we run a regression $\hat u_t = \rho \hat u_{t-1} + e_t$ to test for autocorrelation, the estimate $\hat \rho$ will therefore be heavily biased toward zero rendering the whole test procedure pointless- However, the argument above also points to a remedy for this problem; why not *directly* estimating in some way the right version of the model aboveSo, we try the following specification for the test equation (compare to @eq-simpleartest):$$\hat u_t = \textcolor{red} {\mathbf x_t \boldsymbol \delta_1 + \delta_2 y_{t-1}} + \rho \hat u_{t-1} + e_t$$ {#eq-BGtest}- Although not obvious, one can show that this procedure works, even if we have *weak exogenous* variables like $y_{t-1}$ and therefore "biased" residuals from the original model, see @breusch1981review. The reason is that $\rho$ is now estimated conditional on $y_{t-1}$ and $\mathbf x_t$ (FWL-Theorem) and thereby correcting for the "biased" residuals. Note, if $\rho=0$, $\boldsymbol \delta_1$ and $\delta_2$ *have to be zero as well*! (orthogonality property of OLS)- The test is whether $\rho = 0$, which can be examined with an usual t-test, or with a F-Test if we test for higher order autocorrelations with $\rho_1 \hat u_{t-1} + \rho_2 \hat u_{t-2} + \ldots \,$ as autoregressive terms in the test equation. In the latter case we test whether $\rho_1, \ \rho_2, \ldots = 0$- More popular and powerful is a *Lagrange Multiplier* version of the test which is called **Breusch-Godfrey** test. In this case, the test statistic is $n \cdot R^2$, with the $R^2$ from the above test equation; (basically, this tests whether *all* coefficients of the test equation are zero and might therefore be more powerful). The test statistic is $\chi^2$ distributed with degrees of freedom equal to the number of autoregressive terms in the test equation> The *Breusch-Godfrey* test is today's standard test for autocorrelation. It can be used with *weak exogenous variables*, is flexible as it can test for *higher order autocorrelations* and could easily be made robust against heteroskedasticity using HC standard errors------------------------------------------------------------------------## Example: Estimating a dynamic version of Okun's Law#### Reading in data from Excel table and converting them to time series```{r}#| comment: " "#| warning: truelibrary(zoo)library(AER)library(dynlm) # For reading in Excel tableslibrary(readxl) # Reading in Excel table with data from 1957 Q1 - 2013 Q4USMacro <-read_xlsx("us_macro_quarterly.xlsx")# The first column of the data is a string variable of dates with format 1957:01, # but had no name. It automatically gets the name: ...1 # Define date variable from the first column with the zoo function as.yearqtr() date <-as.yearqtr(USMacro$...1, format ="%Y:0%q")# Convert to zoo data, but: important, remove the first column as it is a non-numeric variable # Remember, time series objects are matrices and can contain only numbersusdata <-zoo(USMacro[, c("GDPC96", "UNRATE") ], order.by=date) class(usdata)```#### Defining and plotting the data```{r}#| fig-width: 6#| fig-asp: 0.75#| fig-align: center# We want to estimate Okun's Law, the relationship between the change in unemployment # rate dependent on the GDP growth ratedgdp =diff( log(usdata$GDPC96) )*100# define quarterly growth rates of gdp usdata <-cbind(usdata, dgdp) # merging to usdata, not necessary in this caseplot( cbind( usdata$dgdp, usdata$UNRATE) )```------------------------------------------------------------------------#### Estimation```{r}#| comment: " "#| label: "okun"# Estimating Okun's Law with dynlm(). ( without an autoregressive term L(d(UNRATE)) ) model0 <-dynlm( d(UNRATE) ~L(dgdp,1:3) +trend(usdata), data = usdata )summary(model0)```------------------------------------------------------------------------#### Plotting residuals and autocorrelation function for residuals```{r}#| layout-ncol: 2#| fig-asp: 0.75# Plotting residuals, looks they are somewhat positively autocorrelated plot(model0$residuals, main ="Residuals - Base Model")abline(0,0)acf(model0$residuals)```------------------------------------------------------------------------#### Testing for autocorrelation in residuals```{r}#| comment: " "# Durbin-Watson test points to autocorrelation dwtest(model0)``````{r}#| comment: " "# Simple test for autocorrelation, regressing uhat on lagged uhat, # clear sign for autocorrelationu0 = model0$residualscoefs <-coeftest( dynlm( u0 ~L(u0) ) )coefs# storing rhorho <- coefs[2,1]round(rho, digits=3)``````{r}#| comment: " "# Breusch-Godfrey test (option fill = NA seems to be necessary)# clear sign for autocorrelationbgtest(model0, fill =NA)```------------------------------------------------------------------------### First remedy: Testing results with HAC standard errors**But caution, you should also consider misspecifications, as these lead to autocorrelated errors**, see @sec-caution```{r}#| comment: " "# First remedy for autocorrelated errors: HAC standard errors, # correcting biased standard errors, t-statistics and p-values# But caution, you should also consider misspecifications, as these leads to autocorrelated errors too coeftest( model0, vcov. = vcovHAC )```------------------------------------------------------------------------#### Comparing results```{r}#| code-fold: TRUE#| label: tbl-HAC#| tbl-cap: "Okun's Law: Comparing default and HAC se, t-statistics and p-values"library(modelsummary)modelsummary( list("default statistics "=model0, " HAC statistics"=model0), fmt =3,stars =TRUE, statistic =c("std.error", "[ {statistic} ]", "[ {p.value} ]"),vcov =c("classical", vcovHAC),gof_omit ="A|L|B|F|R",coef_omit =1,coef_rename =c('L(dgdp, 1:3)1'='L1_gGDP', 'L(dgdp, 1:3)2'='L2_gGDP','L(dgdp, 1:3)3'='L3_gGDP'),align ="cdd",output ="flextable")```------------------------------------------------------------------------### Second remedy: Estimation by FGLS with *rho*=`r round(rho, digits=3)`(see @eq-Koyeck)```{r}#| comment: " "# Estimate Okun's Law by FGLS. But this depends on a consistent estimate of rho, # which we have stored above model0GLS <-dynlm( ( d(UNRATE) - rho*L(d(UNRATE)) ) ~L( ( dgdp - rho*L(dgdp) ) ,1:3) +trend(usdata), data = usdata )summary(model0GLS)```------------------------------------------------------------------------#### Comparing results```{r}#| code-fold: true#| label: tbl-HAC_FGLS#| tbl-cap: "Okun's Law: Comparing OLS, OLS with HAC-CovMat and FGLS, (t-statistics in brackets)"library(modelsummary)modelsummary( list("OLS "= model0, " OLS w HAC "= model0, " FGLS"= model0GLS), fmt =3,stars =TRUE, statistic =c("statistic"),vcov =c("classical", vcovHAC, "classical"),gof_omit ="A|L|B|F|RM",coef_omit =1,coef_rename =c('L(dgdp, 1:3)1'='L1_gGDP', 'L(dgdp, 1:3)2'='L2_gGDP','L(dgdp, 1:3)3'='L3_gGDP','L((dgdp - rho * L(dgdp)), 1:3)1'='L1_gGDP','L((dgdp - rho * L(dgdp)), 1:3)2'='L2_gGDP','L((dgdp - rho * L(dgdp)), 1:3)3'='L3_gGDP' ),align ="lddd",output ="flextable")```------------------------------------------------------------------------#### Estimation with a lagged endogenous variable (autoregressive term)```{r}#| comment: " "#| fig-width: 6#| fig-asp: 0.75#| fig-align: centerarmodel <-dynlm( d(UNRATE) ~L(dgdp,1:3) +L( d(UNRATE) ) +trend(usdata), data = usdata )summary(armodel)```------------------------------------------------------------------------#### Plotting residualsDoesn't looks that they are positively autocorrelated; but possible heteroskedasticity!```{r}#| fig-width: 6#| fig-align: center#| fig-asp: 0.75u <- armodel$residualsplot(u, main ="Residuals - Autoregressive Model")abline(0,0)```------------------------------------------------------------------------#### Simple test for autocorrelation in model with autoregressive termSeems to be no autocorrelation -- but this test is biased (why?)```{r}#| comment: " "usdata <-cbind(usdata, u) # merging to usdata, but not necessary in this case summary( dynlm( u ~L(u), data = usdata ) )```------------------------------------------------------------------------#### Breusch-Godfrey test for autocorrelationPoints to strong autocorrelation of order = 1```{r}#| comment: " "BG <-dynlm( u ~L(dgdp,1:3) +L( d(UNRATE) ) +trend(usdata) +L(u) , data = usdata )summary(BG) ```------------------------------------------------------------------------#### Breusch-Godfrey LM-test for autocorrelation, order = 1, chi\^2 variant```{r}#| comment: " "# by hand# n * R^2nR <-length(BG$residuals) *summary(BG)$r.squared# p-valuepnR <-1-pchisq(nR, df=1) # p-value# Printing resultprint( paste( "Breusch-Godfrey Test, order=1: n * R^2 =", sprintf( "%.3f",nR ), " p-value =", sprintf( "%.4f",pnR ) ))```#### Built in function bgtest()```{r}#| comment: " "# Attention: !!! the R function bgtest() seems to not work properly # without the option fill=NA !!!bgtest(armodel, order =1, fill=NA)```------------------------------------------------------------------------\pagebreak## General Properties of OLS with Time Series- Under our *standard* assumptions MLR.1 - MLR.5 OLS, estimates are (see, @wooldridge_Intro_Econometrics - Unbiased\ - Efficient - Gauss-Markov Theorem\ - Consistent\ - Asymptotic normal distributed\ - With MLR.6, normal distributed in finite samples- However, assumption MLR.2 -- *Random Sample* -- is almost never fulfilled with time series! This assumption is therefore replaced by TSMLR.2: **Weak Dependence** - *Weak dependence* means that correlations between *successive* observations become smaller with *more time distance* between the observations and eventually converge to zero (fast "enough")- Under MLR.4 -- **strong exogeneity** -- $E(u_t|\mathbf X)=0$, OLS estimators are unbiased - However, contrary to the case of cross section data and random samples, in a time series context, this is an *extreme strong assumption*. It means that $u_t$ is uncorrelated not only with $\mathbf x_t$, but also with *all past and future values* $\mathbf x_{t+s}, \mathbf x_{t-s}$. This rules out all *autoregressive model*s with $y_{t-s}$ as *explanatory* variable, as current $y_{t}$, which is obvously correlatet with $u_t$, is part of future $\mathbf x_{t}$ in this case. Furthermore, policy feedback are ruled out too- > Hence, generally only **weak exogeneity** is assumed: $E(u_t|\mathbf x_t)=0$. This, together with weak dependence, is *sufficient* for *consistency* and *asymptotic normality*, but not for unbiasedness------------------------------------------------------------------------- Furthermore, MLR.1 (linearity in parameters) is modified as we additionally assume that all variables follow a (**weak**) **stationary stochastic process**- A **stochastic process**, $\{\ldots, y_{t-1}, \mathbf x_{t-1}, u_{t-1},\ \ y_t, \mathbf x_t, u_t, \ \ y_{t+1}, \mathbf x_{t+1}, u_{t+1}, \ldots \}$, is a *sequence* of random variables over time - *Strong (or strict) stationarity* means, that the *joint distribution* of the random variables is independent from time, meaning the distribution $F(y_t, \mathbf x_t, u_t)=F(y_{t+\tau}, \mathbf x_{t+\tau}, u_{t+\tau}), \, \forall \tau$ - *Weak stationarity* or *variance stationarity* means that only the *first two moments* of the joint distribution -- *expectations*, *variances* and *(auto)covariances* -- are independent from time and are *finite* for all $t$. For instance, we have: $E(y_t, \mathbf x_t, u_t)=E(y_{t+\tau}, \mathbf x_{t+\tau}, u_{t+\tau}), \, \forall \tau$, $\text{ } E(y_t \cdot y_s)=E(y_{t+\tau} \cdot y_{s+\tau}), \, \forall \tau,s$, $E(y_{t} \cdot \mathbf x_{s})=E(y_{t+\tau} \cdot \mathbf x_{s+\tau}), \, \forall \tau,s, \ \ldots$- This **stationarity** assumption, and its related assumption of **weak dependence**, are needed to be able to apply the LLN (@thm-LLN) and the CLT (@thm-CLT) and are therefore important for the asymptotic properties of the estimators. (*Remark*: These two assumptions are the *counterparts* of the i.i.d. assumption for cross section data). Unfortunately, these stationarity assumptions are quite often violated with macroeconomic or financial time series- However, as we will see later, there are tests to reject some important forms of non-stationarity and weak dependence------------------------------------------------------------------------### Some important stationary stochastic processes#### **White Noise Process**The most basic stationary stochastic processes is the **White Noise process** -- **WN process** (German: Reiner Zufallsprozess, RZ), with the following properties$$E(e_t)=0, \ \ \ \operatorname{Var}(e_t)=\sigma_e^2, \ \ \ E(e_t \cdot e_s)=0, \ \forall t \ne s $$------------------------------------------------------------------------#### **Moving Average process**A **Moving Average** process is a weighted sum of current and past values of a white noise process$$\operatorname{MA}(q):= \ u_t = e_t + \theta_1 e_{t-1} + \cdots + \theta_q e_{t-q} $$For a MA(1) process we get$$ E(u_t) = E(e_t) + \theta E(e_{t-1})=0$$$$\operatorname{Var}(u_t) = \operatorname{Var}(e_t) + \theta^2\operatorname{Var}(e_t) = \sigma_e^2 (1+\theta^2)$$$$E(u_t\cdot u_{t-1})=E\left( (e_t+\theta e_{t-1})(e_{t-1}+\theta e_{t-2})\right)=\theta\sigma_e^2$$$$E(u_t\cdot u_{t-2})=E\left( (e_t+\theta e_{t-1})(e_{t-2}+\theta e_{t-3})\right)=0 $$because $e_t$ is a WN-process with $E(e_t\cdot e_{t-s})=0, s \ne 0$> Thus, we arrived to the *important result* that for a MA(q) process, the *covariance* of $(u_t, u_{t-s})$ is *zero for* $s > q$------------------------------------------------------------------------#### **Autoregressive Process**An **Autoregressive** process of *order* $p$ is defined as$$\operatorname{AR}(p):= \ u_t = \rho_1 u_{t-1} + \cdots + \rho_p u_{t-p} + e_t $$with $e_t$ being a WN process. For on AR(1) process$$u_t = \rho u_{t-1} + e_t$$ {#eq-ar_1} we get (we assume stationarity)$$E(u_t)=\rho E(u_{t-1})+E(e_t) \ \ \Rightarrow \ \ E(u_t)(1-\rho)=E(e_t) \ \ \Rightarrow $$$$E(u_t)=\dfrac {1}{1-\rho}E(e_t)=0$$and$$\operatorname{Var}(u_t) = \rho^2 \operatorname{Var}(u_{t-1}) + \sigma_e^2 \ \ \Rightarrow \ \ \operatorname{Var}(u_t) := \sigma_u^2 =\dfrac {1}{1-\rho^2}\sigma_e^2 \$$Finally, we calculate the auto-covariance of an AR(1) process$$E(u_t\cdot u_{t-1})=E ( \, \underbrace {(\rho u_{t-1} + e_t)}_{u_t} \, u_{t-1} ) = \rho E(u_{t-1}^2) + \underbrace {E(e_t u_{t-1})}_{0} = \rho \sigma_u^2$$$$E(u_t\cdot u_{t-2})=E\left( (\rho u_{t-1}+e_t) \, u_{t-2} \right) = \rho \underbrace {E(u_{t-1}u_{t-2})}_{\rho \sigma_u^2} + \underbrace {E(e_t u_{t-2})}_{0} = \rho ( \rho \sigma_u^2) = \rho^2 \sigma_u^2 $$> Thus, we arrived to the *important result that for an AR(1) process, the covariance* of $(u_t, u_{t-s})$, is: $\, E(u_t, u_{t-s}) = \rho^s \sigma_u^2, \ \ \forall s.$ This is a geometrically declining function in $s$, if $|\rho|<1$------------------------------------------------------------------------#### **MA representation of AR process**- Every *stationary* AR(p) process has a MA representation, typically of *infinite* order (the reverse is generally not true, hence MA processes are of a more general type than AR processes!)- By a *recursive expansion* of @eq-ar_1, the AR(1) process, we get$$u_t = \rho u_{t-1} + e_t = \rho \underbrace {(\rho u_{t-2} + e_{t-1})}_{u_{t-1}} + e_t = e_t + \rho e_{t-1} + \rho^2 u_{t-2} \quad \text{ and}$$$$u_t = e_t + \rho e_{t-1} + \rho^2 \underbrace {(\rho u_{t-3} + e_{t-2})}_{u_{t-2}} = e_t + \rho e_{t-1} + \rho^2 e_{t-2} + \rho^3 u_{t-3}$$- And finally, as $\lim \limits_{s \to \infty} \rho^{s} = 0$, we arrive to a MA representation of infinite order$$u_t = e_t + \rho e_{t-1} + \rho^2 e_{t-2} + \rho^3 e_{t-3} + \cdots \ = \ \sum\nolimits_{i=0}^{\infty} \rho^i e_{t-i} $$ {#eq-MA_Representation_of_AR}> The coefficients of this MA representation constitute the so called **Impulse Response** of $u_t$ regarding to a temporary unit shock in $e_t$. Such impulse responses are a very important tool in analyzing the reaction of variables to shocks, especially in a *multivariate* context $\to$ VARs (Vector Autoregressive Processes)------------------------------------------------------------------------### Non-stationary AR processes -- Random Walk, Trends and Unit RootsGiven an AR(1) process$$u_t = \rho u_{t-1} + e_t$$a *necessary and sufficient condition for stationarity* is $|\rho|<1$. This can be proved with the corresponding MA representation @eq-MA_Representation_of_AR- If $\rho=1$, we get a very specific and important process, a **Random Walk** $$ u_t = u_{t-1} + e_t $$ {#eq-RandomWalk}- The MA representation for a random walk with a staring value $u_0$ at time $t=0$ is$$ u_t = e_t + e_{t-1} + e_{t-2} + e_{t-3} + \cdots + e_{1} + u_{0} = \sum\nolimits_{i=1}^t e_i + u_0 $$ {#eq-MA_Rep_of_RandomWalk}- And the variance of $u_t$ is$$\sigma_u^2 = t\cdot \sigma_e^2 + \operatorname {Var} (u_{0})$$- which depends on time $t$. If $t \to \infty$, $\sigma_u^2 \to \infty$, so this clearly *non-stationary*- Random Walks also violate the weak dependence assumption as @eq-MA_Rep_of_RandomWalk clearly shows. A random walk is therefore a **long memory process**; shocks long ago, $e_{t-s}$ with $s$ very large, do not diminish as in an AR(1) process with $|\rho|<1$, but retain their influence on $u_t$- Further note that the *best prediction* of the random walk $u_t$ is $E(u_t|u_{t-1}, u_{t-2}, \ldots ) = u_{t-1}$. Hence, the *difference* between $u_t$ and $u_{t-1}$ *is unpredictable*. This property is of fundamental importance in economics and finance------------------------------------------------------------------------### Trends in time series#### **Deterministic Trend**- Very often, time series exhibit *time trends* or even breaks. In this case, the assumption of stationarity is violated and the resulting complications for econometric analysis, like testing hypothesis, depend on the *specific type of the non-stationarity*, see @nelson1982- Time series can exhibit a **deterministic trend**, a **stochastic trend** or *both*- An example for a **deterministic trend** is$$y_t = \beta_0 + \beta_1 x_t + \beta_2 trend + e_t$$- This poses *no problem for estimation* of $\beta_0, \beta_1$, as by the FWL-theorem, this is the same as when $y_t$ and $x_t$ are first regressed on the trend variable and the detrended *residuals* are then used to estimate $\beta_0, \beta_1$. If these residuals are stationary, $y_t$ and $x_t$ are called **trend stationary**------------------------------------------------------------------------#### **Stochastic Trend**An example for a **stochastic trend** (and an additional deterministic trend) is$$y_t = \beta_0 + y_{t-1} + \beta_1 x_t + e_t $$ {#eq-AR_Model_with_Drift}- This is an *autoregressive model* with an **unit root** (to this later). After a $t$ times *recursive expansions* of the equation above we get$$ y_t = \beta_0 \cdot t + \beta_1 \sum_{i=1}^t x_{i} + \sum_{i=1}^t e_{i} + y_0 $$- The first term represents the *deterministic* trend. It depends on $\beta_0$, which is called a **drift** in this case; because the constant is *added up* *t* times in an autoregressive model with a *unit root* this leads to a deterministic trend- The last sum represents the *stochastic* trend, because this is the MA representation of a random walk, compare @eq-MA_Rep_of_RandomWalk- The first sum could also represent a stochastic trend component, if $x_t$ is treated as random- Note, if we specify @eq-AR_Model_with_Drift as $\Delta y_t = \beta_0 + \beta_1 x_t + e_t$ (estimation in $1^{st}$ differences), no trends arise------------------------------------------------------------------------#### **Unit roots and Lag-Operator**Stochastic trends, like the ones above, are called *unit roots*. Why this labeling?- In theoretical time series analysis, the so called *lag operator L*, (or backshift operator *B*) is commonly used as an extraordinary helpful tool - The **lag operator L** is a function *applied* to time series and shifts the whole times series *one period back* to the past. In general, *L* can be *manipulated like an ordinary algebraic symbol*. We have$$L x_t := x_{t-1}, \quad L^2 x_t = x_{t-2}, \quad L^s x_t = x_{t-s} , \quad L^{-1} x_t = x_{t+1} $$- The difference operator $\Delta$, or *D*, is defined as$$\Delta:= (1-L), \ \ \Rightarrow \ \ \Delta x_t \, = \, x_t - L x_t \, = \, x_t - x_{t-1}$$- Thus, our autoregressive model, @eq-AR_Model_with_Drift, can be written as$$y_t = \beta_0 + L y_{t} + \beta_1 x_t + e_t \ \ \Rightarrow \ \ (1-L)y_t = \beta_0 + \beta_1 x_t + e_t$$- The left hand term of the right variant is an autoregressive **lag-polynomial**. And the root of this polynomial is $(1-L)=0 \ \Rightarrow \ L = 1$, a **unit root**- The lag operator makes it very convenient to handle AR processes of higher orders (and of MA processes of course). For instance, consider the following AR(3) model$$ y_t = \rho_1 y_{t-1} + \rho_2 y_{t-2} + \rho_3 y_{t-3} + e_t \ \ \Rightarrow $$\$$ \underbrace {(1-\rho_1L - \rho_2L^2 -\rho_3L^3)}_{a(L)} y_t = e_t \ \ \Rightarrow $$$$ a(L)y_t = e_t $$ {#eq-ar_lag_process}- with $a(L)$ the corresponding AR-polynomial. This is an extremely compact notation for an AR process- If $e_t$ is not WN, but follow itself an MA process, $e_t=b(L)\epsilon_t$ with $\epsilon_t$ WN, we have$$a(L)y_t = b(L)\epsilon_t $$ {#eq-arma}- which is called an **ARMA**(*p*,*q*) process, with *p* equal to the order of the AR part and *q* equal to the order of the MA part------------------------------------------------------------------------#### Factorization of the lag-polynomial- If it is possible to factor the lag-polynomial $a(L)$ into $(1-L) \, \tilde a(L)$, with $\tilde a(L)$ representing a stationary lag-polynomial, then $a(L)$ *contains an unit root* and the process is non-stationary and not weakly dependent- One can show that if all roots of $a(L)$ are outside the complex unit circle, a factorization $(1-L) \, \tilde a(L)$ is not possible and the AR process is *stationary*; a necessary condition for this is that the *sum of the coefficients of* $a(L)$ is *strict positive* and less than 2, i.e., $0<a(1)<2$ (if the first term of $a(L)$ is 1)- Furthermore, in this case (stationarity) the lag-polynomial $a(L)$ is *invertible* and the *AR process* @eq-ar_lag_process *has a MA representation* with *finite* second moments, i.e., is stationary$$ y_t = a(L)^{-1}e_t $$- *Remark*: The **inverse of a lag-polynomial** $a(L)$ is defined by:$$a(L)^{-1}a(L)=1$$- For example:$$(1-\rho L)^{-1} = (1 + \rho L + \rho^2 L^2 + \rho^3 L^3 + \ldots) $$- which can be proven by multiplying out $\, a(L)^{-1}a(L)$- If the AR polynomial is invertible, even an ARMA process @eq-arma has a MA representation$$y_t = a(L)^{-1}b(L)\epsilon_t $$- On the other hand, if $b(L)$ is invertible, the MA (or ARMA) process has an *AR representation*. But this has nothing to do with stationarity as *every* MA process of *finite order* is stationary. Hence, it is possible that a stationary MA process has *no* AR representation $\rightarrow$ MA processes are more general------------------------------------------------------------------------### Some simulated stochastic processes```{r}#| code-fold: true#| comment: " "# Simulating several stochastic processeslibrary(matrixStats) # for matrix cumSumset.seed(12345) # for reproducibility n=100# number of Observations# MA(2) process; with theta1 = 0.6, theta2 = 0.4e =matrix( rnorm(300), ncol =3 )e =ts(e, 1, frequency =1 ) # convert to a ts object to be able to use the lag functionu = e +0.6*lag(e,-1) +0.4*lag(e,-2)``````{r}#| code-fold: true#| comment: " "# Stable AR(1) processar=NULL# to initialize ar for (i in1:3) { ar_t =arima.sim( list( order(1,0,0), ar=c(0.8) ), n=n ) ar =cbind(ar_t, ar)}# Trend stationary AR(1) process; we simply add a deterministic trend to last AR(1) from abovear_t = ar_t +cumsum( rep(0.5, 100) )``````{r}#| code-fold: true#| comment: " "# Random walk; cumulative sum of WN processe =matrix( rnorm(n*10), ncol =10)rw =colCumsums(e)# Random walk with drift; cumulative sum of WN process with d=0.5ed = e +0.5rwd =colCumsums(ed)# Random walk with stronger drift; cumulative sum of WN process with d=0.5ed1 = e +1.5rwd1 =colCumsums(ed1)```------------------------------------------------------------------------#### **Plotting of MA(2) processes**```{r}#| layout-ncol: 2#| fig-width: 6#| fig-asp: 0.75#| code-fold: truematplot(u, type ="l", lty =1, lwd =1.5, main="MA(2) process; with theta1 = 0.6, theta2 = 0.4"); abline(0,0); acf( u[,2], main="ACF"); ```------------------------------------------------------------------------#### **Plotting of AR(1) processes**```{r}#| layout-ncol: 2#| #| fig-width: 6#| fig-asp: 0.75#| code-fold: truematplot(ar, type ="l", lty =1, lwd =1.5, main ="AR(1), with rho = 0.8"); abline(0,0); acf( ar[,1], main="ACF"); ```------------------------------------------------------------------------#### **Plotting Random Walks**```{r}#| layout-ncol: 2#| #| fig-width: 6#| fig-asp: 0.75#| code-fold: truematplot(rw, type ="l", lty =1, lwd =1.5, main ="Random Walk"); abline(0,0); acf( rw[,1], main="ACF")```------------------------------------------------------------------------#### **Plotting AR(1) with deterministic trend**```{r}#| layout-ncol: 2#| #| fig-width: 6#| fig-asp: 0.75#| code-fold: truematplot(ar_t, type ="l", lty =1, lwd =1.5, main ="AR(1) + determ. trend, with rho = 0.8"); abline(0,0.5); acf( ar_t, main="ACF" )```------------------------------------------------------------------------#### **Plotting Random Walks with drift**```{r}#| layout-ncol: 2#| #| fig-width: 6#| fig-asp: 0.75#| code-fold: truematplot(rwd, type ="l", lty =1, lwd =1.5, main="Random Walk with drift"); abline(0,.5); acf( rwd[,1], main="ACF")```------------------------------------------------------------------------#### **Plotting Random Walks with strong drift**```{r}#| layout-ncol: 2#| #| fig-width: 6#| fig-asp: 0.75#| code-fold: truematplot(rwd1, type ="l", lty =1, lwd =1.5, main="Random Walk with strong drift"); abline(0,1.5); acf( rwd1[,1], main="ACF"); ```------------------------------------------------------------------------\pagebreak## Consequences of unit roots for regressions- In this section, we will present some important **theory**- Confirm the theory with **simulation experiments**- Introduce the concepts of **Integration** and **Cointegration**### Theory- **Non-stationarity** in general, and **unit roots** in particular have serious consequences for regression analyses with time series, see @pagan1989- Regression equations containing variables with unit roots can lead to - *Spurious regression* - *Non-standard distributions* of estimated coefficients with severe consequences for tests - *Super-consistency* of estimated coefficients- In the following, we lay out the statistical reasons for these complications. To keep the analysis simple we consider a model with only one explanatory variable. Furthermore, $y_i$ and $x_i$ should be *demeaned* (so we need no intercept) $$ y_i = \beta \, x_i + u_i $$ {#eq-baseReg}- For this case the usual OLS estimator for $\beta$ is$$\hat \beta \, = \, \left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i^2 \right)^{-1} \left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i y_i \right) \, = \, \\ \qquad \quad \quad $$ $$\beta + \underbrace{\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i^2 \right)^{-1}}_{M_{xx}^{-1}} \underbrace{\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i u_i \right)}_{M_{xu}}$$ {#eq-beta}------------------------------------------------------------------------#### **Case 1: all variables and the error term** $u$ **are well behaved**- Assuming the usual case that the variables are *well behaved* (our former shortcut for *stationarity* and *weak dependency*) we can show by applying the LLN and CLT (@thm-LLN, @thm-CLT) that - $\hat \beta$ is **consistent** and - $\hat \beta$ is **asymptotically normally distributed**- In particular, regarding *consistency* of $\hat {\beta}$, by the LLN and according @eq-beta, we have$$\hat {\beta} \ \stackrel{p}{\longrightarrow} \ \beta + {\underbrace {\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i^2 \right)}_{\stackrel{p}{\longrightarrow} \ \sigma_x^2}} ^{-1} \underbrace {\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i u_i \right)}_{\stackrel{p}{\longrightarrow} \ 0} \ = \ \beta$$- Regarding the *asymptotic normal distribution* of $\hat \beta$, by the CLT we have$$\sqrt n \, (\hat { \beta} - \beta) \ \stackrel{d}{\longrightarrow} \ {\underbrace {\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i^2 \right)}_{\stackrel{p}{\longrightarrow} \ \sigma_x^2}} ^{-1} \underbrace {\left( \frac {1}{\sqrt n}\sum\nolimits_{i=1}^n x_i u_i \right)}_{\stackrel{d}{\longrightarrow} \ NDRV} \ = \ NDRV $$ {#eq-betaND} with $NDRV$ meaning "Normal Distributed Random Variable"------------------------------------------------------------------------#### **Case 2:** $x$ **follows a random walk but** $u$ **is stationary**- The situation is different if $x_i$ follows a **random walk** but $u_i$ is still stationary and weakly dependent. In this case the conditions for the LLN and CLT are not satisfied any more and one can show that (with $NSDRV$ for a non-standard distributed random variable)$$ \frac {1}{n} \left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i^2 \right) \ \stackrel{d}{\longrightarrow} \ NSDRV $$$$ \left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i u_i \right) \ \stackrel{d}{\longrightarrow} \ NSDRV$$- Regarding *consistency* of $\hat \beta$, according to @eq-beta we have$$\hat \beta \ \stackrel{p}{\longrightarrow} \ \beta + {\underbrace {\left( \frac {1}{n^2}\sum\nolimits_{i=1}^n x_i^2 \right)}_{\stackrel{d}{\longrightarrow} \ NSDRV}} ^{-1} \underbrace {\left( \frac {1}{n^2}\sum\nolimits_{i=1}^n x_i u_i \right)}_{\stackrel{p}{\longrightarrow} \ \frac {1}{n}NSDRV \to \ 0 } \ = \ \beta $$ {#eq-betaconsisten}- The rate of convergence actually is $n$ instead of the usual rate $\sqrt n$, compare @eq-betaND.\ Hence, $\hat \beta$ is **super consistent**- We further reach to the conclusion that the *distribution* of $\hat \beta$ is **non-standard distributed**$$ n \, (\hat \beta - \beta) \ \stackrel{d}{\longrightarrow} \ {\underbrace {\left( \frac {1}{n^2}\sum\nolimits_{i=1}^n x_i^2 \right)}_{\stackrel{d}{\longrightarrow} \ NSDRV}} ^{-1} \underbrace {\left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i u_i \right)}_{\stackrel{d}{\longrightarrow} \ NSDRV} \ = \ NSDRV$$ {#eq-betanormal}------------------------------------------------------------------------#### **Case 3: both** $x$ **and** $u$ **are random walks**- The situation dramatically change if **both** $x_i$ and $u_i$ follow a **random walk**$$\frac {1}{n} \left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i^2 \right) \ \stackrel{d}{\longrightarrow} \ NSDRV $$$$ \frac {1}{n} \left( \frac {1}{n}\sum\nolimits_{i=1}^n x_i u_i \right) \ \stackrel{d}{\longrightarrow} \ NSDRV $$The first result is the same as above but the second result is remarkable: even dividing by $n^2$ does not lead to a convergence to zero- Applying this result to the formula for $\hat \beta$, @eq-beta, we get the reason for **Spurious Regressions**$$(\hat \beta - \beta) \ \stackrel{d}{\longrightarrow} \ {\underbrace {\left( \frac {1}{n^2}\sum\nolimits_{i=1}^n x_i^2 \right)}_{\stackrel{d}{\longrightarrow} \ NSDRV}} ^{-1} \underbrace {\left( \frac {1}{n^2}\sum\nolimits_{i=1}^n x_i u_i \right)}_{\stackrel{d}{\longrightarrow} \ NSDRV} \ = \ NSDRV $$ {#eq-spurReg}We have no convergence to $\beta$ even if we divide and multiply by $n^3$ as the first inverse moment will go to $+\infty$ and the second to zero, so that $\hat \beta \to \beta + \infty \cdot 0$, which could be anything. Hence, $\hat \beta$ does not converge to the true $\beta$ but to *a non-standard distributed random variable* even with a very large sample $\Rightarrow$ **Spurious Regressions Problem**, see @granger1974spurious------------------------------------------------------------------------### SimulationsTo underpin our theoretical results we simulate the convergence of moments $M_{xx}^{-1}M_{xu}=(\hat \beta - \beta)$, depending on the ts-properties```{r}#| comment: " "#| code-fold: truerm(.Random.seed, envir=globalenv())betahat_minus_beta =list()x_random_walk =0# x random walk or white noiserho =0.5# AR parameter for u; AR(1) with rho=0.5 or random walk# for 20 and 200 observations, 20000 replicationsfor (n inc(20, 200)) { Mxx =NULL Mxu =NULLfor (i in1:20000) { e1 =rnorm(n) e2 =rnorm(n)ifelse( x_random_walk ==1, x <-cumsum(e1), x <- e1) u <-rep(0, n)for (t in2:n) {u[t] = rho * u[t-1] + e2[t] *2} mxx <-sum(x * x) / (n) mxu <-sum(x * u) / (n) Mxx[i] = mxx Mxu[i] = mxu }if (n==20) {n1=20; betahat_minus_beta[[1]] = Mxu/Mxx}if (n==200) {n2=200; betahat_minus_beta[[2]] = Mxu/Mxx}}```------------------------------------------------------------------------#### **Simulation 1: both x and u stationary**- Normal rate of convergence- With n=200, already normally distributed```{r}#| layout-ncol: 2#| fig-asp: 0.75#| code-fold: truehist(betahat_minus_beta[[1]], breaks=50, freq =FALSE, main =paste0("Mxu/Mxx with n = ", n1), xlim=c(-2,2))curve(dnorm(x, mean =0 ,sd=sd(betahat_minus_beta[[1]])),add =TRUE, col ="red") hist(betahat_minus_beta[[2]], breaks=50, freq =FALSE, main =paste0("Mxu/Mxx with n = ", n2), xlim=c(-2,2))curve(dnorm(x, mean =0 ,sd=sd(betahat_minus_beta[[2]])),add =TRUE, col ="red")```------------------------------------------------------------------------#### **Simulation 2: x random walk, u stationary**```{r}#| comment: " "#| code-fold: truerm(.Random.seed, envir=globalenv())betahat_minus_beta =list()x_random_walk =1rho =0.5for (n inc(20, 200)) { Mxx =NULL Mxu =NULLfor (i in1:20000) { e1 =rnorm(n) e2 =rnorm(n)ifelse( x_random_walk ==1, x <-cumsum(e1), x <- e1) u <-rep(0, n)for (t in2:n) {u[t] = rho * u[t-1] + e2[t] *2} mxx <-sum(x * x) / (n ^2) mxu <-sum(x * u) / (n ^2) Mxx[i] = mxx Mxu[i] = mxu }if (n==20) {n1=20; betahat_minus_beta[[1]] = Mxu/Mxx}if (n==200) {n2=200; betahat_minus_beta[[2]] = Mxu/Mxx}}```- Super consistency- Clear deviations from normal distribution with both n=20 and n=200```{r}#| layout-ncol: 2#| fig-asp: 0.75#| code-fold: truehist(betahat_minus_beta[[1]], breaks=50, freq =FALSE, main =paste0("Mxu/Mxx with n = ", n1), xlim=c(-2,2))curve(dnorm(x, mean =0 ,sd=sd(betahat_minus_beta[[1]])),add =TRUE, col ="red") hist(betahat_minus_beta[[2]], breaks=50, freq =FALSE, main =paste0("Mxu/Mxx with n = ", n2), xlim=c(-2,2))curve(dnorm(x, mean =0 ,sd=sd(betahat_minus_beta[[2]])),add =TRUE, col ="red")```------------------------------------------------------------------------#### **Simulation 3: x random walk, u random walk**```{r}#| comment: " "#| code-fold: truerm(.Random.seed, envir=globalenv())betahat_minus_beta =list()x_random_walk =1rho =1for (n inc(20, 200)) { Mxx =NULL Mxu =NULLfor (i in1:20000) { e1 =rnorm(n) e2 =rnorm(n)ifelse( x_random_walk ==1, x <-cumsum(e1), x <- e1) u <-rep(0, n)for (t in2:n) {u[t] = rho * u[t-1] + e2[t] *2} mxx <-sum(x * x) / (n ^2) mxu <-sum(x * u) / (n ^2) Mxx[i] = mxx Mxu[i] = mxu }if (n==20) {n1=20; betahat_minus_beta[[1]] = Mxu/Mxx}if (n==200) {n2=200; betahat_minus_beta[[2]] = Mxu/Mxx}}```- No convergence to the true value at all, in fact the distribution is independent of n ==\> spurious regression- Clear deviations from normal distribution with both n=20 and n=200```{r}#| layout-ncol: 2#| fig-asp: 0.75#| code-fold: truehist(betahat_minus_beta[[1]], breaks=50, freq =FALSE, main =paste0("Mxu/Mxx with n = ", n1), xlim=c(-8,8))curve(dnorm(x, mean =0 ,sd=sd(betahat_minus_beta[[1]])),add =TRUE, col ="red") hist(betahat_minus_beta[[2]], breaks=50, freq =FALSE, main =paste0("Mxu/Mxx with n = ", n2), xlim=c(-8,8))curve(dnorm(x, mean =0 ,sd=sd(betahat_minus_beta[[2]])),add =TRUE, col ="red")```------------------------------------------------------------------------### Order of Integration and Cointegration- We have seen that *unit roots* in the variables can have dramatic consequences for the asymptotic behavior of the OLS estimators. Especially, we learned that the *features of the error process* are of great relevance. To be more clear, we have to make some important *definitions* 1) A variable is **integrated of order one**, I(1), if it is non-stationary but becomes stationary after applying the difference operator $\Delta = (1-L)$ once. In this case, the variable contains *one unit root* in its autoregressive lag polynomial: $a(L)=\Delta \tilde a(L)$, with $\tilde a(L)$ stationary (invertible) 2) For an I(2) process -- *integrated of order two* -- we have to apply the difference operator $\Delta$ twice to become stationary. Further, an I(0) process is already stationary 3) If we have two I(1) variables $y$ and $x$ and there exists a linear combination $(y - \alpha x)$ which is I(0) -- i.e., is stationary -- than the variables $y$ and $x$ are said to be **cointegrated**- The concept of *cointegration* is of enormous importance in modern macroeconomic theory and macro-econometrics because it has a *clear economic interpretation* as a **long run equilibrium relationship** - If two variables are I(1), they exhibit a stochastic trend (random walk components) and one would expect that they are *drifting further away* from each other in the course of time - If this is not the case, if they *stay side by side*, then there must be some economic forces at work -- adjustments toward a *long run* equilibrium -- which accomplish that. In this case, there exists a **cointegration relationship** $(y - \alpha x)$ which is stationary, I(0), and so they are *cointegrated*------------------------------------------------------------------------### Several important special cases- Let as reconsider or model @eq-baseReg, $\text{ } y_i = x_i \beta + u_i$- If *both* $x$ and $y$ are I(1), but are *cointegrated*, $\beta$ corresponds to $\alpha$ and OLS will estimate the cointegration relationship by minimizing the sum of squared residuals; @eq-baseReg is then called **cointegration equation** (but note, a causal interpretations still requires $x_i$ to be exogenous). Furthermore, the **error term** $u$ in @eq-baseReg is **stationary** and the results of @eq-betaconsisten and @eq-betanormal apply. Thus we have - OLS *consistently* estimates $\beta$, with a convergence rate $n$, i.e., OLS is *super consistent* - The distribution of $\hat \beta$ is generally *non-standard* and therefore the usual tests do not apply. One have to relay on simulated distributions- If *both* $x$ and $y$ are I(1), but are **not** *cointegrated*, then they are drifting away from each other in the course of time which implies a **non-stationary error term** $u$. In this case result of @eq-spurReg applies and - regressing $y$ on $x$ typically leads to a *spurious regression*. The result is *unpredictable* as $\hat \beta$ is a non-standard distributed random variable whose variance does not shrink with larger $n$\ - However, there could be a valid "*short run*" relationship in *first differences*, $\Delta y_i = \Delta x_i \beta + v_i$. As the first differences are stationary, this relationship can be estimated by OLS in the usual manner- If $y$ is I(0), stationary, and $x$ is I(1), then $\hat \beta$ will *converge in a super consistent manner to zero*, i.e., results @eq-betaconsisten and @eq-betanormal apply. This feature is utilized by the so called **Unit Root Tests**, see @sec-unitroottest below------------------------------------------------------------------------#### **Some further cases and qualifications on our present analysis**- Assume $u$ is stationary - If $x$ is *strict* exogenous, the whole analysis could be conditioned on $x$ and in this case the time series properties of $x$ play no role. We can think of $x$ as fixed numbers in repeated samples and all our asymptotic standard results apply. However, strict exogeneity of the right hand side variables *is rare* in a time series context - If $x$ exhibits a unit root (stochastic trend) *with drift*, the standard results apply likewise. The reason is that in this case the deterministic trend dominates the stochastic one asymptotically. However, this is only the case, *if there is no other right hand side variable with a deterministic trend or stochastic trend with drift*. The reason is that in this case, according to the FWL theorem, all other variables behave as they were detrended - Furthermore, all standard results apply, if the left and right hand side variables are *either stationary or cointegrated*. In this case we always can find reparameterizations so that every coefficient can be assigned to a stationary variable. This is especially true for VARs with more than one lag, as a variable and its lagged values are always cointegrated. Hence, there is generally no need to estimated VARs in differences (as many people believe)- The spurious regression problem remains even if $x$ is strict exogenous -- the error term $u$ is still non-stationary in this case. If the variables are trend-stationary, i.e., *only* exhibit a deterministic trend, the simple remedy avoiding spurious regressions is detrending------------------------------------------------------------------------\pagebreak## Unit Root Test and Error Correction Form {#sec-unitroottest}### Dickey-Fuller (DF) Test- We have seen that a possible non-stationarity of times series variables in regression models determines the statistical properties of the estimators. Hence, we clearly need a test for stationary or non-stationary- Consider a pure random walk $$ y_t = y_{t-1} + e_t \ \ \Rightarrow \ \ \Delta y_t = e_t $$ {#eq-DF0}- To test for non-stationarity we add the variable $y_{t-1}$ to the right hand side of the right variant of @eq-DF0. If the the **process is actually a random walk**, results @eq-betaconsisten and @eq-betanormal apply and the coefficient of $y_{t-1}$ would converge with rate $n$ to the true value zero. This is the **Dickey-Fuller (DF) Test**:$$\Delta y_t = \delta y_{t-1} + \beta_0 + (\gamma \cdot t) + e_t $$ {#eq-DF1}- Thus, we are testing whether $\delta = 0$ by means of the **usual t-statistic**. However, $H_0$: $\delta = 0$, therefore we presuppose non-stationarity. Hence, under $H_0$, $\delta$ is non-standard distributed, @eq-betanormal apply, and its t-statistic follows the so called Dickey-Fuller distribution, which has been tabulated by them with Monte-Carlo-simulations. Typically, **critical values** of the Dickey-Fuller distribution are *notably larger* then the usual ones; one-sided 5%: **-2.86** without time trend and **-3.41** with a time trend included in the test equation- To allow for a *proper alternative*, if $H_0$ is not true and $y$ is actually stationary, we usually add a constant $\beta_0$ and even a time trend $(\gamma \cdot t)$ to the test equation @eq-DF0, especially if $y$ exhibit a trend------------------------------------------------------------------------### Augmented Dickey-Fuller Unit Root (ADF) Test- If $y$ does not follow a *pure* random walk but an AR process with unit root, additional terms, which take care oft this more complicated process, have to be added. As we presuppose an AR process with a unit root, we augment the test @eq-DF1 by autoregressive terms $\Delta y_{t-s}$, leading to the so called **Augmented Dickey-Fuller (ADF) Test**. Once again, we test $\delta = 0$ by means of the usual t-statistic$$\Delta y_t = \delta y_{t-1} + \beta_0 + \textcolor{red} {\beta_1 \Delta y_{t-1} + \beta_2 \Delta y_{t-2} + \cdots + \beta_k \Delta y_{t-k}} + (\gamma \cdot t) + e_t $$ {#eq-ADF}- Because of its simplicity and quite good statistical properties the ADF test is the most popular unit root test (there have been developed many other unit root tests in the literature, but many of them are variants of the ADF test)- The ADF test could even be used as a *test* for **cointegration**, which is then called the **Engle-Granger** procedure, see @engle1987co. To do this 1) We first have to check whether both (or more) variables are actually I(1), using the ADF test 2) If both are I(1), and an I(2) property is ruled out, we estimate the hypothetical **cointegration regression** @eq-baseReg 3) After that, we look if the residuals of the cointegration regression @eq-baseReg are indeed stationary. We can employ the ADF test to accomplish this 4) However, because residuals of a regression always look somewhat stationary -- OLS will minimize the residual variance -- the critical values for this test are different than those for the ADF test applied to non-residuals time series; these critical values are even larger than the usual ADF ones- This procedure can easily be extended to more explanatory variables (with even larger critical values). But in this case, usually more advanced multivariate methods, like the **Johansen** procedure, @johansen1991, are used to estimate and test for *multiple* cointegration relationships------------------------------------------------------------------------### Error Correction Form- A cointegration relationship describes the long run connection between variables- However, there is a possibility to estimate short as well as long run relationships in one equation. This is called **error correction form**$$\Delta y_t = + \beta_0 + \beta_1 \Delta y_{t-1} + \beta_2 \Delta y_{t-2} + \cdots + \beta_k \Delta y_{t-k} + \\\gamma_1 \Delta x_{t-1} + \gamma_2 \Delta x_{t-2} + \cdots + \gamma_k \Delta x_{t-k} + \\\delta \textcolor{red} { (y_{t-1} - \alpha x_{t-1})} + \\\gamma \cdot t + e_t $$ {#eq-ERC}- The term in brackets is the *error correction term*. In the *long run*, this *equilibrium relationship* must hold and therefore be stationary. This relationship could be estimated by @eq-ERC "freely" or in a two step procedure: 1) First, estimate a cointegration regression in levels *without any lags* as in @eq-baseReg 2) Test for cointegration, i.e., whether the residuals of the cointegration regression are stationary 3) If they are stationary take the *lagged* residuals of the cointegration regression, representing $(y_{t-1} - \hat\alpha x_{t-1})$, as an additional variable in @eq-ERC; then $\delta$ measures the *adjustment speed* toward the long run equilibrium- The autoregressive and distributed-lag terms in differences in @eq-ERC describe the *short run interactions* between the variables. The lag length $k$ is often chosen so that the error term $e_t$ is not autocorrelated------------------------------------------------------------------------\pagebreak## Examples of regressions with non-stationary time series### Spurious Regression```{r, message=FALSE}#| code-fold: true#| comment: " "# Needed Librarieslibrary(matrixStats) # for matrix colCumsumslibrary(tseries) # for Augmented Dickey-Fuller Unit Rootlibrary(egcm) # Engle-Granger cointegration testslibrary(zoo) # time series infrastructure set.seed(1111) # for reproducibility; will always generate the same random numbers ``````{r}#| code-fold: true#| comment: " "# Simulating two independent random walks# First, generate two white noise processes n=250# number of Observationse =matrix( rnorm(n*2), ncol =2 ) # random matrix with n rows and two columns e =zooreg(e, 1, n, frequency =1 ) # convert a ts-zoo object to be able to use the lag function# generate two random walks; cumulative sums of columns the WN processes matrix; compare u <-colCumsums(e); # u consists of two columns with the two random walkscolnames(u) <-c("y", "x") # giving namesu <-zooreg(u, 1, n, frequency =1) # convert a ts-zoo object```#### Plot the two independently generated random walks```{r}#| fig-width: 6#| fig-align: center#| fig-asp: 0.75#| code-fold: truematplot(u, type ="l", lty =1, lwd =1.5, main ="Two independent random walks")abline(0,0)```------------------------------------------------------------------------#### Regressing the two independently generated random walks- As this is a regression of I(1) variables without lags this is a so called @engle1987co cointegration regression, like @eq-baseReg```{r}#| comment: " "#| fig-width: 6#| fig-asp: 0.72#| fig-align: centermod <-dynlm( u$y ~ u$x )coeftest(mod) # printing the coefficients with std-errors and test statistics```- We get a highly significant relationship, but we know that this is totally spurious. How can this happen?- Test of autocorrelation in residuals: The DW test points to highly autocorrelated residuals, a first hint for a *non-stationarity* of the residuals```{r}#| comment: " "dwtest(mod)```------------------------------------------------------------------------#### Scatterplot of the two independently generated random walksA spurious connection between the two independent random walks is reflected even in the scatterplot```{r}#| fig-width: 6#| fig-align: center#| fig-asp: 0.75#| code-fold: trueplot(u$x, u$y, main ="Scatterplot of the two independent random wakls")abline(mod) # add regression line of the estimated model to the plot```------------------------------------------------------------------------#### Residual PlotPlot of the residuals reveal a very high degree of autocorrelation ==\> possibly non-stationary```{r}#| fig-width: 6#| fig-align: center#| fig-asp: 0.75#| code-fold: trueplot( mod$residuals, main ="Residuals extremly autocorrelated" )abline(0,0)```------------------------------------------------------------------------#### Dickey-Fuller testDF test equation, @eq-DF1: $\Delta$(residuals) on lagged residuals ==\> nearly a random walk. But what critical value should we use for the t-statistic of $\delta$ ?```{r}#| comment: " "coeftest( dynlm( d(mod$residuals) ~L(mod$residuals) ) )```------------------------------------------------------------------------#### Augmented Dickey-Fuller test- ADF test, test @eq-ADF, a formal test for non-stationarity -- H~0~ is non-stationarity- But caution, Dickey-Fuller critical values a are not valid for residuals, as OLS residuals always look somewhat stationary due to the minimization of the residual variance. But nonetheless H~0~, non-stationarity, cannot be rejected- The used lag length k for autoregressive correction terms in @eq-ADF is 6 in this case. There are other procedures for ADF tests in R, which choose an "optimal" lag length according to some information criteria```{r}#| comment: " "adf.test(mod$residuals)```------------------------------------------------------------------------#### Engle-Granger cointegration test- Basically an ADF test for the residuals of the cointegration regression- But the egcm() procedure uses correct critical values for residuals- Furthermore, besides an ADF test, a whole battery of other tests are carried out- None of them can reject the H~0~ of non-stationary of the residuals at the usual significance level of 5%==\> no cointegration\==\> spurious regression```{r, warning=FALSE}#| comment: " "summary( erg <-egcm(u$x, u$y) )```------------------------------------------------------------------------### Cointegration```{r}#| code-fold: true#| comment: " "# Simulating two cointegrated time seriesset.seed(1234) # for reproducibility; will always generate the same random numbers # First, generate two white noise processes v1 =rnorm(250); v2 =rnorm(250)*2# Generating one MA(1) process w1 and one AR(2) process w2; first generate two vectors with zerosw1 =rep(0, 250); w2 =rep(0, 250); # Loop to generate the MA(1) and AR(2) process for (t in3:250) { w1[t] =0.4* v1[t] +0.4* v1[t-1] # MA(1) w2[t] =0.6* w2[t-1] +0.3* w2[t-2] + v2[t] # AR(2)}# Second, generate time series with unit root: cumsum of MA(1) process; more general than random walk rw =cumsum(w1) # actually, this is an ARMA(1,1) with unit root ( or an ARIMA(0,1,1) )# x and y exhibit a common stochastic trend hence, they are cointegratedx = rwy = rw + w2# transform to ts-zoo objects to make available the time series functionsx =zooreg(x,1,250,frequency =1) # transform to a ts-zoo objecty =zooreg(y,1,250,frequency =1) # transform to a ts-zoo object ```#### Plot of the two cointegrated time seriesAlthough both are I(1), they stick together```{r}#| fig-width: 6#| fig-align: center#| fig-asp: 0.75matplot(cbind(y,x), type ="l", lty =1, lwd =1.5, main ="Two cointegrated time series")```------------------------------------------------------------------------#### Testing whether both variables are non-stationary- Before doing a cointegration test, we have to check, whether the variables are actually I(1)```{r}#| comment: " "adf.test(y) # testing whether I(1); I(1) is not rejectedadf.test(diff(y)) # ruling out I(2); I(2) is rejected```- The same for variable x```{r}#| comment: " "adf.test(x) # testing whether I(1); I(1) is not rejectedadf.test(diff(x)) # ruling out I(2); I(2) is rejected```- Hence, both y and x are probably I(1). Therefore, we can look, whether they are cointegrated------------------------------------------------------------------------#### Cointegration regression- Regressing the two I(1) processes y on x, without any lags ==\> @engle1987co cointegration regression, like @eq-baseReg```{r}#| comment: " "summary( cointreg <-dynlm(y ~ x) )```- The true alpha of the simulated cointegration regression is 1 and the *Engle-Granger* estimate for alpha is very close: `r round(coef(cointreg)[2],3)`, although with only 250 observations ==\> super consistency------------------------------------------------------------------------#### Plotting residuals of cointegration regressionResiduals of cointegration regression; they look stationary although clearly autocorrelated```{r}#| fig-width: 6#| fig-align: center#| fig-asp: 0.75plot(cointreg$residuals, main="Residuals of cointegration regression")abline(0,0)```------------------------------------------------------------------------#### Unit root test for residuals##### DF test of residuals- DF test equation, @eq-DF1, H~0~ is non-stationarity:```{r}#| comment: " "coefDF1 <-coeftest( dynlm( d(cointreg$residuals) ~L(cointreg$residuals) ) )coefDF1```- $\Delta$(residuals) on lagged residuals ==\> seems to be stationary\- Note the t-value of L(cointreg\$residuals) = `r round(coefDF1[2,3],2)`. Seems to support cointegration. But see comments below about critical values##### ADF test of residuals- @eq-ADF, test for non-stationarity, H~0~ is non-stationarity. Seems to support cointegration- But caution, Dickey-Fuller critical values a are not valid for residuals, as OLS residuals always look somewhat stationary due to the minimization of the residual variance- The used lag length k for autoregressive correction terms is 6 in this case There are other procedures for ADF tests in R, which choose an "optimal" lag length according to some information criteria```{r}#| comment: " "adf.test(cointreg$residuals)```------------------------------------------------------------------------#### Testing for cointegration with the package "egcm"- *Engle-Granger cointegration test*. Basically an ADF test for the residuals of the cointegration regression- But the egcm() procedure uses *correct* critical values for residuals- Furthermore, besides an ADF test, a whole battery of other tests are carried out - Most of them can reject the H~0~ of non-stationary of the residuals at the usual significance level of 5%\ - specially look at the Elliott, Rothenberg and Stock (ERSD) test, which is a variant of the ADF test with local detrending of the data before testing and said to be the most powerful unit root test - ==\> variables are cointegrated - ==\> no spurious regression```{r, warning=FALSE}#| comment: " "summary( erg <-egcm(x, y, i1test="adf") )```------------------------------------------------------------------------### Estimating an Error Correction Model#### Estimating an error correction model with lagged residuals from the cointegration regression above as error-correction termSee @eq-ERC```{r}#| comment: " "erc <-dynlm( d(y) ~L(d(y)) +L(d(x)) +L(cointreg$residuals) )summary(erc) ```------------------------------------------------------------------------#### Estimating the error correction model in "free form"- The implied parameter $\alpha$ of the cointegration equation, @eq-baseReg, is the coefficient of L(x) / -coefficient of L(y). See @eq-ERC```{r}#| comment: " "ercfree <-dynlm( d(y) ~L(d(y)) +L(d(x)) +L(y) +L(x) )summary(ercfree) # Implied estimate of alpha:alpha <--coef(ercfree)[5] /coef(ercfree)[4]```- The true alpha of the simulated cointegration regression is 1 and the estimate for alpha is very close: `r round(alpha,3)`, ==\> super consistency- Furthermore, it is also very similar to the *Engle-Granger* estimate for alpha from above, which was `r round(coef(cointreg)[2],3)`\pagebreak