# loading basic and useful packages
rm(list=ls()) # clear all
library(AER) # AER is a general econometric package
library(plm) # Panel data package
library(modelsummary) # fancy tables
library(wooldridge) # the wooldridge package includes all data sets used in the textbook
9.1 Topics of this chapter
- Basic Concepts and Motivation
- Pooled Cross-Section Models
- Cluster-Robust Standard Errors
- Panel Data Models
- Difference Estimator
- Dummy Variable Estimator
- Fixed Effects Estimator – Within Estimator
- Between Estimator
- Random Effects Estimator
- Correlated Random Effects Estimator (CRE)
- Hausman-Tayor IV Estimator
- Hausman Test
9.2 Basic Concepts and Motivation
Pooled cross-section and panel data exhibit two dimensions
a time-dimension
a cross-section-dimension
Examples:
a households (or firms) survey of 10,000 households (firms), collected every year for a total of five years (wide panel)
national accounts data for 25 countries over 150 quarters (long panel)
The principle advantages of this special data structure are
more observations and more variation in the data due to less aggregation
allowing to control for unobserved individual heterogeneity thereby avoiding omitted variable bias
testing for the effects of policy measures (DiD-estimator, see Section 5.5) or testing for structural breaks (Chow Test, see Section 5.6)
Possible disadvantages are
a more complicated error term, especially certain types of auto-correlation and heteroscedasticity
we most often assume that parameters are constant over cross-section units, as well as over time
9.2.1 Empirical example: Loading a panel data set of wages and explaining variables
data(wagepan) # loading data set wagepan of the `wooldridge` package
wagepan[ 1:20, c("nr", "year", "lwage", "educ", "married", "union", "hisp", "black", "hours") ]
nr year lwage educ married union hisp black hours
1 13 1980 1.1975402 14 0 0 0 0 2672
2 13 1981 1.8530600 14 0 1 0 0 2320
3 13 1982 1.3444617 14 0 0 0 0 2940
4 13 1983 1.4332134 14 0 0 0 0 2960
5 13 1984 1.5681251 14 0 0 0 0 3071
6 13 1985 1.6998910 14 0 0 0 0 2864
7 13 1986 -0.7202626 14 0 0 0 0 2994
8 13 1987 1.6691879 14 0 0 0 0 2640
9 17 1980 1.6759624 13 0 0 0 0 2484
10 17 1981 1.5183982 13 0 0 0 0 2804
11 17 1982 1.5591905 13 0 0 0 0 2530
12 17 1983 1.7254101 13 0 0 0 0 2340
13 17 1984 1.6220223 13 0 0 0 0 2486
14 17 1985 1.6085882 13 0 0 0 0 2164
15 17 1986 1.5723854 13 0 0 0 0 2749
16 17 1987 1.8203338 13 0 0 0 0 2476
17 18 1980 1.5159627 12 1 0 0 0 2332
18 18 1981 1.7353791 12 1 0 0 0 2116
19 18 1982 1.6317437 12 1 0 0 0 2500
20 18 1983 1.9982288 12 1 0 0 0 2474
# Note the two index-variables (identifiers) nr (for i) and year (for t)
Code
nr | year | agric | black | bus | construc | ent | exper | fin | hisp | poorhlth | hours | manuf | married | min | nrthcen | nrtheast | occ1 | occ2 | occ3 | occ4 | occ5 | occ6 | occ7 | occ8 | occ9 | per | pro | pub | rur | south | educ | tra | trad | union | lwage | d81 | d82 | d83 | d84 | d85 | d86 | d87 | expersq |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13 | 1980 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 2672 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 0 | 0 | 1.198 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
13 | 1981 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 2320 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 14 | 0 | 0 | 1 | 1.853 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 4 |
13 | 1982 | 0 | 0 | 1 | 0 | 0 | 3 | 0 | 0 | 0 | 2940 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 0 | 0 | 1.344 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 9 |
13 | 1983 | 0 | 0 | 1 | 0 | 0 | 4 | 0 | 0 | 0 | 2960 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 0 | 0 | 1.433 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 16 |
13 | 1984 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 3071 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 14 | 0 | 0 | 0 | 1.568 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 25 |
13 | 1985 | 0 | 0 | 1 | 0 | 0 | 6 | 0 | 0 | 0 | 2864 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 0 | 0 | 1.700 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 36 |
13 | 1986 | 0 | 0 | 1 | 0 | 0 | 7 | 0 | 0 | 0 | 2994 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 0 | 0 | -0.720 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 49 |
13 | 1987 | 0 | 0 | 1 | 0 | 0 | 8 | 0 | 0 | 0 | 2640 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 0 | 0 | 1.669 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 64 |
17 | 1980 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 2484 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 0 | 1 | 0 | 1.676 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 16 |
17 | 1981 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 2804 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 0 | 1 | 0 | 1.518 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 25 |
9.3 Pooled Cross-Section Models
9.3.1 Basic Model Structure
Index i represents the particular cross-section unit (firms, households, counties, etc.) and the second one, t, is the time-index. Our model is:
y_{i t} = \mathbf{x}_{i t} \boldsymbol{\beta} + e_{i t} \tag{9.1}
\text{with} \ \ i=1, \ldots, n, \ \ t=1, \ldots, T, \ \ N = n \times T
Stacking over time t for a fixed i leads to the data vectors and matrices of the i^{th} cross-section unit:
\underset{T \times 1} {\mathbf{y}_i} = \left[\begin{array}{c} y_{i 1} \\ y_{i 2} \\ \vdots \\ y_{i T} \end{array}\right] \quad \underset{T \times (k+1)} {\mathbf{X}_i}=\left[\begin{array}{ccccc} 1 &x_{i 1}^1 & x_{i 1}^2 & \cdots & x_{i 1}^k \\ 1 & x_{i 2}^1 & x_{i 2}^2 & \cdots & x_{i 2}^k \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{i T}^1 & x_{i T}^2 & \cdots & x_{i T}^k \end{array}\right] \quad \underset{T \times 1} {\mathbf{e}_{i}}=\left[\begin{array}{c} e_{i 1} \\ e_{i 2} \\ \vdots \\ e_{i T} \end{array}\right] \tag{9.2}
Stacking over units i we get:
\underset{N \times 1}{\mathbf{y}}=\left[\begin{array}{c} \mathbf{y}_1 \\ \mathbf{y}_2 \\ \vdots \\ \mathbf{y}_n \end{array}\right] \quad \underset{N \times (k+1) } {\mathbf{X}}=\left[\begin{array}{c} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_n \end{array}\right] \quad \underset{N \times 1}{\mathbf{e}}= \left[\begin{array}{c} \mathbf{e}_1 \\ \mathbf{e}_2 \\ \vdots \\ \mathbf{e}_n \end{array}\right] \ \ \ \Rightarrow \tag{9.3}
\mathbf{y}=\mathbf{X} \boldsymbol{\beta}+\mathbf{e} \tag{9.4}
This looks like a common regression model.
9.3.2 Estimation with Pooled OLS
If assumption MLR.4, i.e. E(e_{it} | \mathbf X)=0 is fulfilled the model can be consistently and unbiased estimated by OLS \rightarrow Pooled OLS, i.e., by simply applying OLS to the stacked model:
\mathbf{y}=\mathbf{X} \boldsymbol{\beta}+\mathbf{e} \ \ \ \Rightarrow \ \ \ \hat{\boldsymbol{\beta}} = (\mathbf{X'}\mathbf{X})^{-1}\mathbf{X'}\mathbf{y} \tag{9.5}
And with the addition of MLR.5 – \operatorname {Var}(\boldsymbol e | \mathbf X)=\sigma^2_e \mathbf I) – we have:
{\operatorname {Var}(\hat{\boldsymbol{\beta}})} = {{\sigma}}_e^2 (\mathbf{X'}\mathbf{X})^{-1} \tag{9.6}
However, as the error term e_{it} in such a model structure is regularly not iid (MLR.5 is violated) Pooled OLS estimation is often
Not efficient
The estimated covariance matrix of \hat{\boldsymbol \beta} is biased and inconsistent in this case, i.e. simply false. In particular, this applies to all statistical tests
Assumption MLR.4 is often violated, i.e. E(e_{it} \mid \mathbf X) depends on \mathbf X, and thus Pooled OLS is biased and inconsistent in this case
All three problems, especially the last one, can be addressed with appropriate procedures which will be discussed in the following. In doing so, typically the special panel structure is exploited.
The special thing about panels is that every cross-section in the sample contains the very same units i (households, firms, counties, countries, etc.)
Why is the error term regularly not iid in this setting?
We can think of the following error structure: We assume that the whole error process e_{it} is the sum of three independent (and unobserved) components – this is called the Error Component Model:
An idiosyncratic error u_{it}, which is varying across both dimensions, but meets assumption MLR.5 (iid)
A unit specific error a_i, which is different for each unit i but constant over time
A time specific error v_t, which is varying over time t but hits all cross-section units in the same way
e_{it} = v_t + a_i+ u_{it} \tag{9.7}
We will often ignore the time specific \, v_t, as these can easily be handled by the inclusion of time-dummy variables, especially in short panels. So, we actually will proceed from
e_{it} = a_i + u_{it} \tag{9.8}
The most interesting error component is the unit specific error a_i, because this reflects unobserved heterogeneity of the units, for instance ability of workers or general propensity for crime of cities.
As this error is constant over time for each i, a_i is heavily auto-correlated along the time-dimension
Furthermore, a correlation of a_i with some explanatory variables is possible (thereby violating MLR.4) and precisely this aspect is in the center of interest of many panel data methods. Think about our standard example: Correlation of ability and education in a wage equation
Due to the autocorrelation in a_i, e_{it} is not iid (MLR.5 is violated) and therefore the true formula for the covariance matrix of \hat {\boldsymbol \beta} is not Equation 9.6 but
\operatorname{Var}(\hat{\boldsymbol\beta}) = (\mathbf X' \mathbf X)^{-1} (\mathbf X' \boldsymbol{\Omega}_e\mathbf X) (\mathbf X' \mathbf X)^{-1} \tag{9.9}
HAC or HC estimate of the covariance matrix
If we assume
u_{it} to be iid
a_i not to be correlated across units i, \operatorname{Cov}\left(a_{i t} a_{j s}\right)=0, \ \forall \ s, \ i \neq j (why should they be correlated? maybe in a spacial sense \rightarrow Spacial Econometrics 1)
we have a block-diagonal, cluster-structure, of \boldsymbol{\Omega}_e:
\boldsymbol{\Omega}_e=\left[\begin{array}{cccc} \boldsymbol\Sigma_1 & \mathbf 0 & \cdots & \mathbf 0 \\ \mathbf 0 & \boldsymbol\Sigma_i & \cdots & \mathbf 0 \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf 0 & \mathbf 0 & \cdots & \boldsymbol\Sigma_n \end{array}\right] \tag{9.10}
If we further assume homosecdasticity of a_i and that the idiosyncratic errors u_{it} are uncorrelated with the unit specific a_i, Equation 9.8 implies
\operatorname {Var}(e_{it})=\sigma_a^2 + \sigma_u^2
\operatorname{Cov}(e_{it},e_{is}) = \operatorname{Cov}(u_{it},u_{is}) + \operatorname{Cov}(a_{it},a_{is}) = 0 + E(a_{it} \cdot a_{it}) = \sigma_a^2
It follows for each unit i
\boldsymbol{\Sigma} = \sigma_{u}^2 \mathbf{I}_{_T}+\sigma_a^2 \mathbf{i i'} = \left[\begin{array}{cccc} (\sigma_a^2+\sigma_{u}^2) & \sigma_a^2 & \cdots & \sigma_a^2 \\ \sigma_a^2 & (\sigma_a^2+\sigma_{u}^2) & \cdots & \sigma_a^2 \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_a^2 & \sigma_a^2 & \cdots & (\sigma_a^2+\sigma_{u}^2) \end{array}\right] \tag{9.11}
This is a T \times T matrix with (\sigma_{a}^2+\sigma_u^2) on the principal diagonal and \sigma_a^2 on the non-diagonal elements. (\mathbf i \ldots T \times 1 vector of ones and the outer product \mathbf{i i' \ldots} T \times T matrix of ones).
- If we want to estimate an asymptotic correct version of Equation 9.9, a HAC-covariance matrix, one can exploit the special cluster-structure of Equation 9.10 and Equation 9.11 and try to estimate the middle of Equation 9.9 with the help of the residuals \hat e_{it} of regression model model Equation 9.4, respectively Equation 9.5
\widehat {(\mathbf X'\boldsymbol{\Omega}_e \mathbf X)} \ \ \text{ with } \ \ {\boldsymbol {\hat \Sigma}_i} \ = \left[\begin{array}{cccc} \hat e_{i1}^2 & \hat e_{i1}\hat e_{i2} & \cdots & \hat e_{i1}\hat e_{iT} \\ \hat e_{i2}\hat e_{i1} & \hat e_{i2}^2 & \cdots & \hat e_{i2}\hat e_{iT}\\ \vdots & \vdots & \ddots & \vdots \\ \hat e_{iT}\hat e_{i1} & \hat e_{iT}\hat e_{i2} & \cdots & \hat e_{iT}^2 \end{array}\right] \tag{9.12}
This approach is called clustered HC estimator of the covariance matrix. The procedure
plm
() will do that with thevcov=vcovHC
option. Note, that heteroscedastic errors a_i are allowed with this formulationIf we consider \boldsymbol{\Omega}_e as a whole, without cluster structure, we can calculate the usual HAC estimator – we have to fill the zeros in Equation 9.10 with (\hat e_{it} \hat e_{js}) (but for consistency reasons you have to cut off the order of cross-correlations). The procedure
lm
() will do that with thevcov=vcovHAC
option. With the optionvcov=vcovHC
it will only handle heteroscedasticity, which makes a big differenceAnd finally, one can try to estimate \sigma_a^2 and \sigma_u^2 from Equation 9.11 directly. Later we will see that the so called Random effects estimator is doing that
9.3.3 Example: Pooled OLS of a wage equation
With the standard procedure lm()
ols_lm <- lm(lwage ~ educ + black + hisp + exper + I(exper^2) +
married + union + factor(year),
data=wagepan)
coeftest(ols_lm)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.09205578 0.07827010 1.1761 0.2396076
educ 0.09134979 0.00523738 17.4419 < 2.2e-16 ***
black -0.13923421 0.02357956 -5.9049 3.799e-09 ***
hisp 0.01601951 0.02079714 0.7703 0.4411788
exper 0.06723450 0.01369484 4.9095 9.467e-07 ***
I(exper^2) -0.00241170 0.00081995 -2.9413 0.0032860 **
married 0.10825295 0.01568942 6.8997 5.962e-12 ***
union 0.18246128 0.01715677 10.6349 < 2.2e-16 ***
factor(year)1981 0.05831999 0.03035363 1.9214 0.0547528 .
factor(year)1982 0.06277442 0.03321407 1.8900 0.0588251 .
factor(year)1983 0.06201174 0.03666013 1.6915 0.0908072 .
factor(year)1984 0.09046719 0.04009071 2.2566 0.0240849 *
factor(year)1985 0.10924630 0.04335248 2.5200 0.0117725 *
factor(year)1986 0.14195959 0.04642297 3.0580 0.0022421 **
factor(year)1987 0.17383343 0.04943305 3.5165 0.0004417 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Same with HC-robust se
# coeftest(ols_lm, vcov = vcovHC)
# Same with HAC-robust se
# coeftest(ols_lm, vcov = vcovHAC)
With plm()
from the plm package
ols_plm <- plm(lwage ~ educ + black + hisp + exper + I(exper^2) +
married + union + year,
data=wagepan,
index = c("nr", "year"), model = "pooling")
coeftest(ols_plm)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.09205578 0.07827010 1.1761 0.2396076
educ 0.09134979 0.00523738 17.4419 < 2.2e-16 ***
black -0.13923421 0.02357956 -5.9049 3.799e-09 ***
hisp 0.01601951 0.02079714 0.7703 0.4411788
exper 0.06723450 0.01369484 4.9095 9.467e-07 ***
I(exper^2) -0.00241170 0.00081995 -2.9413 0.0032860 **
married 0.10825295 0.01568942 6.8997 5.962e-12 ***
union 0.18246128 0.01715677 10.6349 < 2.2e-16 ***
year1981 0.05831999 0.03035363 1.9214 0.0547528 .
year1982 0.06277442 0.03321407 1.8900 0.0588251 .
year1983 0.06201174 0.03666013 1.6915 0.0908072 .
year1984 0.09046719 0.04009071 2.2566 0.0240849 *
year1985 0.10924630 0.04335248 2.5200 0.0117725 *
year1986 0.14195959 0.04642297 3.0580 0.0022421 **
year1987 0.17383343 0.04943305 3.5165 0.0004417 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Same with (clustered) robust se
# vcovHC is object orientated, will apply a different procedure for a plm-object (clustering)
# With a lm-object, pure heteroscedasticity correction only treats the principal diagonal
# coeftest(ols_plm, vcov = vcovHC)
Summarizing the results with the modelsummary
package
Code
modelsummary(list(" lm() "=ols_lm, "lm()_HC "=ols_lm, "lm()_HAC "=ols_lm, "plm() "=ols_plm, "plm()_HC "=ols_plm),
vcov = c("classical", vcovHC, vcovHAC, "classical", vcovHC),
stars = TRUE,
fmt = 3,
statistic = c('std.error', 'statistic'),
gof_omit = "A|L|B|F|RM",
coef_omit = "Inter|year|hisp",
align = "lddddd",
output = "flextable"
)
|
lm() |
lm()_HC |
lm()_HAC |
plm() |
plm()_HC |
---|---|---|---|---|---|
educ |
0.091*** |
0.091*** |
0.091*** |
0.091*** |
0.091*** |
(0.005) |
(0.005) |
(0.010) |
(0.005) |
(0.011) |
|
(17.442) |
(17.230) |
(8.892) |
(17.442) |
(8.264) |
|
black |
-0.139*** |
-0.139*** |
-0.139** |
-0.139*** |
-0.139** |
(0.024) |
(0.025) |
(0.052) |
(0.024) |
(0.050) |
|
(-5.905) |
(-5.658) |
(-2.704) |
(-5.905) |
(-2.763) |
|
exper |
0.067*** |
0.067*** |
0.067*** |
0.067*** |
0.067*** |
(0.014) |
(0.013) |
(0.019) |
(0.014) |
(0.020) |
|
(4.909) |
(5.035) |
(3.523) |
(4.909) |
(3.440) |
|
I(exper^2) |
-0.002** |
-0.002** |
-0.002* |
-0.002** |
-0.002* |
(0.001) |
(0.001) |
(0.001) |
(0.001) |
(0.001) |
|
(-2.941) |
(-3.110) |
(-2.285) |
(-2.941) |
(-2.358) |
|
married |
0.108*** |
0.108*** |
0.108*** |
0.108*** |
0.108*** |
(0.016) |
(0.015) |
(0.026) |
(0.016) |
(0.026) |
|
(6.900) |
(7.087) |
(4.137) |
(6.900) |
(4.169) |
|
union |
0.182*** |
0.182*** |
0.182*** |
0.182*** |
0.182*** |
(0.017) |
(0.016) |
(0.027) |
(0.017) |
(0.027) |
|
(10.635) |
(11.183) |
(6.863) |
(10.635) |
(6.665) |
|
Num.Obs. |
4360 |
4360 |
4360 |
4360 |
4360 |
R2 |
0.189 |
0.189 |
0.189 |
0.189 |
0.189 |
Std.Errors |
IID |
Custom |
Custom |
IID |
Custom |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
With the stargazer package
Code
# calculating robust se, etc., using coeftest() with vcov. option
robust_table <- coeftest(ols_plm, vcov. = vcovHC )
se_robust <- robust_table[,2] # se
t_robust <- robust_table[,3] # t-values
p_robust <- robust_table[,4] # prob-values
library(stargazer) # loading the package
stargazer(ols_lm, ols_plm, ols_plm, # function call with: model names
type = "text", # important option: output text, not LaTex code
digits = 5,
report = "vcs*", # output string: v=variables, c=coeffs, s=se, t=t-vals, p=p-vals, "vcs*" default
se = list(NULL, NULL, se_robust), # for the third model, we replace stand. se with robust
t = list(NULL, NULL, t_robust), # for the third model, we replace stand. t with robust
p = list(NULL, NULL, p_robust), # for the third model, we replace stand. p with robust
model.names=FALSE, # no caption for OLS and panel linear
model.numbers = FALSE, # no (1) (2) (3)
column.labels = c("pOLS\\_lm() ", " pOLS\\_plm() ", " pOLS\\_plm()\\_HCse "),
dep.var.labels.include = TRUE, # Name of dep. variable
dep.var.caption = "", # no dep. var. caption: dependent variable =
keep = c("educ", "black", "exper", "I(exper2)", "married", "union" ), # evident
keep.stat = c("n", "rsq") ) # displayed model statistics
With the texreg package and screenreg procedure
Code
# calculating robust se, etc., using coeftest() with vcov. option
robust_table <- coeftest(ols_plm, vcov. = vcovHC )
se_robust <- robust_table[,2] # se
t_robust <- robust_table[,3] # t-values (not easily possible in textreg, an exchange with se necessary)
p_robust <- robust_table[,4] # prob-values
library(texreg) # loading the package
# if you want output as LaTex code, use texreg()
screenreg( list(ols_lm, ols_plm, ols_plm), # function call with: list(model names), main diff. to stargazer!
digits = 5,
override.se = list(0, 0, se_robust), # for the third model, we replace stand. se with robust
override.pvalues = list(0, 0, p_robust), # for the third model, we replace stand. p with robust
custom.model.names = c("pOLS_lm() ", " pOLS_plm() ", " pOLS_plm()_HCse"), # custom model names
booktabs = FALSE, # some nice formatting for LaTex (for LaTex, not used with screenreg)
# bold = 0.05, # significant coeffs at 0,05 sig.level in bold (for LaTex, not used with screenreg)
omit.coef = "Inter|hisp|year", # string: excluded variables from table, here intercept, hisp, year
include.adjrs = FALSE, # which model statistic should be excluded, here adjusted R2
caption = FALSE) # no caption (for LaTex only?)
9.4 Basic Panel Data Models
We have identified two major problems with pooled data
-
The error of the model, Equation 9.4, is typically heavily auto-correlated along the time-dimension, especially due to a_i from Equation 9.8, which is constant over time for each unit i
- This problem could be in part addressed with clustered HC estimates of the covariance matrix. However, the efficiency loss remains
-
Furthermore, a correlation of a_i with some explanatory variables is possible (violating MLR.4)
Think about on our standard example: Correlation of unobserved ability with education in a wage equation
The error component a_i mainly represents unobserved heterogeneity across the units and precisely this aspect is in the center of interest of panel data methods
Panel data techniques generally address one broad issue: How to model unobserved heterogeneity, so that we can avoid biased and/or inefficient estimates.
Consider the following base regression model with data from one cross section at time t.
y_{it} = \beta_0 + \beta_1 x_{it} + a_{i} + u_{it} \tag{9.13}
-
We can think on our standard example, where
y_{it} could be the earned wage
x_{it} could be education, which effects on wage we want to estimate
-
a_{i} could represent other important factors for earned wage, like general ability, which is constant over time (therefore a_i and not a_{it})
- But a_{i} is gererally unobservable for the econometrician
u_{it} is the error term representing other unobservable components and is assumed to be uncorrelated with x (MLR.4)
The basic problem is that unobserved a_i is often correlated with x_{it} and we actually can only estimate
y_{it} = \beta_0 + \beta_1 x_{it} + \underbrace {(a_{i} + u_{it})}_{e_{it}} \tag{9.14}
The composite error e_{it} of this regression has exactly the structure of Equation 9.8 and due to its possible correlation with x_{it} via a_i, the estimate \hat \beta_1 will be biased and inconsistent in this case – omitted variable bias
9.4.1 The Difference (DIF) Estimator
So how can the panel data structure be helpful for this estimation problem?
- Suppose, we have not only one but several cross sections, for example for two years, year t-1 and year t, observed for the very same units i \rightarrow panel data. Then we have two relationships for the two periods
y_{i,t-1} = \beta_0 + \beta_1 x_{i,t-1} + a_{i} + u_{i,t-1}
y_{i,t} = \beta_0 + \beta_1 x_{i,t} + a_{i} + u_{i,t}
- We can take the first difference of the variables (we subtract the equation for period t-1 from the equation for period t) and get, assuming constant \betas
(y_{i,t} - y_{i,t-1}) = (\beta_0 - \beta_0) + \beta_1 ( x_{i,t} - x_{i,t-1} ) + (a_{i}- a_{i}) + (u_{i,t} - u_{i,t-1}) \ \Rightarrow
\Delta y_{i,t} = \beta_1 \Delta x_{i,t} + \Delta u_{i,t} \tag{9.15}
The estimation Equation 9.15 is called Difference (DIF) Estimator for panel data
What was the trick?:
-
We assumed that the variable a_i is constant over time, and is only varying across the units i. Hence, taking the first difference in the data yields (a_{i}- a_{i})=0, i.e., the problematic unobservable variable a_i simply dropped out
- Thus, Equation 9.15, i.e. \beta_1, can be consistently estimated with OLS; if we have more than two periods, by pooled OLS
General ability in a wage equation is a good example for such an interpretation of a_i – it varies over units (persons) but doesn’t change much over time
So, we leaned that
if we have an error structure like Equation 9.8 such as in Equation 9.14, with a unit specific variable a_i representing unobserved unit specific heterogeneity
and this variable is further probably correlated with the explanatory variables (violating assumption MLR.4)
we can take the first difference in the data (unit by unit), thereby simply removing a_i and the problem of correlation with the other explanatory variables
Final remark: The error term in Equation 9.15, \Delta u_{it}, is probably heavily autocorrelated and therefore (pooled) OLS is not efficient and we further will need an HAC-estimator of the covariance matrix.
9.4.2 The Dummy Variable Estimator – LSDV Methods
Another possibility to account for time-invariant individual components is to explicitly introduce them into the model specification in the form of individual intercepts
- These individual intercepts capture the unit’s heterogeneity. The time dimension of panel data allows in fact to estimate the a_is as additional (fixed) parameters, together with the \betas, the other parameters of interest
Our starting point is Equation 9.13 which we rewrite here for convenience (but leaving out the general intercept \beta_0 and allowing for k explanatory variables, so \mathbf x_{it} is row vector with k variables).
y_{it} = \mathbf x_{it} \boldsymbol \beta + a_{i} + u_{it} \tag{9.16}
As we did in Equation 9.2, we are stacking this equation over time. This yields a equation for every unit i with T observations: 2
\mathbf y_{i} = \mathbf x_{i} \boldsymbol \beta + \mathbf i \, a_{i} + \mathbf u_{i} \tag{9.17}
Compared to Equation 9.2, the new element is the vector \mathbf i \, a_i. Thereby, \mathbf i is a column vector of ones and length of T and
\underset{T \times 1} {\mathbf{i} \, a_{i}} \ = \ \left[\begin{array}{c} 1 \\ 1 \\ \vdots \\ 1 \end{array}\right] a_i \ = \ \left[\begin{array}{c} a_i \\ a_i \\ \vdots \\ a_i \end{array}\right]
The important point is that {a}_{i} is constant over time for each unit i. So, we can interpret {\mathbf{i} \, a_{i}} as intercept
Clearly, the elements of \mathbf{x}_{i} and \mathbf{u}_{i} are generally allowed to vary over time!
In a final step, we are stacking Equation 9.17 over the n units i to get the equivalent of Equation 9.4
- By using the block-diagonal matrix \mathbf D, and not simply stacking the \mathbf i, we allow for different intercepts across the units
\left[\begin{array}{c} \mathbf{y}_1 \\ \mathbf{y}_2 \\ \vdots \\ \mathbf{y}_n \end{array}\right] = \underbrace {\left[\begin{array}{c} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_n \end{array}\right]}_{\underset {(T \times n) \times k} {\mathbf X}} \left[\begin{array}{c} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_k \end{array}\right] + \underbrace {\left[\begin{array}{cccc} \mathbf{i} & \mathbf 0 & \cdots & \mathbf 0 \\ \mathbf 0 & \mathbf{i} & \cdots & \mathbf 0 \\ & & \vdots & \\ \mathbf 0 & \mathbf 0 & \cdots & \mathbf{i} \end{array}\right]}_{\underset {(T \times n) \times n} {\mathbf D}} \underbrace {\left[\begin{array}{c} a_1 \\ a_2 \\ \vdots \\ a_n \end{array}\right]}_{\underset {n \times 1}{ \boldsymbol \alpha}} + \left[\begin{array}{c} \mathbf{u}_1 \\ \mathbf{u}_2 \\ \vdots \\ \mathbf{u}_n \end{array}\right] \tag{9.18}
which is
\mathbf{y} = \mathbf{X} \boldsymbol{\beta}+ \mathbf{D} \boldsymbol{\alpha}+\boldsymbol{u} \tag{9.19}
Using Equation 9.19, we estimate the coefficient vectors \boldsymbol \beta and \boldsymbol \alpha by pooled OLS.
This estimator is referred to as Least Squares Dummy Variables, or LSDV, Estimator.
-
Note, the columns of matrix \mathbf D are simply unit-specific dummy variables, one dummy variable for each unit i
The elements of vector \boldsymbol \alpha are the coefficients of the corresponding dummy variables, representing the individual intercepts
Furthermore, the columns of \mathbf D, the unit-specific dummy variables, are mutually orthogonal 3
Similar to the DIF estimator, with the help of the LSDV Estimator, we get rid of the unobservable unit specific errors a_i; in this case, we turned them to coefficients of dummy variables and try to explicitly estimate them.
It should be noted that the degrees of freedom for the estimation now reduces to (N - k - n) because of the extra n parameters of the n dummy variables
Moreover, while the vector \hat {\boldsymbol \beta} is N-consistent, the estimates of the individual intercepts \hat a_i are only T-consistent, as relying only on the time dimension. So, in a panel with high n and small T (wide panel) the individual intercepts \hat a_i are not consistent estimates, although unbiased
Remark: If we have a full set of n dummy variables, the regression equation Equation 9.19 must not contain an intercept, because the n dummy variables sum to a vector of ones \rightarrow perfect collinearity in this case.
Alternatively, you can leave out one dummy variable for one unit which then becomes the “benchmark” unit; the coefficients of the other dummy variables are then measuring the difference to the benchmark unit. (But plm()
offers also other options.)
Summarizing, the LSDV estimator is a very convenient way to handle unobserved unit specific heterogeneity by introducing dummy variables to control for them. However, comparing the LSDV estimator with the DIF estimator from above is quite a subtle task:
Both estimators need strict exogenous explanatory variables to be n-consistent – relevant for wide panels 4
-
If the explaining variables are only weak exogenous both estimators are not n-consistent. LSDV is at least T-consistent (if the idiosyncratic error u_{it} is not autocorrelated) – relevant for long panels.5
-
The reason for the n-inconsitancy is that the short run bias, for instance due to an autoregressive term y_{i,t-1}, only diminishes with larger T for the LSDV estimator; a larger n doesn’t help because the short run bias is present in every unit i
- Regarding the DIF estiamator, the short run bias does not even diminish with larger T
-
-
If the explanatory variables are strict exogenous and the idiosyncratic error u_{it} is not autocorrelated (which are our standard assumptions), then the LSDV estimator is more efficient than the DIF estimator6
-
That’s the reason, why the LSDV estimator (and the equivalent fixed effects estimator) is generally more often used in applied work. There are at least two qualifications:
For T=2 (only two time periods) both estimators are identical (if we have a time-dummy in the LSDV equation)
It is sometimes the case that the idiosyncratic error u_{it} is autocorrelated as well (e.g., unobserved variables, which are not constant over time, are often autocorrelated). The more positive autocorrelated these are the less better is the LSDV estimator. At the extreme, if u_{it} is a random walk, the DIF estimator is best and robust against spurious regressions
-
And lastly, and this is very important in practice, both estimators can be very sensitive to measurement errors in the explanatory variables
9.4.3 The Fixed Effects (FE) Estimator
A big disadvantage of the LSDV estimator arises with very large n: For instance suppose we have 50000 units i. In this case we would have to add 50000 unit specific dummy variables to the regression Equation 9.19, which could lead to serious numerical and practical problems.
However, first differencing and unit specific dummy variables are just two of several ways to eliminate the fixed heterogeneity a_i.
An alternative method, which typically works better, is called the fixed effects or within transformation. To see what this method does, reconsider our basic specification of interest, Equation 9.16:
y_{it} = \mathbf x_{it} \boldsymbol \beta + a_{i} + u_{it} \tag{9.20}
Now, for each unit i, we average this equation over time and get
\bar y_{it} = \mathbf {\bar x}_{it} \boldsymbol \beta + a_{i} + \bar u_{it} \tag{9.21}
where for instance \, \bar y_{it} = \frac{1}{T} \sum_{t=1}^T y_{it}, and so on. As a_i is fixed over time, the time average \, \bar a_{i} = \frac{1}{T} \sum_{t=1}^T a_{i} = a_i.
If we subtract Equation 9.21 from Equation 9.20 we arrive to
y_{it} - \bar y_{it} = (\mathbf x_{it} - \mathbf {\bar x}_{it} ) \boldsymbol \beta + 0 + u_{it} - \bar u_{it} \tag{9.22}
or, with a new notation
\ddot y_{it} := (y_{it} - \bar y_{it}), \quad \mathbf {\ddot x}_{it} := (\mathbf x_{it} - \mathbf {\bar x}_{it}) \tag{9.23}
(and likewise for \ddot u_{it}) we have
\ddot y_{it} = \mathbf {\ddot x}_{it} \boldsymbol \beta + \ddot u_{it} \tag{9.24}
Thus, a within transformed variable \ddot x_{it} is simply the deviation of the variable x_{it} from its unit specific mean \bar x_i. Usally, this procedure is called time-demeaning.
In a final step, we are stacking Equation 9.24 over the n units i and get Equation 9.25, the equivalent of Equation 9.4 and Equation 9.19
\mathbf{\ddot y} = \mathbf{\ddot X} \boldsymbol{ \beta} + \boldsymbol{\ddot u} \tag{9.25}
Applying pooled OLS to the within transformed Equation 9.25 results in the Fixed Effects (FE) estimator \boldsymbol {\hat \beta}_{FE}
\hat{\boldsymbol{\beta}}_{F E}=\left(\mathbf{\ddot X}^{\prime} \mathbf{\ddot X} \right)^{-1} \mathbf{\ddot X}^{\prime} \mathbf{\ddot y} \tag{9.26}
with the usual covariance matrix
\operatorname {Var} (\hat{\boldsymbol{\beta}}_{F E}) = \sigma_{\ddot u}^2 \left(\mathbf{\ddot X}^{\prime} \mathbf{\ddot X} \right)^{-1} \tag{9.27}
This whole time-demeaning procedure has become the standard way of estimating models with individual, time-invariant heterogeneity.
- The name within transformation comes from the fact that estimating Equation 9.25 with pooled OLS uses the time variation in y and \mathbf x within each cross-sectional unit and \boldsymbol {\hat \beta}_{FE} is basically a weighted average of these within effects
The important thing about Equation 9.22 to Equation 9.25 is that the unobserved effect a_i (which was the essential problem because of a possible correlation with x_{it}) has dropped out. This suggests that we actually can estimate Equation 9.25 by pooled OLS, as we did in Equation 9.26.
-
Sometimes, the estimated fixed effects (intercepts) \hat a_i are of interest. This is the case if we want to study the distribution of the \hat a_i across i, or if we want to see whether a particular unit’s \hat a_i is above or below the average
These estimates would be directly available from a LSDV regression, but they are rarely reported by FE routines (however, the
plm()
package has a special function for that purpose)After a FE estimation, the \hat a_i can be computed with the help of the time-averaged Equation 9.21 (note that the expectation of \bar u_{it} is zero in that equation)
\hat a_{i} = \bar y_{it} - \mathbf {\bar x}_{it} \boldsymbol {\hat \beta}_{FE} \tag{9.28}
A byproduct of the time-demeaning procedure is the so called between (BTW) estimator. This is obtained as the OLS estimation of the time-averaged equation Equation 9.21 (with an additional intercept)
-
Thereby, the time averages \bar y_{it} and \mathbf {\bar x}_{it} of the units i are used – this transformation is called between transformation – to run a regression on these data which now is basically a cross-sectional regression
Note, there is no special treatment for a_i and thus a_i remains a part of the error term, similar to our base equation, Equation 9.14 (remember, a_i is unobservable). Hence, if a_i is correlated with \mathbf {\bar x}_{i} the BTW estimator is biased and inconsistent
On the other hand, if we think a_i is uncorrelated with \mathbf {\bar x}_{i}, it is better to use the random effects estimator, which we review in a later Section. The reason is that the BTW estimator uses only cross-section information - therefore the name BTW estimator - and ignores the information on how the variables change over time - this important information is averaged out by the between transformation
However, a possible bias through measurement errors is mitigated (averaged out) by the between transformation
Figure 9.1 shows the effects of the between and within transformations on the data; this example is about traffic fatality rates in different US states – our units i – different state have different colors:
Code
library(AER)
data("Fatalities")
library(ggplot2)
Fatalities <- pdata.frame(Fatalities, index= c("state", "year")) # pdata.frame
Fatalities1 <- Fatalities[1:77,] # only the first 77 elements = 11 states (units)
Fatalities1$frate <- with (Fatalities1, fatal / pop * 10000) # calculating fatalities rate
Bfrate = Between(Fatalities1$frate) # Between transformation
Wfrate = Within(Fatalities1$frate) # Within transformation
p1 <- ggplot( Fatalities1 ) +
geom_point( aes(y=1:length(frate), x=frate, color = state ),
size=2.2, show.legend = FALSE ) +
labs(y="Time") +
scale_y_reverse() +
theme_bw()
p2 <- ggplot( Fatalities1 ) +
geom_point( aes(y=1:length(frate), x=Bfrate, color = state ),
size=2.2, show.legend = FALSE ) +
xlim(1.4, 3.1) +
labs(y="Time") +
scale_y_reverse() +
theme_bw()
p3 <- ggplot( Fatalities1 ) +
geom_point( aes(y=1:length(frate), x=Wfrate, color = state ),
size=2.2, show.legend = FALSE ) +
xlim(-1.75/2, 1.75/2) +
labs(y="Time") +
scale_y_reverse() +
theme_bw()
p1; p2; p3
The general properties of the FE estimator are identical to the LSDV estimator discussed above (in fact, they deliver identical estimates, as we will show below).
The main point of the FE estimator is that arbitrary correlations between a_i and the explanatory variables in any time periods are allowed, just as with first differencing and LSDV estimators
However, because of this, any explanatory variable that is constant over time for every i gets swept away by the within transformation: \ddot x_{it} = 0 if the variable x_{it} is constant over time! (See also Figure 9.1 above.)
Hence, the effects of variables like
gender
,black
,place of birth
orcity's proximity to the sea
cannot be estimated with such procedures. The effects of these variables are absorbed by the estimated fixes effects (unit specific intercepts with the LSDV estimator). This is a major drawback of FE, DIF and LSDV estimators
As mentioned above, the FE estimator is identical to the LSDV estimator. In fact, we have the following
Proposition 1: Applying the within transformation to some variables is the very same as regressing these variables on all unit specific dummy variables and taking the residuals.
9.4.4 Example for a panel data set
The Fatalities dataset from Stock and Watson (2020) (here taken from the Kleiber and Zeileis (2008) AER package) is an often used example of the importance of individual heterogeneity and time effects in a panel setting.
The research question is whether taxing alcoholics can reduce the traffic death toll. The basic specification relates the traffic fatality rate, frate, to the tax rate on beer, beertax, in a regression model:8
frate_{it} = \beta_0 + \beta_1 beertax_{it} + e_{it}
Data are from 1982 to 1988 for each of the continental US states. In the following, the model is specified in its simplest form as a bivariate relation between the death rate and the beer tax.
rm(list=ls()) # clear all
library(wooldridge)
library(AER)
data("Fatalities")
library(modelsummary)
library(ggplot2)
library(plm)
# Calculation of fatality rates
Fatalities$frate <- with (Fatalities, fatal / pop * 10000)
# Creating a pdata.frame
Fatalities <- pdata.frame(Fatalities, index= c("state", "year"))
class(Fatalities$frate)
[1] "pseries" "numeric"
# Between and within transformations - needed for graphics with ggplot
Fatalities$Bfrate <- Between(Fatalities$frate)
Fatalities$Bbeertax <- Between(Fatalities$beertax)
Fatalities$Wfrate <- Within(Fatalities$frate)
Fatalities$Wbeertax <- Within(Fatalities$beertax)
class(Fatalities$Bfrate)
[1] "pseries" "numeric"
Pooled OLS
In a first step we estimate the model with pooled OLS without considering any form of individual effect.
coeftest( Pooled <- plm(frate ~ beertax, index= c("state", "year"),
model="pooling",
data = Fatalities))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.853308 0.043567 42.5391 < 2.2e-16 ***
beertax 0.364605 0.062170 5.8647 1.082e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Taxing beer seem to increase the number of deaths from traffic accidents in a highly statistically significant way! Hence, one could even argue that free beer might lead to safer driving
Similar results, contradicting our most basic intuition, appear for any single year in the corresponding cross section samples
Between (BTW) Estimator
coeftest( Between <- plm(frate ~ beertax, index= c("state", "year"),
model="between",
data = Fatalities))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.84622 0.11080 16.663 <2e-16 ***
beertax 0.37842 0.15860 2.386 0.0212 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- We reach to the same conclusion using the BTW estimator – we get a very significant positive coefficient of beertax. As with pooled OLS, individual heterogeneity is not handled here
Difference Estimator with first and last observation
The results above might be biased because we ignored individual heterogeneity which ended up in the error term
-
Panel data analysis will provide a solution for the unexpected results. In fact, we suspect
- the presence of unobserved heterogeneity in the different states which is probably correlated with the way the tax rates are set in the different states
A more appropriate model would be:
frate_{it} = \beta_1 beertax_{it} + \textcolor {red} {a_i} + u_{it}
Here, a_i represents the unobserved heterogeneity as describes in text above
As outlined in the text, the simplest way to get rid of the individual intercepts a_i is to estimate the model in differenced variables. Here, we take the difference of the last (1988) to the first (1982) year
coeftest( DIF <- plm(diff(frate, 6) ~ diff(beertax, 6) -1, index= c("state", "year"),
model="pooling",
data = Fatalities))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
diff(beertax, 6) -0.86892 0.39299 -2.2111 0.03193 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- Estimation on the six-year differences eventually yields a sensible result: after controlling for state heterogeneity, higher taxation on beer is associated with a lower number of fatalities
LSDV estimator
As discussed, another way to control for time-invariant unobservables is to estimate them out explicitly, here with state-specific dummy variables and applying pooled OLS. This could be easily done in R with the plm
package by simply adding state
as additional variable (with lm()
you would have to add factor(states
)).
coeftest( LSDV <- plm(frate ~ beertax + state, index= c("state", "year"),
model="pooling",
data = Fatalities))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.47763 0.31336 11.0980 < 2.2e-16 ***
beertax -0.65587 0.18785 -3.4915 0.0005560 ***
stateaz -0.56773 0.26667 -2.1290 0.0341073 *
statear -0.65495 0.21902 -2.9904 0.0030280 **
stateca -1.50947 0.30435 -4.9596 1.211e-06 ***
stateco -1.48428 0.28735 -5.1654 4.497e-07 ***
statect -1.86226 0.28053 -6.6383 1.583e-10 ***
statede -1.30760 0.29395 -4.4484 1.239e-05 ***
statefl -0.26813 0.13933 -1.9245 0.0552840 .
statega 0.52460 0.18395 2.8519 0.0046613 **
stateid -0.66902 0.25797 -2.5934 0.0099895 **
stateil -1.96162 0.29150 -6.7295 9.234e-11 ***
statein -1.46154 0.27254 -5.3627 1.689e-07 ***
stateia -1.54393 0.25344 -6.0919 3.580e-09 ***
stateks -1.22322 0.24544 -4.9838 1.080e-06 ***
stateky -1.21752 0.28717 -4.2398 3.020e-05 ***
statela -0.84712 0.18867 -4.4900 1.033e-05 ***
stateme -1.10795 0.19112 -5.7971 1.776e-08 ***
statemd -1.70644 0.28322 -6.0252 5.169e-09 ***
statema -2.10975 0.27610 -7.6413 3.238e-13 ***
statemi -1.48453 0.23602 -6.2899 1.182e-09 ***
statemn -1.89721 0.26509 -7.1570 6.917e-12 ***
statems -0.02908 0.14845 -0.1959 0.8448392
statemo -1.29626 0.26669 -4.8606 1.930e-06 ***
statemt -0.36039 0.26396 -1.3653 0.1732250
statene -1.52218 0.24928 -6.1063 3.304e-09 ***
statenv -0.60077 0.28595 -2.1010 0.0365169 *
statenh -1.25445 0.20969 -5.9826 6.526e-09 ***
statenj -2.10575 0.30720 -6.8547 4.370e-11 ***
statenm 0.42638 0.25432 1.6765 0.0947242 .
stateny -2.18667 0.29890 -7.3156 2.574e-12 ***
statenc -0.29046 0.11984 -2.4238 0.0159792 *
statend -1.62344 0.25381 -6.3962 6.452e-10 ***
stateoh -1.67442 0.25381 -6.5970 2.016e-10 ***
stateok -0.54506 0.16912 -3.2229 0.0014154 **
stateor -1.16800 0.28571 -4.0880 5.654e-05 ***
statepa -1.76747 0.27610 -6.4016 6.255e-10 ***
stateri -2.26505 0.29376 -7.7106 2.068e-13 ***
statesc 0.55717 0.11000 5.0654 7.305e-07 ***
statesd -1.00372 0.20962 -4.7883 2.698e-06 ***
statetn -0.87566 0.26802 -3.2672 0.0012182 **
statetx -0.91747 0.24556 -3.7363 0.0002253 ***
stateut -1.16395 0.19642 -5.9259 8.884e-09 ***
statevt -0.96604 0.21113 -4.5756 7.075e-06 ***
stateva -1.29018 0.20416 -6.3196 9.986e-10 ***
statewa -1.65952 0.28347 -5.8544 1.307e-08 ***
statewv -0.89675 0.24661 -3.6363 0.0003277 ***
statewi -1.75927 0.29395 -5.9850 6.442e-09 ***
statewy -0.22850 0.31290 -0.7303 0.4658106
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results are similar to the DIF estimator – but have a much lower estimated standard error of beertax. This should be quite often the case, as the error term of the DIF estimator is autocorrelated by construction (if u_{it} is actually iid) and therefore the LSDV and fixed effect estimators are more efficient
Note, that this estimator deliverers explicit estimates of the state-individual intercepts – the fixed effects. Further note that most of these are highly significant which highlights the importance of individual heterogeneity
Within (FE) estimator
Now we use the FE (within) estimator. This should be the same as with the LSDV estimator.
coeftest( Within <- plm(frate ~ beertax, index= c("state", "year"),
model="within",
data = Fatalities))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
beertax -0.65587 0.18785 -3.4915 0.000556 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
az ar ca co ct de
-0.5677269 -0.6549516 -1.5094688 -1.4842804 -1.8622573 -1.3076020
FE (within) estimation yields actually the very same result as the LSDV estimator (we know they are equivalent), but in a more compact and efficient way
If we want the state-individual intercepts – the fixed effects – we can calculate these with the function
fixef()
(using Equation 9.28). These are identical to the above ones as well-
The FE model requires only minimal assumptions on the nature of heterogeneity
Within (FE) estimator with time-dummies
Now we check whether the inclusion of time specific effects play a role here. This is done by simply adding time-dummy variables – adding the variable year
.
coeftest( Within_t <- plm(frate ~ beertax + year, index= c("state", "year"),
model="within",
data = Fatalities))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
beertax -0.639980 0.197377 -3.2424 0.001328 **
year1983 -0.079903 0.038354 -2.0833 0.038126 *
year1984 -0.072421 0.038352 -1.8883 0.060012 .
year1985 -0.123976 0.038442 -3.2250 0.001408 **
year1986 -0.037864 0.038588 -0.9813 0.327312
year1987 -0.050902 0.038974 -1.3061 0.192600
year1988 -0.051804 0.039623 -1.3074 0.192145
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
az ar ca co ct de
-0.5677269 -0.6549516 -1.5094688 -1.4842804 -1.8622573 -1.3076020
- In this case, the time-dummy variables don’t play an important rule. But this is rather an exception than the rule and therefore the inclusion of time-dummy variables should always be tried, at least.
Within (FE) estimator, effect = twoways
Another way to consider time specific heterogeneity is the option effect = "twoways"
. This should be identical to the inclusion of time-dummy variables, which fortunately is actual the case.
coeftest( Within_2w <- plm(frate ~ beertax + year, index= c("state", "year"),
model="within",
effect = "twoways",
data = Fatalities))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
beertax -0.63998 0.19738 -3.2424 0.001328 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
az ar ca co ct de
-0.5677269 -0.6549516 -1.5094688 -1.4842804 -1.8622573 -1.3076020
Summarizing the results
Table 9.2 summarizes all results regarding the variable beertax.
- Look at the very high R^2 of the LSDV estimator, especially compared to the within estimator, which yields the very same parameter estimates and t-values. Why is that the case? Effects of a beer tax on traffic fatalities
Code
modelsummary(list("Pooled"=Pooled, "BTW"=Between, "DIF"=DIF, "LSDV"=LSDV, "FE"=Within, "FE_t"=Within_t, "FE_2w"=Within_2w),
vcov = c(vcovHC, "classical", "classical", "classical", "classical", "classical", "classical"),
stars = TRUE,
fmt = 3,
statistic = c('statistic'),
gof_omit = "A|L|B|F|RM",
coef_omit = "Inter|year|state",
align = "lddddddd",
output = "flextable"
)
|
Pooled |
BTW |
DIF |
LSDV |
FE |
FE_t |
FE_2w |
---|---|---|---|---|---|---|---|
beertax |
0.365** |
0.378* |
-0.656*** |
-0.656*** |
-0.640** |
-0.640** |
|
(3.083) |
(2.386) |
(-3.491) |
(-3.491) |
(-3.242) |
(-3.242) |
||
diff(beertax, 6) |
-0.869* |
||||||
(-2.211) |
|||||||
Num.Obs. |
336 |
48 |
48 |
336 |
336 |
336 |
336 |
R2 |
0.093 |
0.110 |
0.119 |
0.905 |
0.041 |
0.080 |
0.036 |
Std.Errors |
Custom |
IID |
IID |
IID |
IID |
IID |
IID |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
Figure 9.2 summarizes all results regarding the variable beertax in a grphical way.
Code
9.4.5 Some additional graphics
The following data graphics should help to explain the estimation results of the different estimators.
Code
ggplot(as.data.frame(Fatalities), aes(y=frate, x=beertax)) +
geom_point( aes(color=state), show.legend = FALSE ) +
geom_smooth(method=lm, se=F, aes(color=year), linewidth=0.4, show.legend = FALSE) +
geom_smooth(method=lm, aes(y=frate, x=beertax), color="black", linewidth=0.5) +
theme_bw()
Code
ggplot(as.data.frame(Fatalities), aes(y=frate, x=beertax)) +
geom_point( aes(color=state), show.legend = FALSE ) +
geom_smooth(method=lm, se=F, aes(color=state), linewidth=0.4, show.legend = FALSE) +
theme_bw()
Code
ggplot(as.data.frame(Fatalities), aes(y=Bfrate, x=Bbeertax)) +
geom_point( aes(color=state), show.legend = FALSE) +
geom_smooth(method=lm, aes(y=Bfrate, x=Bbeertax), color="black", linewidth=0.5) +
geom_text(aes(y=Bfrate, x=Bbeertax, label=state), nudge_y = 0.09, size=2, color="grey") +
theme_bw()
Code
ggplot(as.data.frame(Fatalities), aes(y=frate, x=beertax, color=state)) +
geom_point( show.legend = FALSE) +
geom_smooth(aes(y=frate, x=beertax), color="black", linewidth=0.4, method=lm) +
facet_wrap(vars(state)) +
theme_bw()
Code
ggplot(as.data.frame(Fatalities), aes(y=Wfrate, x=Wbeertax, color=state)) +
geom_point( show.legend = FALSE) +
geom_smooth(method=lm ,se=TRUE, show.legend = FALSE) +
facet_wrap(vars(state)) +
theme_bw()
Code
ggplot(as.data.frame(Fatalities), aes(y=Wfrate, x=Wbeertax)) +
geom_point( aes(color=state), show.legend = FALSE) +
geom_smooth(method=lm, aes(y=Wfrate, x=Wbeertax), color="black", linewidth=0.5) +
theme_bw()
9.5 Some more advanced panel data methods
9.5.1 The Random Effects (RE) Estimator
The within (FE) estimator is a regression on data that have been transformed so that the individual effects a_i vanish (they are “transformed out” – the same is true for the DIF estimator)
-
The equivalent LSDV estimator considers the individual effects a_i as parameters to be estimated (they are “estimated out”)
Both give identical estimates for \boldsymbol \beta
Both allow the individual effects a_i to be correlated with with the explanatory variables
But both are not efficient estimates as they throw away all cross section information, which is unnecessary if the individual error a_i is not correlated with the explanatory variables
-
On the contrary, the random effects (RE) estimator treats the individual effects as random and, more importantly, as part of the error term and seeks to obtain efficient estimates for \boldsymbol \beta via a GLS estimator
However, because the individual effects a_i do not vanish with this approach and remain a part of the error term we have to assume that a_i is uncorrelated with the explanatory variables (as otherwise MLR.4 would be violated)
If a_i is actually not correlated with the explanatory variables (which can be tested under certain circumstances) the RE estimator justifiably uses both within and cross section information to estimate \boldsymbol \beta in an optimal way
Compared to pooled OLS and the BTW estimator, which also have to assume that a_i is uncorrelated with the explanatory variables, the RE estimator is more efficient, because it takes care of the autocorrelation in the error term due to the presence of a_i, i.e., it is a GLS estimator and thus BLUE
For the procedure to have good properties, n should be large relative to T (wide panels). The properties for the reverse case are largely unknown, Wooldridge (2019), Chapt. 14
Generally, GLS seeks for a transformation of the model so that the error term of the transformed model becomes iid and simple OLS can be applied to obtain efficient estimates
- Weighted LS to handle heteroscedasticity and the Koyeck-transformation to handle autocorrelation have already been discussed as GLS procedures in former sections
Our starting point is Equation 9.4 again together with the error component model Equation 9.8. Under the assumption that u_{it} is iid and that a_i is not correlated across units the covariance matrix of \mathbf e has a block-diagonal, cluster-structure:
\operatorname {Var} (\mathbf e) = \boldsymbol{\Omega}=\left[\begin{array}{cccc} \boldsymbol\Sigma_1 & \mathbf 0 & \cdots & \mathbf 0 \\ \mathbf 0 & \boldsymbol\Sigma_i & \cdots & \mathbf 0 \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf 0 & \mathbf 0 & \cdots & \boldsymbol\Sigma_n \end{array}\right] \tag{9.43}
Assuming homosecdasticity of a_i and that the idiosyncratic errors u_{it} are uncorrelated with the unit specific a_i, Equation 9.8 then implies
\operatorname {Var}(e_{it}) = \sigma_a^2 + \sigma_u^2, \ \text{ and } \ \operatorname{Cov}(e_{it},e_{is}) = \sigma_a^2
So we got for each unit i
\boldsymbol{\Sigma} = \left[\begin{array}{cccc} (\sigma_a^2+\sigma_{u}^2) & \sigma_a^2 & \cdots & \sigma_a^2 \\ \sigma_a^2 & (\sigma_a^2+\sigma_{u}^2) & \cdots & \sigma_a^2 \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_a^2 & \sigma_a^2 & \cdots & (\sigma_a^2+\sigma_{u}^2) \end{array}\right] \tag{9.44}
Hence, the covariance matrix of \mathbf e depends only on two parameters,
the variance of the individual effect \sigma_a^2
the variance of the idiosyncratic error \sigma_u^2
Give the covariance matrix of \mathbf e, Equation 9.43, the RE estimator, which is the GLS estimator of model Equation 9.4, is given by:
\hat{\boldsymbol{\beta}}_{R E} = \left(\mathbf{X}^{\prime} \mathbf \Omega^{-1} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime} \mathbf \Omega^{-1} \mathbf{y} \tag{9.45}
-
The formula for the RE estimator of Equation 9.45 can be directly applied to the data if we have an estimate of the covariance matrix \mathbf \Omega of the error \mathbf e, which in our case only depends on \sigma^2_a and \sigma^2_u
- However, the covariance matrix \mathbf \Omega is a (n \times T)\times (n \times T) matrix, which easily could become very large in a wide panel setting
Therefore, it could be much more convenient to search for a variable transformation along the lines shown above, like in Equation 9.50 and Equation 9.51. And this transformation is called quasi time-demeaning
Quasi time-demeaning
As shown in Equation 9.49, a GLS estimation can be carried out by pre-multiplying the variables with \, \mathbf \Omega^{-\frac{1}{2}} to get a transformed model with iid errors.
- \mathbf \Omega is defined in Equation 9.43 and equals to:
\, \mathbf \Omega = [\mathbf I_{_n} \otimes \mathbf \Sigma ]
- Thus, we search for:
\mathbf \Omega^{-\frac{1}{2}} = \ [\mathbf I_{_n} \otimes \mathbf \Sigma ] ^{-\frac{1}{2}} = \ \mathbf I_{_n} \otimes \mathbf \Sigma^{-\frac{1}{2}}
Hence, we actually only need the inverse square root matrix \,\mathbf \Sigma^{-\frac {1}{2}} \, from Equation 9.44, which is a tremendous facilitation.
One can show this is: 9
\mathbf \Sigma ^{-\frac{1}{2}} = \frac{1}{\sigma_u} \left [ \mathbf I_{_T} - \theta \tfrac {1}{T} \mathbf i \, \mathbf i' \right] \tag{9.61}
with \theta = 1 - \left [ \frac {\sigma_u^2} { \sigma_u^2 + T\sigma_a^2 } \right ]^{\frac {1}{2}}, \ \ \in [0,1] \tag{9.62}
Thereby, \mathbf i is a T-vector of ones and the outer product \, \mathbf i \, \mathbf i' is a \,T \times T matrix of ones. Thus, the term \, \frac {1}{T} \mathbf i \, \mathbf i' of Equation 9.61 is simply the mean function
The GLS transformation of \, \mathbf y_i and likewise of \, \mathbf X_i therefore is:
\mathbf \Sigma ^{-\frac{1}{2}} \mathbf y_i \ = \ \frac{1}{\sigma_u} \left[\begin{array}{c} y_{i 1} - \theta \bar y_i \\ y_{i 2} - \theta \bar y_i\\ \vdots \\ y_{i T} - \theta \bar y_i \end{array}\right] \tag{9.63}
This transformation is called quasi time-demeaning. Thereby a fraction \theta of the time averages are subtracted from the variables. And the fraction \theta only depends on the two variances \sigma_a^2 and \sigma_u^2.
If \theta=1, the quasi time-demeaning becomes time-demeaning, which is the within transformation.
Applying Equation 9.63 to the variables we reach to following transformed model (the factor (1 / \sigma_u) cancels):
\underbrace {(y_{it} - \theta \bar y_i)}_{\breve y_{it}} = \underbrace {( \mathbf x_{it} - \theta \mathbf {\bar x}_i)}_{\mathbf {\breve {x}}_{it}} \boldsymbol \beta + \underbrace {(e_{it} - \theta \bar e_i)}_{\breve e_{it}} \tag{9.64}
or \breve y_{it} = \breve {\mathbf x}_{it} \boldsymbol \beta + \breve e_{it} \tag{9.65}
Applying pooled OLS to the stacked Equation 9.65 yields the RE estimator:
\boldsymbol {\hat \beta}_{R E} = (\mathbf {\breve {X'}} \mathbf {\breve X})^{-1} \mathbf {\breve {X'}} \mathbf {\breve y}
\operatorname {Var} (\boldsymbol {\hat \beta}_{R E}) = \sigma_{\breve e}^2 (\mathbf {\breve {X'}} \mathbf {\breve X})^{-1}
-
Depending on \theta, the RE estimator becomes more and more similar to pooled OLS if \, \theta is small – limiting case: \theta = 0
- In this case, the variance of the individual heterogeneity, \sigma_a^2, is small relative to \sigma_u^2, the variance of the idiosyncratic error u_{it}
-
On the other hand, if \theta is near one, the RE estimator becomes more and more similar to the FE estimator – limiting case: \theta = 1
In this case, the variance of the individual heterogeneity, \sigma_a^2, strongly dominates the variance of the idiosyncratic error \sigma_u^2
If \theta \rightarrow 1, a possible bias caused by a correlation of the individual heterogeneity a_i with some explanatory variables, becomes smaller and smaller
Estimating \sigma_a^2 and \sigma_u^2
Up to now, we pretended to know the two variances \sigma_a^2 and \sigma_a^2 that determine \mathbf \Sigma, Equation 9.44, and therefore \theta of the quasi time-demeaning transformation.
However, these two variances have to be estimated. There are several possibilities to do this.
All of them are based on estimated residuals variances (or covariances) of different panel regressions:
One can show that the 10
- estimated residual variance of the pooled OLS estimator converges to
\operatorname {plim} \left[ \mathbf{e'}_{{pooled }} \ \mathbf{e}_{ {pooled}} / (n T) \right]=\sigma_a^2 + \sigma_{u}^2
- estimated residual variance of the FE (LSDV) estimator converges to
\operatorname {plim} \left[ \mathbf{\ddot u'}_{{FE}} \ \mathbf{\ddot u}_{ {FE}} / (n(T-1-K)) \right]=\sigma_{u}^2
- estimated residual variance of the DIF estimator converges to
\operatorname {plim}\left[ \Delta\mathbf{u'}_{{DIF}} \ \Delta \mathbf {u}_{{DIF}} /(n(T-1)) \right]=2 \sigma_{u}^2
- estimated residual variance of the between (BTW) estimator converges to
\operatorname {plim}\left[ \mathbf{\bar e'}_{{BTW}} \ \mathbf{\bar e}_{{BTW}} /(n T) \right] = \sigma_a^2 + \sigma_{u}^2 / T
We could further obtain an estimate for \sigma_{a}^2 by calculating the variance of the estimated fixed effects from FE (LSDV) regressions, Nerlove (1971). However, if there are time-invariant explanatory variables in the model, like
gender
orrace
, the effects of these variables are totally absorbed be the fixed effects (individual intercepts), which could lead to an overestimation of \sigma_{a}^2Baltagi and Baltagi (2005) suggests yet another method which is based on pooled OLS regression. Considering Equation 9.44, we note that all covariances are equal to \sigma_a^2: \operatorname {Cov}(e_{it}e_{is}) = \sigma_a^2, within each group i. For a given t, there are T(T - 1)/ 2 pairs of residuals \hat e_{it} \hat e_{is} that can be used to estimate the covariances, which are all equal to \sigma_a^2. Afterwards, we average over all t and finally average over all units i to obtain the following estimate for \sigma_a^2:
\operatorname{plim} \frac{1}{n} \sum_{i=1}^n \frac{\sum_{t=2}^T \sum_{s=1}^{t-1} \hat e_{i t} \hat e_{i s}}{T(T-1) / 2}=\sigma_a^2
We recognize from above that we need at least two relationships (equations) to determine the two variances:
- A popular variant is using the between estimator residuals to estimate \sigma_a^2 + \sigma_{u}^2 / T and the FE estimator residuals to get an estimate for \sigma_u^2. This method is the default of the
plm()
procedure (Swamy and Arora (1972),swar
). But with the optionmethod
you can choose several other variants
As we have to estimate the two variances to be able to apply the GLS transformation from Equation 9.63, the RE estimator actually is a feasible GLS estimator.
9.5.2 Example with Fatalities dataset
We continue our example with the Fatalities dataset from Stock and Watson (2020)
RE estimator
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)
Call:
plm(formula = frate ~ beertax, data = Fatalities, model = "random",
index = c("state", "year"))
Balanced Panel: n = 48, T = 7, N = 336
Effects:
var std.dev share
idiosyncratic 0.03605 0.18986 0.119
individual 0.26604 0.51579 0.881
theta: 0.8622
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.471090 -0.120045 -0.021530 0.091011 0.964350
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
(Intercept) 2.067141 0.099971 20.6773 <2e-16 ***
beertax -0.052016 0.124176 -0.4189 0.6753
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 12.648
Residual Sum of Squares: 12.642
R-Squared: 0.00052508
Adj. R-Squared: -0.0024674
Chisq: 0.175467 on 1 DF, p-value: 0.6753
RE estimator – twoways
summary( RE_2w <- plm(frate ~ beertax, index= c("state", "year"),
model="random",
effect = "twoways",
data = Fatalities))
Twoways effects Random Effect Model
(Swamy-Arora's transformation)
Call:
plm(formula = frate ~ beertax, data = Fatalities, effect = "twoways",
model = "random", index = c("state", "year"))
Balanced Panel: n = 48, T = 7, N = 336
Effects:
var std.dev share
idiosyncratic 0.035300 0.187883 0.117
individual 0.266148 0.515895 0.880
time 0.001031 0.032116 0.003
theta: 0.8636 (id) 0.3548 (time) 0.3531 (total)
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.479379 -0.114279 -0.017247 0.093782 0.946156
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
(Intercept) 2.059236 0.100884 20.4120 <2e-16 ***
beertax -0.036614 0.125073 -0.2927 0.7697
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 12.321
Residual Sum of Squares: 12.318
R-Squared: 0.00025652
Adj. R-Squared: -0.0027367
Chisq: 0.0856989 on 1 DF, p-value: 0.76972
Code
modelsummary(list("Pooled"=Pooled, "BTW"=Between, "DIF"=DIF, "FE"=Within, "FE_2w"=Within_2w, "RE"=RE, "RE_2w"=RE_2w),
vcov = c(vcovHC, "classical", "classical", "classical", "classical", "classical", "classical"),
stars = TRUE,
fmt = 3,
statistic = c('statistic'),
gof_omit = "A|L|B|F|RM",
coef_omit = "Inter|year|state",
align = "lddddddd",
output = "flextable"
)
|
Pooled |
BTW |
DIF |
FE |
FE_2w |
RE |
RE_2w |
---|---|---|---|---|---|---|---|
beertax |
0.365** |
0.378* |
-0.656*** |
-0.640** |
-0.052 |
-0.037 |
|
(3.083) |
(2.386) |
(-3.491) |
(-3.242) |
(-0.419) |
(-0.293) |
||
diff(beertax, 6) |
-0.869* |
||||||
(-2.211) |
|||||||
Num.Obs. |
336 |
48 |
48 |
336 |
336 |
336 |
336 |
R2 |
0.093 |
0.110 |
0.119 |
0.041 |
0.036 |
0.001 |
0.000 |
Std.Errors |
Custom |
IID |
IID |
IID |
IID |
IID |
IID |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
9.5.4 The Hausman test and a CRE based specification test
We have learned much about FE and RE estimators. The latter has some advantages,
as it is efficient and we can estimate the effects of time-constant variables, which are often in the center of interest
-
However, we need to treat the individual effects as uncorrelated with the other regressors, which is often hard to justify
- The RE estimator, therefore, may suffer from inconsistency due to this correlation
The FE estimator doesn’t have this problem
The question is, which one should we use?
The specification test proposed by Hausman (1978) can be used to test for such correlations of the individual heterogeneity a_i with the explanatory variables. 12
The test is based on the idea that under the hypothesis of no correlation, both FE and RE (FGLS) estimators are consistent, but RE is asymptotically efficient and FE is not
However, under the alternative, FE is consistent, but RE is not
Therefore, under the null hypothesis, the two estimates should not differ systematically, and a test can be based on the difference (\boldsymbol {\hat \beta}_{FE} - \boldsymbol {\hat \beta}_{RE})
Hausman showed that the covariance matrix of this difference is (provided that RE is actually efficient, e.g., u_{it} is actually iid) 13
\operatorname {Var}(\boldsymbol {\hat \beta}_{FE} - \boldsymbol {\hat \beta}_{RE}) \ = \ \operatorname {Var}(\boldsymbol {\hat \beta}_{FE}) - \operatorname {Var}(\boldsymbol {\hat \beta}_{RE}) = \mathbf \Psi \tag{9.69}
The resulting Wald test statistic is:
W = (\boldsymbol {\hat \beta}_{FE} - \boldsymbol {\hat \beta}_{RE})' \mathbf \Psi^{-1}(\boldsymbol {\hat \beta}_{FE} - \boldsymbol {\hat \beta}_{RE}) \ \sim \ \chi ^2(k-1)
Let’s apply this test to our previous example of the relationship between traffic fatality rates and a beer tax. To remember look at Table 9.3. The results of the FE and RE estimators are quite different, but are they statistically significant? The following Hausman test should answer this question.
# Hausman test with the phtest() function from the plm package
phtest(Within, RE)
Hausman Test
data: frate ~ beertax
chisq = 18.353, df = 1, p-value = 1.835e-05
alternative hypothesis: one model is inconsistent
The H0 that both models are consistent is overwhelmingly refuted – the difference between these two estimators is too large to be accidentally. Thus, the FE estimator has to be used in this case.
Now, consider our last example which estimates a wage equation; to remember look at Table 9.4.
Here, the difference between FE and RE estimates in not so obvious. Note, that we can only compare the coefficients of time varying variables.
Carrying out a Hausman test yields the following result:
phtest(FE_, RE_)
Hausman Test
data: lwage ~ educ + black + hisp + exper + expersq + married + union + ...
chisq = 31.707, df = 10, p-value = 0.000448
alternative hypothesis: one model is inconsistent
Also in this case, the H0 that both models are consistent is overwhelmingly rejected. Hence, the FE estimator should be used.
One weakness of this test is that it cannot be made robust against serial correlation of the idiosyncratic error u_{it}. If u_{it} violates the iid assumption, the formula for the covariance matrix Equation 9.69 is incorrect, leading to biased test results.
However, there is a variant of the Hausman test, based on a CRE model, that can be made robust against autocorrelated error and is also more intuitive than the original test.
The CRE approach provides a simple formal way of choosing between FE and RE approaches:
-
The CRE model differ from a pure RE model because of the additional inclusion of the time averaged variables \mathbf {\bar x}_i in Equation 9.68 into the model
If the parameter vector \boldsymbol \delta of \mathbf {\bar x}_i is \mathbf 0 in Equation 9.68, the CRE becomes a RE model
Furthermore, in this case, as Equation 9.67 shows, there is no correlation between \mathbf {\bar x}_i and hence \mathbf {x}_{it} on the one hand and the individual heterogeneity a_i on the other, which is the main assumption for applying RE
Thus, the H0 that RE is sufficient (that we do not need the CRE and therefore the FE approach) is \boldsymbol \delta = \mathbf 0 14
- To carry out the test, we estimates a CRE model, like Equation 9.68, and test whether all coefficients of the time averaged variables \mathbf {\bar x}_i are zero
Let’s do that for our wage equation:
# We have stored the results of the CRE model in the list CRE
# We test, whether all between transformed (time averaged) variables
# have a zero coefficient
lht(CRE, matchCoefs(CRE,"B_"))
Linear hypothesis test
Hypothesis:
B_expersq = 0
B_married = 0
B_union = 0
Model 1: restricted model
Model 2: lwage ~ educ + black + hisp + exper + expersq + married + union +
year + B_exper + B_expersq + B_married + B_union
Res.Df Df Chisq Pr(>Chisq)
1 4345
2 4342 3 26.361 8.013e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Once again, the H0 is clearly rejected, meaning that we need a FE or CRE model.
Now the same with a cluster-robust covariance matrix. We get an even stronger rejection of the H0.
lht(CRE, matchCoefs(CRE,"B_"), vcov=vcovHC)
Linear hypothesis test
Hypothesis:
B_expersq = 0
B_married = 0
B_union = 0
Model 1: restricted model
Model 2: lwage ~ educ + black + hisp + exper + expersq + married + union +
year + B_exper + B_expersq + B_married + B_union
Note: Coefficient covariance matrix supplied.
Res.Df Df Chisq Pr(>Chisq)
1 4345
2 4342 3 30.038 1.355e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Finally, we tried the a CRE model which was estimated by pooled OLS, also with a cluster-robust covariance matrix. Surprisingly, we get exactly the same result as above (without a cluster-robust covariance matrix we get a clearly biased outcome)
lht(CRE1, matchCoefs(CRE1,"B_"), vcov=vcovHC)
Linear hypothesis test
Hypothesis:
B_expersq = 0
B_married = 0
B_union = 0
Model 1: restricted model
Model 2: lwage ~ educ + black + hisp + exper + expersq + married + union +
year + B_exper + B_expersq + B_married + B_union
Note: Coefficient covariance matrix supplied.
Res.Df Df Chisq Pr(>Chisq)
1 4345
2 4342 3 30.038 1.355e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Remark: The phtest()
function has an option to carry out an CRE-based Hausman test, using the option: method = "aux"
. However, this procedure doesn’t seem to work properly. It crashed when using time-dummies.
Code
# `phtest()` function with "aux" option
# You have to use a model formula as first argument
## phtest(lwage ~ educ + black + hisp + exper + expersq + married + union + year,
## data = pwagepan, method = "aux", vcov = vcovHC)
# This would lead to an error message:
# Error in vcov(auxmod)[range, range] : subscript out of bounds
9.5.5 The Hausman-Taylor IV Estimator 15
We considered the FE, LSDV and DIF estimators which provide consistent estimates but not for the coefficients of time-invariant regressors because these are swept away by these data transformations (absorbed into the fixed effects)
The Model
The Hausman-Taylor estimator is an IV estimator that additionally enables the consistent estimation of the coefficients of time-invariant regressors. It does so by assuming that some specified regressors are uncorrelated with the unobserved individual heterogeneity a_i
In particular, we have the model
y_{it} = \mathbf x_{{1it}} \boldsymbol \beta_1 + \mathbf x_{2{it}} \boldsymbol \beta_2 + \mathbf z_{1i} \boldsymbol \gamma_{1i} + \mathbf z_{2i} \boldsymbol \gamma_{2i} + \underbrace {(a_i + u_{it})}_{e_{it}} \tag{9.70}
- Or stacked to matrices
\mathbf{y}=\mathbf{X}_1 \boldsymbol\beta_1+\mathbf{X}_2 \boldsymbol\beta_2+\mathbf{Z}_1 \boldsymbol\gamma_1+\mathbf{Z}_2 \boldsymbol\gamma_2 + \underbrace {(\mathbf{a}+\mathbf{u})}_{\mathbf e} \tag{9.71}
-
Thereby, the time-varying regressors are denoted by \mathbf X_{1 \text { or } 2} and the time-invariant ones (the observed individual heterogeneity) are denoted by \mathbf Z_{1 \text { or } 2}
The regressors [\mathbf X_1, \mathbf Z_1] are strict exogenous with respect to a_i, whereas the regressors [\mathbf X_2, \mathbf Z_2] are allowed to be correlated with a_i
All regressors have to be exogenous 16with respect to the idiosyncratic error u_{it}. If this is not the case, you have to employ a usual IV-estimator with external instruments, which are exogenous with respect to u_{it} 17
The Instruments
As [\mathbf X_1, \mathbf Z_1] are exogenous, they need not to be instrumented, i.e., they can serve as their own (internal) instruments
-
So, we need “external” instruments for [\mathbf X_2, \mathbf Z_2]
Hausman and Taylor (1981) suggest to use the within-transformed \mathbf {\ddot X}_2 as instruments for \mathbf X_2 (because the \mathbf {\ddot X}_2 are uncorrelated with a_i by construction 18) and
-
the between-transformed \mathbf {\bar X}_1 as instruments for \mathbf Z_2, because \mathbf X_1 is strict exogenous by assumption and therefore the same is true for their time-averages \mathbf {\bar X}_1
- \mathbf {\ddot X}_2 and \mathbf {\bar X}_1 are external instruments, because they are not part of the original model, Equation 9.71, i.e., they contain data information from other time periods
So, we have the instrument matrix: \mathbf W = [\mathbf X_1, \mathbf {\ddot X}_2, \mathbf Z_1, \mathbf {\bar X}_1]. Because \mathbf {\ddot X}_1, is a linear combination of \mathbf {X}_1 this is equivalent to
\mathbf W = [\mathbf {\ddot X}_1, \mathbf {\ddot X}_2, \mathbf Z_1, \mathbf {\bar X}_1] \tag{9.72}
The Estimation
One could now proceed and estimate Equation 9.71 by a IV procedure
However, Hausman and Taylor (1981) instead proceeded with the quasi time-demeaned RE model, Equation 9.73, for efficiency reasons 19
\mathbf{\breve y}=\mathbf{\breve X}_1 \boldsymbol\beta_1+\mathbf{\breve X}_2 \boldsymbol\beta_2 + \mathbf{\breve Z}_1 \boldsymbol\gamma_1 + \mathbf{\breve Z}_2 \boldsymbol\gamma_2 + \underbrace {(\mathbf{\breve a}+\mathbf{\breve u})}_{\mathbf {\breve e}} \tag{9.73}
-
The IV estimation procedure with instruments \mathbf W applied to Equation 9.73 should now deliver consistent and efficient estimates of \boldsymbol \beta and \boldsymbol \gamma. Yet, we have to note:
The model is only identified, if the number of time-varying exogenous variables in \mathbf X_1 is at least as large as the number of time-constant endogenous variables in \mathbf Z_2
-
Before applying IV estimation to Equation 9.73, we nevertheless have to estimate the variances \sigma_u^2 and \sigma_a^2. These determine \theta from Equation 9.62, which is necessary to adopt the quasi time-demeaning transformation from Equation 9.64 to Equation 9.73. To this matter, Hausman and Taylor (1981) proceeded the following way 20
\sigma_u is estimated, as usual, by an FE model of Equation 9.71
They adopted an IV estimation on the group means of the FE residuals from step i. with respect to [\mathbf {Z}_1, \mathbf Z_2], using the instruments [\mathbf {X}_1, \mathbf Z_1]. The residual variance of this regression delivers a consistent estimate of (\sigma_a^2 + \frac {1}{T}\sigma_u^2)
Using the results from step i. and ii., both variances can be determined and thus, we have the necessary estimate \hat \theta
The Results
- The resulting Hausman-Taylor IV estimator of Equation 9.73 is, 21
\boldsymbol {\hat \beta}_{HT} = \left({ {\mathbf{\breve X}'}} \mathbf P_{_{\mathbf W}} {\mathbf{\breve X}}\right)^{-1} \left({\mathbf{\breve X}'} \mathbf P_{_{\mathbf W}} {\mathbf{\breve y}}\right) \tag{9.74}
- This is the usual 2SLS formula applied to the quasi time-demeaned variables with
\mathbf X := [\mathbf X_1, \mathbf X_2, \mathbf Z_1, \mathbf Z_2]
\boldsymbol \beta := [\boldsymbol \beta_1, \boldsymbol \beta_2, \boldsymbol \gamma_1, \boldsymbol \gamma_2]'
\mathbf W := [\mathbf {\ddot X}_1, \mathbf {\ddot X}_2, \mathbf Z_1, \mathbf {\bar X}_1]
\ \mathbf P_{_{\mathbf W}} := {\mathbf{W}} \left({\mathbf{W}'} {\mathbf{W}}\right)^{-1} {\mathbf{W}'}
Note, \mathbf P_{_{\mathbf W}} is the idempotent projection matrix, or Hat-matrix, regarding the instruments \mathbf W 22
We further get the usual 2SLS covariance matrix:
\operatorname {Var} (\boldsymbol {\hat \beta}_{HT}) = \sigma_{\breve e}^2\left({ {\mathbf{\breve X}'}} \mathbf P_{_{\mathbf W}} {\mathbf{\breve X}}\right)^{-1} \tag{9.75}
- With
plm()
, one can also estimate a variant of the Hausman-Taylor estimator with an augmented set of instruments, suggested by Amemiya and MaCurdy (1986), with the optioninst.method = "am"
. This estimator should be a little more efficient, as it might mitigate a possible weak instrument problem
Empirical example – wage equation
# Hausman-Taylor IV estimator
# Only educ is endogenous
# The formula has now tree parts, separated by | .
# 1. part: the usual formula: y ~ explanatory variables.
# 2. part: the list of all instruments, internal + external.
# 3. part: the list of all endogenous variables.
# There is no need to discriminate between time-varying or time-constant variables,
# the procedure will do that for you.
# But you have to look, whether the number of time-varying exogenous variables
# is at least as large than the number of time-constant endogenous variables:
# In our case, we have 4 varying exogenous vars: exper, expersq, married, union,
# and one time-constant endogenous variable: educ
# So, the model is identified
HT <- plm(lwage ~ educ + black + hisp +
exper + expersq + married + union |
black + hisp + exper + expersq + married + union |
educ,
data = pwagepan,
model = "random",
random.method = "ht",
inst.method = "baltagi"
)
summary(HT)
Oneway (individual) effect Random Effect Model
(Hausman-Taylor's transformation)
Instrumental variable estimation
(Baltagi's transformation)
Call:
plm(formula = lwage ~ educ + black + hisp + exper + expersq +
married + union | black + hisp + exper + expersq + married +
union | educ, data = pwagepan, model = "random", random.method = "ht",
inst.method = "baltagi")
Balanced Panel: n = 545, T = 8, N = 4360
Effects:
var std.dev share
idiosyncratic 0.1233 0.3511 0.527
individual 0.1108 0.3329 0.473
theta: 0.6506
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-13.03022 -0.40651 0.06607 0.52849 4.39476
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
(Intercept) -0.26424642 0.18936474 -1.3954 0.1628843
educ 0.11435287 0.01566574 7.2996 2.887e-13 ***
black -0.13971986 0.04875583 -2.8657 0.0041608 **
hisp 0.03291060 0.04518652 0.7283 0.4664130
exper 0.11159180 0.00827684 13.4824 < 2.2e-16 ***
expersq -0.00398352 0.00059813 -6.6600 2.738e-11 ***
married 0.06086456 0.01683741 3.6148 0.0003005 ***
union 0.10649729 0.01784304 5.9686 2.394e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 653.17
Residual Sum of Squares: 4357.1
R-Squared: 0.17788
Adj. R-Squared: 0.17656
Chisq: 872.42 on 7 DF, p-value: < 2.22e-16
# educ, married and exper are endogenous
HT1 <- plm(lwage ~ educ + black + hisp +
exper + expersq + married + union |
black + hisp + expersq + union |
educ + married + exper,
data = pwagepan,
model = "random",
random.method = "ht",
)
# educ, married and exper are endogenous, Amemiya and MaCurdy (1986) instruments
HT1AM <- plm(lwage ~ educ + black + hisp +
exper + expersq + married + union |
black + hisp + expersq + union |
educ + married + exper,
data = pwagepan,
model = "random",
random.method = "ht",
inst.method = "am"
)
Code
modelsummary(list("Pooled"=Pooled_, "RE"=RE_, "FE"=FE_, "CRE"=CRE , "HT"=HT, "HT_1"=HT1, "HT_1_AM"=HT1AM),
vcov = c(vcovHC, "classical", "classical", "classical", "classical", "classical", "classical"),
stars = TRUE,
fmt = 3,
statistic = c('statistic'),
gof_omit = "A|L|B|F|RM",
coef_omit = "Inter|year",
align = "cddddddd",
output = "flextable"
)
|
Pooled |
RE |
FE |
CRE |
HT |
HT_1 |
HT_1_AM |
---|---|---|---|---|---|---|---|
educ |
0.099*** |
0.101*** |
0.095*** |
0.114*** |
0.101*** |
0.111*** |
|
(10.812) |
(11.357) |
(8.676) |
(7.300) |
(11.147) |
(7.102) |
||
black |
-0.144** |
-0.144** |
-0.139** |
-0.140** |
-0.144** |
-0.144** |
|
(-2.875) |
(-3.027) |
(-2.840) |
(-2.866) |
(-2.969) |
(-2.959) |
||
hisp |
0.016 |
0.020 |
0.005 |
0.033 |
0.020 |
0.029 |
|
(0.401) |
(0.473) |
(0.112) |
(0.728) |
(0.465) |
(0.645) |
||
exper |
0.089*** |
0.112*** |
0.117*** |
0.117*** |
0.112*** |
0.112*** |
0.113*** |
(7.179) |
(13.572) |
(13.878) |
(13.878) |
(13.482) |
(13.622) |
(13.621) |
|
expersq |
-0.003** |
-0.004*** |
-0.004*** |
-0.004*** |
-0.004*** |
-0.004*** |
-0.004*** |
(-3.278) |
(-6.875) |
(-7.106) |
(-7.106) |
(-6.660) |
(-6.904) |
(-6.786) |
|
married |
0.108*** |
0.063*** |
0.045* |
0.045* |
0.061*** |
0.062*** |
0.046* |
(4.135) |
(3.744) |
(2.474) |
(2.474) |
(3.615) |
(3.703) |
(2.518) |
|
union |
0.180*** |
0.107*** |
0.082*** |
0.082*** |
0.106*** |
0.106*** |
0.107*** |
(6.540) |
(6.022) |
(4.255) |
(4.255) |
(5.969) |
(5.966) |
(5.997) |
|
B_exper |
-0.167** |
||||||
(-3.278) |
|||||||
B_expersq |
0.009** |
||||||
(2.884) |
|||||||
B_married |
0.098* |
||||||
(2.182) |
|||||||
B_union |
0.189*** |
||||||
(3.742) |
|||||||
Num.Obs. |
4360 |
4360 |
4360 |
4360 |
4360 |
4360 |
4360 |
R2 |
0.187 |
0.178 |
0.178 |
0.183 |
0.178 |
0.178 |
0.178 |
Std.Errors |
Custom |
IID |
IID |
IID |
IID |
IID |
IID |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
In this text we always assume that there are no error correlations across units i. Prominent examples for correlations across units are geographic vicinity or some other sorts of clusters. Typically, there is a natural ordering of the units or clusters in this case.↩︎
In this text, we are always assuming a balanced panel, which means that every unit i has the same number of observations over time – no missing observations. Unbalanced panels are somewhat harder to treat, but don’t lead to major different insights.↩︎
The block-diagonal matrix \mathbf D equals (\mathbf I_n \otimes \mathbf i). Thereby the Kronecker product \mathbf A \otimes \mathbf B means that every element of \mathbf A is multiplied by the matrix \mathbf B.↩︎
n-consistent means consistency if n \rightarrow \infty and T is constant.↩︎
T-consistent means consistency if T \rightarrow \infty and n is constant.↩︎
As the DIF estimator leads to autocorrelated errors \Delta u_{it} in this case, i.e. to an MA(1) process, see Equation 9.15.↩︎
See the slides: Multiple Regression, or Wooldridge (2019) chapter 3.↩︎
The presentation of this example follows very narrowly Croissant and Millo (2018), pp. 3-5.↩︎
See below for a very detailed proof or Wooldridge (2010), Chapt. 10, p. 326, or Hansen (2022), Chapt. 17.15, p. 614 ff.↩︎
The variable
exper
could be special case if we had estimated the equation with a full set of time-dummy variables (variableyear
). This variable is time varying, but in a special manner: experience increases at a constant rate for every individual. And if we had estimated the model with a full set of time-dummy variables (variableyear
), the variableexper
would have become a constant variable (FWL theorem).↩︎See the short and concise treatment in @cameron2010microeconometrics.↩︎
Strict exogenous for the RE and FE transformations.↩︎
These IV-procedures usually poses no special difficulties, see Green (2017), Chapt. 11.8.1. For a general treatment of IV estimators see Section 7.5.2.↩︎
We have: E((x_{it}-\bar x_i)a_i) = E(x_{it} a_i-\bar x_ia_i) = E_a[E(x_{it} a_i-\bar x_ia_i|a_i]) = E_a[E (x_{it} a_i | a_i) -E( \bar x_ia_i|a_i) ] = E_a[a_i E (x_{it} | a_i) - a_i E( \bar x_i |a_i) ] = E_a[a_i E (x_{it} | a_i) - a_i E( x_i |a_i) ] = E_a[0] = 0.↩︎
Note that quasi time-demeaning of time-invariant variables like \mathbf Z or \mathbf a leads to (1-\theta )\mathbf Z. But the factor (1-\theta) cancels in a balanced data set and therefore \mathbf {\breve Z} would be equal to \mathbf Z in this case.↩︎
For details see Green (2017), Chapt. 11.8.2, or Baltagi and Baltagi (2005), p. 126 ff.↩︎
For unbalanced panels, there are minor changes as \theta is different for the units i in this case, see Hansen (2022), p. 633.↩︎
Projecting into the space spanned by the instruments \mathbf W.↩︎