Mainly based on Wooldridge (2019), Chapters 13 and 14, Croissant and Millo (2018) and Green (2017)

9.1 Topics of this chapter

  • Basic Concepts and Motivation
  • Pooled Cross-Section Models
    • Cluster-Robust Standard Errors
  • Panel Data Models
    • Difference Estimator
    • Dummy Variable Estimator
    • Fixed Effects Estimator – Within Estimator
    • Between Estimator
    • Random Effects Estimator
    • Correlated Random Effects Estimator (CRE)
    • Hausman-Tayor IV Estimator
  • Hausman Test

9.2 Basic Concepts and Motivation

Pooled cross-section and panel data exhibit two dimensions

  • a time-dimension

  • a cross-section-dimension

Examples:

  • a households (or firms) survey of 10,000 households (firms), collected every year for a total of five years (wide panel)

  • national accounts data for 25 countries over 150 quarters (long panel)

The principle advantages of this special data structure are

  • more observations and more variation in the data due to less aggregation

  • allowing to control for unobserved individual heterogeneity thereby avoiding omitted variable bias

  • testing for the effects of policy measures (DiD-estimator, see Section 5.5) or testing for structural breaks (Chow Test, see Section 5.6)

Possible disadvantages are

  • a more complicated error term, especially certain types of auto-correlation and heteroscedasticity

  • we most often assume that parameters are constant over cross-section units, as well as over time


9.2.1 Empirical example: Loading a panel data set of wages and explaining variables

# loading basic and useful packages
rm(list=ls()) # clear all
library(AER)  # AER is a general econometric package 
library(plm)  # Panel data package
library(modelsummary) # fancy tables
library(wooldridge)  # the wooldridge package includes all data sets used in the textbook
data(wagepan)  # loading data set wagepan of the `wooldridge`  package 
wagepan[ 1:20, c("nr", "year", "lwage", "educ", "married", "union", "hisp", "black", "hours") ] 
         nr year      lwage educ married union hisp black hours
      1  13 1980  1.1975402   14       0     0    0     0  2672
      2  13 1981  1.8530600   14       0     1    0     0  2320
      3  13 1982  1.3444617   14       0     0    0     0  2940
      4  13 1983  1.4332134   14       0     0    0     0  2960
      5  13 1984  1.5681251   14       0     0    0     0  3071
      6  13 1985  1.6998910   14       0     0    0     0  2864
      7  13 1986 -0.7202626   14       0     0    0     0  2994
      8  13 1987  1.6691879   14       0     0    0     0  2640
      9  17 1980  1.6759624   13       0     0    0     0  2484
      10 17 1981  1.5183982   13       0     0    0     0  2804
      11 17 1982  1.5591905   13       0     0    0     0  2530
      12 17 1983  1.7254101   13       0     0    0     0  2340
      13 17 1984  1.6220223   13       0     0    0     0  2486
      14 17 1985  1.6085882   13       0     0    0     0  2164
      15 17 1986  1.5723854   13       0     0    0     0  2749
      16 17 1987  1.8203338   13       0     0    0     0  2476
      17 18 1980  1.5159627   12       1     0    0     0  2332
      18 18 1981  1.7353791   12       1     0    0     0  2116
      19 18 1982  1.6317437   12       1     0    0     0  2500
      20 18 1983  1.9982288   12       1     0    0     0  2474
# Note the two index-variables (identifiers) nr (for i) and year (for t) 
Code
# A more fancy table type, which is interactive in Browsers
library(gt)
gt(head( round( wagepan[ ,], 3), 10 ) )
nr year agric black bus construc ent exper fin hisp poorhlth hours manuf married min nrthcen nrtheast occ1 occ2 occ3 occ4 occ5 occ6 occ7 occ8 occ9 per pro pub rur south educ tra trad union lwage d81 d82 d83 d84 d85 d86 d87 expersq
13 1980 0 0 1 0 0 1 0 0 0 2672 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 14 0 0 0 1.198 0 0 0 0 0 0 0 1
13 1981 0 0 0 0 0 2 0 0 0 2320 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 14 0 0 1 1.853 1 0 0 0 0 0 0 4
13 1982 0 0 1 0 0 3 0 0 0 2940 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 14 0 0 0 1.344 0 1 0 0 0 0 0 9
13 1983 0 0 1 0 0 4 0 0 0 2960 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 14 0 0 0 1.433 0 0 1 0 0 0 0 16
13 1984 0 0 0 0 0 5 0 0 0 3071 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 14 0 0 0 1.568 0 0 0 1 0 0 0 25
13 1985 0 0 1 0 0 6 0 0 0 2864 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 1.700 0 0 0 0 1 0 0 36
13 1986 0 0 1 0 0 7 0 0 0 2994 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 -0.720 0 0 0 0 0 1 0 49
13 1987 0 0 1 0 0 8 0 0 0 2640 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 1.669 0 0 0 0 0 0 1 64
17 1980 0 0 0 0 0 4 0 0 0 2484 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 13 0 1 0 1.676 0 0 0 0 0 0 0 16
17 1981 0 0 0 0 0 5 0 0 0 2804 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 13 0 1 0 1.518 1 0 0 0 0 0 0 25

9.3 Pooled Cross-Section Models

9.3.1 Basic Model Structure

Index i represents the particular cross-section unit (firms, households, counties, etc.) and the second one, t, is the time-index. Our model is:

y_{i t} = \mathbf{x}_{i t} \boldsymbol{\beta} + e_{i t} \tag{9.1}

\text{with} \ \ i=1, \ldots, n, \ \ t=1, \ldots, T, \ \ N = n \times T

Stacking over time t for a fixed i leads to the data vectors and matrices of the i^{th} cross-section unit:

\underset{T \times 1} {\mathbf{y}_i} = \left[\begin{array}{c} y_{i 1} \\ y_{i 2} \\ \vdots \\ y_{i T} \end{array}\right] \quad \underset{T \times (k+1)} {\mathbf{X}_i}=\left[\begin{array}{ccccc} 1 &x_{i 1}^1 & x_{i 1}^2 & \cdots & x_{i 1}^k \\ 1 & x_{i 2}^1 & x_{i 2}^2 & \cdots & x_{i 2}^k \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{i T}^1 & x_{i T}^2 & \cdots & x_{i T}^k \end{array}\right] \quad \underset{T \times 1} {\mathbf{e}_{i}}=\left[\begin{array}{c} e_{i 1} \\ e_{i 2} \\ \vdots \\ e_{i T} \end{array}\right] \tag{9.2}

Stacking over units i we get:

\underset{N \times 1}{\mathbf{y}}=\left[\begin{array}{c} \mathbf{y}_1 \\ \mathbf{y}_2 \\ \vdots \\ \mathbf{y}_n \end{array}\right] \quad \underset{N \times (k+1) } {\mathbf{X}}=\left[\begin{array}{c} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_n \end{array}\right] \quad \underset{N \times 1}{\mathbf{e}}= \left[\begin{array}{c} \mathbf{e}_1 \\ \mathbf{e}_2 \\ \vdots \\ \mathbf{e}_n \end{array}\right] \ \ \ \Rightarrow \tag{9.3}

\mathbf{y}=\mathbf{X} \boldsymbol{\beta}+\mathbf{e} \tag{9.4}

This looks like a common regression model.


9.3.2 Estimation with Pooled OLS

If assumption MLR.4, i.e. E(e_{it} | \mathbf X)=0 is fulfilled the model can be consistently and unbiased estimated by OLS \rightarrow Pooled OLS, i.e., by simply applying OLS to the stacked model:

\mathbf{y}=\mathbf{X} \boldsymbol{\beta}+\mathbf{e} \ \ \ \Rightarrow \ \ \ \hat{\boldsymbol{\beta}} = (\mathbf{X'}\mathbf{X})^{-1}\mathbf{X'}\mathbf{y} \tag{9.5}

And with the addition of MLR.5 – \operatorname {Var}(\boldsymbol e | \mathbf X)=\sigma^2_e \mathbf I) – we have:

{\operatorname {Var}(\hat{\boldsymbol{\beta}})} = {{\sigma}}_e^2 (\mathbf{X'}\mathbf{X})^{-1} \tag{9.6}

However, as the error term e_{it} in such a model structure is regularly not iid (MLR.5 is violated) Pooled OLS estimation is often

  1. Not efficient

  2. The estimated covariance matrix of \hat{\boldsymbol \beta} is biased and inconsistent in this case, i.e. simply false. In particular, this applies to all statistical tests

  3. Assumption MLR.4 is often violated, i.e. E(e_{it} \mid \mathbf X) depends on \mathbf X, and thus Pooled OLS is biased and inconsistent in this case

All three problems, especially the last one, can be addressed with appropriate procedures which will be discussed in the following. In doing so, typically the special panel structure is exploited.

The special thing about panels is that every cross-section in the sample contains the very same units i (households, firms, counties, countries, etc.)


Why is the error term regularly not iid in this setting?

We can think of the following error structure: We assume that the whole error process e_{it} is the sum of three independent (and unobserved) components – this is called the Error Component Model:

  • An idiosyncratic error u_{it}, which is varying across both dimensions, but meets assumption MLR.5 (iid)

  • A unit specific error a_i, which is different for each unit i but constant over time

  • A time specific error v_t, which is varying over time t but hits all cross-section units in the same way

e_{it} = v_t + a_i+ u_{it} \tag{9.7}

We will often ignore the time specific \, v_t, as these can easily be handled by the inclusion of time-dummy variables, especially in short panels. So, we actually will proceed from

e_{it} = a_i + u_{it} \tag{9.8}

The most interesting error component is the unit specific error a_i, because this reflects unobserved heterogeneity of the units, for instance ability of workers or general propensity for crime of cities.

  • As this error is constant over time for each i, a_i is heavily auto-correlated along the time-dimension

  • Furthermore, a correlation of a_i with some explanatory variables is possible (thereby violating MLR.4) and precisely this aspect is in the center of interest of many panel data methods. Think about our standard example: Correlation of ability and education in a wage equation

Due to the autocorrelation in a_i, e_{it} is not iid (MLR.5 is violated) and therefore the true formula for the covariance matrix of \hat {\boldsymbol \beta} is not Equation 9.6 but

\operatorname{Var}(\hat{\boldsymbol\beta}) = (\mathbf X' \mathbf X)^{-1} (\mathbf X' \boldsymbol{\Omega}_e\mathbf X) (\mathbf X' \mathbf X)^{-1} \tag{9.9}


HAC or HC estimate of the covariance matrix

If we assume

  • u_{it} to be iid

  • a_i not to be correlated across units i, \operatorname{Cov}\left(a_{i t} a_{j s}\right)=0, \ \forall \ s, \ i \neq j (why should they be correlated? maybe in a spacial sense \rightarrow Spacial Econometrics 1)

we have a block-diagonal, cluster-structure, of \boldsymbol{\Omega}_e:

\boldsymbol{\Omega}_e=\left[\begin{array}{cccc} \boldsymbol\Sigma_1 & \mathbf 0 & \cdots & \mathbf 0 \\ \mathbf 0 & \boldsymbol\Sigma_i & \cdots & \mathbf 0 \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf 0 & \mathbf 0 & \cdots & \boldsymbol\Sigma_n \end{array}\right] \tag{9.10}

If we further assume homosecdasticity of a_i and that the idiosyncratic errors u_{it} are uncorrelated with the unit specific a_i, Equation 9.8 implies

\operatorname {Var}(e_{it})=\sigma_a^2 + \sigma_u^2

\operatorname{Cov}(e_{it},e_{is}) = \operatorname{Cov}(u_{it},u_{is}) + \operatorname{Cov}(a_{it},a_{is}) = 0 + E(a_{it} \cdot a_{it}) = \sigma_a^2

It follows for each unit i

\boldsymbol{\Sigma} = \sigma_{u}^2 \mathbf{I}_{_T}+\sigma_a^2 \mathbf{i i'} = \left[\begin{array}{cccc} (\sigma_a^2+\sigma_{u}^2) & \sigma_a^2 & \cdots & \sigma_a^2 \\ \sigma_a^2 & (\sigma_a^2+\sigma_{u}^2) & \cdots & \sigma_a^2 \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_a^2 & \sigma_a^2 & \cdots & (\sigma_a^2+\sigma_{u}^2) \end{array}\right] \tag{9.11}

This is a T \times T matrix with (\sigma_{a}^2+\sigma_u^2) on the principal diagonal and \sigma_a^2 on the non-diagonal elements. (\mathbf i \ldots T \times 1 vector of ones and the outer product \mathbf{i i' \ldots} T \times T matrix of ones).

\widehat {(\mathbf X'\boldsymbol{\Omega}_e \mathbf X)} \ \ \text{ with } \ \ {\boldsymbol {\hat \Sigma}_i} \ = \left[\begin{array}{cccc} \hat e_{i1}^2 & \hat e_{i1}\hat e_{i2} & \cdots & \hat e_{i1}\hat e_{iT} \\ \hat e_{i2}\hat e_{i1} & \hat e_{i2}^2 & \cdots & \hat e_{i2}\hat e_{iT}\\ \vdots & \vdots & \ddots & \vdots \\ \hat e_{iT}\hat e_{i1} & \hat e_{iT}\hat e_{i2} & \cdots & \hat e_{iT}^2 \end{array}\right] \tag{9.12}

  1. This approach is called clustered HC estimator of the covariance matrix. The procedure plm() will do that with the vcov=vcovHC option. Note, that heteroscedastic errors a_i are allowed with this formulation

  2. If we consider \boldsymbol{\Omega}_e as a whole, without cluster structure, we can calculate the usual HAC estimator – we have to fill the zeros in Equation 9.10 with (\hat e_{it} \hat e_{js}) (but for consistency reasons you have to cut off the order of cross-correlations). The procedure lm() will do that with the vcov=vcovHAC option. With the option vcov=vcovHC it will only handle heteroscedasticity, which makes a big difference

  3. And finally, one can try to estimate \sigma_a^2 and \sigma_u^2 from Equation 9.11 directly. Later we will see that the so called Random effects estimator is doing that


9.3.3 Example: Pooled OLS of a wage equation

With the standard procedure lm()

ols_lm <- lm(lwage ~ educ + black + hisp + exper + I(exper^2) + 
               married + union + factor(year), 
             data=wagepan)
coeftest(ols_lm)
       
       t test of coefficients:
       
                           Estimate  Std. Error t value  Pr(>|t|)    
       (Intercept)       0.09205578  0.07827010  1.1761 0.2396076    
       educ              0.09134979  0.00523738 17.4419 < 2.2e-16 ***
       black            -0.13923421  0.02357956 -5.9049 3.799e-09 ***
       hisp              0.01601951  0.02079714  0.7703 0.4411788    
       exper             0.06723450  0.01369484  4.9095 9.467e-07 ***
       I(exper^2)       -0.00241170  0.00081995 -2.9413 0.0032860 ** 
       married           0.10825295  0.01568942  6.8997 5.962e-12 ***
       union             0.18246128  0.01715677 10.6349 < 2.2e-16 ***
       factor(year)1981  0.05831999  0.03035363  1.9214 0.0547528 .  
       factor(year)1982  0.06277442  0.03321407  1.8900 0.0588251 .  
       factor(year)1983  0.06201174  0.03666013  1.6915 0.0908072 .  
       factor(year)1984  0.09046719  0.04009071  2.2566 0.0240849 *  
       factor(year)1985  0.10924630  0.04335248  2.5200 0.0117725 *  
       factor(year)1986  0.14195959  0.04642297  3.0580 0.0022421 ** 
       factor(year)1987  0.17383343  0.04943305  3.5165 0.0004417 ***
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Same with HC-robust se
# coeftest(ols_lm, vcov = vcovHC)

# Same with HAC-robust se
# coeftest(ols_lm, vcov = vcovHAC)

With plm() from the plm package

ols_plm <- plm(lwage ~ educ + black + hisp + exper + I(exper^2) + 
                 married + union  + year,
               data=wagepan, 
               index = c("nr", "year"), model = "pooling")
coeftest(ols_plm)
       
       t test of coefficients:
       
                      Estimate  Std. Error t value  Pr(>|t|)    
       (Intercept)  0.09205578  0.07827010  1.1761 0.2396076    
       educ         0.09134979  0.00523738 17.4419 < 2.2e-16 ***
       black       -0.13923421  0.02357956 -5.9049 3.799e-09 ***
       hisp         0.01601951  0.02079714  0.7703 0.4411788    
       exper        0.06723450  0.01369484  4.9095 9.467e-07 ***
       I(exper^2)  -0.00241170  0.00081995 -2.9413 0.0032860 ** 
       married      0.10825295  0.01568942  6.8997 5.962e-12 ***
       union        0.18246128  0.01715677 10.6349 < 2.2e-16 ***
       year1981     0.05831999  0.03035363  1.9214 0.0547528 .  
       year1982     0.06277442  0.03321407  1.8900 0.0588251 .  
       year1983     0.06201174  0.03666013  1.6915 0.0908072 .  
       year1984     0.09046719  0.04009071  2.2566 0.0240849 *  
       year1985     0.10924630  0.04335248  2.5200 0.0117725 *  
       year1986     0.14195959  0.04642297  3.0580 0.0022421 ** 
       year1987     0.17383343  0.04943305  3.5165 0.0004417 ***
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Same with (clustered) robust se

# vcovHC is object orientated, will apply a different procedure for a plm-object (clustering)
# With a lm-object, pure heteroscedasticity correction only treats the principal diagonal

# coeftest(ols_plm, vcov = vcovHC)

Summarizing the results with the modelsummary package

Code
modelsummary(list(" lm() "=ols_lm, "lm()_HC "=ols_lm, "lm()_HAC "=ols_lm, "plm() "=ols_plm, "plm()_HC "=ols_plm),
            vcov = c("classical", vcovHC, vcovHAC, "classical", vcovHC),
            stars = TRUE, 
            fmt = 3,
            statistic = c('std.error', 'statistic'),
            gof_omit = "A|L|B|F|RM",
            coef_omit = "Inter|year|hisp",
            align = "lddddd",
            output = "flextable"
            )
Table 9.1:

Pooled OLS results for wage equation with different packages and different vcov’s options (se and t-statistics in brackets)

lm()

lm()_HC

lm()_HAC

plm()

plm()_HC

educ

   0.091***

     0.091***

     0.091***

   0.091***

     0.091***

  (0.005)  

    (0.005)  

    (0.010)  

  (0.005)  

    (0.011)  

 (17.442)  

   (17.230)  

    (8.892)  

 (17.442)  

    (8.264)  

black

  -0.139***

    -0.139***

    -0.139** 

  -0.139***

    -0.139** 

  (0.024)  

    (0.025)  

    (0.052)  

  (0.024)  

    (0.050)  

 (-5.905)  

   (-5.658)  

   (-2.704)  

 (-5.905)  

   (-2.763)  

exper

   0.067***

     0.067***

     0.067***

   0.067***

     0.067***

  (0.014)  

    (0.013)  

    (0.019)  

  (0.014)  

    (0.020)  

  (4.909)  

    (5.035)  

    (3.523)  

  (4.909)  

    (3.440)  

I(exper^2)

  -0.002** 

    -0.002** 

    -0.002*  

  -0.002** 

    -0.002*  

  (0.001)  

    (0.001)  

    (0.001)  

  (0.001)  

    (0.001)  

 (-2.941)  

   (-3.110)  

   (-2.285)  

 (-2.941)  

   (-2.358)  

married

   0.108***

     0.108***

     0.108***

   0.108***

     0.108***

  (0.016)  

    (0.015)  

    (0.026)  

  (0.016)  

    (0.026)  

  (6.900)  

    (7.087)  

    (4.137)  

  (6.900)  

    (4.169)  

union

   0.182***

     0.182***

     0.182***

   0.182***

     0.182***

  (0.017)  

    (0.016)  

    (0.027)  

  (0.017)  

    (0.027)  

 (10.635)  

   (11.183)  

    (6.863)  

 (10.635)  

    (6.665)  

Num.Obs.

4360      

  4360      

  4360      

4360      

  4360      

R2

   0.189   

     0.189   

     0.189   

   0.189   

     0.189   

Std.Errors

 IID      

Custom      

Custom      

 IID      

Custom      

+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001


With the stargazer package
Code
# calculating robust se, etc., using coeftest() with vcov. option 
robust_table <- coeftest(ols_plm, vcov. = vcovHC ) 
se_robust <- robust_table[,2]  # se
t_robust <-  robust_table[,3]  # t-values
p_robust <-  robust_table[,4]  # prob-values

library(stargazer) # loading the package

stargazer(ols_lm, ols_plm, ols_plm,  # function call with: model names
          type = "text",  # important option: output text, not LaTex code
          digits = 5,
          report = "vcs*",  # output string: v=variables, c=coeffs, s=se, t=t-vals, p=p-vals, "vcs*" default
          se = list(NULL, NULL, se_robust),  # for the third model, we replace stand. se with robust
          t  = list(NULL, NULL,  t_robust),   # for the third model, we replace stand. t with robust
          p  = list(NULL, NULL,  p_robust), # for the third model, we replace stand. p with robust
          model.names=FALSE,  # no caption for OLS and panel linear
          model.numbers = FALSE,  # no (1) (2) (3)
          column.labels = c("pOLS\\_lm() ", " pOLS\\_plm() ", " pOLS\\_plm()\\_HCse "), 
          dep.var.labels.include = TRUE,  # Name of dep. variable
          dep.var.caption = "",  # no dep. var. caption: dependent variable =
          keep = c("educ", "black", "exper", "I(exper2)", "married", "union" ),  # evident
          keep.stat = c("n", "rsq")  )  # displayed model statistics 
With the texreg package and screenreg procedure
Code
# calculating robust se, etc., using coeftest() with vcov. option 
robust_table <- coeftest(ols_plm, vcov. = vcovHC ) 
se_robust <- robust_table[,2]  # se
t_robust <-  robust_table[,3]  # t-values (not easily possible in textreg, an exchange with se necessary)
p_robust <-  robust_table[,4]  # prob-values

library(texreg)  # loading the package

# if you want output as LaTex code, use texreg()
screenreg( list(ols_lm, ols_plm, ols_plm),  # function call with: list(model names), main diff. to stargazer!
           digits = 5,
           override.se = list(0, 0, se_robust),     # for the third model, we replace stand. se with robust
           override.pvalues = list(0, 0, p_robust),  # for the third model, we replace stand. p with robust
           custom.model.names = c("pOLS_lm() ", " pOLS_plm() ", " pOLS_plm()_HCse"),  # custom model names
           booktabs = FALSE,  # some nice formatting for LaTex (for LaTex, not used with screenreg)
           # bold = 0.05,  # significant coeffs at 0,05 sig.level in bold (for LaTex, not used with screenreg)
           omit.coef = "Inter|hisp|year",  # string: excluded variables from table, here intercept, hisp, year
           include.adjrs = FALSE,  # which model statistic should be excluded, here adjusted R2
           caption = FALSE)  # no caption (for LaTex only?)

9.4 Basic Panel Data Models

We have identified two major problems with pooled data

  • The error of the model, Equation 9.4, is typically heavily auto-correlated along the time-dimension, especially due to a_i from Equation 9.8, which is constant over time for each unit i

    • This problem could be in part addressed with clustered HC estimates of the covariance matrix. However, the efficiency loss remains
  • Furthermore, a correlation of a_i with some explanatory variables is possible (violating MLR.4)

    • Think about on our standard example: Correlation of unobserved ability with education in a wage equation

    • The error component a_i mainly represents unobserved heterogeneity across the units and precisely this aspect is in the center of interest of panel data methods

Panel data techniques generally address one broad issue: How to model unobserved heterogeneity, so that we can avoid biased and/or inefficient estimates.

Consider the following base regression model with data from one cross section at time t.

y_{it} = \beta_0 + \beta_1 x_{it} + a_{i} + u_{it} \tag{9.13}

  • We can think on our standard example, where

    • y_{it} could be the earned wage

    • x_{it} could be education, which effects on wage we want to estimate

    • a_{i} could represent other important factors for earned wage, like general ability, which is constant over time (therefore a_i and not a_{it})

      • But a_{i} is gererally unobservable for the econometrician
    • u_{it} is the error term representing other unobservable components and is assumed to be uncorrelated with x (MLR.4)

The basic problem is that unobserved a_i is often correlated with x_{it} and we actually can only estimate

y_{it} = \beta_0 + \beta_1 x_{it} + \underbrace {(a_{i} + u_{it})}_{e_{it}} \tag{9.14}

The composite error e_{it} of this regression has exactly the structure of Equation 9.8 and due to its possible correlation with x_{it} via a_i, the estimate \hat \beta_1 will be biased and inconsistent in this case – omitted variable bias


9.4.1 The Difference (DIF) Estimator

So how can the panel data structure be helpful for this estimation problem?

  • Suppose, we have not only one but several cross sections, for example for two years, year t-1 and year t, observed for the very same units i \rightarrow panel data. Then we have two relationships for the two periods

y_{i,t-1} = \beta_0 + \beta_1 x_{i,t-1} + a_{i} + u_{i,t-1}

y_{i,t} = \beta_0 + \beta_1 x_{i,t} + a_{i} + u_{i,t}

  • We can take the first difference of the variables (we subtract the equation for period t-1 from the equation for period t) and get, assuming constant \betas

(y_{i,t} - y_{i,t-1}) = (\beta_0 - \beta_0) + \beta_1 ( x_{i,t} - x_{i,t-1} ) + (a_{i}- a_{i}) + (u_{i,t} - u_{i,t-1}) \ \Rightarrow

\Delta y_{i,t} = \beta_1 \Delta x_{i,t} + \Delta u_{i,t} \tag{9.15}

The estimation Equation 9.15 is called Difference (DIF) Estimator for panel data

What was the trick?:

  • We assumed that the variable a_i is constant over time, and is only varying across the units i. Hence, taking the first difference in the data yields (a_{i}- a_{i})=0, i.e., the problematic unobservable variable a_i simply dropped out

    • Thus, Equation 9.15, i.e. \beta_1, can be consistently estimated with OLS; if we have more than two periods, by pooled OLS
  • General ability in a wage equation is a good example for such an interpretation of a_i – it varies over units (persons) but doesn’t change much over time

So, we leaned that

if we have an error structure like Equation 9.8 such as in Equation 9.14, with a unit specific variable a_i representing unobserved unit specific heterogeneity

and this variable is further probably correlated with the explanatory variables (violating assumption MLR.4)

we can take the first difference in the data (unit by unit), thereby simply removing a_i and the problem of correlation with the other explanatory variables

Final remark: The error term in Equation 9.15, \Delta u_{it}, is probably heavily autocorrelated and therefore (pooled) OLS is not efficient and we further will need an HAC-estimator of the covariance matrix.


9.4.2 The Dummy Variable Estimator – LSDV Methods

Another possibility to account for time-invariant individual components is to explicitly introduce them into the model specification in the form of individual intercepts

  • These individual intercepts capture the unit’s heterogeneity. The time dimension of panel data allows in fact to estimate the a_is as additional (fixed) parameters, together with the \betas, the other parameters of interest

Our starting point is Equation 9.13 which we rewrite here for convenience (but leaving out the general intercept \beta_0 and allowing for k explanatory variables, so \mathbf x_{it} is row vector with k variables).

y_{it} = \mathbf x_{it} \boldsymbol \beta + a_{i} + u_{it} \tag{9.16}

As we did in Equation 9.2, we are stacking this equation over time. This yields a equation for every unit i with T observations: 2

\mathbf y_{i} = \mathbf x_{i} \boldsymbol \beta + \mathbf i \, a_{i} + \mathbf u_{i} \tag{9.17}

Compared to Equation 9.2, the new element is the vector \mathbf i \, a_i. Thereby, \mathbf i is a column vector of ones and length of T and

\underset{T \times 1} {\mathbf{i} \, a_{i}} \ = \ \left[\begin{array}{c} 1 \\ 1 \\ \vdots \\ 1 \end{array}\right] a_i \ = \ \left[\begin{array}{c} a_i \\ a_i \\ \vdots \\ a_i \end{array}\right]

  • The important point is that {a}_{i} is constant over time for each unit i. So, we can interpret {\mathbf{i} \, a_{i}} as intercept

  • Clearly, the elements of \mathbf{x}_{i} and \mathbf{u}_{i} are generally allowed to vary over time!

In a final step, we are stacking Equation 9.17 over the n units i to get the equivalent of Equation 9.4

  • By using the block-diagonal matrix \mathbf D, and not simply stacking the \mathbf i, we allow for different intercepts across the units

\left[\begin{array}{c} \mathbf{y}_1 \\ \mathbf{y}_2 \\ \vdots \\ \mathbf{y}_n \end{array}\right] = \underbrace {\left[\begin{array}{c} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_n \end{array}\right]}_{\underset {(T \times n) \times k} {\mathbf X}} \left[\begin{array}{c} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_k \end{array}\right] + \underbrace {\left[\begin{array}{cccc} \mathbf{i} & \mathbf 0 & \cdots & \mathbf 0 \\ \mathbf 0 & \mathbf{i} & \cdots & \mathbf 0 \\ & & \vdots & \\ \mathbf 0 & \mathbf 0 & \cdots & \mathbf{i} \end{array}\right]}_{\underset {(T \times n) \times n} {\mathbf D}} \underbrace {\left[\begin{array}{c} a_1 \\ a_2 \\ \vdots \\ a_n \end{array}\right]}_{\underset {n \times 1}{ \boldsymbol \alpha}} + \left[\begin{array}{c} \mathbf{u}_1 \\ \mathbf{u}_2 \\ \vdots \\ \mathbf{u}_n \end{array}\right] \tag{9.18}

which is

\mathbf{y} = \mathbf{X} \boldsymbol{\beta}+ \mathbf{D} \boldsymbol{\alpha}+\boldsymbol{u} \tag{9.19}

Using Equation 9.19, we estimate the coefficient vectors \boldsymbol \beta and \boldsymbol \alpha by pooled OLS.

This estimator is referred to as Least Squares Dummy Variables, or LSDV, Estimator.

  • Note, the columns of matrix \mathbf D are simply unit-specific dummy variables, one dummy variable for each unit i

    • The elements of vector \boldsymbol \alpha are the coefficients of the corresponding dummy variables, representing the individual intercepts

    • Furthermore, the columns of \mathbf D, the unit-specific dummy variables, are mutually orthogonal 3

Similar to the DIF estimator, with the help of the LSDV Estimator, we get rid of the unobservable unit specific errors a_i; in this case, we turned them to coefficients of dummy variables and try to explicitly estimate them.

  • It should be noted that the degrees of freedom for the estimation now reduces to (N - k - n) because of the extra n parameters of the n dummy variables

  • Moreover, while the vector \hat {\boldsymbol \beta} is N-consistent, the estimates of the individual intercepts \hat a_i are only T-consistent, as relying only on the time dimension. So, in a panel with high n and small T (wide panel) the individual intercepts \hat a_i are not consistent estimates, although unbiased


Remark: If we have a full set of n dummy variables, the regression equation Equation 9.19 must not contain an intercept, because the n dummy variables sum to a vector of ones \rightarrow perfect collinearity in this case.
Alternatively, you can leave out one dummy variable for one unit which then becomes the “benchmark” unit; the coefficients of the other dummy variables are then measuring the difference to the benchmark unit. (But plm() offers also other options.)


Summarizing, the LSDV estimator is a very convenient way to handle unobserved unit specific heterogeneity by introducing dummy variables to control for them. However, comparing the LSDV estimator with the DIF estimator from above is quite a subtle task:

  • Both estimators need strict exogenous explanatory variables to be n-consistent – relevant for wide panels 4

  • If the explaining variables are only weak exogenous both estimators are not n-consistent. LSDV is at least T-consistent (if the idiosyncratic error u_{it} is not autocorrelated) – relevant for long panels.5

    • The reason for the n-inconsitancy is that the short run bias, for instance due to an autoregressive term y_{i,t-1}, only diminishes with larger T for the LSDV estimator; a larger n doesn’t help because the short run bias is present in every unit i

      • Regarding the DIF estiamator, the short run bias does not even diminish with larger T
  • If the explanatory variables are strict exogenous and the idiosyncratic error u_{it} is not autocorrelated (which are our standard assumptions), then the LSDV estimator is more efficient than the DIF estimator6

    • That’s the reason, why the LSDV estimator (and the equivalent fixed effects estimator) is generally more often used in applied work. There are at least two qualifications:

      • For T=2 (only two time periods) both estimators are identical (if we have a time-dummy in the LSDV equation)

      • It is sometimes the case that the idiosyncratic error u_{it} is autocorrelated as well (e.g., unobserved variables, which are not constant over time, are often autocorrelated). The more positive autocorrelated these are the less better is the LSDV estimator. At the extreme, if u_{it} is a random walk, the DIF estimator is best and robust against spurious regressions

  • And lastly, and this is very important in practice, both estimators can be very sensitive to measurement errors in the explanatory variables


9.4.3 The Fixed Effects (FE) Estimator

A big disadvantage of the LSDV estimator arises with very large n: For instance suppose we have 50000 units i. In this case we would have to add 50000 unit specific dummy variables to the regression Equation 9.19, which could lead to serious numerical and practical problems.

However, first differencing and unit specific dummy variables are just two of several ways to eliminate the fixed heterogeneity a_i.

An alternative method, which typically works better, is called the fixed effects or within transformation. To see what this method does, reconsider our basic specification of interest, Equation 9.16:

y_{it} = \mathbf x_{it} \boldsymbol \beta + a_{i} + u_{it} \tag{9.20}

Now, for each unit i, we average this equation over time and get

\bar y_{it} = \mathbf {\bar x}_{it} \boldsymbol \beta + a_{i} + \bar u_{it} \tag{9.21}

where for instance \, \bar y_{it} = \frac{1}{T} \sum_{t=1}^T y_{it}, and so on. As a_i is fixed over time, the time average \, \bar a_{i} = \frac{1}{T} \sum_{t=1}^T a_{i} = a_i.

If we subtract Equation 9.21 from Equation 9.20 we arrive to

y_{it} - \bar y_{it} = (\mathbf x_{it} - \mathbf {\bar x}_{it} ) \boldsymbol \beta + 0 + u_{it} - \bar u_{it} \tag{9.22}

or, with a new notation

\ddot y_{it} := (y_{it} - \bar y_{it}), \quad \mathbf {\ddot x}_{it} := (\mathbf x_{it} - \mathbf {\bar x}_{it}) \tag{9.23}

(and likewise for \ddot u_{it}) we have

\ddot y_{it} = \mathbf {\ddot x}_{it} \boldsymbol \beta + \ddot u_{it} \tag{9.24}

Thus, a within transformed variable \ddot x_{it} is simply the deviation of the variable x_{it} from its unit specific mean \bar x_i. Usally, this procedure is called time-demeaning.

In a final step, we are stacking Equation 9.24 over the n units i and get Equation 9.25, the equivalent of Equation 9.4 and Equation 9.19

\mathbf{\ddot y} = \mathbf{\ddot X} \boldsymbol{ \beta} + \boldsymbol{\ddot u} \tag{9.25}

Applying pooled OLS to the within transformed Equation 9.25 results in the Fixed Effects (FE) estimator \boldsymbol {\hat \beta}_{FE}

\hat{\boldsymbol{\beta}}_{F E}=\left(\mathbf{\ddot X}^{\prime} \mathbf{\ddot X} \right)^{-1} \mathbf{\ddot X}^{\prime} \mathbf{\ddot y} \tag{9.26}

with the usual covariance matrix

\operatorname {Var} (\hat{\boldsymbol{\beta}}_{F E}) = \sigma_{\ddot u}^2 \left(\mathbf{\ddot X}^{\prime} \mathbf{\ddot X} \right)^{-1} \tag{9.27}

This whole time-demeaning procedure has become the standard way of estimating models with individual, time-invariant heterogeneity.

  • The name within transformation comes from the fact that estimating Equation 9.25 with pooled OLS uses the time variation in y and \mathbf x within each cross-sectional unit and \boldsymbol {\hat \beta}_{FE} is basically a weighted average of these within effects

The important thing about Equation 9.22 to Equation 9.25 is that the unobserved effect a_i (which was the essential problem because of a possible correlation with x_{it}) has dropped out. This suggests that we actually can estimate Equation 9.25 by pooled OLS, as we did in Equation 9.26.


  • Sometimes, the estimated fixed effects (intercepts) \hat a_i are of interest. This is the case if we want to study the distribution of the \hat a_i across i, or if we want to see whether a particular unit’s \hat a_i is above or below the average

    • These estimates would be directly available from a LSDV regression, but they are rarely reported by FE routines (however, the plm() package has a special function for that purpose)

    • After a FE estimation, the \hat a_i can be computed with the help of the time-averaged Equation 9.21 (note that the expectation of \bar u_{it} is zero in that equation)

\hat a_{i} = \bar y_{it} - \mathbf {\bar x}_{it} \boldsymbol {\hat \beta}_{FE} \tag{9.28}


A byproduct of the time-demeaning procedure is the so called between (BTW) estimator. This is obtained as the OLS estimation of the time-averaged equation Equation 9.21 (with an additional intercept)

  • Thereby, the time averages \bar y_{it} and \mathbf {\bar x}_{it} of the units i are used – this transformation is called between transformation – to run a regression on these data which now is basically a cross-sectional regression

    • Note, there is no special treatment for a_i and thus a_i remains a part of the error term, similar to our base equation, Equation 9.14 (remember, a_i is unobservable). Hence, if a_i is correlated with \mathbf {\bar x}_{i} the BTW estimator is biased and inconsistent

    • On the other hand, if we think a_i is uncorrelated with \mathbf {\bar x}_{i}, it is better to use the random effects estimator, which we review in a later Section. The reason is that the BTW estimator uses only cross-section information - therefore the name BTW estimator - and ignores the information on how the variables change over time - this important information is averaged out by the between transformation

    • However, a possible bias through measurement errors is mitigated (averaged out) by the between transformation

Figure 9.1 shows the effects of the between and within transformations on the data; this example is about traffic fatality rates in different US states – our units i – different state have different colors:

Code
library(AER)
data("Fatalities")
library(ggplot2)

Fatalities <- pdata.frame(Fatalities,  index= c("state", "year")) # pdata.frame
Fatalities1 <- Fatalities[1:77,] # only the first 77 elements = 11 states (units)
Fatalities1$frate <- with (Fatalities1, fatal / pop * 10000) # calculating fatalities rate

Bfrate = Between(Fatalities1$frate) # Between transformation
Wfrate = Within(Fatalities1$frate)  # Within transformation

p1 <- ggplot( Fatalities1 ) +
  geom_point( aes(y=1:length(frate), x=frate, color = state ), 
              size=2.2, show.legend = FALSE ) +
  labs(y="Time") + 
  scale_y_reverse() + 
  theme_bw()

p2 <- ggplot( Fatalities1 ) +
  geom_point( aes(y=1:length(frate), x=Bfrate, color = state ), 
              size=2.2, show.legend = FALSE ) +
  xlim(1.4, 3.1) + 
  labs(y="Time") + 
  scale_y_reverse() + 
  theme_bw()

p3 <- ggplot( Fatalities1 ) +
  geom_point( aes(y=1:length(frate), x=Wfrate, color = state ), 
              size=2.2, show.legend = FALSE ) +
  xlim(-1.75/2, 1.75/2) + 
  labs(y="Time") + 
  scale_y_reverse() + 
  theme_bw()

p1; p2; p3

(a) Orignal data

(b) Between transformation

(c) Within transformaton

Figure 9.1: Between and Within transformations


The general properties of the FE estimator are identical to the LSDV estimator discussed above (in fact, they deliver identical estimates, as we will show below).

The main point of the FE estimator is that arbitrary correlations between a_i and the explanatory variables in any time periods are allowed, just as with first differencing and LSDV estimators

However, because of this, any explanatory variable that is constant over time for every i gets swept away by the within transformation: \ddot x_{it} = 0 if the variable x_{it} is constant over time! (See also Figure 9.1 above.)

Hence, the effects of variables like gender, black, place of birth or city's proximity to the sea cannot be estimated with such procedures. The effects of these variables are absorbed by the estimated fixes effects (unit specific intercepts with the LSDV estimator). This is a major drawback of FE, DIF and LSDV estimators

As mentioned above, the FE estimator is identical to the LSDV estimator. In fact, we have the following

Proposition 1: Applying the within transformation to some variables is the very same as regressing these variables on all unit specific dummy variables and taking the residuals.

Let’s reconsider Equation 9.19, the regression equation of the LSDV estimator. With the help of the FWL-Theorem 7 we get an identical estimate of \boldsymbol {\hat \beta} if we apply the following procedure:

  1. We regress all explanatory variables in \mathbf X on all the dummy variables contained in the matrix \mathbf D and take the residuals from that regressions. Let us label these residuals as \mathbf {\tilde X}

  2. We regress the dependent variable \mathbf y on all the dummy variables contained in the matrix \mathbf D and take the residuals from that regression. Let us label these residuals as \mathbf {\tilde y}

  3. Finally, we regress \mathbf {\tilde y} on \mathbf {\tilde X} which yields

\mathbf{\tilde y} = \mathbf{\tilde X} \boldsymbol{\hat \beta} + \boldsymbol{\hat u} \tag{9.29}

According to the FWL-Theorem, we get the very same estimate of \boldsymbol \beta and the very same residuals as with the original Equation 9.19

To proof proposition 1 we have to show that the residuals above, \mathbf{\tilde y_i} and \mathbf {\tilde X_i}, are identical to the within transformed data for every i:

\mathbf {\tilde y} \ \, \equiv \ \ \mathbf {\ddot y_i} \, = (\mathbf y_i - {\mathbf {\bar y_i}})

\ \mathbf{\tilde X} \ \ \equiv \ \ \mathbf {\ddot X_i} = (\mathbf X_i - {\mathbf {\bar X_i}}) \tag{9.30}


We showed in the chapter about “Multiple Regression” the following propositions:

  1. To get the predicted values of a regression of a variable \mathbf y on some explanatory variables \mathbf X, multiply \mathbf y from the left with the projection matrix \mathbf P_{_\mathbf X} = \mathbf X(\mathbf X' \mathbf X)^{-1} \mathbf X'

  2. To obtain the residuals of the regression, multiply \mathbf y from the left with the orthogonal projection matrix \mathbf M_{_\mathbf X} = (\mathbf I - \mathbf P_{_\mathbf X})

It is very important to note that both \mathbf M_{_\mathbf X} and \mathbf P_{_\mathbf X} are symmetric and idempotent, meaning \mathbf M_{_\mathbf X} \mathbf M_{_\mathbf X} = \mathbf M_{_\mathbf X} and \mathbf P_{_\mathbf X} \mathbf P_{_\mathbf X} = \mathbf P_{_\mathbf X} and furthermore, \mathbf M_{_\mathbf X} \mathbf P_{_\mathbf X} = \mathbf 0, \mathbf P_{_\mathbf X} \mathbf M_{_\mathbf X} = \mathbf 0, respectively \mathbf M_{_\mathbf X} + \mathbf P_{_\mathbf X} = \mathbf I.


Hence, for the problem above – regressing \mathbf y and \mathbf X on the dummy variables in \mathbf D to obtain the residuals – we apply the 2nd statement from above.
(As our explanatory variables are now \mathbf D, we have \mathbf{M}_{_\mathbf{D}} instead of \mathbf{M}_{_\mathbf{X}}).

\mathbf {\tilde y} = \mathbf{M}_{_\mathbf{D}}\mathbf {y}

\mathbf {\tilde X} = \mathbf{M}_{_\mathbf{D}}\mathbf {X} \tag{9.31}

with

\underset {(T \times n)\times (T \times n)} {\mathbf{M}_{_\mathbf{D}}} = (\mathbf{I}_{_N}- \mathbf P_{_\mathbf D}) = (\mathbf{I}_{_N} - \underbrace {\mathbf{D}\left(\mathbf{D}^{\prime} \mathbf{D}\right)^{-1} \mathbf{D}^{\prime}}_{\mathbf P_{_\mathbf D}} ) \tag{9.32}

where T \times n = N.


Let’s have a closer look at matrix \mathbf P_{_\mathbf D}. The inner part of \mathbf{P}_{_\mathbf{D}} is \mathbf D' \mathbf D, the product of the dummy-variable matrix \mathbf{D} from Equation 9.18. Because \mathbf D is block-diagonal this matrix product is block-diagonal as well – indeed it is a diagonal matrix (the columns of \mathbf D – the unit-specific dummy variables – are mutually orthogonal).

In particular, we have:

\underset {n \times n} {\mathbf D' \mathbf D} = \left[\begin{array}{cccc} \mathbf{i'i} & 0 & \cdots & 0 \\ 0 & \mathbf{i'i} & \cdots & 0 \\ & & \vdots & \\ 0 & 0 & \cdots & \mathbf{i'i} \end{array}\right]

Remember the definition of \mathbf i, a T \times 1 vector of ones. Thus, the elements \mathbf{i'i} are the scalar products of \mathbf i and equals T. Hence, the inverse (\mathbf D' \mathbf D)^{-1} = \frac{1}{T} \mathbf I, which is a diagonal matrix as well.

Therefore, \, \mathbf{P}_{_\mathbf{D}} = \mathbf{D}\left(\mathbf{D}^{\prime} \mathbf{D}\right)^{-1} \mathbf{D}^{\prime}\, reduces to \,\frac {1}{T}\mathbf{D} \mathbf{D}^{\prime}.

Lastly, we have to evaluate \mathbf{D} \mathbf{D}^{\prime}. This is the outer product of the block diagonal matrix \mathbf D, and therefore block diagonal as well

\underset {(T\times n) \times (T\times n)} {\mathbf{D} \mathbf{D}^{\prime}} = \left[\begin{array}{cccc} \mathbf{ii'} & \mathbf 0 & \cdots & \mathbf 0 \\ \mathbf 0 & \mathbf{ii'} & \cdots & \mathbf 0 \\ & & \vdots & \\ \mathbf 0 & \mathbf 0 & \cdots & \mathbf{ii'} \end{array}\right]

with

\underset {T \times T}{\mathbf{ii'}} = \left[\begin{array}{ccc} 1 & \cdots & 1 \\ \vdots & \ddots & \vdots \\ 1 & \cdots & 1 \end{array}\right]

which is a matrix of ones. Putting together all results, we have

\mathbf {\tilde y} \ = \ \mathbf{M}_{_\mathbf{D}} \mathbf{y} \ = \ (\mathbf I_{_N} - \underbrace {\dfrac {1}{T} \mathbf{D} \mathbf{D}^{\prime} }_{\mathbf P_{_\mathbf D}}) {\left[\begin{array}{c} \mathbf{y}_1 \\ \mathbf{y}_2 \\ \vdots \\ \mathbf{y}_n \end{array}\right]}

and

\mathbf {\tilde X} \ = \ \mathbf{M}_{_\mathbf{D}} \mathbf{X} \ = \ (\mathbf I_{_N} - \underbrace {\dfrac {1}{T} \mathbf{D} \mathbf{D}^{\prime}}_{\mathbf P_{_\mathbf D}} ) {\left[\begin{array}{c} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_n \end{array}\right]}


We proceed with latter (the analysis of the former is identical):

\ \ \mathbf {\tilde X} \ = \ \mathbf{M}_{_\mathbf{D}} \mathbf{X} \ = \ \mathbf I_{_N} {\left[\begin{array}{c} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_n \end{array}\right]} \ \ - \ \ \dfrac {1}{T} \left[\begin{array}{cccc} \mathbf{ii'} & \mathbf 0 & \cdots & \mathbf 0 \\ \mathbf 0 & \mathbf{ii'} & \cdots & \mathbf 0 \\ & & \vdots & \\ \mathbf 0 & \mathbf 0 & \cdots & \mathbf{ii'} \end{array}\right] {\left[\begin{array}{c} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_n \end{array}\right]} \ =

{\left[\begin{array}{c} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_n \end{array}\right]} \ \ - \ \ {\left[\begin{array}{c} \frac{1}{T} \, \mathbf{ii' \,} \mathbf{X}_1 \\ \frac{1}{T} \, \mathbf{ii' \, } \mathbf{X}_2 \\ \vdots \\ \frac{1}{T} \, \mathbf{ii' \,} \mathbf{X}_n \end{array}\right]} \ =

\quad {\left[\begin{array}{c} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_n \end{array}\right]} \ \ - \ \ {\left[\begin{array}{c} \mathbf i \, (\frac{1}{T} \, \mathbf{i' \,} \mathbf{X}_1) \\ \mathbf i \, (\frac{1}{T} \, \mathbf{i' \, } \mathbf{X}_2) \\ \vdots \\ \mathbf i \, (\frac{1}{T} \, \mathbf{i' \,} \mathbf{X}_n) \end{array}\right]} \ =

{\left[\begin{array}{c} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_n \end{array}\right]} \ \ - \ \ {\left[\begin{array}{c} \mathbf i \, \mathbf {\bar x}_1 \\ \mathbf i \, \mathbf {\bar x}_2 \\ \vdots \\ \mathbf i \, \mathbf {\bar x}_n \end{array}\right]} \ = \ \quad \ \

\mathbf {\bar x}_i is the unit specific time average of the variable vector \, \mathbf x_{it} = [1, x_{i t}^1, x_{i t}^2 \cdots, x_{i t}^k].

So, we finally have (with \, \mathbf i \, \mathbf {\bar x}_i = \mathbf{\bar X}_{i}, being a matrix, whose element are all equal to \, \mathbf {\bar x}_i):

\mathbf {\tilde X} \ = \ {\left[\begin{array}{c} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_n \end{array}\right]} \ \ - \ \ \underbrace {\left[\begin{array}{c} \mathbf{\bar X}_{1} \\ \mathbf{\bar X}_{2} \\ \vdots \\ \mathbf{\bar X}_{n} \end{array}\right]}_{\mathbf P_{_\mathbf D} \mathbf X : \text{ between transf.}} \tag{9.33}

But this is equal to the definition of the within transformation, Equation 9.23.

\mathbf {\tilde X} \ = \ {\left[\begin{array}{c} \mathbf{X}_1 \\ \mathbf{X}_2 \\ \vdots \\ \mathbf{X}_n \end{array}\right]} \ \ - \ \ {\left[\begin{array}{c} \mathbf{\bar X}_1 \\ \mathbf{\bar X}_2 \\ \vdots \\ \mathbf{\bar X}_n \end{array}\right]} \ \ \equiv \ \ \mathbf {\ddot X} \tag{9.34}

(Clearly, we get the same results for \mathbf {\tilde y}_i = (\mathbf y_i - {\mathbf {\bar y}_i})= \mathbf {\ddot y}_i.)

Thus, we have shown that the residuals \mathbf {\tilde X} and \mathbf {\tilde y} from a regression on the unit specific dummy variables (contained in the matrix \mathbf D) are identical to the within transformed variables \mathbf {\ddot X} and \mathbf {\ddot y}. This implies the equivalence of the LSDV and the FE estimators. qed.

Remark: An alternative notation for the between and within transformations using the Kronecker product is:

\mathbf B = \mathbf{P}_{_\mathbf{D}} \ = \ (\mathbf I_{_n} \otimes \mathbf i \mathbf i'\frac{1}{T} ) \tag{9.35}

\mathbf W = \mathbf{M}_{_\mathbf{D}} \ = \ (\mathbf I_{_N} - \mathbf B) = (\mathbf I_{_N} - \mathbf I_{_n} \otimes \mathbf i \mathbf i'\frac{1}{T} ) \tag{9.36}

Formulas for the FE estimator and covariance matrix:

We know from Proposition 1 that applying the within transformation to some variables is the very same as regressing these variables on all unit specific dummy variables, which are contained in the matrix \mathbf D, and taking the residuals. And this is done by multiplying the variables from the left by the orthogonal projection matrix

\underset {(T \times n)\times (T \times n)} {\mathbf{M}_{_\mathbf{D}}} = (\mathbf{I}_{_N}- \mathbf P_{_\mathbf D}) = (\mathbf{I}_{_N} - \underbrace {\mathbf{D}\left(\mathbf{D}^{\prime} \mathbf{D}\right)^{-1} \mathbf{D}^{\prime}}_{\mathbf P_{_\mathbf D}} ) \tag{9.37}

So, our basic Equation 9.19 becomes to:

\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{D} \boldsymbol{\alpha} + \boldsymbol{u} \ \ \ \Rightarrow

\mathbf{M}_{_\mathbf{D}} \mathbf{y} = \mathbf{M}_{_\mathbf{D}} \mathbf{X} \boldsymbol{\beta} + \underbrace {\mathbf{M}_{_\mathbf{D}} \mathbf{D}}_{\mathbf 0} \boldsymbol{\alpha} + \mathbf{M}_{_\mathbf{D}} \boldsymbol{u} The second term on the right hand side is zero:

\mathbf{M}_{_\mathbf{D}} \mathbf{D} = (\mathbf{I}_N-\mathbf{D}\left(\mathbf{D}^{\prime} \mathbf{D}\right)^{-1} \mathbf{D}^{\prime}) \mathbf{D} = \mathbf{D}-\mathbf{D}\left(\mathbf{D}^{\prime} \mathbf{D}\right)^{-1} \mathbf{D}^{\prime} \mathbf{D} = \mathbf{D} - \mathbf{D} = \mathbf{0}

So we have: \mathbf{M}_{_\mathbf{D}} \mathbf{y} = \mathbf{M}_{_\mathbf{D}} \mathbf{X} \boldsymbol{\beta} + \mathbf{M}_{_\mathbf{D}} \boldsymbol{u} \tag{9.38}

Which is equal to: \mathbf{\ddot y} = \mathbf{\ddot X} \boldsymbol{\beta} + \boldsymbol{\ddot u} \tag{9.39}


Thus, the fixed effect estimator Equation 9.26 is:

\hat{\boldsymbol{\beta}}_{F E}=\left(\mathbf{\ddot X}^{\prime} \mathbf{\ddot X} \right)^{-1} \mathbf{\ddot X}^{\prime} \mathbf{\ddot y} \ = \ \left(\mathbf{X}^{\prime} \mathbf{M}'_{_\mathbf{D}} \mathbf{M}_{_\mathbf{D}} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime} \mathbf{M}_{_\mathbf{D}}' \mathbf{M}_{_\mathbf{D}} \mathbf{y}

Remember, \mathbf{M}_{_\mathbf{D}} is symmetric and idempotent, hence \mathbf{M}_{_\mathbf{D}}' \mathbf{M}_{_\mathbf{D}} = \mathbf{M}_{_\mathbf{D}}. It follows:

\hat{\boldsymbol{\beta}}_{F E}=\left(\mathbf{X}^{\prime} \mathbf{M}_{_\mathbf{D}} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime} \mathbf{M}_{_\mathbf{D}} \mathbf{y} \tag{9.40}

With the covariance matrix (if u_{it} is iid, i.e., \operatorname {Var}(\mathbf u) = \sigma_u^2 \mathbf I):

\operatorname{Var}(\hat{\boldsymbol\beta}_{F E}) = \sigma_u^2(\mathbf X' \mathbf{M}_{_\mathbf{D}} \mathbf X)^{-1} (\mathbf X'\mathbf{M}_{_\mathbf{D}} \mathbf {I} \, \mathbf{M}_{_\mathbf{D}} \mathbf X) (\mathbf X' \mathbf{M}_{_\mathbf{D}} \mathbf X)^{-1} \ = \sigma_u^2(\mathbf X' \mathbf{M}_{_\mathbf{D}} \mathbf X)^{-1} \tag{9.41}

It is further remarkable that the error in Equation 9.38 and Equation 9.39 are autocorrelated in finite samples even if u_{it} is iid:

\operatorname{Var}(\ddot {\boldsymbol u}) \ = \ \sigma_u^2 \, \mathbf{M}_{_\mathbf{D}} \mathbf I \ \mathbf{M'}_{_\mathbf{D}} \ = \ \sigma_u^2 \, \mathbf{M}_{_\mathbf{D}} \tag{9.42}

So, the error term of Equation 9.38 and Equation 9.39 is not iid. This implies that the fixed effect estimator may not be efficient in finite samples. In particular, the error term of Equation 9.39 is autocrrelated, which makes tests for autocorrelated errors u_{it} of the original model difficult.
Furthermore, the FE estimator only uses within information and doesn’t use any cross section information.

9.4.4 Example for a panel data set

The Fatalities dataset from Stock and Watson (2020) (here taken from the Kleiber and Zeileis (2008) AER package) is an often used example of the importance of individual heterogeneity and time effects in a panel setting.
The research question is whether taxing alcoholics can reduce the traffic death toll. The basic specification relates the traffic fatality rate, frate, to the tax rate on beer, beertax, in a regression model:8

frate_{it} = \beta_0 + \beta_1 beertax_{it} + e_{it}

Data are from 1982 to 1988 for each of the continental US states. In the following, the model is specified in its simplest form as a bivariate relation between the death rate and the beer tax.

rm(list=ls()) # clear all
library(wooldridge)
library(AER)
data("Fatalities")
library(modelsummary)
library(ggplot2)
library(plm)

# Calculation of fatality rates
Fatalities$frate <- with (Fatalities, fatal / pop * 10000)

# Creating a pdata.frame 
Fatalities <- pdata.frame(Fatalities,  index= c("state", "year"))
class(Fatalities$frate)
       [1] "pseries" "numeric"
# Between and within transformations - needed for graphics with ggplot
Fatalities$Bfrate <- Between(Fatalities$frate)
Fatalities$Bbeertax <- Between(Fatalities$beertax)

Fatalities$Wfrate <- Within(Fatalities$frate)
Fatalities$Wbeertax <- Within(Fatalities$beertax)

class(Fatalities$Bfrate)
       [1] "pseries" "numeric"

Pooled OLS

In a first step we estimate the model with pooled OLS without considering any form of individual effect.

coeftest( Pooled <-  plm(frate ~ beertax, index= c("state", "year"), 
                        model="pooling",  
                        data = Fatalities)) 
       
       t test of coefficients:
       
                   Estimate Std. Error t value  Pr(>|t|)    
       (Intercept) 1.853308   0.043567 42.5391 < 2.2e-16 ***
       beertax     0.364605   0.062170  5.8647 1.082e-08 ***
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Taxing beer seem to increase the number of deaths from traffic accidents in a highly statistically significant way! Hence, one could even argue that free beer might lead to safer driving

  • Similar results, contradicting our most basic intuition, appear for any single year in the corresponding cross section samples


Between (BTW) Estimator

coeftest( Between <-  plm(frate ~ beertax, index= c("state", "year"), 
                        model="between",  
                        data = Fatalities)) 
       
       t test of coefficients:
       
                   Estimate Std. Error t value Pr(>|t|)    
       (Intercept)  1.84622    0.11080  16.663   <2e-16 ***
       beertax      0.37842    0.15860   2.386   0.0212 *  
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • We reach to the same conclusion using the BTW estimator – we get a very significant positive coefficient of beertax. As with pooled OLS, individual heterogeneity is not handled here

Difference Estimator with first and last observation

  • The results above might be biased because we ignored individual heterogeneity which ended up in the error term

  • Panel data analysis will provide a solution for the unexpected results. In fact, we suspect

    • the presence of unobserved heterogeneity in the different states which is probably correlated with the way the tax rates are set in the different states
  • A more appropriate model would be:

frate_{it} = \beta_1 beertax_{it} + \textcolor {red} {a_i} + u_{it}

  • Here, a_i represents the unobserved heterogeneity as describes in text above

  • As outlined in the text, the simplest way to get rid of the individual intercepts a_i is to estimate the model in differenced variables. Here, we take the difference of the last (1988) to the first (1982) year

coeftest( DIF <-  plm(diff(frate, 6) ~ diff(beertax, 6) -1, index= c("state", "year"), 
                        model="pooling", 
                        data = Fatalities))
       
       t test of coefficients:
       
                        Estimate Std. Error t value Pr(>|t|)  
       diff(beertax, 6) -0.86892    0.39299 -2.2111  0.03193 *
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Estimation on the six-year differences eventually yields a sensible result: after controlling for state heterogeneity, higher taxation on beer is associated with a lower number of fatalities

LSDV estimator

As discussed, another way to control for time-invariant unobservables is to estimate them out explicitly, here with state-specific dummy variables and applying pooled OLS. This could be easily done in R with the plm package by simply adding state as additional variable (with lm() you would have to add factor(states)).

coeftest( LSDV <-  plm(frate ~ beertax + state, index= c("state", "year"), 
                        model="pooling",  
                        data = Fatalities))
       
       t test of coefficients:
       
                   Estimate Std. Error t value  Pr(>|t|)    
       (Intercept)  3.47763    0.31336 11.0980 < 2.2e-16 ***
       beertax     -0.65587    0.18785 -3.4915 0.0005560 ***
       stateaz     -0.56773    0.26667 -2.1290 0.0341073 *  
       statear     -0.65495    0.21902 -2.9904 0.0030280 ** 
       stateca     -1.50947    0.30435 -4.9596 1.211e-06 ***
       stateco     -1.48428    0.28735 -5.1654 4.497e-07 ***
       statect     -1.86226    0.28053 -6.6383 1.583e-10 ***
       statede     -1.30760    0.29395 -4.4484 1.239e-05 ***
       statefl     -0.26813    0.13933 -1.9245 0.0552840 .  
       statega      0.52460    0.18395  2.8519 0.0046613 ** 
       stateid     -0.66902    0.25797 -2.5934 0.0099895 ** 
       stateil     -1.96162    0.29150 -6.7295 9.234e-11 ***
       statein     -1.46154    0.27254 -5.3627 1.689e-07 ***
       stateia     -1.54393    0.25344 -6.0919 3.580e-09 ***
       stateks     -1.22322    0.24544 -4.9838 1.080e-06 ***
       stateky     -1.21752    0.28717 -4.2398 3.020e-05 ***
       statela     -0.84712    0.18867 -4.4900 1.033e-05 ***
       stateme     -1.10795    0.19112 -5.7971 1.776e-08 ***
       statemd     -1.70644    0.28322 -6.0252 5.169e-09 ***
       statema     -2.10975    0.27610 -7.6413 3.238e-13 ***
       statemi     -1.48453    0.23602 -6.2899 1.182e-09 ***
       statemn     -1.89721    0.26509 -7.1570 6.917e-12 ***
       statems     -0.02908    0.14845 -0.1959 0.8448392    
       statemo     -1.29626    0.26669 -4.8606 1.930e-06 ***
       statemt     -0.36039    0.26396 -1.3653 0.1732250    
       statene     -1.52218    0.24928 -6.1063 3.304e-09 ***
       statenv     -0.60077    0.28595 -2.1010 0.0365169 *  
       statenh     -1.25445    0.20969 -5.9826 6.526e-09 ***
       statenj     -2.10575    0.30720 -6.8547 4.370e-11 ***
       statenm      0.42638    0.25432  1.6765 0.0947242 .  
       stateny     -2.18667    0.29890 -7.3156 2.574e-12 ***
       statenc     -0.29046    0.11984 -2.4238 0.0159792 *  
       statend     -1.62344    0.25381 -6.3962 6.452e-10 ***
       stateoh     -1.67442    0.25381 -6.5970 2.016e-10 ***
       stateok     -0.54506    0.16912 -3.2229 0.0014154 ** 
       stateor     -1.16800    0.28571 -4.0880 5.654e-05 ***
       statepa     -1.76747    0.27610 -6.4016 6.255e-10 ***
       stateri     -2.26505    0.29376 -7.7106 2.068e-13 ***
       statesc      0.55717    0.11000  5.0654 7.305e-07 ***
       statesd     -1.00372    0.20962 -4.7883 2.698e-06 ***
       statetn     -0.87566    0.26802 -3.2672 0.0012182 ** 
       statetx     -0.91747    0.24556 -3.7363 0.0002253 ***
       stateut     -1.16395    0.19642 -5.9259 8.884e-09 ***
       statevt     -0.96604    0.21113 -4.5756 7.075e-06 ***
       stateva     -1.29018    0.20416 -6.3196 9.986e-10 ***
       statewa     -1.65952    0.28347 -5.8544 1.307e-08 ***
       statewv     -0.89675    0.24661 -3.6363 0.0003277 ***
       statewi     -1.75927    0.29395 -5.9850 6.442e-09 ***
       statewy     -0.22850    0.31290 -0.7303 0.4658106    
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • The results are similar to the DIF estimator – but have a much lower estimated standard error of beertax. This should be quite often the case, as the error term of the DIF estimator is autocorrelated by construction (if u_{it} is actually iid) and therefore the LSDV and fixed effect estimators are more efficient

  • Note, that this estimator deliverers explicit estimates of the state-individual intercepts – the fixed effects. Further note that most of these are highly significant which highlights the importance of individual heterogeneity


Within (FE) estimator

Now we use the FE (within) estimator. This should be the same as with the LSDV estimator.

coeftest( Within <-  plm(frate ~ beertax, index= c("state", "year"), 
                        model="within",  
                        data = Fatalities))
       
       t test of coefficients:
       
               Estimate Std. Error t value Pr(>|t|)    
       beertax -0.65587    0.18785 -3.4915 0.000556 ***
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Calculating fixed effects
a_i <- fixef(Within, type = 'dfirst')
head(a_i)
               az         ar         ca         co         ct         de 
       -0.5677269 -0.6549516 -1.5094688 -1.4842804 -1.8622573 -1.3076020
  • FE (within) estimation yields actually the very same result as the LSDV estimator (we know they are equivalent), but in a more compact and efficient way

  • If we want the state-individual intercepts – the fixed effects – we can calculate these with the function fixef() (using Equation 9.28). These are identical to the above ones as well

  • The FE model requires only minimal assumptions on the nature of heterogeneity

    • It is one of the simplest and most robust specifications in panel data econometrics and often the benchmark against more sophisticated, and possibly more efficient models. Therefore, it is the default model in the plm() procedure (Croissant and Millo (2018))

Within (FE) estimator with time-dummies

Now we check whether the inclusion of time specific effects play a role here. This is done by simply adding time-dummy variables – adding the variable year.

coeftest( Within_t <-  plm(frate ~ beertax + year, index= c("state", "year"), 
                        model="within",  
                        data = Fatalities))
       
       t test of coefficients:
       
                 Estimate Std. Error t value Pr(>|t|)   
       beertax  -0.639980   0.197377 -3.2424 0.001328 **
       year1983 -0.079903   0.038354 -2.0833 0.038126 * 
       year1984 -0.072421   0.038352 -1.8883 0.060012 . 
       year1985 -0.123976   0.038442 -3.2250 0.001408 **
       year1986 -0.037864   0.038588 -0.9813 0.327312   
       year1987 -0.050902   0.038974 -1.3061 0.192600   
       year1988 -0.051804   0.039623 -1.3074 0.192145   
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Calculating fixed effects
a_i <- fixef(Within, type = 'dfirst')
head(a_i)
               az         ar         ca         co         ct         de 
       -0.5677269 -0.6549516 -1.5094688 -1.4842804 -1.8622573 -1.3076020
  • In this case, the time-dummy variables don’t play an important rule. But this is rather an exception than the rule and therefore the inclusion of time-dummy variables should always be tried, at least.

Within (FE) estimator, effect = twoways

Another way to consider time specific heterogeneity is the option effect = "twoways". This should be identical to the inclusion of time-dummy variables, which fortunately is actual the case.

coeftest( Within_2w <-  plm(frate ~ beertax + year, index= c("state", "year"), 
                        model="within",
                        effect = "twoways",
                        data = Fatalities))
       
       t test of coefficients:
       
               Estimate Std. Error t value Pr(>|t|)   
       beertax -0.63998    0.19738 -3.2424 0.001328 **
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Calculating fixed effects
a_i <- fixef(Within, type = 'dfirst')
head(a_i)
               az         ar         ca         co         ct         de 
       -0.5677269 -0.6549516 -1.5094688 -1.4842804 -1.8622573 -1.3076020

Summarizing the results

Table 9.2 summarizes all results regarding the variable beertax.

  • Look at the very high R^2 of the LSDV estimator, especially compared to the within estimator, which yields the very same parameter estimates and t-values. Why is that the case? Effects of a beer tax on traffic fatalities
Code
modelsummary(list("Pooled"=Pooled, "BTW"=Between, "DIF"=DIF, "LSDV"=LSDV, "FE"=Within, "FE_t"=Within_t, "FE_2w"=Within_2w),
            vcov = c(vcovHC, "classical", "classical", "classical", "classical", "classical", "classical"),
            stars = TRUE, 
            fmt = 3,
            statistic = c('statistic'),
            gof_omit = "A|L|B|F|RM",
            coef_omit = "Inter|year|state",
            align = "lddddddd",
            output = "flextable"
            )
Table 9.2:

Effects of a beer tax on traffic fatalitiy rates: Results of several panel data models (t-statistics in brackets)

Pooled

BTW

DIF

LSDV

FE

FE_t

FE_2w

beertax

     0.365**

  0.378*

  -0.656***

  -0.656***

  -0.640**

  -0.640**

    (3.083) 

 (2.386)

 (-3.491)  

 (-3.491)  

 (-3.242) 

 (-3.242) 

diff(beertax, 6)

  -0.869*

 (-2.211)

Num.Obs.

   336     

 48    

  48    

 336      

 336      

 336     

 336     

R2

     0.093  

  0.110 

   0.119 

   0.905   

   0.041   

   0.080  

   0.036  

Std.Errors

Custom     

IID    

 IID    

 IID      

 IID      

 IID     

 IID     

+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001


Figure 9.2 summarizes all results regarding the variable beertax in a grphical way.

Code
models <-  list("Pooled"=Pooled, "BTW"=Between, "DIF"=DIF, "LSDV"=LSDV, "FE"=Within, "FE_t"=Within_t, "FE_2w"=Within_2w)
modelplot(models,
          vcov = c(vcovHC, "classical", "classical", "classical", "classical", "classical", "classical"),
          coef_omit = "Inter|year|state"
          )

Figure 9.2: Effects of a beer tax on traffic fatalitiy rates: Results of several panel data models

9.4.5 Some additional graphics

The following data graphics should help to explain the estimation results of the different estimators.

Code
ggplot(as.data.frame(Fatalities), aes(y=frate, x=beertax)) + 
  geom_point( aes(color=state), show.legend = FALSE ) + 
  geom_smooth(method=lm, se=F, aes(color=year), linewidth=0.4, show.legend = FALSE) +
  geom_smooth(method=lm, aes(y=frate, x=beertax), color="black", linewidth=0.5) +
  theme_bw()

Figure 9.3: Data with reg-lines per year
Code
ggplot(as.data.frame(Fatalities), aes(y=frate, x=beertax)) + 
  geom_point( aes(color=state), show.legend = FALSE ) + 
  geom_smooth(method=lm, se=F, aes(color=state), linewidth=0.4, show.legend = FALSE) +
  theme_bw()

Figure 9.4: Data with within reg-lines per state
Code
ggplot(as.data.frame(Fatalities), aes(y=Bfrate, x=Bbeertax)) + 
  geom_point( aes(color=state), show.legend = FALSE) + 
  geom_smooth(method=lm, aes(y=Bfrate, x=Bbeertax), color="black", linewidth=0.5) +
  geom_text(aes(y=Bfrate, x=Bbeertax, label=state), nudge_y = 0.09, size=2, color="grey") +
  theme_bw()

Figure 9.5: Between transformed data with reg-line
Code
ggplot(as.data.frame(Fatalities), aes(y=frate, x=beertax, color=state)) + 
  geom_point( show.legend = FALSE) + 
  geom_smooth(aes(y=frate, x=beertax), color="black", linewidth=0.4, method=lm) +
  facet_wrap(vars(state)) +
  theme_bw()

Figure 9.6: Origional data per state
Code
ggplot(as.data.frame(Fatalities), aes(y=Wfrate, x=Wbeertax, color=state)) + 
  geom_point( show.legend = FALSE) + 
  geom_smooth(method=lm ,se=TRUE, show.legend = FALSE) +
  facet_wrap(vars(state)) +
  theme_bw()

Figure 9.7: Within transformed data per state
Code
ggplot(as.data.frame(Fatalities), aes(y=Wfrate, x=Wbeertax)) + 
  geom_point( aes(color=state), show.legend = FALSE) + 
  geom_smooth(method=lm, aes(y=Wfrate, x=Wbeertax), color="black", linewidth=0.5) +
  theme_bw()

Figure 9.8: Within transformed data with reg-line

9.5 Some more advanced panel data methods

9.5.1 The Random Effects (RE) Estimator

  • The within (FE) estimator is a regression on data that have been transformed so that the individual effects a_i vanish (they are “transformed out” – the same is true for the DIF estimator)

  • The equivalent LSDV estimator considers the individual effects a_i as parameters to be estimated (they are “estimated out”)

    • Both give identical estimates for \boldsymbol \beta

    • Both allow the individual effects a_i to be correlated with with the explanatory variables

    • But both are not efficient estimates as they throw away all cross section information, which is unnecessary if the individual error a_i is not correlated with the explanatory variables

  • On the contrary, the random effects (RE) estimator treats the individual effects as random and, more importantly, as part of the error term and seeks to obtain efficient estimates for \boldsymbol \beta via a GLS estimator

    • However, because the individual effects a_i do not vanish with this approach and remain a part of the error term we have to assume that a_i is uncorrelated with the explanatory variables (as otherwise MLR.4 would be violated)

    • If a_i is actually not correlated with the explanatory variables (which can be tested under certain circumstances) the RE estimator justifiably uses both within and cross section information to estimate \boldsymbol \beta in an optimal way

    • Compared to pooled OLS and the BTW estimator, which also have to assume that a_i is uncorrelated with the explanatory variables, the RE estimator is more efficient, because it takes care of the autocorrelation in the error term due to the presence of a_i, i.e., it is a GLS estimator and thus BLUE

    • For the procedure to have good properties, n should be large relative to T (wide panels). The properties for the reverse case are largely unknown, Wooldridge (2019), Chapt. 14

Generally, GLS seeks for a transformation of the model so that the error term of the transformed model becomes iid and simple OLS can be applied to obtain efficient estimates

  • Weighted LS to handle heteroscedasticity and the Koyeck-transformation to handle autocorrelation have already been discussed as GLS procedures in former sections

Our starting point is Equation 9.4 again together with the error component model Equation 9.8. Under the assumption that u_{it} is iid and that a_i is not correlated across units the covariance matrix of \mathbf e has a block-diagonal, cluster-structure:

\operatorname {Var} (\mathbf e) = \boldsymbol{\Omega}=\left[\begin{array}{cccc} \boldsymbol\Sigma_1 & \mathbf 0 & \cdots & \mathbf 0 \\ \mathbf 0 & \boldsymbol\Sigma_i & \cdots & \mathbf 0 \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf 0 & \mathbf 0 & \cdots & \boldsymbol\Sigma_n \end{array}\right] \tag{9.43}

Assuming homosecdasticity of a_i and that the idiosyncratic errors u_{it} are uncorrelated with the unit specific a_i, Equation 9.8 then implies

\operatorname {Var}(e_{it}) = \sigma_a^2 + \sigma_u^2, \ \text{ and } \ \operatorname{Cov}(e_{it},e_{is}) = \sigma_a^2

So we got for each unit i

\boldsymbol{\Sigma} = \left[\begin{array}{cccc} (\sigma_a^2+\sigma_{u}^2) & \sigma_a^2 & \cdots & \sigma_a^2 \\ \sigma_a^2 & (\sigma_a^2+\sigma_{u}^2) & \cdots & \sigma_a^2 \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_a^2 & \sigma_a^2 & \cdots & (\sigma_a^2+\sigma_{u}^2) \end{array}\right] \tag{9.44}

Hence, the covariance matrix of \mathbf e depends only on two parameters,

  • the variance of the individual effect \sigma_a^2

  • the variance of the idiosyncratic error \sigma_u^2

Give the covariance matrix of \mathbf e, Equation 9.43, the RE estimator, which is the GLS estimator of model Equation 9.4, is given by:

\hat{\boldsymbol{\beta}}_{R E} = \left(\mathbf{X}^{\prime} \mathbf \Omega^{-1} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime} \mathbf \Omega^{-1} \mathbf{y} \tag{9.45}


GLS estimators in general and the corresponding variable transformation

How can we reason Equation 9.45?

As covariance matrix, \mathbf \Omega is always positive definite and thus there exits an unique, symmetric and invertible square root matrix \, \mathbf \Omega^{\frac {1}{2}} of \, \mathbf \Omega \, with the following property:

\mathbf {\Omega} = {\mathbf \Omega^{\frac {1}{2}} \mathbf \Omega^{\frac {1}{2}}} \tag{9.46}

Because \mathbf \Omega^{\frac {1}{2}} is invertible we can multiply the equation above by \mathbf \Omega^{-\frac {1}{2}}, (which is [ \mathbf \Omega^{\frac {1}{2}}]^{-1}), from the left and from the right to get:

\mathbf \Omega^{-\frac {1}{2}} \mathbf \Omega \, \mathbf \Omega^{-\frac {1}{2}} = \mathbf I \tag{9.47}

And furthermore, applying the inverse operation on both sides of Equation 9.46 we have:

\mathbf {\Omega}^{-1} = {\mathbf \Omega^{-\frac {1}{2}} \mathbf \Omega^{-\frac {1}{2}}} \tag{9.48}

Now, let’s transform our model Equation 9.4 and multiply by {\mathbf \Omega^{-\frac {1}{2}}}, the inverse of the square root of \mathbf \Omega, from the left:

\underbrace {{\mathbf \Omega^{-\frac {1}{2}}} \mathbf {y}}_{\mathbf {\breve y}} \ = \ \underbrace {{\mathbf \Omega^{-\frac {1}{2}}} \mathbf {X}}_\mathbf {\breve X} \, \boldsymbol \beta + \underbrace {{\mathbf \Omega^{-\frac {1}{2}}} \mathbf {e}}_\mathbf {\breve e} \tag{9.49}

Taking note of Equation 9.47, the covariance matrix of the error of the transformed model is

\operatorname {Var} ( \underbrace {\mathbf \Omega^{-\frac {1}{2}} \mathbf {e}}_\mathbf {\breve e} ) = \mathbf \Omega^{-\frac {1}{2}} \mathbf \Omega \, \mathbf \Omega^{-\frac {1}{2}} = \mathbf I

Hence, the error of the transformed model is now iid and therefore, the transformed model can be estimated by OLS efficiently.

So we have a transformed model with iid errors

\mathbf{\breve y} = \mathbf{\breve X} \boldsymbol{\beta} + \mathbf{\breve e} \tag{9.50}

with the transformed variables

\mathbf {\breve y} := \mathbf \Omega^{-\frac {1}{2}} \mathbf y , \ \ \mathbf {\breve X} := \mathbf \Omega^{-\frac {1}{2}} \mathbf X, \ \ \mathbf {\breve e} := \mathbf \Omega^{-\frac {1}{2}} \mathbf e \tag{9.51}

We finally arrive to the GLS estimator \boldsymbol {\hat \beta}_{GLS} which, for our particular panel setting, is the RE estimator, by simply applying pooled OLS to the transformed model, Equation 9.50:

\boldsymbol {\hat \beta}_{R E} \ = \ (\mathbf {\breve {X'}} \mathbf {\breve X})^{-1} \mathbf {\breve {X'}} \mathbf {\breve y} \tag{9.52}

We get an alternative representation by substituting back (with the variable transformations from Equation 9.51) and taking note of Equation 9.48:

\boldsymbol {\hat \beta}_{R E} \ = \ (\mathbf X' \,\underbrace {\mathbf \Omega^{-\frac {1}{2}} \mathbf \Omega^{-\frac {1}{2}} }_{\mathbf \Omega ^{-1}} \, \mathbf X)^{-1} \, \mathbf X' \, \underbrace { \mathbf \Omega^{-\frac {1}{2}} \mathbf \Omega^{-\frac {1}{2}} }_{\mathbf \Omega ^{-1}} \, \mathbf y \quad \Rightarrow

\boldsymbol {\hat \beta}_{R E} \ = \ (\mathbf X' \mathbf \Omega^{-1} \mathbf X)^{-1} \, \mathbf X' \mathbf \Omega^{-1} \mathbf y

This is the same as was claimed in Equation 9.45!

Similary, we can easily show that the covariance matrix of \boldsymbol {\hat \beta}_{R E} is

\operatorname{Var}(\hat{\boldsymbol\beta}_{R E}) \ = \ \sigma^2_{\breve e} (\mathbf {\breve {X'}} \mathbf {\breve X} )^{-1} = (\mathbf X' \mathbf \Omega^{-\frac {1}{2}} \mathbf \Omega^{-\frac {1}{2}} \mathbf X) = (\mathbf X' \mathbf \Omega ^{-1} \mathbf X)^{-1} \tag{9.53}

  • The formula for the RE estimator of Equation 9.45 can be directly applied to the data if we have an estimate of the covariance matrix \mathbf \Omega of the error \mathbf e, which in our case only depends on \sigma^2_a and \sigma^2_u

    • However, the covariance matrix \mathbf \Omega is a (n \times T)\times (n \times T) matrix, which easily could become very large in a wide panel setting
  • Therefore, it could be much more convenient to search for a variable transformation along the lines shown above, like in Equation 9.50 and Equation 9.51. And this transformation is called quasi time-demeaning


Quasi time-demeaning

As shown in Equation 9.49, a GLS estimation can be carried out by pre-multiplying the variables with \, \mathbf \Omega^{-\frac{1}{2}} to get a transformed model with iid errors.

\, \mathbf \Omega = [\mathbf I_{_n} \otimes \mathbf \Sigma ]

  • Thus, we search for:

\mathbf \Omega^{-\frac{1}{2}} = \ [\mathbf I_{_n} \otimes \mathbf \Sigma ] ^{-\frac{1}{2}} = \ \mathbf I_{_n} \otimes \mathbf \Sigma^{-\frac{1}{2}}

Hence, we actually only need the inverse square root matrix \,\mathbf \Sigma^{-\frac {1}{2}} \, from Equation 9.44, which is a tremendous facilitation.

One can show this is: 9

Generally, this is an eigenvalue/eigenvector problem. But the special structure of Equation 9.44 allows a more direct calculation. A first trick is to factor out \sigma_u^2 from Equation 9.44:

\underset{T \times T} {\boldsymbol{\Sigma}} \ = \ \left[\begin{array}{cccc} (\sigma_a^2+\sigma_{u}^2) & \sigma_a^2 & \cdots & \sigma_a^2 \\ \sigma_a^2 & (\sigma_a^2+\sigma_{u}^2) & \cdots & \sigma_a^2 \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_a^2 & \sigma_a^2 & \cdots & (\sigma_a^2+\sigma_{u}^2) \end{array}\right] \ =

\ \; \sigma_u^2 \left[\begin{array}{cccc} \left(1 + \frac {\sigma_a^2}{\sigma_u^2} \right) & \frac {\sigma_a^2}{\sigma_u^2} & \cdots & \frac {\sigma_a^2}{\sigma_u^2} \\ \frac {\sigma_a^2}{\sigma_u^2} & \left(1 + \frac {\sigma_a^2}{\sigma_u^2} \right) & \cdots & \frac {\sigma_a^2}{\sigma_u^2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac {\sigma_a^2}{\sigma_u^2} & \frac {\sigma_a^2}{\sigma_u^2} & \cdots & \left(1 + \frac {\sigma_a^2}{\sigma_u^2} \right) \end{array}\right]

And so we have:

\frac {1}{\sigma_u^2 } \boldsymbol{\Sigma} \ = \ \mathbf I_{_T} + \frac {\sigma_a^2}{\sigma_u^2}\mathbf i \, \mathbf i' \ = \ \mathbf I_{_T} + \frac {T \sigma_a^2}{\sigma_u^2} (\frac {1}{T} \mathbf i \, \mathbf i') \ = \ \mathbf I_{_T} + \frac {T \sigma_a^2}{\sigma_u^2} \mathbf P_{_\mathbf i} \tag{9.54}

Thereby \mathbf i is a T-vector of ones and the outer product \mathbf i \, \mathbf i' is a T \times T matrix of ones. Thus, the term \frac {1}{T} \mathbf i \, \mathbf i' is simply the mean function – multiplying a vector \mathbf x by this from the left produces \, \mathbf i \,{\bar x}, a vector, whose elements are all equal to the mean of \mathbf x.


Note that the mean function \frac {1}{T} \mathbf i \, \mathbf i' is equal to the projection matrix (hat-matrix) \, \mathbf P_{_\mathbf i} with respect to the ones-vector \mathbf i (intercept):

\mathbf P_{_\mathbf i} \ := \ \mathbf i (\mathbf i' \mathbf i)^{-1} \mathbf i' \ = \ \tfrac {1}{T} \mathbf i \, \mathbf i' \tag{9.55}

Remember, \mathbf P_{_\mathbf i} is symmetric and idempotent: \,\mathbf P_{_\mathbf i} \mathbf P_{_\mathbf i} = \mathbf P_{_\mathbf i}.

Furthermore, in a panel context, pre-multiplying a corresponding ordered vector or matrix with matrix (\mathbf I_{_n} \otimes \mathbf P_{_\mathbf i}) leads to the between transformation (we need the Kronecker product to get individual specific intercepts).

The within transformation was achieved by pre-multiplying a corresponding ordered vector or matrix with matrix (\mathbf I_{_n} \otimes \mathbf M_{_\mathbf i}), with the orthogonal projection matrix (residual maker matrix)

\mathbf M_{_\mathbf i} \ := (\mathbf I_{_T} - \mathbf P_{_\mathbf i})

Remember, \mathbf M_{_\mathbf i} is symmetric and idempotent: \, \mathbf M_{_\mathbf i} \mathbf M_{_\mathbf i} = \mathbf M_{_\mathbf i}.

Furthermore, and this is important for the proofs below, \, \mathbf M_{_\mathbf i} and \, \mathbf P_{_\mathbf i} are also orthogonal, meaning \, \mathbf M_{_\mathbf i} \mathbf P_{_\mathbf i} = \mathbf P_{_\mathbf i} \mathbf M_{_\mathbf i} = \mathbf 0, which is easy to proof.


We can now further manipulate the result of Equation 9.54:

\frac {1}{\sigma_u^2 } \boldsymbol{\Sigma} \ = \ \mathbf I_{_T} + \frac {T \sigma_a^2}{\sigma_u^2} \mathbf P_{_\mathbf i} \ = \

\mathbf I_{_T} - \mathbf P_{_\mathbf i}+ \frac {T \sigma_a^2}{\sigma_u^2} \mathbf P_{_\mathbf i} + \mathbf P_{_\mathbf i} \ = \ \mathbf M_{_\mathbf i} + (1+\frac {T \sigma_a^2}{\sigma_u^2})\mathbf P_{_\mathbf i}

We simplify the notation somewhat:

\frac {1}{\sigma_u^2 } \boldsymbol{\Sigma} \ = \ \mathbf M_{_\mathbf i} + (1+\frac {T \sigma_a^2}{\sigma_u^2})\mathbf P_{_\mathbf i} \ = \

\mathbf M_{_\mathbf i} + \underbrace {( {\frac {\sigma_u^2 + T \sigma_a^2}{\sigma_u^2}})}_{:= \rho^2} \,\mathbf P_{_\mathbf i} \ =

(\mathbf M_{_\mathbf i} + \rho^2 \mathbf P_{_\mathbf i}) \tag{9.56}

with

\rho^2 := \left ( {\frac {\sigma_u^2 + T \sigma_a^2}{\sigma_u^2}} \right ) \tag{9.57}

Hence, \, \frac {1}{\sigma_u^2} \boldsymbol{\Sigma} \, is a weighted sum of two idempotent and orthogonal matrices, which makes a direct calculation of the inverse possible.


The first task is to calculate the square root matrix of \, \frac {1}{\sigma_u^2} \boldsymbol{\Sigma} \, which is \, \frac {1}{\sigma_u} \boldsymbol{\Sigma}^{\frac {1}{2}}.

Proposition 2:

\frac {1}{\sigma_u} \boldsymbol{\Sigma}^{\frac {1}{2}} \ = \ (\mathbf M_{_\mathbf i} + \rho \,\mathbf P_{_\mathbf i}) \tag{9.58}

Proof of opposition 2:

  • We have to show that actually \, \underbrace {( \mathbf M_{_\mathbf i} + \rho \, \mathbf P_{_\mathbf i} )}_{\frac {1}{\sigma_u} \boldsymbol{\Sigma}^{\frac {1}{2}}} \, \underbrace{( \mathbf M_{_\mathbf i} + \rho \, \mathbf P_{_\mathbf i} )}_{\frac {1}{\sigma_u} \boldsymbol{\Sigma}^{\frac {1}{2}}} \, = \, \frac {1}{\sigma_u^2 } \boldsymbol{\Sigma}, \, from Equation 9.56:

\left ( \mathbf M_{_\mathbf i} + \rho \, \mathbf P_{_\mathbf i} \right ) \left ( \mathbf M_{_\mathbf i} + \rho \, \mathbf P_{_\mathbf i} \right ) \ =

\mathbf M_{_\mathbf i} \mathbf M_{_\mathbf i} + \mathbf M_{_\mathbf i}\mathbf P_{_\mathbf i}\rho \, + \rho \, \mathbf P_{_\mathbf i} \mathbf M_{_\mathbf i} + \rho^2 \, \mathbf P_{_\mathbf i} \mathbf P_{_\mathbf i} =

(\mathbf M_{_\mathbf i} + \rho^2 \mathbf P_{_\mathbf i}) \ = \ \frac {1}{\sigma_u^2 } \boldsymbol{\Sigma}

  • given idempotency and orthogonality of \, \mathbf M_{_\mathbf i} and \, \mathbf P_{_\mathbf i} and the result in Equation 9.56. \quad q.e.d.

Thus, we showed that the square root matrix of \, \frac {1}{\sigma_u^2} \boldsymbol{\Sigma} \, which is \, \frac {1}{\sigma_u} \boldsymbol{\Sigma}^{\frac {1}{2}} equals \, (\mathbf M_{_\mathbf i} + \rho \, \mathbf P_{_\mathbf i}).


The second task is to calculate the inverse of the square root matrix of \, \frac {1}{\sigma_u} \boldsymbol{\Sigma}^{\frac {1}{2}}

Proposition 3:

\left[ \frac {1}{\sigma_u} \boldsymbol{\Sigma}^{\frac {1}{2}} \right]^{-1} \ = \ {\sigma_u} \boldsymbol{\Sigma}^{-\frac {1}{2}} \ = \ (\mathbf M_{_\mathbf i} + {\rho}^{-1} \,\mathbf P_{_\mathbf i}) \tag{9.59}

Proof of proposition 3:

  • We need to show that \, \underbrace {( \mathbf M_{_\mathbf i} + \rho \,\mathbf P_{_\mathbf i} )}_{\frac {1}{\sigma_u} \boldsymbol{\Sigma}^{\frac {1}{2}}} \, \underbrace {( \mathbf M_{_\mathbf i} + {\rho}^{-1} \,\mathbf P_{_\mathbf i} )}_{{\sigma_u} \boldsymbol{\Sigma}^{-\frac {1}{2}}} \ = \ \mathbf I_{_T}:

( \mathbf M_{_\mathbf i} + \rho \,\mathbf P_{_\mathbf i} ) ( \mathbf M_{_\mathbf i} + {\rho}^{-1} \,\mathbf P_{_\mathbf i} ) \ =

\mathbf M_{_\mathbf i} \mathbf M_{_\mathbf i} + {\rho}^{-1} \mathbf M_{_\mathbf i} \mathbf P_{_\mathbf i} + \rho \,\mathbf P_{_\mathbf i} \mathbf M_{_\mathbf i} + \mathbf P_{_\mathbf i}\mathbf P_{_\mathbf i} \ =

\mathbf M_{_\mathbf i} + \mathbf P_{_\mathbf i} \ =

(\mathbf I_{_T} - \mathbf P_{_\mathbf i}) + \mathbf P_{_\mathbf i} \ = \ \mathbf I_{_T}

  • q.e.d.

In a final step, we make a few manipulation on Equation 9.59 to have an identity matrix on the first position of the expression, as we want a handy variable transformation:

{\sigma_u} \boldsymbol{\Sigma}^{-\frac {1}{2}} \ = \ (\mathbf M_{_\mathbf i} + {\rho}^{-1} \,\mathbf P_{_\mathbf i}) \ =

(\mathbf I_{_T} - \mathbf P_{_\mathbf i} + {\rho}^{-1} \, \mathbf P_{_\mathbf i}) \ =

\left (\mathbf I_{_T} - (1-\rho^{-1})\mathbf P_{_\mathbf i} \right )

And substituting back for \rho from Equation 9.57

(1-\rho^{-1}) \ = \ 1 - \left [ {\frac {\sigma_u^2 + T \sigma_a^2}{\sigma_u^2}} \right ]^{- \frac {1}{2}} \ = \ 1 - \left [ {\frac {\sigma_u^2}{\sigma_u^2 + T \sigma_a^2}} \right ]^{\frac {1}{2}} \ := \ \theta

This gives the final result for the quasi time-demeaning variable transformation used to estimate RE models:

\boldsymbol{\Sigma}^{-\frac {1}{2}} \ = \ \frac {1}{\sigma_u} \left [ (\mathbf I_{_T} - \theta \, \tfrac {1}{T} \mathbf i \, \mathbf i' \right ] \tag{9.60}

\mathbf \Sigma ^{-\frac{1}{2}} = \frac{1}{\sigma_u} \left [ \mathbf I_{_T} - \theta \tfrac {1}{T} \mathbf i \, \mathbf i' \right] \tag{9.61}

with \theta = 1 - \left [ \frac {\sigma_u^2} { \sigma_u^2 + T\sigma_a^2 } \right ]^{\frac {1}{2}}, \ \ \in [0,1] \tag{9.62}

  • Thereby, \mathbf i is a T-vector of ones and the outer product \, \mathbf i \, \mathbf i' is a \,T \times T matrix of ones. Thus, the term \, \frac {1}{T} \mathbf i \, \mathbf i' of Equation 9.61 is simply the mean function

  • The GLS transformation of \, \mathbf y_i and likewise of \, \mathbf X_i therefore is:

\mathbf \Sigma ^{-\frac{1}{2}} \mathbf y_i \ = \ \frac{1}{\sigma_u} \left[\begin{array}{c} y_{i 1} - \theta \bar y_i \\ y_{i 2} - \theta \bar y_i\\ \vdots \\ y_{i T} - \theta \bar y_i \end{array}\right] \tag{9.63}

This transformation is called quasi time-demeaning. Thereby a fraction \theta of the time averages are subtracted from the variables. And the fraction \theta only depends on the two variances \sigma_a^2 and \sigma_u^2.

If \theta=1, the quasi time-demeaning becomes time-demeaning, which is the within transformation.


Applying Equation 9.63 to the variables we reach to following transformed model (the factor (1 / \sigma_u) cancels):

\underbrace {(y_{it} - \theta \bar y_i)}_{\breve y_{it}} = \underbrace {( \mathbf x_{it} - \theta \mathbf {\bar x}_i)}_{\mathbf {\breve {x}}_{it}} \boldsymbol \beta + \underbrace {(e_{it} - \theta \bar e_i)}_{\breve e_{it}} \tag{9.64}

or \breve y_{it} = \breve {\mathbf x}_{it} \boldsymbol \beta + \breve e_{it} \tag{9.65}

Applying pooled OLS to the stacked Equation 9.65 yields the RE estimator:

\boldsymbol {\hat \beta}_{R E} = (\mathbf {\breve {X'}} \mathbf {\breve X})^{-1} \mathbf {\breve {X'}} \mathbf {\breve y}

\operatorname {Var} (\boldsymbol {\hat \beta}_{R E}) = \sigma_{\breve e}^2 (\mathbf {\breve {X'}} \mathbf {\breve X})^{-1}


  • Depending on \theta, the RE estimator becomes more and more similar to pooled OLS if \, \theta is small – limiting case: \theta = 0

    • In this case, the variance of the individual heterogeneity, \sigma_a^2, is small relative to \sigma_u^2, the variance of the idiosyncratic error u_{it}
  • On the other hand, if \theta is near one, the RE estimator becomes more and more similar to the FE estimator – limiting case: \theta = 1

    • In this case, the variance of the individual heterogeneity, \sigma_a^2, strongly dominates the variance of the idiosyncratic error \sigma_u^2

    • If \theta \rightarrow 1, a possible bias caused by a correlation of the individual heterogeneity a_i with some explanatory variables, becomes smaller and smaller


Estimating \sigma_a^2 and \sigma_u^2

Up to now, we pretended to know the two variances \sigma_a^2 and \sigma_a^2 that determine \mathbf \Sigma, Equation 9.44, and therefore \theta of the quasi time-demeaning transformation.

However, these two variances have to be estimated. There are several possibilities to do this.

All of them are based on estimated residuals variances (or covariances) of different panel regressions:

One can show that the 10

  • estimated residual variance of the pooled OLS estimator converges to

\operatorname {plim} \left[ \mathbf{e'}_{{pooled }} \ \mathbf{e}_{ {pooled}} / (n T) \right]=\sigma_a^2 + \sigma_{u}^2

  • estimated residual variance of the FE (LSDV) estimator converges to

\operatorname {plim} \left[ \mathbf{\ddot u'}_{{FE}} \ \mathbf{\ddot u}_{ {FE}} / (n(T-1-K)) \right]=\sigma_{u}^2

  • estimated residual variance of the DIF estimator converges to

\operatorname {plim}\left[ \Delta\mathbf{u'}_{{DIF}} \ \Delta \mathbf {u}_{{DIF}} /(n(T-1)) \right]=2 \sigma_{u}^2

  • estimated residual variance of the between (BTW) estimator converges to

\operatorname {plim}\left[ \mathbf{\bar e'}_{{BTW}} \ \mathbf{\bar e}_{{BTW}} /(n T) \right] = \sigma_a^2 + \sigma_{u}^2 / T

  • We could further obtain an estimate for \sigma_{a}^2 by calculating the variance of the estimated fixed effects from FE (LSDV) regressions, Nerlove (1971). However, if there are time-invariant explanatory variables in the model, like gender or race, the effects of these variables are totally absorbed be the fixed effects (individual intercepts), which could lead to an overestimation of \sigma_{a}^2

  • Baltagi and Baltagi (2005) suggests yet another method which is based on pooled OLS regression. Considering Equation 9.44, we note that all covariances are equal to \sigma_a^2: \operatorname {Cov}(e_{it}e_{is}) = \sigma_a^2, within each group i. For a given t, there are T(T - 1)/ 2 pairs of residuals \hat e_{it} \hat e_{is} that can be used to estimate the covariances, which are all equal to \sigma_a^2. Afterwards, we average over all t and finally average over all units i to obtain the following estimate for \sigma_a^2:

\operatorname{plim} \frac{1}{n} \sum_{i=1}^n \frac{\sum_{t=2}^T \sum_{s=1}^{t-1} \hat e_{i t} \hat e_{i s}}{T(T-1) / 2}=\sigma_a^2

We recognize from above that we need at least two relationships (equations) to determine the two variances:

  • A popular variant is using the between estimator residuals to estimate \sigma_a^2 + \sigma_{u}^2 / T and the FE estimator residuals to get an estimate for \sigma_u^2. This method is the default of the plm() procedure (Swamy and Arora (1972), swar). But with the option method you can choose several other variants

As we have to estimate the two variances to be able to apply the GLS transformation from Equation 9.63, the RE estimator actually is a feasible GLS estimator.


9.5.2 Example with Fatalities dataset

We continue our example with the Fatalities dataset from Stock and Watson (2020)

RE estimator

summary( RE <-  plm(frate ~ beertax, index= c("state", "year"), 
                        model="random",  
                        data = Fatalities))
       Oneway (individual) effect Random Effect Model 
          (Swamy-Arora's transformation)
       
       Call:
       plm(formula = frate ~ beertax, data = Fatalities, model = "random", 
           index = c("state", "year"))
       
       Balanced Panel: n = 48, T = 7, N = 336
       
       Effects:
                         var std.dev share
       idiosyncratic 0.03605 0.18986 0.119
       individual    0.26604 0.51579 0.881
       theta: 0.8622
       
       Residuals:
            Min.   1st Qu.    Median   3rd Qu.      Max. 
       -0.471090 -0.120045 -0.021530  0.091011  0.964350 
       
       Coefficients:
                    Estimate Std. Error z-value Pr(>|z|)    
       (Intercept)  2.067141   0.099971 20.6773   <2e-16 ***
       beertax     -0.052016   0.124176 -0.4189   0.6753    
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
       
       Total Sum of Squares:    12.648
       Residual Sum of Squares: 12.642
       R-Squared:      0.00052508
       Adj. R-Squared: -0.0024674
       Chisq: 0.175467 on 1 DF, p-value: 0.6753

RE estimator – twoways

summary( RE_2w <-  plm(frate ~ beertax, index= c("state", "year"), 
                        model="random",
                        effect = "twoways",
                        data = Fatalities))
       Twoways effects Random Effect Model 
          (Swamy-Arora's transformation)
       
       Call:
       plm(formula = frate ~ beertax, data = Fatalities, effect = "twoways", 
           model = "random", index = c("state", "year"))
       
       Balanced Panel: n = 48, T = 7, N = 336
       
       Effects:
                          var  std.dev share
       idiosyncratic 0.035300 0.187883 0.117
       individual    0.266148 0.515895 0.880
       time          0.001031 0.032116 0.003
       theta: 0.8636 (id) 0.3548 (time) 0.3531 (total)
       
       Residuals:
            Min.   1st Qu.    Median   3rd Qu.      Max. 
       -0.479379 -0.114279 -0.017247  0.093782  0.946156 
       
       Coefficients:
                    Estimate Std. Error z-value Pr(>|z|)    
       (Intercept)  2.059236   0.100884 20.4120   <2e-16 ***
       beertax     -0.036614   0.125073 -0.2927   0.7697    
       ---
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
       
       Total Sum of Squares:    12.321
       Residual Sum of Squares: 12.318
       R-Squared:      0.00025652
       Adj. R-Squared: -0.0027367
       Chisq: 0.0856989 on 1 DF, p-value: 0.76972

Code
modelsummary(list("Pooled"=Pooled, "BTW"=Between, "DIF"=DIF, "FE"=Within, "FE_2w"=Within_2w, "RE"=RE, "RE_2w"=RE_2w),
            vcov = c(vcovHC, "classical", "classical", "classical", "classical", "classical", "classical"),
            stars = TRUE, 
            fmt = 3,
            statistic = c('statistic'),
            gof_omit = "A|L|B|F|RM",
            coef_omit = "Inter|year|state",
            align = "lddddddd",
            output = "flextable"
            )
Table 9.3:

Estimated effects of a beer tax on traffic fatality rates (t-statistics in brackets)

Pooled

BTW

DIF

FE

FE_2w

RE

RE_2w

beertax

     0.365**

  0.378*

  -0.656***

  -0.640**

  -0.052 

  -0.037 

    (3.083) 

 (2.386)

 (-3.491)  

 (-3.242) 

 (-0.419)

 (-0.293)

diff(beertax, 6)

  -0.869*

 (-2.211)

Num.Obs.

   336     

 48    

  48    

 336      

 336     

 336    

 336    

R2

     0.093  

  0.110 

   0.119 

   0.041   

   0.036  

   0.001 

   0.000 

Std.Errors

Custom     

IID    

 IID    

 IID      

 IID     

 IID    

 IID    

+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001


9.5.3 Correlated Random Effects Estimator (CRE) Estimator

So far, we learned about the two most used panel data methods, the FE and RE estimators. Both have pros and cons:

For the FE estimator we have:

  • We are not obliged to assume that the unobserved individual heterogeneity a_i is uncorrelated with the observed explanatory variables. This is the most import feature and advantage of the FE estimator

  • However, we cannot estimate effects of variables, which are constant over time, like gender, race, or even education in some data sets

  • Even if a variable is principally varying over time, the variation could be small, so that we get very imprecise estimates (variable is hardly identified)

For the RE estimator we have:

  • We can estimate effects of variables, which are constant over time, like gender, race or education

  • The RE estimator is efficient if our usual assumptions are complied, for instance that the idiosyncratic error u_{it} is iid

  • However, the whole procedure breaks down if the individual heterogeneity a_i is correlated with some explanatory variables. And such a correlation can rarely be credibly ruled out. This makes the results of RE estimates often implausible

This rises the question whether there isn’t a method that combines the advantages of both FE and RE estimators. And the answer is: yes, but with some reservations.

There is an alternative to the FE estimator that still allows for a correlation of the unobserved individual heterogeneity a_i with observed explanatory variables, thereby retaining some of the advantages of the RE estimator – the CRE estimator


The basic idea is due to Mundlak (1978) and was further elaborated in Wooldridge (2010), Chapt. 10, which we basically follow here.

Let’s start with our previous base model, assuming the error component model Equation 9.8:

y_{it} = \beta_0 + \mathbf x_{it} \boldsymbol \beta + \mathbf z_{i} \boldsymbol \gamma + \underbrace {(a_i + u_{it})}_{e_{it}} \tag{9.66}

  • Here, the new element is that we now distinguish between time varying explanatory variables \mathbf x_{it} and explanatory variables constant over time, contained in vector \mathbf z_{i}. Until now, \mathbf x_{it} included both types of variables

  • Instead of assuming a_i to be not correlated with \mathbf x_{it} and \mathbf z_{i}, or applying the within transformation to remove a_i (and all other constant variables \mathbf z_{i}), we explicitly model the correlation between a_i and \mathbf x_{it}

    • Thereby, the presumption is that a_i is only correlated with the average level of \mathbf x_{it}. As a_i is constant over time the idea that it is only correlated with the constant time averages \bar {\mathbf x}_{i} has same attraction

Hence, suppose we have the following relationship between a_i and \bar {\mathbf x}_{i}:

a_i = \alpha + \bar{\mathbf x}_{i} \boldsymbol \delta + r_i \tag{9.67}

  • Thereby, r_i is the residual of the individual heterogeneity, which is neither correlated with \mathbf x_{it} nor with \bar{\mathbf x}_{i}

    • The parameter vector \boldsymbol \delta measures the part of the individual heterogeneity a_i which can be explained by \bar {\mathbf x}_{i}. If \boldsymbol \delta is zero, there is no correlation all. This can be exploited for a test whether RE is appropriate (we will deal with this later)

    • Note, we do not rule out a correlation between r_i and \mathbf z_{i}. This is clearly a weak point

Substituting Equation 9.67 into Equation 9.66 yields:

y_{it} = (\textcolor{red} \alpha + \beta_0) + \mathbf x_{it} \boldsymbol \beta + \mathbf z_{i} \boldsymbol \gamma + \textcolor{red} {\mathbf {\bar x}_{i} \boldsymbol \delta} + (\textcolor{red} {r_i} + u_{it}) \tag{9.68}

  • Equation 9.68 still has a composite error term (r_i + u_{it}), but this is now uncorrelated with \mathbf x_{it} by construction. Hence, the requirements for a consistent estimation of \boldsymbol \beta are met:

    • The correlation between a_i and \mathbf x_{it} is now fully captured by the new control variables \bar {\mathbf x}_{i}, the constant time averages of \mathbf x_{it}

Estimation of Equation 9.68 can be carried out as RE model (or even with pooled OLS – but with incorrect se because of the autocorrelated composite error (r_i + u_{it})). The result is the so called correlated random effects (CRE) estimator.

  • Note, that we are now able to estimates the effects of the time constant variables \mathbf z_{i}, contrary to the FE model

    • However, for consistent estimates of \boldsymbol \gamma we have to assume that \mathbf z_{i} is uncorrelated with the rest-heterogeneity r_i. Hopefully, most of the individual heterogeneity is correctly represented by the control variables \mathbf z_{i} and \bar {\mathbf x}_{i}

Empirical example – wage equation

In the following example we estimate a wage equation with various panel methods along with the new CRE procedure.

data(wagepan)

Pooled_ <- plm(lwage ~ educ + black + hisp + exper + 
                expersq + married + union, 
            data = wagepan,
            model = "pooling")

BTW_ <- plm(lwage ~ educ + black + hisp + exper + 
            expersq + married + union, 
          data = wagepan,
          model = "between")

FE_ <- plm(lwage ~ educ + black + hisp + exper + 
            expersq + married + union, 
          data = wagepan,
          model = "within")

RE_ <- plm(lwage ~ educ + black + hisp + exper + 
            expersq + married + union, 
          data = wagepan,
          model = "random") 

The package plm unfortunately does not directly support a CRE model. So, we have to do it by hand.

First, we transform the data.frame ‘wagepan’ to a pdata.frame. Afterward we apply the Between transformation to all time varying variables to calculate the time averages {\mathbf {\bar x}_{i}} of them.

Finally, we estimate the CRE model and apply both RE and pooled OLS to Equation 9.68. As we will see, this gives exactly the same estimated coefficients (but different se and t-values).

pwagepan <- pdata.frame(wagepan, index = c("nr","year"))

# Between transformations
pwagepan$B_exper <-   Between(pwagepan$exper)
pwagepan$B_expersq <- Between(pwagepan$expersq)
pwagepan$B_married <- Between(pwagepan$married)
pwagepan$B_union <-   Between(pwagepan$union)


# Estimating CRE as RE
CRE <- plm(lwage ~ educ + black + hisp + 
               exper +   expersq +   married +   union + 
             B_exper + B_expersq + B_married + B_union, 
           data = pwagepan, 
           model = "random")


# Estimating CRE with pooled OLS
CRE1 <- plm(lwage ~ educ + black + hisp + 
               exper +   expersq +   married +   union +
             B_exper + B_expersq + B_married + B_union, 
           data = pwagepan, 
           model = "pooling") 

The following Table 9.4 summarizes the results.

  1. Both CRE variants yield exactly the same coefficients (but not the same se or t-values)

  2. The coefficients (and se and t-values) of the time varying variables exper, expersq, married and unionof the CRE model are exactly equal to the FE estimates. Hence, these coefficients are consistently estimated by CRE 11

  3. The coefficients (and se and t-values) of the constant variables educ, black and hisp from the CRE model are identical to the between estimates

  4. The coefficients of the time averages (between transformed) of the time varying variables B_exper, B_expersq, B_married and B_union are equal to the difference of between and FE estimates

Code
modelsummary(list("Pooled"=Pooled_, "RE"=RE_, "BTW"=BTW_, "FE"=FE_, "CRE"=CRE, "CRE(pOLS)"=CRE1),
            vcov = c(vcovHC, "classical", "classical", "classical", "classical", vcovHC),
            stars = TRUE, 
            fmt = 3,
            statistic = c('statistic'),
            gof_omit = "A|L|B|F|RM",
            coef_omit = "Inter|year",
            align = "cdddddd",
            output = "flextable"
            ) 
Table 9.4:

Wage equation: Results of several panel data models, including CRE (t-statistics in brackets)

Pooled

RE

BTW

FE

CRE

CRE(pOLS)

educ

     0.099***

   0.101***

   0.095***

   0.095***

     0.095***

   (10.812)  

 (11.357)  

  (8.676)  

  (8.676)  

    (8.428)  

black

    -0.144** 

  -0.144** 

  -0.139** 

  -0.139** 

    -0.139** 

   (-2.875)  

 (-3.027)  

 (-2.840)  

 (-2.840)  

   (-2.757)  

hisp

     0.016   

   0.020   

   0.005   

   0.005   

     0.005   

    (0.401)  

  (0.473)  

  (0.112)  

  (0.112)  

    (0.124)  

exper

     0.089***

   0.112***

  -0.050   

   0.117***

   0.117***

     0.117***

    (7.179)  

 (13.572)  

 (-1.002)  

 (13.878)  

 (13.878)  

   (10.922)  

expersq

    -0.003** 

  -0.004***

   0.005   

  -0.004***

  -0.004***

    -0.004***

   (-3.278)  

 (-6.875)  

  (1.596)  

 (-7.106)  

 (-7.106)  

   (-6.277)  

married

     0.108***

   0.063***

   0.144***

   0.045*  

   0.045*  

     0.045*  

    (4.135)  

  (3.744)  

  (3.487)  

  (2.474)  

  (2.474)  

    (2.160)  

union

     0.180***

   0.107***

   0.271***

   0.082***

   0.082***

     0.082***

    (6.540)  

  (6.022)  

  (5.813)  

  (4.255)  

  (4.255)  

    (3.601)  

B_exper

  -0.167** 

    -0.167***

 (-3.278)  

   (-3.576)  

B_expersq

   0.009** 

     0.009***

  (2.884)  

    (3.327)  

B_married

   0.098*  

     0.098*  

  (2.182)  

    (2.199)  

B_union

   0.189***

     0.189***

  (3.742)  

    (3.975)  

Num.Obs.

  4360      

4360      

 545      

4360      

4360      

  4360      

R2

     0.187   

   0.178   

   0.219   

   0.178   

   0.183   

     0.200   

Std.Errors

Custom      

 IID      

 IID      

 IID      

 IID      

Custom      

+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001


9.5.4 The Hausman test and a CRE based specification test

We have learned much about FE and RE estimators. The latter has some advantages,

  • as it is efficient and we can estimate the effects of time-constant variables, which are often in the center of interest

  • However, we need to treat the individual effects as uncorrelated with the other regressors, which is often hard to justify

    • The RE estimator, therefore, may suffer from inconsistency due to this correlation
  • The FE estimator doesn’t have this problem

The question is, which one should we use?


The specification test proposed by Hausman (1978) can be used to test for such correlations of the individual heterogeneity a_i with the explanatory variables. 12

  • The test is based on the idea that under the hypothesis of no correlation, both FE and RE (FGLS) estimators are consistent, but RE is asymptotically efficient and FE is not

  • However, under the alternative, FE is consistent, but RE is not

  • Therefore, under the null hypothesis, the two estimates should not differ systematically, and a test can be based on the difference (\boldsymbol {\hat \beta}_{FE} - \boldsymbol {\hat \beta}_{RE})

Hausman showed that the covariance matrix of this difference is (provided that RE is actually efficient, e.g., u_{it} is actually iid) 13

\operatorname {Var}(\boldsymbol {\hat \beta}_{FE} - \boldsymbol {\hat \beta}_{RE}) \ = \ \operatorname {Var}(\boldsymbol {\hat \beta}_{FE}) - \operatorname {Var}(\boldsymbol {\hat \beta}_{RE}) = \mathbf \Psi \tag{9.69}

The resulting Wald test statistic is:

W = (\boldsymbol {\hat \beta}_{FE} - \boldsymbol {\hat \beta}_{RE})' \mathbf \Psi^{-1}(\boldsymbol {\hat \beta}_{FE} - \boldsymbol {\hat \beta}_{RE}) \ \sim \ \chi ^2(k-1)


Let’s apply this test to our previous example of the relationship between traffic fatality rates and a beer tax. To remember look at Table 9.3. The results of the FE and RE estimators are quite different, but are they statistically significant? The following Hausman test should answer this question.

# Hausman test with the phtest() function from the plm package 
phtest(Within, RE)

    Hausman Test

data:  frate ~ beertax
chisq = 18.353, df = 1, p-value = 1.835e-05
alternative hypothesis: one model is inconsistent

The H0 that both models are consistent is overwhelmingly refuted – the difference between these two estimators is too large to be accidentally. Thus, the FE estimator has to be used in this case.


Now, consider our last example which estimates a wage equation; to remember look at Table 9.4.

Here, the difference between FE and RE estimates in not so obvious. Note, that we can only compare the coefficients of time varying variables.

Carrying out a Hausman test yields the following result:

phtest(FE_, RE_)

    Hausman Test

data:  lwage ~ educ + black + hisp + exper + expersq + married + union +  ...
chisq = 31.707, df = 10, p-value = 0.000448
alternative hypothesis: one model is inconsistent

Also in this case, the H0 that both models are consistent is overwhelmingly rejected. Hence, the FE estimator should be used.

One weakness of this test is that it cannot be made robust against serial correlation of the idiosyncratic error u_{it}. If u_{it} violates the iid assumption, the formula for the covariance matrix Equation 9.69 is incorrect, leading to biased test results.

However, there is a variant of the Hausman test, based on a CRE model, that can be made robust against autocorrelated error and is also more intuitive than the original test.


The CRE approach provides a simple formal way of choosing between FE and RE approaches:

  • The CRE model differ from a pure RE model because of the additional inclusion of the time averaged variables \mathbf {\bar x}_i in Equation 9.68 into the model

    • If the parameter vector \boldsymbol \delta of \mathbf {\bar x}_i is \mathbf 0 in Equation 9.68, the CRE becomes a RE model

    • Furthermore, in this case, as Equation 9.67 shows, there is no correlation between \mathbf {\bar x}_i and hence \mathbf {x}_{it} on the one hand and the individual heterogeneity a_i on the other, which is the main assumption for applying RE

  • Thus, the H0 that RE is sufficient (that we do not need the CRE and therefore the FE approach) is \boldsymbol \delta = \mathbf 0 14

  • To carry out the test, we estimates a CRE model, like Equation 9.68, and test whether all coefficients of the time averaged variables \mathbf {\bar x}_i are zero

Let’s do that for our wage equation:

# We have stored the results of the CRE model in the list CRE 
# We test, whether all between transformed (time averaged) variables 
# have a zero coefficient

lht(CRE, matchCoefs(CRE,"B_"))
Linear hypothesis test

Hypothesis:
B_expersq = 0
B_married = 0
B_union = 0

Model 1: restricted model
Model 2: lwage ~ educ + black + hisp + exper + expersq + married + union + 
    year + B_exper + B_expersq + B_married + B_union

  Res.Df Df  Chisq Pr(>Chisq)    
1   4345                         
2   4342  3 26.361  8.013e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Once again, the H0 is clearly rejected, meaning that we need a FE or CRE model.


Now the same with a cluster-robust covariance matrix. We get an even stronger rejection of the H0.

lht(CRE, matchCoefs(CRE,"B_"), vcov=vcovHC)
Linear hypothesis test

Hypothesis:
B_expersq = 0
B_married = 0
B_union = 0

Model 1: restricted model
Model 2: lwage ~ educ + black + hisp + exper + expersq + married + union + 
    year + B_exper + B_expersq + B_married + B_union

Note: Coefficient covariance matrix supplied.

  Res.Df Df  Chisq Pr(>Chisq)    
1   4345                         
2   4342  3 30.038  1.355e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Finally, we tried the a CRE model which was estimated by pooled OLS, also with a cluster-robust covariance matrix. Surprisingly, we get exactly the same result as above (without a cluster-robust covariance matrix we get a clearly biased outcome)

lht(CRE1, matchCoefs(CRE1,"B_"), vcov=vcovHC)
Linear hypothesis test

Hypothesis:
B_expersq = 0
B_married = 0
B_union = 0

Model 1: restricted model
Model 2: lwage ~ educ + black + hisp + exper + expersq + married + union + 
    year + B_exper + B_expersq + B_married + B_union

Note: Coefficient covariance matrix supplied.

  Res.Df Df  Chisq Pr(>Chisq)    
1   4345                         
2   4342  3 30.038  1.355e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Remark: The phtest() function has an option to carry out an CRE-based Hausman test, using the option: method = "aux". However, this procedure doesn’t seem to work properly. It crashed when using time-dummies.

Code
# `phtest()` function with "aux" option
# You have to use a model formula as first argument

## phtest(lwage ~ educ + black + hisp + exper + expersq + married + union + year, 
## data = pwagepan, method = "aux", vcov = vcovHC)

# This would lead to an error message: 
# Error in vcov(auxmod)[range, range] : subscript out of bounds

9.5.5 The Hausman-Taylor IV Estimator 15

We considered the FE, LSDV and DIF estimators which provide consistent estimates but not for the coefficients of time-invariant regressors because these are swept away by these data transformations (absorbed into the fixed effects)

The Model

  • The Hausman-Taylor estimator is an IV estimator that additionally enables the consistent estimation of the coefficients of time-invariant regressors. It does so by assuming that some specified regressors are uncorrelated with the unobserved individual heterogeneity a_i

  • In particular, we have the model

y_{it} = \mathbf x_{{1it}} \boldsymbol \beta_1 + \mathbf x_{2{it}} \boldsymbol \beta_2 + \mathbf z_{1i} \boldsymbol \gamma_{1i} + \mathbf z_{2i} \boldsymbol \gamma_{2i} + \underbrace {(a_i + u_{it})}_{e_{it}} \tag{9.70}

  • Or stacked to matrices

\mathbf{y}=\mathbf{X}_1 \boldsymbol\beta_1+\mathbf{X}_2 \boldsymbol\beta_2+\mathbf{Z}_1 \boldsymbol\gamma_1+\mathbf{Z}_2 \boldsymbol\gamma_2 + \underbrace {(\mathbf{a}+\mathbf{u})}_{\mathbf e} \tag{9.71}

  • Thereby, the time-varying regressors are denoted by \mathbf X_{1 \text { or } 2} and the time-invariant ones (the observed individual heterogeneity) are denoted by \mathbf Z_{1 \text { or } 2}

    • The regressors [\mathbf X_1, \mathbf Z_1] are strict exogenous with respect to a_i, whereas the regressors [\mathbf X_2, \mathbf Z_2] are allowed to be correlated with a_i

    • All regressors have to be exogenous 16with respect to the idiosyncratic error u_{it}. If this is not the case, you have to employ a usual IV-estimator with external instruments, which are exogenous with respect to u_{it} 17


The Instruments

  • As [\mathbf X_1, \mathbf Z_1] are exogenous, they need not to be instrumented, i.e., they can serve as their own (internal) instruments

  • So, we need “external” instruments for [\mathbf X_2, \mathbf Z_2]

    • Hausman and Taylor (1981) suggest to use the within-transformed \mathbf {\ddot X}_2 as instruments for \mathbf X_2 (because the \mathbf {\ddot X}_2 are uncorrelated with a_i by construction 18) and

    • the between-transformed \mathbf {\bar X}_1 as instruments for \mathbf Z_2, because \mathbf X_1 is strict exogenous by assumption and therefore the same is true for their time-averages \mathbf {\bar X}_1

      • \mathbf {\ddot X}_2 and \mathbf {\bar X}_1 are external instruments, because they are not part of the original model, Equation 9.71, i.e., they contain data information from other time periods
  • So, we have the instrument matrix: \mathbf W = [\mathbf X_1, \mathbf {\ddot X}_2, \mathbf Z_1, \mathbf {\bar X}_1]. Because \mathbf {\ddot X}_1, is a linear combination of \mathbf {X}_1 this is equivalent to

\mathbf W = [\mathbf {\ddot X}_1, \mathbf {\ddot X}_2, \mathbf Z_1, \mathbf {\bar X}_1] \tag{9.72}


The Estimation

  • One could now proceed and estimate Equation 9.71 by a IV procedure

  • However, Hausman and Taylor (1981) instead proceeded with the quasi time-demeaned RE model, Equation 9.73, for efficiency reasons 19

\mathbf{\breve y}=\mathbf{\breve X}_1 \boldsymbol\beta_1+\mathbf{\breve X}_2 \boldsymbol\beta_2 + \mathbf{\breve Z}_1 \boldsymbol\gamma_1 + \mathbf{\breve Z}_2 \boldsymbol\gamma_2 + \underbrace {(\mathbf{\breve a}+\mathbf{\breve u})}_{\mathbf {\breve e}} \tag{9.73}

  • The IV estimation procedure with instruments \mathbf W applied to Equation 9.73 should now deliver consistent and efficient estimates of \boldsymbol \beta and \boldsymbol \gamma. Yet, we have to note:

    1. The model is only identified, if the number of time-varying exogenous variables in \mathbf X_1 is at least as large as the number of time-constant endogenous variables in \mathbf Z_2

    2. Before applying IV estimation to Equation 9.73, we nevertheless have to estimate the variances \sigma_u^2 and \sigma_a^2. These determine \theta from Equation 9.62, which is necessary to adopt the quasi time-demeaning transformation from Equation 9.64 to Equation 9.73. To this matter, Hausman and Taylor (1981) proceeded the following way 20

      1. \sigma_u is estimated, as usual, by an FE model of Equation 9.71

      2. They adopted an IV estimation on the group means of the FE residuals from step i. with respect to [\mathbf {Z}_1, \mathbf Z_2], using the instruments [\mathbf {X}_1, \mathbf Z_1]. The residual variance of this regression delivers a consistent estimate of (\sigma_a^2 + \frac {1}{T}\sigma_u^2)

      3. Using the results from step i. and ii., both variances can be determined and thus, we have the necessary estimate \hat \theta


The Results

\boldsymbol {\hat \beta}_{HT} = \left({ {\mathbf{\breve X}'}} \mathbf P_{_{\mathbf W}} {\mathbf{\breve X}}\right)^{-1} \left({\mathbf{\breve X}'} \mathbf P_{_{\mathbf W}} {\mathbf{\breve y}}\right) \tag{9.74}

  • This is the usual 2SLS formula applied to the quasi time-demeaned variables with

\mathbf X := [\mathbf X_1, \mathbf X_2, \mathbf Z_1, \mathbf Z_2]

\boldsymbol \beta := [\boldsymbol \beta_1, \boldsymbol \beta_2, \boldsymbol \gamma_1, \boldsymbol \gamma_2]'

\mathbf W := [\mathbf {\ddot X}_1, \mathbf {\ddot X}_2, \mathbf Z_1, \mathbf {\bar X}_1]

\ \mathbf P_{_{\mathbf W}} := {\mathbf{W}} \left({\mathbf{W}'} {\mathbf{W}}\right)^{-1} {\mathbf{W}'}

  • Note, \mathbf P_{_{\mathbf W}} is the idempotent projection matrix, or Hat-matrix, regarding the instruments \mathbf W 22

  • We further get the usual 2SLS covariance matrix:

\operatorname {Var} (\boldsymbol {\hat \beta}_{HT}) = \sigma_{\breve e}^2\left({ {\mathbf{\breve X}'}} \mathbf P_{_{\mathbf W}} {\mathbf{\breve X}}\right)^{-1} \tag{9.75}

  • With plm(), one can also estimate a variant of the Hausman-Taylor estimator with an augmented set of instruments, suggested by Amemiya and MaCurdy (1986), with the option inst.method = "am". This estimator should be a little more efficient, as it might mitigate a possible weak instrument problem

Empirical example – wage equation

# Hausman-Taylor IV estimator

# Only educ is endogenous 

# The formula has now tree parts, separated by | .
#   1. part: the usual formula: y ~ explanatory variables.
#   2. part: the list of all instruments, internal + external.
#   3. part: the list of all endogenous variables.
# There is no need to discriminate between time-varying or time-constant variables,
# the procedure will do that for you.
# But you have to look, whether the number of time-varying exogenous variables 
# is at least as large than the number of time-constant endogenous variables: 
# In our case, we have 4 varying exogenous vars: exper, expersq, married, union, 
# and one time-constant endogenous variable: educ
# So, the model is identified

HT <- plm(lwage ~ educ + black + hisp + 
            exper + expersq + married + union | 
            black + hisp  + exper + expersq + married + union |
            educ,
          data = pwagepan, 
          model = "random",
          random.method = "ht",
          inst.method = "baltagi"
          )
summary(HT)
Oneway (individual) effect Random Effect Model 
   (Hausman-Taylor's transformation)
Instrumental variable estimation
   (Baltagi's transformation)

Call:
plm(formula = lwage ~ educ + black + hisp + exper + expersq + 
    married + union | black + hisp + exper + expersq + married + 
    union | educ, data = pwagepan, model = "random", random.method = "ht", 
    inst.method = "baltagi")

Balanced Panel: n = 545, T = 8, N = 4360

Effects:
                 var std.dev share
idiosyncratic 0.1233  0.3511 0.527
individual    0.1108  0.3329 0.473
theta: 0.6506

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-13.03022  -0.40651   0.06607   0.52849   4.39476 

Coefficients:
               Estimate  Std. Error z-value  Pr(>|z|)    
(Intercept) -0.26424642  0.18936474 -1.3954 0.1628843    
educ         0.11435287  0.01566574  7.2996 2.887e-13 ***
black       -0.13971986  0.04875583 -2.8657 0.0041608 ** 
hisp         0.03291060  0.04518652  0.7283 0.4664130    
exper        0.11159180  0.00827684 13.4824 < 2.2e-16 ***
expersq     -0.00398352  0.00059813 -6.6600 2.738e-11 ***
married      0.06086456  0.01683741  3.6148 0.0003005 ***
union        0.10649729  0.01784304  5.9686 2.394e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    653.17
Residual Sum of Squares: 4357.1
R-Squared:      0.17788
Adj. R-Squared: 0.17656
Chisq: 872.42 on 7 DF, p-value: < 2.22e-16
# educ, married and exper are endogenous 
HT1 <- plm(lwage ~ educ + black + hisp + 
            exper + expersq + married + union | 
            black + hisp + expersq + union |
            educ +  married + exper,
          data = pwagepan, 
          model = "random",
          random.method = "ht",
          )

# educ, married and exper are endogenous, Amemiya and MaCurdy (1986) instruments
HT1AM <- plm(lwage ~ educ + black + hisp + 
            exper + expersq + married + union | 
            black + hisp + expersq + union |
            educ +  married + exper,
          data = pwagepan, 
          model = "random",
          random.method = "ht",
          inst.method = "am"
          ) 
Code
modelsummary(list("Pooled"=Pooled_, "RE"=RE_,  "FE"=FE_,  "CRE"=CRE , "HT"=HT, "HT_1"=HT1, "HT_1_AM"=HT1AM),
            vcov = c(vcovHC, "classical", "classical", "classical", "classical", "classical", "classical"),
            stars = TRUE, 
            fmt = 3,
            statistic = c('statistic'),
            gof_omit = "A|L|B|F|RM",
            coef_omit = "Inter|year",
            align = "cddddddd",
            output = "flextable"
            ) 
Table 9.5:

Wage equation: Results of several panel data models, including Hausman-Taylor IV estimates (t-statistics in brackets)

Pooled

RE

FE

CRE

HT

HT_1

HT_1_AM

educ

     0.099***

   0.101***

   0.095***

   0.114***

   0.101***

   0.111***

   (10.812)  

 (11.357)  

  (8.676)  

  (7.300)  

 (11.147)  

  (7.102)  

black

    -0.144** 

  -0.144** 

  -0.139** 

  -0.140** 

  -0.144** 

  -0.144** 

   (-2.875)  

 (-3.027)  

 (-2.840)  

 (-2.866)  

 (-2.969)  

 (-2.959)  

hisp

     0.016   

   0.020   

   0.005   

   0.033   

   0.020   

   0.029   

    (0.401)  

  (0.473)  

  (0.112)  

  (0.728)  

  (0.465)  

  (0.645)  

exper

     0.089***

   0.112***

   0.117***

   0.117***

   0.112***

   0.112***

   0.113***

    (7.179)  

 (13.572)  

 (13.878)  

 (13.878)  

 (13.482)  

 (13.622)  

 (13.621)  

expersq

    -0.003** 

  -0.004***

  -0.004***

  -0.004***

  -0.004***

  -0.004***

  -0.004***

   (-3.278)  

 (-6.875)  

 (-7.106)  

 (-7.106)  

 (-6.660)  

 (-6.904)  

 (-6.786)  

married

     0.108***

   0.063***

   0.045*  

   0.045*  

   0.061***

   0.062***

   0.046*  

    (4.135)  

  (3.744)  

  (2.474)  

  (2.474)  

  (3.615)  

  (3.703)  

  (2.518)  

union

     0.180***

   0.107***

   0.082***

   0.082***

   0.106***

   0.106***

   0.107***

    (6.540)  

  (6.022)  

  (4.255)  

  (4.255)  

  (5.969)  

  (5.966)  

  (5.997)  

B_exper

  -0.167** 

 (-3.278)  

B_expersq

   0.009** 

  (2.884)  

B_married

   0.098*  

  (2.182)  

B_union

   0.189***

  (3.742)  

Num.Obs.

  4360      

4360      

4360      

4360      

4360      

4360      

4360      

R2

     0.187   

   0.178   

   0.178   

   0.183   

   0.178   

   0.178   

   0.178   

Std.Errors

Custom      

 IID      

 IID      

 IID      

 IID      

 IID      

 IID      

+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001


  1. In this text we always assume that there are no error correlations across units i. Prominent examples for correlations across units are geographic vicinity or some other sorts of clusters. Typically, there is a natural ordering of the units or clusters in this case.↩︎

  2. In this text, we are always assuming a balanced panel, which means that every unit i has the same number of observations over time – no missing observations. Unbalanced panels are somewhat harder to treat, but don’t lead to major different insights.↩︎

  3. The block-diagonal matrix \mathbf D equals (\mathbf I_n \otimes \mathbf i). Thereby the Kronecker product \mathbf A \otimes \mathbf B means that every element of \mathbf A is multiplied by the matrix \mathbf B.↩︎

  4. n-consistent means consistency if n \rightarrow \infty and T is constant.↩︎

  5. T-consistent means consistency if T \rightarrow \infty and n is constant.↩︎

  6. As the DIF estimator leads to autocorrelated errors \Delta u_{it} in this case, i.e. to an MA(1) process, see Equation 9.15.↩︎

  7. See the slides: Multiple Regression, or Wooldridge (2019) chapter 3.↩︎

  8. The presentation of this example follows very narrowly Croissant and Millo (2018), pp. 3-5.↩︎

  9. See below for a very detailed proof or Wooldridge (2010), Chapt. 10, p. 326, or Hansen (2022), Chapt. 17.15, p. 614 ff.↩︎

  10. See Green (2017), Chapt. 11.↩︎

  11. The variable exper could be special case if we had estimated the equation with a full set of time-dummy variables (variable year). This variable is time varying, but in a special manner: experience increases at a constant rate for every individual. And if we had estimated the model with a full set of time-dummy variables (variable year), the variable exper would have become a constant variable (FWL theorem).↩︎

  12. See Green (2017), Chapt. 11.↩︎

  13. See Green (2017), Chapt. 11↩︎

  14. See, Wooldridge (2019), Chapt. 14.↩︎

  15. See the short and concise treatment in @cameron2010microeconometrics.↩︎

  16. Strict exogenous for the RE and FE transformations.↩︎

  17. These IV-procedures usually poses no special difficulties, see Green (2017), Chapt. 11.8.1. For a general treatment of IV estimators see Section 7.5.2.↩︎

  18. We have: E((x_{it}-\bar x_i)a_i) = E(x_{it} a_i-\bar x_ia_i) = E_a[E(x_{it} a_i-\bar x_ia_i|a_i]) = E_a[E (x_{it} a_i | a_i) -E( \bar x_ia_i|a_i) ] = E_a[a_i E (x_{it} | a_i) - a_i E( \bar x_i |a_i) ] = E_a[a_i E (x_{it} | a_i) - a_i E( x_i |a_i) ] = E_a[0] = 0.↩︎

  19. Note that quasi time-demeaning of time-invariant variables like \mathbf Z or \mathbf a leads to (1-\theta )\mathbf Z. But the factor (1-\theta) cancels in a balanced data set and therefore \mathbf {\breve Z} would be equal to \mathbf Z in this case.↩︎

  20. For details see Green (2017), Chapt. 11.8.2, or Baltagi and Baltagi (2005), p. 126 ff.↩︎

  21. For unbalanced panels, there are minor changes as \theta is different for the units i in this case, see Hansen (2022), p. 633.↩︎

  22. Projecting into the space spanned by the instruments \mathbf W.↩︎