Mainly based on Wooldridge (2019), Chapters 15

7.1 Main Causes of the Problem

Endogeneity problems are endemic in social sciences/economics.

Possible reasons are:

  • Omitted variables: In many cases, important variables (e.g., personal characteristics) cannot be observed and thus are part of the error term

    • These are often correlated with the observed explanatory variables leading to a violation of MLR.4’ – E(u_i|\mathbf x_i)\neq0, see Section 2.7) – which we denote as endogeneity problem. This problem implies biased and inconsistent estimates of the parameters

    • Example: \log(wages_i) = \beta_0 + \beta_1 educ_i + \beta_3 exper_i + u_i. The important variable ability_i is omitted (as not easily observable) and hence part of u_i. But ability_i is probably correlated with educ_i, violating MLR.4. Therefore, \beta_1 does not only measure the partial effect of educ_i but indirectly also the effect of ability

  • Measurement errorin explanatory variables may also lead to endogeneity

    • Example: \log(wages_i) = \beta_0 + \beta_1 educ_i + \beta_3 exper_i + u_i. Suppose, we cannot measure experience accurately, we only observe exper_i^B = exper_i + e_i, e_i being a measurement error. So we have: \log(wages_i) = \beta_0 + \beta_1 educ_i + \beta_3 exper_i^B + (u_i - \beta_3 e_i), with the new error term (u_i - \beta_3 e_i) being correlated with the observed explanatory variable exper_i^B, violating MLR.4
  • Simultaneous equations, reversed causality and feedbacks are an additional source of endogeneity

    • Example: We want to estimate a makro consumption function: c_t = \beta_0 + \beta_1 y_t + u_t, especially the marginal propensity to consume \beta_1. But we also have the VGR identity y_t=c_t+i_t+g_t. Thus, whenever u_t is high, consumption c_t is high, but income y_t as well. Therefore, u_t and y_t are positively correlated, violating MLR.4 and leading to an overestimation of \beta_1; this parameter also measures c_t \rightarrow y_t
  • Lagged endogenous variables as explanatory variables in connection with autocorrelated errors. This is important for dynamic models, especially for panel data - dynamic panel bias (not to be confused with unobserved fixed effects in panel data, which is an omitted variable problem)

  • Non-random sample, self selection

Solutions to the endogeneity:
  • Proxy variables method for omitted regressors

  • Model for selection process

  • Fixed effects methods if

    1. panel data are available,
    2. endogeneity is time-constant, and
    3. regressors are not time-constant
  • Instrumental variables methods (IV)

    • IV estimators are the most prominent method to address endogeneity problems

7.2 Main idea of IV estimation

The main causes for endogeneity of explanatory variables discussed above are so common that nearly every empirical work is more or less affected by this problem

  • Assume, our model is the following with \tilde x_i being endogenous, i.e. correlated with u_i, violating MLR.4’ (therefore the tilde over x_i). This is the so called structural equation which describes the causal effect we want to estimate

y_i = \beta_0 + \beta_1 \tilde x_i + u_i \tag{7.1}

  • The method of instrumental variables is a remedy for the endogeneity problem

  • The main idea is that the variables \tilde {\mathbf x}_i which are correlated with u_i are “replaced” in some way with instruments

These instruments should contain additional information (outside of Equation 7.1) to help resolve the endogeneity problem, i.e., to disentangle the looked for partial effect of \tilde {\mathbf x}_i from feedbacks or other sources of correlation which we discussed above

  • The external instruments (we denote them \mathbf z_i) have to satisfy the following three conditions: 1

    1. \mathbf z_i have to be (weak) exogenous; Cov (z_i, u_i) = 0, see Section 2.7

    2. \mathbf z_i must not be a part of the structural equation of interest – exclusion restrictions.
      We need external instruments with additional outside information!

    3. \mathbf z_i have to be relevant; Cov (z_i, \tilde x_i) \neq 0, indeed, the correlation between z_i and \tilde x_i should be as high as possible

  • From the first and third requirement, we can easily drive the IV estimator for one explanatory variable and one instrument


7.3 The IV estimator

  • Based on Cov (z_i, u_i) = 0 we can derive a method of moments estimator. From Equation 7.1, we have u_i = y_i - \beta_0 - \beta_1 \tilde x_i. Plug this into Cov (z_i, u_i)

Cov \left( z_i, (y_i - \beta_0 - \beta_1 \tilde x_i) \right) \ = \ Cov(z_i,y_i) - \beta_1 Cov(z_i,\tilde x_i) \, = \, 0 \ \ \Rightarrow

\beta_1 = \dfrac{Cov(z_i,y_i)}{Cov(z_i,\tilde x_i)} \tag{7.2}

  • The parameter is estimable (identified) because we can write down \beta_1 in terms of population moments which can be replaced with their empirical counterparts to reach to the IV estimator for \beta_1

\hat\beta_{1,IV} = \dfrac{\frac {1}{n}\sum_i (z_i-\bar z)(y_i-\bar y)} {\frac {1}{n}\sum_i (z_i-\bar z)(\tilde x_i-\bar x)} \tag{7.3}

  • If every variable is well behaved, we can apply the LLN and it follows that Equation 7.3 converges to Equation 7.2 with an ever increasing sample size. Hence, \hat\beta_{1,IV} is a consistent estimator for \beta_1, whereas the OLS estimator

\hat\beta_{1} = \dfrac{\frac {1}{n}\sum_i (\tilde x_i-\bar x)(y_i-\bar y)} {\frac {1}{n}\sum_i (\tilde x_i-\bar x)^2} \ \tag{7.4}

is not; because \tilde x_i is correlated with u_i by assumption


How to find instruments?

  • The consistency of the IV estimators relies on the exogeneity of z_i. Unfortunately, this exogeneity cannot be tested directly (without additional information), hence we have to assume this – based on economic theory, common sense or introspection

    • If we have more external instruments as needed (more than one in this example), we can test whether the instruments are exogenous as a group; this will be discussed later – Sargan J test
  • In practice, the main difficulty with IV estimators is to find appropriate instruments. Let us consider our good old wage equation:

wage_i = \beta_0 + \beta_1 educ_i + \beta_2 exper_i + \underbrace {(ability + v_i)}_{u_i} \tag{7.5}

  • We are interested in the partial effect of education on the wage. But we probably have an omitted variable problem as ability of the people is clearly important for the received wage and is not directly observable and thus, ability is a part of u_i

    • However, ability and therefore u_i are probably correlated with educ – people with higher ability also tend to be more educated. But this violates MLR.4’ and thus, educ is endogenous
  • So, we need at least one external instrument for educ, which is

    • not part of Equation 7.5
    • is relevant
    • is exogenous (not correlated with the error term and thus correctly excluded from the main model)

There have been proposed several instruments for this matter
  • The education of the mother or father

    1. No direct wage determinant
    2. Correlated with education of the child because of social factors
    3. Probably (?) uncorrelated with innate ability (problem: ability may be inherited from parents)
  • The number of siblings

    1. No direct wage determinant
    2. Correlated with education because of resource constraints in household
    3. Probably uncorrelated with innate ability
  • College proximity when 16 years old

    1. No direct wage determinant
    2. Correlated with education because more education if lived near college
    3. Uncorrelated with error (?)
  • Month of birth

    1. No direct wage determinant
    2. Correlated with education because of compulsory school attendance laws (in German: Schulpflicht)
    3. Uncorrelated with error
  • In all these cases one could question the exogeneity of the proposed instrument or their relevance, or even both. However, at least the relevance can be tested

  • In the following, we estimate Equation 7.5 as an example with OLS and IV

    • Note the coefficients of the endogenous educ (this actually is expected to be over- and not underestimated by OLS – maybe an additional errors in variables problem?) and the considerably larger standard errors of educ

Example: IV versus OLS

library(wooldridge); library(AER); library(texreg)
data("card")


# OLS
ols <- lm(lwage ~ educ + exper + I(exper^2) + black + smsa + south, data=card)


# IV with father education as instrument 
# Note, in ivreg the instruments are after "|" and you have to include all (!)  
# exogenous variables of the model but not the endogenous educ
iv1 <-  ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | 
                fatheduc + exper + I(exper^2) + black + smsa + south, 
              data=card)


# IV with mother education as instrument 
iv2 <-  ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | 
                motheduc + exper + I(exper^2) + black + smsa + south, 
              data=card)


# IV with father and mother education as instruments 
iv12 <-  ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | 
                fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, 
               data=card)


# IV with nearc4 (proximity to a 4 year collage) as instrument 
iv3 <-  ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | 
                nearc4 + exper + I(exper^2) + black + smsa + south, 
              data=card)

Code
library(modelsummary)
modelsummary( list("OLS"=ols,
                   "IV-fath"=iv1, "IV-moth"=iv2, "IV-fath_moth"=iv12, "IV-nearc4"=iv3), 
              gof_omit = "A|L|B|F",
              align = "lddddd", 
              stars = TRUE, 
              fmt = 3,
              output="gt")
Table 7.1:

Comparison of OLS with IV estimates, using different instruments (se in brackets)

OLS IV-fath IV-moth IV-fath_moth IV-nearc4
(Intercept)    4.734***    4.467***    4.266***    4.264***    3.753***
  (0.068)     (0.238)     (0.234)     (0.219)     (0.829)  
educ    0.074***    0.089***    0.102***    0.100***    0.132** 
  (0.004)     (0.014)     (0.014)     (0.013)     (0.049)  
exper    0.084***    0.093***    0.095***    0.099***    0.107***
  (0.007)     (0.010)     (0.009)     (0.010)     (0.021)  
I(exper^2)   -0.002***   -0.002***   -0.002***   -0.002***   -0.002***
  (0.000)     (0.000)     (0.000)     (0.000)     (0.000)  
black   -0.190***   -0.160***   -0.168***   -0.151***   -0.131*  
  (0.018)     (0.026)     (0.024)     (0.026)     (0.053)  
smsa    0.161***    0.155***    0.146***    0.151***    0.131***
  (0.016)     (0.019)     (0.018)     (0.020)     (0.030)  
south   -0.125***   -0.113***   -0.116***   -0.107***   -0.105***
  (0.015)     (0.018)     (0.017)     (0.018)     (0.023)  
Num.Obs. 3010       2320       2657       2220       3010      
R2    0.291       0.264       0.274       0.253       0.225   
RMSE    0.37        0.38        0.38        0.38        0.39    
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

7.4 The relevance of relevance

  • Besides the exogeneity, the relevance of the instrument, Cov(z_i,\tilde x_i), plays an extreme important role. To show this we subtract from model Equation 7.1 the corresponding mean values and premultiply by (z_i-\bar z)

(z_i-\bar z) (y_i-\bar y) = \beta_1 (z_i-\bar z)(\tilde x_i-\bar x) + (z_i-\bar z)u_i

  • Taking the expectation we get

E[(z_i-\bar z) (y_i-\bar y)] = \beta_1 E[(z_i-\bar z)(\tilde x_i-\bar x)] + E[(z_i-\bar z)u_i] \ \ \Rightarrow

Cov(z_i,y_i) = \beta_1 Cov(z_i,\tilde x_i) + Cov(z_i,u_i)

  • And dividing by Cov(z_i,\tilde x_i)

\dfrac {Cov(z_i,y_i)}{Cov(z_i,\tilde x_i)} = \beta_1 + \dfrac {Cov(z_i,u_i)}{Cov(z_i,\tilde x_i)}

  • Replacing the theoretical moment by their empirical counterparts, we recognize that the left hand side is equal to the IV estimate of \beta_1, Equation 7.3. Taking the probability limits (Equation A.9) we arrive to

\operatorname {plim} \hat\beta_{1,IV} \ = \ \beta_1 + \dfrac{\operatorname {plim} \frac {1}{n}\sum_i (z_i-\bar z)u_i} {\operatorname {plim} \frac {1}{n}\sum_i (z_i-\bar z)(\tilde x_i-\bar x)} \ = \ \beta_1 + \dfrac {\operatorname {Corr}(z_i,u_i)}{\operatorname {Corr}(z_i,\tilde x_i)} \dfrac {\sigma_u}{\sigma_{\tilde x}} \tag{7.6}


  • As Equation 7.6 shows, \hat \beta_{1,IV} is consistent if the correlation between z_i and u_i is zero, \operatorname {Corr}(z_i,u_i)=0, i.e., z is exogenous

  • Suppose, this correlation is not exactly zero, but small. In this case we would expect only a small asymptotic bias in \hat \beta_{1,IV}

    • However, if additionally, \operatorname {Corr}(z_i,\tilde x_i) is small as well – if z_i is not relevant – the bias in the IV estimate could become considerable large. \rightarrow Weak instrument problem
  • We can derive a relation similar to Equation 7.6 for the OLS estimate; we subtract the corresponding mean values from model Equation 7.1 and premultiply by (\tilde x_i-\bar x). This yields

\operatorname {plim} \hat\beta_{1,OLS} \ = \ \beta_1 + \dfrac{\operatorname {plim} \frac {1}{n}\sum_i (\tilde x_i-\bar x)u_i} {\operatorname {plim} \frac {1}{n}\sum_i (\tilde x_i-\bar x)(\tilde x_i-\bar x)} \ = \ \beta_1 + \dfrac {\operatorname {Corr}(\tilde x_i,u_i)}{\operatorname {Var}(\tilde x_i)} {\sigma_u \sigma_{\tilde x}} \ = \

\beta_1 + \operatorname {Corr}(\tilde x_i,u_i) \frac {\sigma_u}{\sigma_{\tilde x}} \tag{7.7}

  • Thus, besides {\sigma_u}/{\sigma_{\tilde x}} (explain why) the asymptotic bias of the OLS estimates depends on \operatorname {Corr}(\tilde x_i,u_i)

  • However, if we have a weak instrument problem, it is easily possible that the asymptotic bias of the IV estimate is even larger than that of the OLS estimate;

\frac{\operatorname{Corr}(z_i, u_i)}{\operatorname{Corr}(z_i, \tilde x_i)}>\operatorname{Corr}(\tilde x_i, u_i) \text { e.g. } \frac{0.03}{0.2}>0.1


7.5 Two stage least square (2SLS)

Suppose, we want to estimate a more elaborated structural model equation

y = \beta_0 + \beta_1 x_1 + \cdots + \beta_{l} x_{l} + \beta_{j} \tilde x_j + u \tag{7.8}

with l exogenous variables, x_1, \ldots , x_l, and one endogenous, \tilde x_j.

In the introduction we argued that in the case of endogenous regressors we have to “replace” this variable by an exogenous and relevant instrument. But we were not specific what replace really means

  • Here, replace is not meant literally in the sense that we actually replace \tilde x with z in Equation 7.8 and then apply OLS. This would be a Proxy variable approach, which might be sometimes useful with an error in the variables or omitted variable problem 2

    • As an example, if we directly replace educ in our wage equation Equation 7.5 by an exogenous proxy variable z (for instance with fatheduc ), the OLS-coefficient of z would generally not estimate the effect of education on earned wage, \beta_1, and probably introduce an additional errors in the variables problem
  • The IV approach is another one: We do not replace \tilde x with z, but rather with the predicted values of a regression of \tilde x on all the exogenous variables of the model including the external instrument z. We denote this predicted values with \hat x and call this the

    first step regression or reduced form regression

\tilde x_j \, = \, \gamma_0 + \gamma_1 x_1 + \cdots + \gamma_{l} x_{l} \, + \, \gamma_{l+1}z_1 + \cdots + \gamma_{l+m}z_m + e \tag{7.9}

  • Here, x_1,\ldots,x_{l} are the l exogenous variables of the model (sometimes called internal instruments), for instance exper and exper^2 in our wage equation, and z_1,\ldots,z_m are the m external instruments

  • We estimate the coefficients \gamma of Equation 7.9 by OLS, leading to the predicted values \hat x_j

\tilde x_j \, = \, \underbrace{ \hat \gamma_0 + \hat \gamma_1 x_1 + \cdots + \hat \gamma_{l} x_{l} \, + \, \hat \gamma_{l+1}z_1 + \cdots + \hat \gamma_{l+m}z_m}_{\hat x_j} + \hat e \quad \Rightarrow \tag{7.10}

\tilde x_j \, = \, \hat x_j + \hat e \tag{7.11}


Remark: The reduced form Equation 7.9 is the functional relationship of an endogenous variable dependent only on exogenous variables (the exogenous variables and error terms drive the endogenous ones – data generating process) and is related to simultaneous equation models


  • In the second step regression we estimate the original model, but with \hat x_j in place of \tilde x_j, i.e., we insert \hat x_j + \hat e from Equation 7.11 for \tilde x_j in Equation 7.8

y = \beta_0 + \beta_1 x_1 + \cdots + \beta_{l} x_{l} + \beta_{j} \hat x_j + \underbrace { (u + \beta_j \hat e)}_v \tag{7.12}

  • The new error v is composed of two components:

    • The error from the structural model, u. This is uncorrelated with the exogenous variables x_1,\ldots,x_{l} and is now also uncorrelated with \hat x_j, because the latter is a linear combination of x_1,\ldots,x_{l} and the exogenous external instruments z_1,\ldots,z_{m} from the first stage regression, Equation 7.10

    • The residuals of the first stage regression, \hat e. But these pose no problems (besides the larger error variance), because the residuals are uncorrelated with all variables in Equation 7.12 by construction (orthogonality property) – hence, no errors in the variables problem and no violation of MLR.4’ 3

  • Hence, the parameters of Equation 7.12 can be consistently estimated by OLS

Remark: The 2SLS residuals, and subsequently the residual variance \hat \sigma^2, have to computed by

\hat u \ = \ y - \underbrace {\hat\beta_0 + \hat\beta_1 x_1 + \cdots + \hat\beta_{l} x_{l} + \hat\beta_{j} \tilde x_j}_{\hat y} \tag{7.13}

  • Thereby, the 2SLS estimates of the \betas are used, but with the original variable \tilde x_j and not with \hat x_j. This procedure yields \hat u_i from Equation 7.8 and not \hat v_i from Equation 7.12

  • After estimating \sigma^2 by Equation 2.35 with residuals based on Equation 7.13, all tests can be carried out in the usual way


Example:

Suppose our wage equation is

wage_i \ = \ \beta_0 + \beta_1 educ_i + \beta_2 exper_i + \beta_3 exper_i^2 + u_i \tag{7.14}

  • Because of the unobserved ability (which is therefore part of u_i) and the fact that ability is probably correlated with education the variable education is endogenous. Therefore, we need instruments for education, which are not part of the model, are exogenous and relevant

  • Suppose we have two external instruments for education: education of the mother and education of the father. We already discussed them before

  • Thus, in the first stage regression we regress educ on all exogenous variables of the model (internal instruments) and the two external instrumental variables

educ_i = \gamma_0 + \gamma_1 exper_i + \gamma_2 exper_i^2 + \gamma_3 fatheduc_i + \gamma_4 motheduc_i + e_i \tag{7.15}

  • This first stage regression yields the predicted values \widehat {educ}

  • In the second stage we estimate the original model, but with \widehat {educ} instead of educ

wage_i \ = \ \beta_0 + \beta_1 \widehat {educ_i} + \beta_2 exper_i + \beta_3 exper_i^2 + v_i \tag{7.16}

  • This two-step procedure yield consistent estimates for all \betas; therefore the name two stage least square

Remark: Usually, if we have exactly as many external instrument as right hand side (rhs) endogenous variables, we call the procedure (ordinary) IV, otherwise 2SLS. But this labeling doesn’t seems to be unanimously used

  • Why does this procedure work

    • All variables in the second stage regression are exogenous because educ was replaced by a prediction, only based on exogenous information

    • By using the prediction based on exogenous information, educ is purged of its endogenous part (the part that is related to the error term)

    • Thus, only that part of educ remains which is exogenous. And this is the reason why the coefficient of \widehat {educ} represents the causal effect of education on received wages (and not some mixtures of effects, see Section 1.3.2)


7.5.1 2SLS – variance of estimates

  • The most important downside of IV/2SLS estimations is that the variance of the IV/2SLS estimates are generally considerably larger than that of OLS estimates, i.e. they are less precise (look at Table 7.1)

  • Therefore, IV/2SLS need large samples to be useful

  • Below, the formulas for the variance of OLS estimates and the formula for IV/2SLS estimates is shown

\operatorname {Var}(\hat \beta_{j,OLS}) \ = \ \dfrac{\sigma^2}{ \underbrace{SST_j}_{\sum_{i=1}^n (x_{ij} - \bar x_j)^2} (1-R_j^2) }

\operatorname {Var}(\hat \beta_{j,2SLS}) \ = \ \dfrac{\sigma^2}{ \underbrace{SST_j}_{\sum_{i=1}^n (\hat x_{ij} - \bar x_j)^2} (1 - R_{j}^2) } \tag{7.17}

  • These formulas only differ in that for calculation of the latter one, the explanatory variable x_j is replaced with its prediction from the first step regression, \hat x_j

The variance of the IV/2SLS estimate \beta_j
  • increases with the error variance \sigma^2 and decreases with sample size n

  • decreases with the total variation of the predicted values \hat x_j

  • increases with R_{j}^2, which is the R^2 of a regression of \hat x_j on all the other explanatory x-es

The last two points are always (considerably) worse for IV/2SLS than for OLS and even worsens more with poor or weak instruments

  • The error variance \sigma_v^2 of the second stage regression is larger, because the error term additionally contains the first stage residuals. However, the residuals are purged form this effect if they are correctly computed by Equation 7.13

  • The variation of a predicted variable, SSE, is always less than the variation of the original variable, SST. The definition of the R^2 is based in this ratio SSE/SST \leq 1; \rightarrow less variation of the corresponding explanatory variable \hat x_j

  • The R^2, i.e. the fit of a regression of the predicted variable \hat x_j on all the other x-es is always higher than the R^2 of a regression of the original variable x_j on all the other x-es

    • The reason is that in the first stage regression, x_j is regressed on all exogenous variables of the model (the other x-s) plus the external instruments. Hence, the predicted values of this regression, \hat x_j, are, besides the effects of the external instruments, a linear function of these other x-es. This implies: The correlation between \hat x_j and the other x-es is typically much higher than the correlation between the original x_j and the other x-es

    • With other words, IV/2SLS exhibit an inherent multicollinearity problem


7.5.2 Matrix notation for IV/2SLS

We have the structural model:
(Let’s assume that all variables are demeaned, so we can forget the intercept – otherwise, the intercept would be part of the exogenous model variables in \mathbf X and hence an internal instrument as well. The derived formulas are unaffected by this, and instead of k we would have k+1 respectively, instead of l we would have l+1.)

\mathbf y \, = \, \mathbf X \boldsymbol \beta + \mathbf u \tag{7.18}

Some of the variables in the n \times k matrix \mathbf X are endogenous, i.e., are correlated with \mathbf u. This leads to inconsistent OLS estimators \hat {\boldsymbol \beta}.

In the first step of 2SLS, we regress all k variables in \mathbf X on all l exogenous variables of the model from Equation 7.18 (internal instruments) plus the m external instruments. The data matrix (the n \times (l+m) instrument matrix) of regressors \mathbf Z contains both groups of variables. For identification we necessarily must have (l+m) \geq k.

  • For every ith row of \mathbf Z we assume E(\mathbf z_i \mid u_i)=\mathbf 0, hence, every column (variable) of \mathbf Z is (weak) exogenous with regard to the error term \mathbf u

Thus, we have for the k first step regressions:

\mathbf X \, = \, \mathbf Z \, \boldsymbol\Gamma + \mathbf E \tag{7.19}

  • with the (l+m) \times k coefficient matrix \mathbf \Gamma (the columns of \mathbf \Gamma containing the coefficients of the k first step regressions) and

  • the first step n \times k error term matrix \mathbf E

We need the predicted valuesthe linear projections – of the first stage regression Equation 7.19:

\hat {\mathbf X} \, = \, \mathbf Z \, \hat {\boldsymbol \Gamma} \, = \, \mathbf Z \, \underbrace {(\mathbf {Z'Z})^{-1} \mathbf Z'\mathbf X}_{\hat{\boldsymbol \Gamma}} \, = \, \mathbf P_{\mathbf Z} \mathbf X \tag{7.20}

Thereby, we have used the projection matrix (hat matrix):

\mathbf P_{\mathbf Z}=\mathbf Z (\mathbf {Z'Z})^{-1} \mathbf Z'

This matrix was introduced in Section 2.5.1, see also Equation C.11.

  • With regard to our example from Equation 7.14, the matrix \hat {\mathbf X} of the linear projections contains:

    1. The exogenous variables of the model including the intercept and

    2. The predicted values of the endogenous variables, \widehat {educ} in our example;

    Hence: \hat {\mathbf X} = [1, exper, exper^2, \widehat {educ}]

  • Regarding a), the exogenous variables of the model; this follows directly from the fact that we regress the exogenous model variables on themselves (these are part of the matrix \mathbf Z). Hence, we get a perfect fit for them and their predicted values are identical to the variables themselves. Thus, the exogenous variables of the model act as their own (internal) instruments

  • Applying OLS to the first step regressions Equation 7.19 implies:

\mathbf X \ = \underbrace {\hat {\mathbf X}}_ {\mathbf Z \hat {\mathbf \Gamma} } + \ \hat {\mathbf E} \tag{7.21}


In the second step of 2SLS we replace \mathbf X in Equation 7.18 with \hat {\mathbf X} + \hat{\mathbf E} from Equation 7.21 and arrive to

\mathbf y \, = \, \hat{\mathbf X} \boldsymbol \beta + \underbrace {(\mathbf u + \hat{\mathbf E} \boldsymbol \beta)}_{\mathbf v} \tag{7.22}

with some errors \mathbf v, which additionally to \mathbf u, contains the residuals of the first stage regressions \hat {\mathbf E}.

  • Note that \hat {\mathbf E} is uncorrelated with \hat{\mathbf X} as predicted values of OLS are always uncorrelated with the OLS residuals (orthogonality of the projection matrices \mathbf M and \mathbf P):

    \hat {\mathbf X'} \, \hat{\mathbf E} \ = \ \underbrace {\mathbf {X'P_{\mathbf Z}}}_{\hat {\mathbf X'}} \, \underbrace {\mathbf {M_{\mathbf Z}X}}_{\hat{\mathbf E}} = \mathbf 0

    • Leaving out one exogenous variable from Equation 7.18 in the first stage regression Equation 7.19 would not lead to a breakdown of the orthogonality of \hat {\mathbf E} and \hat{\mathbf X} and 2SLS would still be consistent but less efficient because the left out exogenous variable would be replaced by its linear projection on \mathbf Z 4

    • Also note that \hat {\mathbf X} is a linear combination of variables {\mathbf Z} which all are weak exogenous with respect to \mathbf u by assumption

      • Hence, the regressors of the second stage regression are weak exogenous with respect to the error term \mathbf v = (\mathbf u + \hat{\mathbf E}) of the second stage regression Equation 7.22. Assumption MLR.4’ of Section 2.7 is therefore fulfilled
  • Applying OLS to Equation 7.22 yields the IV estimates, respectively, the 2SLS estimates, which are consistent based on the augments above:

\hat {\boldsymbol \beta}_{IV} = (\hat{\mathbf X}'\hat{\mathbf X})^{-1}\hat{\mathbf X}'\mathbf y \tag{7.23}

\hat {\boldsymbol \beta}_{IV} = \boldsymbol \beta + (\hat{\mathbf X}'\hat{\mathbf X})^{-1}\hat{\mathbf X}'\mathbf u \tag{7.24}

  • Therefore, assuming iid errors \mathbf u with variance \sigma^2 and applying formula Equation C.7 leads to the the covariance matrix of \hat {\boldsymbol \beta}_{IV}

\operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) = \sigma^2 (\hat{\mathbf X}'\hat{\mathbf X})^{-1} \tag{7.25}

Finally we have to estimate \sigma^2, the variance of \mathbf u.

\hat {\mathbf u} \ = \ \mathbf y - \hat {\mathbf y} \ = \ \mathbf y - \mathbf X \hat {\boldsymbol \beta}_{IV} \tag{7.26}

  • And furthermore, the estimated variance of \mathbf u is

\hat \sigma^2 \ = \ \dfrac {1}{n-k} \hat {\mathbf u}' \hat {\mathbf u} \tag{7.27}


The two step procedure describe above can be accomplished in only one step, which is much more convenient for actual computation.

  • For that, we plug in \mathbf P_{\mathbf Z} \mathbf X for \hat{\mathbf X} in Equation 7.23 and use the fact that \mathbf P_{\mathbf Z} is symmetric and idempotent,
    i.e., \mathbf P'_{\mathbf Z}=\mathbf P_{\mathbf Z} and \mathbf P_{\mathbf Z}\mathbf P_{\mathbf Z}=\mathbf P_{\mathbf Z}

\hat {\boldsymbol \beta}_{IV} \, = (\hat{\mathbf X}' \hat{\mathbf X})^{-1}\hat{\mathbf X}'\mathbf y \, = \, (\mathbf X' \mathbf P'_{\mathbf Z} \mathbf P_{\mathbf Z} \mathbf X)^{-1}\mathbf X' \mathbf P'_{\mathbf Z} \mathbf y \, = \, (\mathbf X' \mathbf P_{\mathbf Z} \mathbf X)^{-1}\mathbf X' \mathbf P_{\mathbf Z} \mathbf y \tag{7.28}

and for the covariance matrix

\operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) \, = \, \sigma^2 (\hat{\mathbf X}'\hat{\mathbf X})^{-1} \, = \, \sigma^2(\mathbf X' \mathbf P'_{\mathbf Z} \mathbf P_{\mathbf Z} \mathbf X)^{-1} \, = \, \sigma^2 (\mathbf X' \mathbf P_{\mathbf Z} \mathbf X)^{-1} \tag{7.29}

  • Using the definition of \mathbf P_{\mathbf Z} and writing out, the resulting terms could obviously be calculated in one step

\hat {\boldsymbol \beta}_{IV} \, = \, \left(\mathbf X' \mathbf Z \, (\mathbf {Z'Z})^{-1} \mathbf Z' \mathbf X \right)^{-1} \mathbf X'\mathbf Z \, (\mathbf {Z'Z})^{-1} \, \mathbf Z' \mathbf y \tag{7.30}

\operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) \, = \, \sigma^2 (\mathbf X' \mathbf Z \, (\mathbf {Z'Z})^{-1} \mathbf Z' \mathbf X)^{-1} \tag{7.31}

An interesting special case arise, if \mathbf X and \mathbf Z have the same number of columns, i.e., k = l+m. In this case the number of rhs endogenous variables equals the number of external instruments and the matrix \mathbf X'\mathbf Z is square, k \times k, and invertible. Then, using the rule (ABC)^{-1}=C^{-1}B^{-1}A^{-1}, \hat {\boldsymbol \beta}_{IV} simplifies to

\hat {\boldsymbol \beta}_{IV} \, = \, (\mathbf Z' \mathbf X)^{-1} \, (\mathbf {Z'Z}) (\mathbf X' \mathbf Z) ^{-1} \mathbf X' \mathbf Z \, (\mathbf {Z'Z})^{-1} \, \mathbf Z' \mathbf y \, = \, (\mathbf Z' \mathbf X)^{-1} \, \mathbf Z' \mathbf y \tag{7.32}

  • This resembles Equation 7.3 and is sometimes called ordinary IV estimator. Note, in this case the model is just or exactly identified

To prove the consistency of \hat {\boldsymbol \beta}_{IV}, we take the solution Equation 7.30 and plug in the model Equation 7.18 for \mathbf y:

\hat {\boldsymbol \beta}_{IV} \, = \, {\boldsymbol \beta} + \left(\mathbf X' \mathbf Z \, (\mathbf {Z'Z})^{-1} \mathbf Z' \mathbf X \right)^{-1} \mathbf X'\mathbf Z \, (\mathbf {Z'Z})^{-1} \, \mathbf Z' \mathbf u \tag{7.33}

Afterwards, we are after dividing and multiplying the cross product terms by n accordingly and take the probability limit.
Note, according to Slutsky’s Theorem (Theorem A.4), \operatorname {plim} g(x) = f(\operatorname {plim}x) for a continuous function g(.)

\operatorname{plim}\left[\hat{\boldsymbol{\beta}}_{IV}\right] = \boldsymbol{\beta} \, + \, \left[\operatorname{plim}\left[\frac{\mathbf{X}' \mathbf{Z}}{n}\right] \ \operatorname{plim}\left[\frac{\mathbf{Z}' \mathbf{Z}}{n}\right]^{-1} \operatorname{plim}\left[\frac{\mathbf Z' \mathbf {X}}{n}\right]\right]^{-1} \times \\ \operatorname{plim}\left[\frac{\mathbf X' \mathbf Z}{n}\right] \ \operatorname{plim}\left[\frac{\mathbf Z' \mathbf Z}{n}\right]^{-1} \operatorname{plim}\left[\frac{\mathbf Z' \mathbf u}{n}\right] \tag{7.34}

  • According to LLN, if \mathbf X and \mathbf Z are “well behaved”, the empirical moment matrices converge to their population (theoretical) moments matrices \mathbf m. Thus we have

\operatorname{plim}\left[\hat{\boldsymbol{\beta}}_{IV}\right] = \boldsymbol{\beta} \, + \, \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf {ZZ}}^{-1} \cdot \operatorname{plim}\left[\frac{\mathbf Z' \mathbf u}{n}\right]

  • The last plim is

\operatorname{plim}\left[\frac{\mathbf Z' \mathbf u}{n}\right] \, = \, \operatorname{plim} \left[\frac{1}{n} \sum_{i=1}^n \mathbf z'_i u_i \right ] \tag{7.35}

  • According to the LLN (Theorem A.2), this average term converges in probability to the expectation of its summands. As we presuppose that \mathbf z_i is (weak) exogenous (MLR.4’) we have

E(\mathbf z'_i u_i) := \mathbf m_{\mathbf {Zu}} = E_z \left(E(\mathbf z'_i u_i \mid \mathbf z_i)\right) = E_z(\mathbf 0) = \mathbf 0 \tag{7.36}

  • Thus, we finally have

\operatorname{plim}\left[\hat{\boldsymbol{\beta}}_{IV}\right] = \boldsymbol{\beta} \, + \, \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf {ZZ}}^{-1} \cdot \underbrace {\mathbf m_{\mathbf {Zu}}}_{\mathbf 0} \, = \, \boldsymbol{\beta} \tag{7.37}

  • This proofs the consistency of the IV/2SLS estimator. However note that with our assumptions, IV/2SLS estimators are generally biased in finite samples, because some elements of \mathbf x_i are not exogenous (for the expectation operation, there is no thing comparable to the Slutsky’s theorem)

We furthermore state that \hat{\boldsymbol{\beta}}_{IV} is asymptotically normal distributed with the asymptotic expectation \boldsymbol \beta and asymptotic covariance matrix

\hat{\boldsymbol{\beta}}_{IV} \ \, \stackrel{a}{\sim} \, \, N \left(\boldsymbol{\beta}, \ \sigma^2 \frac{1}{n}\left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \, \mathbf m_{\mathbf {ZX}} \right]^{-1}\right) \tag{7.38}

\sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta}) \ \stackrel{d} \longrightarrow \ \left[\operatorname{plim}\left[\frac{\mathbf{X}' \mathbf{Z}}{n}\right] \ \operatorname{plim}\left[\frac{\mathbf{Z}' \mathbf{Z}}{n}\right]^{-1} \operatorname{plim}\left[\frac{\mathbf Z' \mathbf {X}}{n}\right]\right]^{-1} \times \\ \operatorname{plim}\left[\frac{\mathbf X' \mathbf Z}{n}\right] \ \operatorname{plim}\left[\frac{\mathbf Z' \mathbf Z}{n}\right]^{-1} \times \ \stackrel{d} \longrightarrow \ \left[\frac{\mathbf Z' \mathbf u}{ \textcolor {red}{\sqrt n} }\right] \ \ \ \ \Rightarrow

\sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta}) \ \stackrel{d} \longrightarrow \ \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf {ZZ}}^{-1} \ \times \ \stackrel{d} \longrightarrow \ \left[\frac{\mathbf Z' \mathbf u}{\sqrt n}\right] \tag{7.39}

Thus, once again, the last term is the important one. In Section 4.3.1, we discussed that such a term converges in distribution under quite similar conditions as we had with the LLN to a normally distributed random vector with expected value \mathbf 0 and covariance matrix \sigma^2 \mathbf m_{\mathbf {ZZ}}, provided that E(u_i \mid \mathbf z_i)=0 and \operatorname {Var} (\mathbf u) = \sigma^2 \mathbf I – CLT, see Theorem A.3.

  • So, after applying the covariance matrix formula from Equation C.7, we get the covariance matrix of \sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta})

\operatorname{Var}(\sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta})) \ =

\left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \, \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf {ZZ}}^{-1} \left(\sigma^2 \mathbf m_{\mathbf {ZZ}}\right) \mathbf m_{\mathbf {ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \, \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \ = \\

\sigma^2 \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \tag{7.40}

  • From the limiting distribution of \sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta}) we immediately get the asymptotic distribution of \hat{\boldsymbol{\beta}}_{IV}

\hat{\boldsymbol{\beta}}_{IV} \ \, \stackrel{a}{\sim} \, \, N \left(\boldsymbol{\beta}, \ \sigma^2 \frac{1}{n}\left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \, \mathbf m_{\mathbf {ZX}} \right]^{-1}\right) \tag{7.41}

  • And a consistent estimator of the asymptotic covariance matrix is

\widehat {\operatorname{Asy.Var}}(\hat{\boldsymbol{\beta}}_{IV}) \ = \ \hat\sigma^2 \left [ \mathbf {X'Z} \ (\mathbf {Z'Z})^{-1} \mathbf {Z'X} \right]^{-1} \ = \ \hat\sigma^2 (\mathbf X' \mathbf P_{\mathbf Z} \mathbf X)^{-1} \tag{7.42}

  • Note, \hat \sigma^2 has to be calculated with residuals according to Equation 7.13

  • If \mathbf {X'Z} is “small”, i.e., the correlation between \mathbf {X} and \mathbf {Z} is small, then the inverse in the covariance matrix formula becomes “large” and we get large standard errors for the estimated coefficients – weak instrument problem


7.6 Identification

For a basic introduction to identification problems, see, Section 2.3.1.1.

But let us now discuss this issue by means of the first stage Equation 7.15, in particular, why we need external instruments to estimate \beta_1 in Equation 7.14

  • Suppose we have no external instrument z_i (the corresponding \gamma_j in the first stage regression Equation 7.15 are 0) and regress educ only on the internal exogenous variables exper and exper^2 in the first step. Then, \widehat {educ} would be a perfect linear combination of exper and exper^2

    • However, these two variables are already present in the second stage Equation 7.16; this would generate a perfect collinearity between exper, exper^2 and \widehat {educ} in Equation 7.16, rendering it impossible to disentangle the effects of exper and exper^2 on the one hand and \widehat {educ} on the other hand. The coefficients \beta_1, \beta_2 and \beta_3 are thus not identified (not estimable) in this case
  • As a general rule, identification of the model equation requires that for every rhs endogenous variable we must have at least one distinct external instrument which is also relevant, i.e., the corresponding \gamma_j in first stage equation \ne 0 (rank condition)

  • The variables, which act as external instruments, must not be part of the model, Equation 7.14. This is called exclusion restrictions

  • If we have more than one external instrument per endogenous variable, the model is overidentified – which is often a good thing as we will see later. Otherwise, the model is just or exactly identified


7.6.1 Tests for weak instruments

Above, we explained why we need at least one external instrument per endogenous rhs variable

  • However, even if this condition is met (the corresponding \gamma_j in the first stage regression is \ne 0), it could be that the model is only barely identified. This is the case, if the conditional correlations between the external instrument and the endogenous rhs variable is low. In this instance, we have a weak instrument problem; fortunately, we can test for this circumstance

Weak instrument test: The first stage regression could be used to test for the relevance of the instruments

  • In our example, with Equation 7.15, we can carry out an F-test to examine, whether \gamma_3 and \gamma_4, the coefficients of fatheduc and matheduc, are jointly zero

    • Monte Carlo simulations by Staiger and Stock (1997) show that a F-statistic less than 10 indicates a weak instrument problem. With heteroskedasticity in either the first or second stage equations the F-statistic should be more in the range of 20 and above
  • If we have more than one rhs side endogenous variable, things are more complicated; even if we have good F-statistics in every first stage regression, it is not guarantied that we have at least one distinct and relevant external instrument for each endogenous variable. We have to use specialized tests like the Cragg and Donald (1993) or Anderson (1984) tests which are based on the smallest canonical correlation of the rhs endogenous variables and the external instruments (conditioned on the other \mathbf x)


Example – Testing for relevance of instruments

We are testing the relevance of the instruments using the first step regression, Equation 7.15, i.e., regressing educ on the corresponding external instrument(s) and all other exogenous variables

  • It is the partial effect of the external instrument(s) that matters!
rel1 <-   lm(educ ~ fatheduc  +  exper + I(exper^2) + black + smsa + south, data=card)
rel2 <-   lm(educ ~ motheduc  +  exper + I(exper^2) + black + smsa + south, data=card)
rel12 <-  lm(educ ~ fatheduc  + motheduc  +  exper + I(exper^2) + black + smsa + south, data=card)
rel3 <-   lm(educ ~ nearc4    +  exper + I(exper^2) + black + smsa + south, data=card)

Code
modelsummary( list("fatheduc"=rel1, "motheduc"=rel2, 
                   "fatheduc+motheduc"=rel12, "nearc4"=rel3),
              output="gt", 
              statistic = "statistic",
              gof_omit = "A|B|L|F", 
              align = "ldddd", 
              stars = TRUE, 
              fmt = 4,
              coef_map = c("fatheduc", "motheduc", "nearc4",
                           "exper", "I(exper^2)", "black", "smsa", "south")
              )
Table 7.2:

Tests for weak instruments, several external instruments (t-statistics in brackes. Note, the F-statistic for a single variable is the square of the t-statistic)

fatheduc motheduc fatheduc+motheduc nearc4
fatheduc     0.1728***     0.1128***
  (14.3035)      (7.7539)  
motheduc     0.1879***     0.1297***
  (14.5996)      (7.6118)  
nearc4     0.3373***
   (4.0887)  
exper    -0.3808***    -0.3754***    -0.3780***    -0.4100***
 (-10.0625)    (-10.7007)     (-9.8671)    (-12.1686)  
I(exper^2)     0.0019        0.0009        0.0024        0.0007   
   (1.0098)      (0.5376)      (1.2250)      (0.4438)  
black    -0.4789***    -0.6015***    -0.3543**     -1.0061***
  (-4.1516)     (-5.9888)     (-2.9719)    (-11.2235)  
smsa     0.3882***     0.3818***     0.3509***     0.4039***
   (4.3106)      (4.5528)      (3.8408)      (4.7578)  
south    -0.1426+      -0.2211**     -0.1211       -0.2915***
  (-1.6511)     (-2.7157)     (-1.3852)     (-3.6790)  
Num.Obs.  2320         2657         2220         3010       
R2     0.477         0.492         0.482         0.474    
RMSE     1.89          1.89          1.86          1.94     
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
  • The t-statistics of fatheduc and motheduc are > 14 hence, a weak instrument problem can be ruled out for these instruments

F-test, whether both fatheduc and motheduc together are 0
  • F-statistic should be at least 10 to rule out a weak instrument problem

  • In the case of heteroscedasticity, the F-statistic should be at least 20 to rule out a weak instrument problem

lht(rel12, c("fatheduc=0", "motheduc=0"))
      Linear hypothesis test
      
      Hypothesis:
      fatheduc = 0
      motheduc = 0
      
      Model 1: restricted model
      Model 2: educ ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa + 
          south
      
        Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
      1   2214 8594.3                                  
      2   2212 7704.2  2    890.12 127.78 < 2.2e-16 ***
      ---
      Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • The F-test (F > 100) clearly rules out that fatheduc and motheduc together are weak instruments

The t-statistic for nearc4 is about 4, which is suspicious in the case of heteroscedasticity. We therefore additionally test for heteroscedasticity – applying the Breusch-Pagan test, see Section 6.3.

bptest(rel3)
      
        studentized Breusch-Pagan test
      
      data:  rel3
      BP = 92.185, df = 6, p-value < 2.2e-16
  • The Bresch-Pagan test overwhelmingly reject homoscedasticity, hence nearc4 might be only a weak instrument

Another example

In this example we show what devastating effects a weak instrument can have.

  • We want to investigate the relationship between the weight of newborns (bwght) and smoking (packs)

  • However, we suppose some common relationships between unobserved genetic factors (in u) for bwght and smoking

  • Thus, besides OLS, we try an IV estimator

library(wooldridge); data("bwght")

# OLS estimation 
bwols <- lm(bwght ~ packs, data=bwght)
summary(bwols)
      
      Call:
      lm(formula = bwght ~ packs, data = bwght)
      
      Residuals:
          Min      1Q  Median      3Q     Max 
      -96.772 -11.772   0.297  13.228 151.228 
      
      Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
      (Intercept) 119.7719     0.5723 209.267  < 2e-16 ***
      packs       -10.2754     1.8098  -5.678 1.66e-08 ***
      ---
      Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
      
      Residual standard error: 20.13 on 1386 degrees of freedom
      Multiple R-squared:  0.02273, Adjusted R-squared:  0.02202 
      F-statistic: 32.24 on 1 and 1386 DF,  p-value: 1.662e-08
  • The variable packs has the expected negative sign

  • However, we conjecture that packs might be endogenous (this is a choice variable, which are always suspicious for endogeneity)

  • So, we need an instrument for smoking (packs)

    • We take cigprice (for prices of cigarettes) as instrument

      1. cigprice should have no direct effect on bwght (conditional on packs) - exclusion restriction

      2. cigprice should be correlated with packs - some relevance

      3. Furthermore, cigprice is for sure unrelated to unobserved individual genetic factors contained in u - exogenous


  • The corresponding first stage regression is
# First stage regression
first <-  lm(packs ~ cigprice, data=bwght)
coeftest(first)
      
      t test of coefficients:
      
                    Estimate Std. Error t value Pr(>|t|)
      (Intercept) 0.06742568 0.10253837  0.6576   0.5109
      cigprice    0.00028288 0.00078297  0.3613   0.7179
  • As we see, the t-statistic is very low indicating that we should not use ciprice as instrument for packs ==> no relevance

  • But what happens if we do it nonetheless?
# IV estimation 
second <- ivreg(bwght ~ packs | cigprice, data=bwght)
summary(second)
      
      Call:
      ivreg(formula = bwght ~ packs | cigprice, data = bwght)
      
      Residuals:
          Min      1Q  Median      3Q     Max 
      -856.32   15.35   33.35   47.35  188.35 
      
      Coefficients:
                  Estimate Std. Error t value Pr(>|t|)
      (Intercept)    82.65     104.63   0.790     0.43
      packs         345.47    1002.19   0.345     0.73
      
      Residual standard error: 108.2 on 1386 degrees of freedom
      Multiple R-Squared: -27.22,   Adjusted R-squared: -27.24 
      Wald test: 0.1188 on 1 and 1386 DF,  p-value: 0.7304
  • As we see, we have an unexpected sign with an absurd high estimate for packs and an extreme high standard error for packs (very low t-value). (Note, the R2 has no natural interpretation for IV/2SLS estimations as we have no orthogonality property with these estimators)

  • Apparently, IV estimates with very weak instruments can yield much more unreliable results than OLS


7.7 Testing for endogeneity - Hausman-Wu test

  • Sometimes, it is not clear whether some regressors are correlated with the error term u_i, i.e., if they are actually endogenous, and whether we need an IV estimator. Thus, a test for the appropriateness of OLS is desirable.

  • A practical problem is that u_i is not observable and that the observed OLS residuals \hat u_i are always uncorrelated with all regressors; orthogonality property of OLS: \mathbf X' \hat {\mathbf u} = \mathbf 0

  • The Hausman test provides the following test idea for the problem at hand:

    • If there is no endogeneity problem in the model, OLS is consistent (and efficient), but the same is also true for IV/2SLS estimators with regard to consistency (but obviously not for efficiency). Therefore, in this case, the parameter estimates of both procedures should converge to the same true parameter values

    • If, on the contrary, there is an endogeneity problem, only IV/2SLS would be consistent

    • Thus, under the null hypothesis of no endogeneity problem, the OLS estimator \boldsymbol \beta_{OLS} and the IV/2SLS estimator \boldsymbol \beta_{IV} should not differ too much (only because of sampling errors)

    • If, on the other hand, the H_0 is false, we would expect that the two estimators differ more than sampling errors would suggest


  • A natural test, Hausman test, would therefore look whether the difference between the two estimators is too large. This test can be cast in terms of a usual Wald statistic, (compare Equation 3.13)

W = \mathbf {d}' \{ { \operatorname{Var}}(\mathbf{d}) \}^{-1} \mathbf{d} , \ \ \ \text{ with } \ \mathbf d = (\hat {\boldsymbol \beta}_{OLS} - \hat {\boldsymbol \beta}_{IV})

  • However, the difficulty with this test statistic is that the covariance matrix of \mathbf d, { \operatorname{Var}}(\mathbf{d}), which, asymptotically, can be shown to be \left[ \operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) - { \operatorname{Var}}(\hat {\boldsymbol \beta}_{OLS}) \right], is not of full rank and thus has no inverse (you would have to rely on a generalized inverse)

  • Fortunately, there is a much more simpler and equivalent variant of this test, the Hausman-Wu test. This test is a two step procedure (but not 2SLS, of course):

    1. We estimate the usual first stage (reduced form) regression. The endogenous rhs variable is \tilde x_j

    \tilde {x}_j = \underbrace {\hat \gamma_0 + \hat \gamma_1 x_1 + \cdots + \hat \gamma_{l} x_{k-1} \, + \, \hat \gamma_{l+1}z_1 + \cdots + \hat \gamma_{l+m}z_m}_{\hat {x}_j} + \hat e \tag{7.43}

    1. In the second stage we estimate the original model, Equation 7.8, by OLS, but with the errors \hat e of the first equation as an additional variable, and test whether the coefficient of \hat e, \, \hat \delta, is zero

y = \beta_0 + \beta_1 x_1 + \cdots + \beta_j \tilde x_j + \textcolor {red} {\delta \hat e} + u \tag{7.44}


  • Why does the Hausman-Wu procedure work?

    • In the first stage regression, Equation 7.43, the endogenous variable \tilde x_j is regressed only on exogenous variables, so \hat x_j is uncorrelated with u by assumption. Thereby, \tilde x_j is decomposed in an exogenous part, \hat x_j, and in a possibly endogenous part, \hat e

      • Thus, if \tilde x_j is correlated with the error u of the original equation, this correlation must be due to the the residuals of Equation 7.43, \hat e

      • With other words, \tilde x_j is endogenous (correlated with u of the original model) if and only if the residuals \hat e of the first stage regression are correlated with u

    • Therfore, we can test for endogeneity of \tilde x, by adding \hat e to the original model. If we cannot reject the H_0: \delta=0, then there is no convincing evidence for an endogeneity problem and we should use OLS – as OLS is much more efficient. Otherwise we need IV/2SLS

  • This test also works, if we have more than one rhs endogenous variable. In this case, we simply estimate a reduced form equation like Equation 7.43 for every endogenous variable and plugging the residuals of all these equations into Equation 7.44 as additional variables. Then, we use an F-test to test whether all these added residuals are jointly insignificant

  • Note, if \delta = 0, the estimated coefficients of Equation 7.44 are exactly the same as the OLS estimates of the original model

  • If \delta \neq 0, the estimated coefficients of Equation 7.44 are exactly the same as the 2SLS estimates of the original model, which is not so obvious


To show the logic of this test more formally, we state the original model, with \tilde {x} denoting the possible rhs endogenous variable and \mathbf x the vector of exogenous explanatory variables

y = \tilde {x}\beta_1 + \mathbf x \boldsymbol \beta_2 + u \tag{7.45}

  • As \tilde {x} is possible endogenous, we apply 2SLS

    • The first step regression with the vector of exogenous external instruments \mathbf z is

    \tilde {x} = \underbrace {\mathbf x \hat {\boldsymbol \gamma}_1 + \mathbf z \hat {\boldsymbol \gamma}_2}_{\hat x} + \hat e \tag{7.46}

    • Substituting Equation 7.46 into Equation 7.45, we reach to the second step regression. Applying OLS to Equation 7.47 estimates \beta_1 and \boldsymbol \beta_2 consistently as 2SLS y = \hat { x} \hat {\beta}_{1_{2SLS}} + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} + \hat v \tag{7.47}

    • Note that v = u + \beta_1 \hat e

  • Now the trick; we add the residuals of the first stage regression, Equation 7.46, \hat e, to the second stage regression, Equation 7.47, as additional variable. But by construction (orthogonality property of OLS), \hat e is orthogonal (uncorrelated) to every explanatory variable in Equation 7.47. Thus, adding \hat e as additional variable in Equation 7.47 does not alter the 2SLS estimates of \beta_1 and \boldsymbol \beta_2 in Equation 7.47 (adding orthogonal regressors doesn’t change the the estimated parameters, compare Section 2.9.2)

    y = \hat {x} \hat {\beta}_{1_{2SLS}} + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} + \hat \alpha \hat e + \hat v \tag{7.48}

    • Side note: As v = u + \beta_1 \hat e, \hat \alpha should converge to \beta_1, if \hat e and u are uncorrelated. Therefore, another variant of the Hausman-Wu test would be to test the equality of \hat \alpha and \hat \beta_1 in Equation 7.48

  • Finally, we exploit the identity \tilde x = \hat x + \hat e form the first stage regression, Equation 7.46, to reparameterize Equation 7.48: We replace \hat x by (\tilde x - \hat e) in Equation 7.48 and arrive to

y = (\tilde x - \hat e) \hat {\beta}_{1_{2SLS}} + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} + \hat \alpha \hat e + \hat v \ \ \ \Rightarrow

y = \tilde x \hat {\beta}_{1_{2SLS}} + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} + \underbrace {(\hat \alpha - \hat \beta_{1_{2SLS}})}_{\hat \delta} \hat e + \hat v \tag{7.49}

  • Note, as this is only reparameterization (no additional or lost information), the OLS estimates of Equation 7.49 remain unaffected

Equation 7.49 is the same as Equation 7.44, the equation for the Hausman-Wu test.

  • Pay attention that the OLS estimates of this equation actually deliver the 2SLS estimates of \beta_1 and \boldsymbol \beta_2

  • Furthermore, Equation 7.49 is basically an OLS model, as the variable in question, \tilde x, is included in its (questionable) original form

    • If the estimated coefficient \hat \delta of \hat e is zero, i.e., \hat \alpha = \hat \beta_{1_{2SLS}}, we actually get the OLS estimates of \beta_1 and \boldsymbol \beta_2, i.e., 2SLS is not necessary. And this is the substance of the Hausman-Wu test!
  • The formulation in Equation 7.49 therefore shows the equivalence of the Hausman-Wu test with the original Hausman test:

    • The more important the term \hat e in Equation 7.49, the more the 2SLS estimates will differ from the OLS estimates

    • And this was the original test idea of the Hausman test

Final Remark: If we use \hat x instead of \hat e in Equation 7.44 and Equation 7.49, we will end up with the very same test results and estimates

y = \tilde x (\hat \beta_{1_{2SLS}} + \hat \delta) + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} - \underbrace {(\hat \alpha - \hat \beta_{1_{2SLS}})}_{\hat \delta} \hat x + \hat v \tag{7.50}


7.8 Testing overidentifying restrictions – Sargan test

At the beginning of this chapter we claimed that we have to assume the exogeneity of the instruments and that it is not possible to test whether they are valid, i.e., uncorrelated with u from the main equation. That is true for exactly (just) identified models

  • If we have more external instruments than needed to identify the model, the model is overidentified. This can improve efficiency but moreover can be exploited for a test of the validity of the instruments

  • Such a test for the validity of the instruments is the Sargan test or Hansen’s J -test

  • The main idea of the Sargan test is as following:

    • If we have more external instrument than needed for identification, we can calculate several different 2SLS estimates of \boldsymbol \beta using different set of instruments

    • If all the instruments are valid, the different 2SLS estimates of \boldsymbol \beta would all be consistent and converge to the same true values of \boldsymbol \beta

    • Hence, for a specific sample, the difference of the different estimates should not be larger than expected from sampling errors. If they do, there is something wrong with the instruments

  • It can be shown that this test can be carried out with a quite simple auxiliary equation approach described below


  • The starting point is the second step regression (2SLS) of a structural equation like the following

y = \mathbf x \hat {\boldsymbol \beta}_{_{2SLS}} + \hat {x}_j \hat {\beta}_{j_{2SLS}} + \underbrace {\widehat {(u + \beta_j \hat e)} }_{\hat v} \tag{7.51}

  • Thereby, \hat x_j was obtained by a first step regression of the endogenous \tilde x_j on the exogenous model variables \mathbf x (internal instruments) and on several external instruments \mathbf z

\tilde x_j = \mathbf x \hat{\boldsymbol \gamma}_1 + \mathbf z \hat{\boldsymbol \gamma}_2 + \hat e \tag{7.52}

  • We presuppose that \hat x_j was estimated in this first step with more than one external instrument. Hence, the model is overidentified

  • Subsequently, we estimate the following auxiliary equation, which is the Sargan test equation

    \hat u = \mathbf x \boldsymbol \delta_1 + \mathbf z \boldsymbol \delta_2 + \epsilon \tag{7.53}

  • Now we test whether the 2SLS residuals \hat u are actually uncorrelated with all exogenous variables, in particular with the external instruments \mathbf z

    • If \mathbf x and \mathbf z are actually exogenous, the fit of Equation 7.53, measured with by n \cdot R^2, should be zero (besides sampling errors)

    • The test statistic n \cdot R^2 of this equation is \chi^2(m) distributed, m being the number of overidentifying restrictions, i.e., the number of external instruments minus the number of rhs endogenous variables

  • If we reject the H_0 (all instruments are valid), then at least one external instrument is not valid, i.e., not exogenous and therfore erroneously excluded from the main equation


Why does this test only work for overidentified models?

This test only works if we have more external instruments than rhs endogenous variables

  • Suppose not and we have one rhs endogenous variable and only one external instrument z_1

    • Then \hat x from the first step equation is (besides \mathbf x) simply a multiple to the one external instrument z_1

    • In this case one can show that z_1 is orthogonal to \hat u, the 2SLS residuals (conditional on \mathbf x); for a proof, see footnote 5

      • As the exogenous variables in \mathbf x are also orthogonal to \hat u, it follows that all parameters \boldsymbol \delta of the test Equation 7.53 are zero and the R^2 is always zero as well. Hence, a test for correlation with \hat u is not possible in this case
  • If we have more external instruments than rhs endogenous variables, than \hat x is not a simple multiple of one instrument but equals a particular linear combination of several z_j. Thus, the single z_j are not automatically orthogonal to \hat u and therefore, a test for correlation with \hat u is possible 6


  • Rejecting the H_0 does not tell us which external instrument is invalid

    • However, if we have at least more than one overidentifying restriction, we can infer whether certain subgroups of instruments are valid. In this case, we estimate the model with only a subgroup of external instruments (in which we have more confidence) and calculate the Saragn test statistic for this subgoup

    • Afterwards, we estimate the model with all external instruments and calculate the Sargan test statistic, which is usually larger than the previous one. The difference of these two test statistics (which is \chi^2(m-m_1)) should not differ too much. If they do, the compliment set of the trustworthy subgroup is invalid; – J-Diff-test

  • A problem with the Sargan test is that the power of the test (see Section 3.4) could be quite low, especially if the external instruments have a common source or are highly correlated

    • For instance, the 2SLS estimates obtained by two different instruments could be very similar, even if both instruments are invalid

    • Therefore, if we are not able to reject the H_0 of the Sargan test, we should not rely too much on this result, especially if the external instruments are highly correlated

In the following more formal section we will show more clearly the connection of this test with overidentifying restrictions and introduce General Methods of Moments (GMM) estimators


7.8.1 GMM estimators

At the heart of IV/2SLS estimation is the assumption that the instruments (internal + external) \mathbf z are exogenous. This assumption can be cast in so called moment restrictions:

E(\mathbf z_i' u_i) = \mathbf 0 \ = \ \left[ \begin{array} {c} E(z_{1,i} u_i)=0 \\ E(z_{2,i}u_i)=0 \\ \vdots \\ E(z_{k,1}u_i)=0 \end{array} \right] \tag{7.54}

  • First, we presuppose that \mathbf z hat k elements, i.e., we have as many instruments than variables in the structural model we want to estimate, so that the model is exactly (just) identified

  • Replacing this theoretical (population) moments by its sample counterparts we get

\frac{1}{n}\sum_{i=1}^n {\mathbf z}_i' \hat u_i = \mathbf 0 \ = \ \left[ \begin{array} {c} \frac{1}{n}\sum_{i=1}^n z_{1,i} \hat u_i =0 \\ \frac{1}{n}\sum_{i=1}^n z_{2,i} \hat u_i =0\\ \vdots \\ \frac{1}{n}\sum_{i=1}^n z_{k,i} \hat u_i = 0 \end{array} \right] \ = \ \frac{1}{n} \mathbf Z' \hat {\mathbf u} \ = \ \mathbf 0 \tag{7.55}


  • Substituting the structural model \mathbf y - \mathbf X \hat {\boldsymbol \beta} for \hat {\mathbf u} we get

\frac{1}{n}\sum_{i=1}^n \mathbf z_i' (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \ = \ \left[ \begin{array} {c} \frac{1}{n}\sum\nolimits _{i=1}^n z_{1,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{2,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \vdots \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{k,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \end{array} \right] \ = \ \frac{1}{n} \mathbf Z' (\mathbf y - \mathbf X \hat {\boldsymbol \beta} ) \ = \ \mathbf 0 \tag{7.56}

  • So, this clearly shows that we have k equations to determine the k parameters \beta_1,\ldots,\beta_k (without loss for generality, we assume demeaned variables, so we need no intercept \beta_0)

  • Multiplying out Equation 7.56 and solving for \hat {\boldsymbol \beta} we finally arrive to the (ordinary) IV estimator from Equation 7.32 (Section 7.5.2) as a methods of moment estimator

\frac{1}{n}\mathbf Z' \mathbf y = \frac{1}{n}\mathbf Z' \mathbf X \hat{\boldsymbol \beta} \ \ \Rightarrow \ \ \hat {\boldsymbol \beta}_{IV} = (\mathbf Z' \mathbf X )^{-1}\mathbf Z' \mathbf y \tag{7.57}

  • The analysis above describes the case for an exactly (just) identified model as we have k equations in k parameters (number of rhs endogenous variables equals the number of external instruments)

But what happens if the model is overidentified, meaning that we have more instruments as variables?


  • In this case \mathbf z_i has k+m elements, m being the number of overidentifying restrictions. Hence, we have more equations than parameters in \boldsymbol \beta – the model is overdetermined

\frac{1}{n}\sum_{i=1}^n \mathbf z_i' (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \ = \ \left[ \begin{array} {c} \frac{1}{n}\sum\nolimits _{i=1}^n z_{1,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{2,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \vdots \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{k,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \vdots \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{k+m,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \end{array} \right] \ = \ \frac{1}{n} \mathbf Z' (\mathbf y - \mathbf X \hat {\boldsymbol \beta}) = \ \mathbf 0 \tag{7.58}

  • We have m excess equations

  • Multiplying out Equation 7.58 we once again get

\frac{1}{n}\mathbf Z' \mathbf y = \frac{1}{n}\mathbf Z' \mathbf X \hat {\boldsymbol \beta} \tag{7.59}

  • But this time, \mathbf Z' \mathbf X is a \left( (k+m), k \right) matrix and thus, not quadratic any more! Therefore, a usual inverse of this matrix does not exists and so we cannot solve this system for {\hat {\boldsymbol \beta}}

  • Because we have m excess equations, basic mathematics knowledge tells us that there exists no \hat {\boldsymbol \beta} with k elements so that k+m linear independent equations can be jointly satisfied

  • So, how to solve for \hat {\boldsymbol \beta} in this case?

    • The key idea for this problem is to search for a \hat {\boldsymbol \beta} which approximately satisfies the k+m linear equations in a best manner, i.e., a weighted sum of the squared moment restrictions from Equation 7.58 should be as close as possible to zero, but not exactly zero. This procedure is called Generalized Methods of Moment (GMM)

    • Hence, to estimate {\boldsymbol \beta}, we minimize a weighted sum of the squared sample moments, with weights given by a positive definite and symmetric weighting matrix \mathbf W. We call the resulting quadratic form J

      \underset{\hat{\beta}}{\operatorname {min}} \ J := \ n \cdot \frac {1}{n}(\mathbf y - \mathbf X \hat {\boldsymbol \beta})'\mathbf Z \, \mathbf W \, \mathbf Z' (\mathbf y - \mathbf X \hat {\boldsymbol \beta})\frac {1}{n} \tag{7.60}

      Remark: We multiply the quadratic form by n for testing purposes as otherwise J would converge to zero (but this plays no role for optimization)
      (Remark: Multiplying Equation 7.59 with a left sided pseudo inverse of \mathbf Z' \mathbf X is equivalent to minimize J with \mathbf W = \mathbf I – we would have an unweighted sum of the sample moments in this case)

  • The solution to this minimization problem is obtained by setting the first derivative of J with respect to \hat {\boldsymbol \beta} to zero and solving the resulting matrix equation for \hat {\boldsymbol \beta}:

\hat {\boldsymbol \beta}_{GMM} = (\mathbf X' \mathbf Z \mathbf W \mathbf Z' \mathbf X)^{-1} \mathbf X' \mathbf Z \mathbf W \mathbf Z' \mathbf y \tag{7.61}


  • Generalized Method of Moments estimator are generally consistent. And this particular estimator \hat {\boldsymbol \beta}_{GMM} is consistent, regardless of our choice of \mathbf W (if the moment restrictions from Equation 7.58 are true, of course)

  • But we want an optimal weighting matrix to minimize the variance of the estimated parameters – we want an efficient estimator

  • It turns out that this optimal weighting matrix is proportional to an estimate of the inverse of the asymptotic covariance matrix of the moment restrictions. For homoskedastic errors this is

\mathbf W = [E(\mathbf z_i' u_i u_i' \mathbf z_i)]^{-1} = [E_z(E(u_i^2\mathbf z_i' \mathbf z_i \, | \, \mathbf z_i))]^{-1} = [{\sigma^2} E_z(\mathbf z_i' \mathbf z_i)]^{-1} \ \ \Rightarrow

\widehat{\mathbf W} = \left( {\hat\sigma^2} \frac{1}{n} \sum_{i=1}^n \mathbf z_i' \mathbf z_i\right)^{-1} = \frac {n}{\hat \sigma^2} (\mathbf Z' \mathbf Z)^{-1} \tag{7.62}

  • Plugging \widehat {\mathbf W} into Equation 7.61 we get the efficient GMM estimator for homoscedastic errors

\hat {\boldsymbol \beta}_{EGMM} = (\mathbf X' \mathbf Z (\mathbf Z' \mathbf Z)^{-1} \mathbf Z' \mathbf X)^{-1} \mathbf X' \mathbf Z (\mathbf Z' \mathbf Z)^{-1} \mathbf Z' \mathbf y \tag{7.63}


  • All results so far are resting on the truth of the assumptions E(\mathbf z_i' u_i)=\mathbf 0

  • However, these assumptions can be tested if we have overidentifying restrictions, i.e., more equations than variables in the system of Equation 7.58

  • This test is based on J, with the optimal weighting matrix \hat {\mathbf W} for homoscedastic errors plugged in

\hat J = \dfrac {(\mathbf y - \mathbf X \hat{\boldsymbol \beta})' \mathbf Z \, (\mathbf Z' \mathbf Z)^{-1} \, \mathbf Z' (\mathbf y - \mathbf X \hat {\boldsymbol \beta}) } {\hat \sigma^2}

  • If the model is just identified, \hat {\boldsymbol \beta} is solving the system equations of Equation 7.56 exactly and \hat J is always zero – no test possible

  • If we have more equations than variables, the system of equations can only be approximately solved by \hat {\boldsymbol \beta}. This is even the case, if all moment restrictions E(\mathbf z_i' u_i)=\mathbf 0 are true. However, the larger \hat J, the more likely it is that for some or even all moment restrictions E(\mathbf z_i' u_i)\ne\mathbf 0, i.e., are not true

  • Hence, a test procedure for the validity of the overidentifying restrictions (and thus for the validity of the instruments), is to look whether \hat J is too large – larger than sampling errors would suggest


  • Substituting for (\mathbf y - \mathbf X \hat {\boldsymbol \beta}) = \hat {\mathbf u} in Equation 7.60 we get

\hat J \, = \, \dfrac {\hat {\mathbf u}' \mathbf Z \, (\mathbf Z' \mathbf Z)^{-1} \, \mathbf Z' \hat {\mathbf u} } {\hat \sigma^2} \, = \, \dfrac {\hat {\mathbf u}' \mathbf P_{\mathbf Z} \, \hat {\mathbf u} } {\hat \sigma^2} \, = \, n \dfrac { ( \mathbf P_{\mathbf Z} \hat {\mathbf u})' (\mathbf P_{\mathbf Z} \, \hat {\mathbf u}) } {\hat {\mathbf u}' \hat {\mathbf u}} \tag{7.64}

  • Here, \mathbf P_{\mathbf Z} is the projection matrix (Hat matrix, see Equation C.11), which is idempotent and projects \hat {\mathbf u} into the linear subspace of \mathbb{R}^n spanned by the columns of \mathbf Z

  • Hence, \mathbf P_{\mathbf Z} \hat {\mathbf u} are the predicted values of a regression of the 2SLS-residuals \hat {\mathbf u} on all instruments in \mathbf Z (compare the Sargan test, Equation 7.53. Therefore, the numerator is the sum of the squares (the scalar product) of these predicted values, i.e., the SSE of this regression (Equation 7.53)

  • The denominator is the sum of the squared 2SLS-residuals, hence SST of \hat {\mathbf u}

Thus, \hat J is n times the R^2 of a regression of the squared 2SLS-residuals on all instruments; remember, R^2 := \frac {SSE}{SST}.
If n \cdot R^2 is too large, at least one (or even all) instrument is not exogenous

  • This J-statistic can be shown to be asymptotically \chi^2(m) distributed, with m being the number of overidentifying restrictions and is identical to the Sargan test procedure described in the text above (see Equation 7.53)

7.9 Summary

  • With IV/2SLS estimation techniques we can handle the problem of endogenous rhs variables, which is a widespread phenomenon

  • The drawback of IV/2SLS estimates are the generally much larger standard errors of the estimated parameters

    • Therefore, one should always use OLS if this is justifiable. The Hausman-Wu test can give some indication for this matter
  • The drawbacks of IV/2SLS are particularly present if we have only weak instruments. Thus, a test for weak instruments as described above is mandatory for a credible analysis

  • Furthermore, if we have an overidentified model, a Sargan test (J-test) is mandatory as well; the whole IV/2SLS procedure is grounded on valid instruments. If only one instrument is not valid, the entire analysis breaks down

  • Corrections for heteroskedasticity / serial correlation analogous to OLS and IV/2SLS easily extends to time series and panel data situations

  • In the following example, we once again estimate our wage equation by 2SLS (as we did at the begin of this chapter), but this time we additionally carry out the diagnostic tests described above. Fortunately, the R procedure ivreg does most of the work for us


7.10 Example – 2SLS with diagnostics

  • 2SLS estimation with father and mother education as instrument for education

  • We have two external instruments, thus the model is overidentified

library(wooldridge); library(AER); library(texreg)
data("card")

# Complication: Because of missing values in `fatheduc` and `motheduc`,
# which would make some problems when we carry out some test by hand

# We generate a new data set `card1` with missing values of `fatheduc` and `motheduc` excluded 
card1 <-  subset(card, card$fatheduc>-1 & card$motheduc>-1)


# 2SLS estimation
iv12 <-  ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | 
                fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, 
               data=card1)


# Saving 2SLS residuals
resid_iv <- iv12$residuals

# To get the three described diagnostic tests for 2SLS, we have to set  
# the option "diagnostics=TRUE" 
# summary(iv12, diagnostics = TRUE)

Code
# Modifications for modelsummery to print Diagnostic statistics for ivreg

library(broom)
glance_custom.ivreg <- function(x, ...) {
  Dia <- " " 
  WI <- paste0( sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[1,3], 2 ), fmt = '%4.2f'), " [", 
                sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[1,4], 3 ), fmt = '%4.3f'), "]" )
  WU <- paste0( sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[2,3], 2 ), fmt = '%4.2f'), " [", 
                sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[2,4], 3 ), fmt = '%4.3f'), "]" )
  SA <- paste0( sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[3,3], 2 ), fmt = '%4.2f'), " [", 
                sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[3,4], 3 ), fmt = '%4.3f'), "]" )
  out <- data.frame( "Diagnostics" = Dia,
                     "Weak Instr" = WI, 
                     "Hausman WU" = WU,  
                     "Sargan" = SA )
  return(out)
}
Code
#summary(iv12, diagnostics = TRUE)

library(modelsummary)
modelsummary(list( "2SLS" = iv12 ),
             shape =  term ~ statistic,
             statistic = c('std.error', 'statistic', 'p.value', 'conf.int'), 
             stars = TRUE, 
             gof_omit = "A|L|B|F",
             align = "ldddddd",
             fmt= 4,
             output = "gt")
Table 7.3:

IV estimates of a wage equation using father and mother education as instruments and showing important diagnostic statistics

2SLS
Est. S.E. t p 2.5 % 97.5 %
(Intercept)       4.2642***   0.2189  19.4792 <1e-04      3.8349   4.6934
educ       0.0999***   0.0128   7.8341 <1e-04      0.0749   0.1249
exper       0.0989***   0.0095  10.3954 <1e-04      0.0802   0.1175
I(exper^2)      -0.0024***   0.0004  -6.1028 <1e-04     -0.0032  -0.0017
black      -0.1506***   0.0260  -5.8009 <1e-04     -0.2015  -0.0997
smsa       0.1509***   0.0196   7.6933 <1e-04      0.1125   0.1894
south      -0.1073***   0.0181  -5.9364 <1e-04     -0.1427  -0.0718
Num.Obs.    2220       
R2       0.253    
RMSE       0.38     
Diagnostics               
Weak.Instr 127.78 [0.000]
Hausman.WU 3.97 [0.047]
Sargan 2.05 [0.152]
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Test for weak instruments by hand
# First stage regression
first <- lm(educ ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, data = card1)

# Testing whether coefficients of external instruments are jointly zero
lht(first, c("motheduc", "fatheduc"))
      Linear hypothesis test
      
      Hypothesis:
      motheduc = 0
      fatheduc = 0
      
      Model 1: restricted model
      Model 2: educ ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa + 
          south
      
        Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
      1   2214 8594.3                                  
      2   2212 7704.2  2    890.12 127.78 < 2.2e-16 ***
      ---
      Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Doing the Hausman-Wu test by hand
# We need the residual of first stage regression of educ on all exoegenous variables 

resid1 <- first$residuals 

# Regressing the model of interest with residuals of the first stage regression 
# as additional variable

# Hausman-Wu test; Look at the p-value of resid1
# Further, compare estimated coefficients with the 2SLS estimates; they are are identical 
 
Hausman_Wu <- lm(lwage ~ educ + exper + I(exper^2) + black + smsa + south + 
                   resid1, data = card1) 

Code
library(modelsummary)
modelsummary(list( "Hausman_Wu" = Hausman_Wu ),
             shape =  term ~ statistic,
             statistic = c('std.error', 'statistic', 'p.value', 'conf.int'), 
             stars = TRUE, 
             gof_omit = "A|L|B|F",
             align = "ldddddd",
             fmt= 4,
             output = "gt")
Table 7.4:

Hausman-Wu test for wage equation above. resid1 are the residuals of the first stage regression. Note, the coefficients of this eqaution are identical to the 2SLS estimates

Hausman_Wu
Est. S.E. t p 2.5 % 97.5 %
(Intercept)     4.2642***   0.2171  19.6427 <1e-04       3.8384   4.6899
educ     0.0999***   0.0126   7.8998 <1e-04       0.0751   0.1247
exper     0.0989***   0.0094  10.4826 <1e-04       0.0804   0.1174
I(exper^2)    -0.0024***   0.0004  -6.1540 <1e-04      -0.0032  -0.0017
black    -0.1506***   0.0257  -5.8496 <1e-04      -0.2011  -0.1001
smsa     0.1509***   0.0195   7.7578 <1e-04       0.1128   0.1891
south    -0.1073***   0.0179  -5.9862 <1e-04      -0.1424  -0.0721
resid1    -0.0266*     0.0134  -1.9915      0.0465  -0.0528  -0.0004
Num.Obs.  2220       
R2     0.266    
RMSE     0.38     
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Sargan test by hand
```{r}
#| comment: "     "

# Regression of IV residuals on all exogenous variables
sargan <- lm(resid_iv ~ fatheduc + motheduc + 
               exper + I(exper^2) + black + smsa + south, 
             data = card1)

# Test statistic: J = n * R^2
J <- length( sargan$residuals ) * summary(sargan)$r.squared


print("Result of Sargan test")
print( paste( "J-stat =", sprintf( "%.3f",J ), "   p-value =", sprintf( "%.4f",1-pchisq(J,1) ) ) )
```
      [1] "Result of Sargan test"
      [1] "J-stat = 2.051    p-value = 0.1522"

  1. In principle, every variable not part of a correctly specified structural equation that is uncorrelated with the error term u (condition 1. is met), could serve as an external instrument, especially if condition 3. is met. So, strictly speaking, condition 2. is redundant but nonetheless helpful for the distinction of external and internal instruments and to understand the basic problem of finding external instruments.↩︎

  2. If the strong requirements for proxy variables are satisfied, see Equation 2.47 and the following analysis.↩︎

  3. In particular, OLS predicted values \hat y = \mathbf Py and OLS residuals \hat u = \mathbf My are always uncorrelated because \mathbf P and \mathbf M are orthogonal matrices, see Equation C.10 and Equation C.11.↩︎

  4. This is not the case if the 2SLS estimates are calculated by hand, like described in text following Equation 7.9. The reason is that in this case in the second stage regression the exogenous variables remain unchanged as regressors are not replaced by their linear projections. This can lead to a correlation of the exogenous variable, which was left out in the first stage regression, with the residuals of the first stage regression, violating MLR.4’ in Equation 7.12.↩︎

  5. Note that generally, the 2SLS residual do not retain the orthogonality property from their OLS counterparts, which makes the argument in the text considerably more complicated to proof.
    First of all, we have to distinguish the residuals of Equation 7.51, \hat v from the 2SLS residuals:
    The former a are defined as \hat v = y-\hat {\boldsymbol \beta} \mathbf x - \hat x_j \hat \beta_j and the latter are \hat u = y-\hat {\boldsymbol \beta} \mathbf x - \tilde x_j \hat \beta_j. Substituting \tilde x_j = \hat x_j +\hat e from the first stage regression we get: \hat u = y-\hat {\boldsymbol \beta} \mathbf x - ( \hat x_j +\hat e) \hat \beta_j \; = \; y-\hat {\boldsymbol \beta} \mathbf x - \hat x_j \beta_j - \hat e \hat \beta_j \; \Rightarrow \; \hat u = (\hat v - \hat e \hat \beta_j), which is not obvious.
    Secondly, we proof that \mathbf x is uncorrelated with \hat u = \hat v - \beta_j \hat e:
    Because of the orthogonality property of OLS it follows from the first stage regression that \mathbf x is uncorrelated with the first stage residuals \hat e. From the second stage regression Equation 7.51 it follows that \mathbf x is uncorrelated with \hat v as well. Thus, \mathbf x is uncorrelated with \hat u.
    Thirdly, we proof that z_1 is uncorrelated with \hat u = \hat v - \beta_j \hat e:
    If we have only one instrument z_1, \hat x_j in Equation 7.51 is: \hat x_j = \mathbf x \hat {\boldsymbol \gamma_1} + z_1 \hat \gamma_{2,1}. As \hat x_j and \mathbf x are uncorrelated with \hat v (orthogonality in Equation 7.51), z_1 must be uncorrelated with \hat v as well.
    Furthermore, z_1 is uncorrelated with the first stage residuals \hat e as well, because of the orthogonality property of OLS. Hence, z_1 is uncorrelated with \hat u.
    Therefore, both \mathbf x and z_1 are uncorrelated with \hat u from Equation 7.53, leading to R^2=0 in this case.↩︎

  6. Regarding the third argument of the previous footnote we now have:
    \hat x_j = \mathbf x \hat {\boldsymbol \gamma}_1 + z_1 \hat \gamma_{2,1} + z_2 \hat \gamma_{2,2}. As \hat x_j and \mathbf x are uncorrelated with \hat v (orthogonality in Equation 7.51), the linear combination z_1 \hat \gamma_{2,1} + z_2 \hat \gamma_{2,2} must be uncorrelated with \hat v as well.
    Furthermore, z_1 and z_2 are uncorrelated with the first stage residuals \hat e, because of the orthogonality property of OLS. We conclude, z_1 \hat \gamma_{2,1} + z_2 \hat \gamma_{2,2} is uncorrelated with \hat u but not z_1 or z_2 for themselves.
    Hence, z_1 and z_2 are generally correlated with \hat u in Equation 7.53 and so, we generally have R^2 \neq 0.↩︎