Econometrics with R - 7 IV Estimators

7.1 Main Causes of the Problem

Solutions to the endogeneity:

Proxy variables method for omitted regressors
Model for selection process
Fixed effects methods if
1. panel data are available,
2. endogeneity is time-constant, and
3. regressors are not time-constant
Instrumental variables methods (IV)
- IV estimators are the most prominent method to address endogeneity problems

7.2 Main idea of IV estimation

The main causes for endogeneity of explanatory variables discussed above are so common that nearly every empirical work is more or less affected by this problem

Assume, our model is the following with \tilde x_i being endogenous, i.e. correlated with u_i, violating MLR.4’ (therefore the tilde over x_i). This is the so called structural equation which describes the causal effect we want to estimate

y_i = \beta_0 + \beta_1 \tilde x_i + u_i \tag{7.1}

The method of instrumental variables is a remedy for the endogeneity problem
The main idea is that the variables \tilde {\mathbf x}_i which are correlated with u_i are “replaced” in some way with instruments

These instruments should contain additional information (outside of Equation 7.1) to help resolve the endogeneity problem, i.e., to disentangle the looked for partial effect of \tilde {\mathbf x}_i from feedbacks or other sources of correlation which we discussed above

The external instruments (we denote them \mathbf z_i) have to satisfy the following three conditions: ¹
1. \mathbf z_i have to be (weak) exogenous; Cov (z_i, u_i) = 0, see Section 2.7
2. \mathbf z_i must not be a part of the structural equation of interest – exclusion restrictions.
  We need external instruments with additional outside information!
3. \mathbf z_i have to be relevant; Cov (z_i, \tilde x_i) \neq 0, indeed, the correlation between z_i and \tilde x_i should be as high as possible
From the first and third requirement, we can easily drive the IV estimator for one explanatory variable and one instrument

7.3 The IV estimator

Based on Cov (z_i, u_i) = 0 we can derive a method of moments estimator. From Equation 7.1, we have u_i = y_i - \beta_0 - \beta_1 \tilde x_i. Plug this into Cov (z_i, u_i)

Cov \left( z_i, (y_i - \beta_0 - \beta_1 \tilde x_i) \right) \ = \ Cov(z_i,y_i) - \beta_1 Cov(z_i,\tilde x_i) \, = \, 0 \ \ \Rightarrow

\beta_1 = \dfrac{Cov(z_i,y_i)}{Cov(z_i,\tilde x_i)} \tag{7.2}

The parameter is estimable (identified) because we can write down \beta_1 in terms of population moments which can be replaced with their empirical counterparts to reach to the IV estimator for \beta_1

\hat\beta_{1,IV} = \dfrac{\frac {1}{n}\sum_i (z_i-\bar z)(y_i-\bar y)} {\frac {1}{n}\sum_i (z_i-\bar z)(\tilde x_i-\bar x)} \tag{7.3}

If every variable is well behaved, we can apply the LLN and it follows that Equation 7.3 converges to Equation 7.2 with an ever increasing sample size. Hence, \hat\beta_{1,IV} is a consistent estimator for \beta_1, whereas the OLS estimator

\hat\beta_{1} = \dfrac{\frac {1}{n}\sum_i (\tilde x_i-\bar x)(y_i-\bar y)} {\frac {1}{n}\sum_i (\tilde x_i-\bar x)^2} \ \tag{7.4}

is not; because \tilde x_i is correlated with u_i by assumption

How to find instruments?

The consistency of the IV estimators relies on the exogeneity of z_i. Unfortunately, this exogeneity cannot be tested directly (without additional information), hence we have to assume this – based on economic theory, common sense or introspection
- If we have more external instruments as needed (more than one in this example), we can test whether the instruments are exogenous as a group; this will be discussed later – Sargan J test
In practice, the main difficulty with IV estimators is to find appropriate instruments. Let us consider our good old wage equation:

wage_i = \beta_0 + \beta_1 educ_i + \beta_2 exper_i + \underbrace {(ability + v_i)}_{u_i} \tag{7.5}

We are interested in the partial effect of education on the wage. But we probably have an omitted variable problem as ability of the people is clearly important for the received wage and is not directly observable and thus, ability is a part of u_i
- However, ability and therefore u_i are probably correlated with educ – people with higher ability also tend to be more educated. But this violates MLR.4’ and thus, educ is endogenous
So, we need at least one external instrument for educ, which is
- not part of Equation 7.5
- is relevant
- is exogenous (not correlated with the error term and thus correctly excluded from the main model)

There have been proposed several instruments for this matter

The education of the mother or father
1. No direct wage determinant
2. Correlated with education of the child because of social factors
3. Probably (?) uncorrelated with innate ability (problem: ability may be inherited from parents)
The number of siblings
1. No direct wage determinant
2. Correlated with education because of resource constraints in household
3. Probably uncorrelated with innate ability
College proximity when 16 years old
1. No direct wage determinant
2. Correlated with education because more education if lived near college
3. Uncorrelated with error (?)
Month of birth
1. No direct wage determinant
2. Correlated with education because of compulsory school attendance laws (in German: Schulpflicht)
3. Uncorrelated with error
In all these cases one could question the exogeneity of the proposed instrument or their relevance, or even both. However, at least the relevance can be tested
In the following, we estimate Equation 7.5 as an example with OLS and IV
- Note the coefficients of the endogenous educ (this actually is expected to be over- and not underestimated by OLS – maybe an additional errors in variables problem?) and the considerably larger standard errors of educ

Example: IV versus OLS

library(wooldridge); library(AER); library(texreg)
data("card")


# OLS
ols <- lm(lwage ~ educ + exper + I(exper^2) + black + smsa + south, data=card)


# IV with father education as instrument 
# Note, in ivreg the instruments are after "|" and you have to include all (!)  
# exogenous variables of the model but not the endogenous educ
iv1 <-  ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | 
                fatheduc + exper + I(exper^2) + black + smsa + south, 
              data=card)


# IV with mother education as instrument 
iv2 <-  ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | 
                motheduc + exper + I(exper^2) + black + smsa + south, 
              data=card)


# IV with father and mother education as instruments 
iv12 <-  ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | 
                fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, 
               data=card)


# IV with nearc4 (proximity to a 4 year collage) as instrument 
iv3 <-  ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | 
                nearc4 + exper + I(exper^2) + black + smsa + south, 
              data=card)

Code

library(modelsummary)
modelsummary( list("OLS"=ols,
                   "IV-fath"=iv1, "IV-moth"=iv2, "IV-fath_moth"=iv12, "IV-nearc4"=iv3), 
              gof_omit = "A|L|B|F",
              align = "lddddd", 
              stars = TRUE, 
              fmt = 3,
              output="gt")

Table 7.1:
Comparison of OLS with IV estimates, using different instruments (se in brackets)
	OLS	IV-fath	IV-moth	IV-fath_moth	IV-nearc4
(Intercept)	4.734***	4.467***	4.266***	4.264***	3.753***
	(0.068)	(0.238)	(0.234)	(0.219)	(0.829)
educ	0.074***	0.089***	0.102***	0.100***	0.132**
	(0.004)	(0.014)	(0.014)	(0.013)	(0.049)
exper	0.084***	0.093***	0.095***	0.099***	0.107***
	(0.007)	(0.010)	(0.009)	(0.010)	(0.021)
I(exper^2)	-0.002***	-0.002***	-0.002***	-0.002***	-0.002***
	(0.000)	(0.000)	(0.000)	(0.000)	(0.000)
black	-0.190***	-0.160***	-0.168***	-0.151***	-0.131*
	(0.018)	(0.026)	(0.024)	(0.026)	(0.053)
smsa	0.161***	0.155***	0.146***	0.151***	0.131***
	(0.016)	(0.019)	(0.018)	(0.020)	(0.030)
south	-0.125***	-0.113***	-0.116***	-0.107***	-0.105***
	(0.015)	(0.018)	(0.017)	(0.018)	(0.023)
Num.Obs.	3010	2320	2657	2220	3010
R2	0.291	0.264	0.274	0.253	0.225
RMSE	0.37	0.38	0.38	0.38	0.39
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

7.4 The relevance of relevance

Besides the exogeneity, the relevance of the instrument, Cov(z_i,\tilde x_i), plays an extreme important role. To show this we subtract from model Equation 7.1 the corresponding mean values and premultiply by (z_i-\bar z)

(z_i-\bar z) (y_i-\bar y) = \beta_1 (z_i-\bar z)(\tilde x_i-\bar x) + (z_i-\bar z)u_i

Taking the expectation we get

E[(z_i-\bar z) (y_i-\bar y)] = \beta_1 E[(z_i-\bar z)(\tilde x_i-\bar x)] + E[(z_i-\bar z)u_i] \ \ \Rightarrow

Cov(z_i,y_i) = \beta_1 Cov(z_i,\tilde x_i) + Cov(z_i,u_i)

And dividing by Cov(z_i,\tilde x_i)

\dfrac {Cov(z_i,y_i)}{Cov(z_i,\tilde x_i)} = \beta_1 + \dfrac {Cov(z_i,u_i)}{Cov(z_i,\tilde x_i)}

Replacing the theoretical moment by their empirical counterparts, we recognize that the left hand side is equal to the IV estimate of \beta_1, Equation 7.3. Taking the probability limits (Equation A.9) we arrive to

\operatorname {plim} \hat\beta_{1,IV} \ = \ \beta_1 + \dfrac{\operatorname {plim} \frac {1}{n}\sum_i (z_i-\bar z)u_i} {\operatorname {plim} \frac {1}{n}\sum_i (z_i-\bar z)(\tilde x_i-\bar x)} \ = \ \beta_1 + \dfrac {\operatorname {Corr}(z_i,u_i)}{\operatorname {Corr}(z_i,\tilde x_i)} \dfrac {\sigma_u}{\sigma_{\tilde x}} \tag{7.6}

As Equation 7.6 shows, \hat \beta_{1,IV} is consistent if the correlation between z_i and u_i is zero, \operatorname {Corr}(z_i,u_i)=0, i.e., z is exogenous
Suppose, this correlation is not exactly zero, but small. In this case we would expect only a small asymptotic bias in \hat \beta_{1,IV}
- However, if additionally, \operatorname {Corr}(z_i,\tilde x_i) is small as well – if z_i is not relevant – the bias in the IV estimate could become considerable large. \rightarrow Weak instrument problem
We can derive a relation similar to Equation 7.6 for the OLS estimate; we subtract the corresponding mean values from model Equation 7.1 and premultiply by (\tilde x_i-\bar x). This yields

\operatorname {plim} \hat\beta_{1,OLS} \ = \ \beta_1 + \dfrac{\operatorname {plim} \frac {1}{n}\sum_i (\tilde x_i-\bar x)u_i} {\operatorname {plim} \frac {1}{n}\sum_i (\tilde x_i-\bar x)(\tilde x_i-\bar x)} \ = \ \beta_1 + \dfrac {\operatorname {Corr}(\tilde x_i,u_i)}{\operatorname {Var}(\tilde x_i)} {\sigma_u \sigma_{\tilde x}} \ = \

\beta_1 + \operatorname {Corr}(\tilde x_i,u_i) \frac {\sigma_u}{\sigma_{\tilde x}} \tag{7.7}

Thus, besides {\sigma_u}/{\sigma_{\tilde x}} (explain why) the asymptotic bias of the OLS estimates depends on \operatorname {Corr}(\tilde x_i,u_i)
However, if we have a weak instrument problem, it is easily possible that the asymptotic bias of the IV estimate is even larger than that of the OLS estimate;

\frac{\operatorname{Corr}(z_i, u_i)}{\operatorname{Corr}(z_i, \tilde x_i)}>\operatorname{Corr}(\tilde x_i, u_i) \text { e.g. } \frac{0.03}{0.2}>0.1

7.5 Two stage least square (2SLS)

Suppose, we want to estimate a more elaborated structural model equation

y = \beta_0 + \beta_1 x_1 + \cdots + \beta_{l} x_{l} + \beta_{j} \tilde x_j + u \tag{7.8}

with l exogenous variables, x_1, \ldots , x_l, and one endogenous, \tilde x_j.

In the introduction we argued that in the case of endogenous regressors we have to “replace” this variable by an exogenous and relevant instrument. But we were not specific what replace really means

Here, replace is not meant literally in the sense that we actually replace \tilde x with z in Equation 7.8 and then apply OLS. This would be a Proxy variable approach, which might be sometimes useful with an error in the variables or omitted variable problem ²
- As an example, if we directly replace educ in our wage equation Equation 7.5 by an exogenous proxy variable z (for instance with fatheduc ), the OLS-coefficient of z would generally not estimate the effect of education on earned wage, \beta_1, and probably introduce an additional errors in the variables problem
The IV approach is another one: We do not replace \tilde x with z, but rather with the predicted values of a regression of \tilde x on all the exogenous variables of the model including the external instrument z. We denote this predicted values with \hat x and call this the

first step regression or reduced form regression

\tilde x_j \, = \, \gamma_0 + \gamma_1 x_1 + \cdots + \gamma_{l} x_{l} \, + \, \gamma_{l+1}z_1 + \cdots + \gamma_{l+m}z_m + e \tag{7.9}

Here, x_1,\ldots,x_{l} are the l exogenous variables of the model (sometimes called internal instruments), for instance exper and exper^2 in our wage equation, and z_1,\ldots,z_m are the m external instruments
We estimate the coefficients \gamma of Equation 7.9 by OLS, leading to the predicted values \hat x_j

\tilde x_j \, = \, \underbrace{ \hat \gamma_0 + \hat \gamma_1 x_1 + \cdots + \hat \gamma_{l} x_{l} \, + \, \hat \gamma_{l+1}z_1 + \cdots + \hat \gamma_{l+m}z_m}_{\hat x_j} + \hat e \quad \Rightarrow \tag{7.10}

\tilde x_j \, = \, \hat x_j + \hat e \tag{7.11}

Remark: The reduced form Equation 7.9 is the functional relationship of an endogenous variable dependent only on exogenous variables (the exogenous variables and error terms drive the endogenous ones – data generating process) and is related to simultaneous equation models

In the second step regression we estimate the original model, but with \hat x_j in place of \tilde x_j, i.e., we insert \hat x_j + \hat e from Equation 7.11 for \tilde x_j in Equation 7.8

y = \beta_0 + \beta_1 x_1 + \cdots + \beta_{l} x_{l} + \beta_{j} \hat x_j + \underbrace { (u + \beta_j \hat e)}_v \tag{7.12}

The new error v is composed of two components:
- The error from the structural model, u. This is uncorrelated with the exogenous variables x_1,\ldots,x_{l} and is now also uncorrelated with \hat x_j, because the latter is a linear combination of x_1,\ldots,x_{l} and the exogenous external instruments z_1,\ldots,z_{m} from the first stage regression, Equation 7.10
- The residuals of the first stage regression, \hat e. But these pose no problems (besides the larger error variance), because the residuals are uncorrelated with all variables in Equation 7.12 by construction (orthogonality property) – hence, no errors in the variables problem and no violation of MLR.4’ ³
Hence, the parameters of Equation 7.12 can be consistently estimated by OLS

Remark: The 2SLS residuals, and subsequently the residual variance \hat \sigma^2, have to computed by

\hat u \ = \ y - \underbrace {\hat\beta_0 + \hat\beta_1 x_1 + \cdots + \hat\beta_{l} x_{l} + \hat\beta_{j} \tilde x_j}_{\hat y} \tag{7.13}

Thereby, the 2SLS estimates of the \betas are used, but with the original variable \tilde x_j and not with \hat x_j. This procedure yields \hat u_i from Equation 7.8 and not \hat v_i from Equation 7.12
After estimating \sigma^2 by Equation 2.35 with residuals based on Equation 7.13, all tests can be carried out in the usual way

Example:

Suppose our wage equation is

wage_i \ = \ \beta_0 + \beta_1 educ_i + \beta_2 exper_i + \beta_3 exper_i^2 + u_i \tag{7.14}

Because of the unobserved ability (which is therefore part of u_i) and the fact that ability is probably correlated with education the variable education is endogenous. Therefore, we need instruments for education, which are not part of the model, are exogenous and relevant
Suppose we have two external instruments for education: education of the mother and education of the father. We already discussed them before
Thus, in the first stage regression we regress educ on all exogenous variables of the model (internal instruments) and the two external instrumental variables

educ_i = \gamma_0 + \gamma_1 exper_i + \gamma_2 exper_i^2 + \gamma_3 fatheduc_i + \gamma_4 motheduc_i + e_i \tag{7.15}

This first stage regression yields the predicted values \widehat {educ}

In the second stage we estimate the original model, but with \widehat {educ} instead of educ

wage_i \ = \ \beta_0 + \beta_1 \widehat {educ_i} + \beta_2 exper_i + \beta_3 exper_i^2 + v_i \tag{7.16}

This two-step procedure yield consistent estimates for all \betas; therefore the name two stage least square

Remark: Usually, if we have exactly as many external instrument as right hand side (rhs) endogenous variables, we call the procedure (ordinary) IV, otherwise 2SLS. But this labeling doesn’t seems to be unanimously used

Why does this procedure work
- All variables in the second stage regression are exogenous because educ was replaced by a prediction, only based on exogenous information
- By using the prediction based on exogenous information, educ is purged of its endogenous part (the part that is related to the error term)
- Thus, only that part of educ remains which is exogenous. And this is the reason why the coefficient of \widehat {educ} represents the causal effect of education on received wages (and not some mixtures of effects, see Section 1.3.2)

7.5.1 2SLS – variance of estimates

The most important downside of IV/2SLS estimations is that the variance of the IV/2SLS estimates are generally considerably larger than that of OLS estimates, i.e. they are less precise (look at Table 7.1)
Therefore, IV/2SLS need large samples to be useful
Below, the formulas for the variance of OLS estimates and the formula for IV/2SLS estimates is shown

\operatorname {Var}(\hat \beta_{j,OLS}) \ = \ \dfrac{\sigma^2}{ \underbrace{SST_j}_{\sum_{i=1}^n (x_{ij} - \bar x_j)^2} (1-R_j^2) }

\operatorname {Var}(\hat \beta_{j,2SLS}) \ = \ \dfrac{\sigma^2}{ \underbrace{SST_j}_{\sum_{i=1}^n (\hat x_{ij} - \bar x_j)^2} (1 - R_{j}^2) } \tag{7.17}

These formulas only differ in that for calculation of the latter one, the explanatory variable x_j is replaced with its prediction from the first step regression, \hat x_j

The variance of the IV/2SLS estimate \beta_j

increases with the error variance \sigma^2 and decreases with sample size n
decreases with the total variation of the predicted values \hat x_j
increases with R_{j}^2, which is the R^2 of a regression of \hat x_j on all the other explanatory x-es

The last two points are always (considerably) worse for IV/2SLS than for OLS and even worsens more with poor or weak instruments

The error variance \sigma_v^2 of the second stage regression is larger, because the error term additionally contains the first stage residuals. However, the residuals are purged form this effect if they are correctly computed by Equation 7.13
The variation of a predicted variable, SSE, is always less than the variation of the original variable, SST. The definition of the R^2 is based in this ratio SSE/SST \leq 1; \rightarrow less variation of the corresponding explanatory variable \hat x_j
The R^2, i.e. the fit of a regression of the predicted variable \hat x_j on all the other x-es is always higher than the R^2 of a regression of the original variable x_j on all the other x-es
- The reason is that in the first stage regression, x_j is regressed on all exogenous variables of the model (the other x-s) plus the external instruments. Hence, the predicted values of this regression, \hat x_j, are, besides the effects of the external instruments, a linear function of these other x-es. This implies: The correlation between \hat x_j and the other x-es is typically much higher than the correlation between the original x_j and the other x-es
- With other words, IV/2SLS exhibit an inherent multicollinearity problem

7.5.2 Matrix notation for IV/2SLS

Formal analysis of IV/2SLS estimation in matrix notation

We have the structural model:
(Let’s assume that all variables are demeaned, so we can forget the intercept – otherwise, the intercept would be part of the exogenous model variables in \mathbf X and hence an internal instrument as well. The derived formulas are unaffected by this, and instead of k we would have k+1 respectively, instead of l we would have l+1.)

\mathbf y \, = \, \mathbf X \boldsymbol \beta + \mathbf u \tag{7.18}

Some of the variables in the n \times k matrix \mathbf X are endogenous, i.e., are correlated with \mathbf u. This leads to inconsistent OLS estimators \hat {\boldsymbol \beta}.

In the first step of 2SLS, we regress all k variables in \mathbf X on all l exogenous variables of the model from Equation 7.18 (internal instruments) plus the m external instruments. The data matrix (the n \times (l+m) instrument matrix) of regressors \mathbf Z contains both groups of variables. For identification we necessarily must have (l+m) \geq k.

For every i^th row of \mathbf Z we assume E(\mathbf z_i \mid u_i)=\mathbf 0, hence, every column (variable) of \mathbf Z is (weak) exogenous with regard to the error term \mathbf u

Thus, we have for the k first step regressions:

\mathbf X \, = \, \mathbf Z \, \boldsymbol\Gamma + \mathbf E \tag{7.19}

with the (l+m) \times k coefficient matrix \mathbf \Gamma (the columns of \mathbf \Gamma containing the coefficients of the k first step regressions) and
the first step n \times k error term matrix \mathbf E

We need the predicted values – the linear projections – of the first stage regression Equation 7.19:

\hat {\mathbf X} \, = \, \mathbf Z \, \hat {\boldsymbol \Gamma} \, = \, \mathbf Z \, \underbrace {(\mathbf {Z'Z})^{-1} \mathbf Z'\mathbf X}_{\hat{\boldsymbol \Gamma}} \, = \, \mathbf P_{\mathbf Z} \mathbf X \tag{7.20}

Thereby, we have used the projection matrix (hat matrix):

\mathbf P_{\mathbf Z}=\mathbf Z (\mathbf {Z'Z})^{-1} \mathbf Z'

This matrix was introduced in Section 2.5.1, see also Equation C.11.

With regard to our example from Equation 7.14, the matrix \hat {\mathbf X} of the linear projections contains:
1. The exogenous variables of the model including the intercept and
2. The predicted values of the endogenous variables, \widehat {educ} in our example;
Hence: \hat {\mathbf X} = [1, exper, exper^2, \widehat {educ}]
Regarding a), the exogenous variables of the model; this follows directly from the fact that we regress the exogenous model variables on themselves (these are part of the matrix \mathbf Z). Hence, we get a perfect fit for them and their predicted values are identical to the variables themselves. Thus, the exogenous variables of the model act as their own (internal) instruments
Applying OLS to the first step regressions Equation 7.19 implies:

\mathbf X \ = \underbrace {\hat {\mathbf X}}_ {\mathbf Z \hat {\mathbf \Gamma} } + \ \hat {\mathbf E} \tag{7.21}

In the second step of 2SLS we replace \mathbf X in Equation 7.18 with \hat {\mathbf X} + \hat{\mathbf E} from Equation 7.21 and arrive to

\mathbf y \, = \, \hat{\mathbf X} \boldsymbol \beta + \underbrace {(\mathbf u + \hat{\mathbf E} \boldsymbol \beta)}_{\mathbf v} \tag{7.22}

with some errors \mathbf v, which additionally to \mathbf u, contains the residuals of the first stage regressions \hat {\mathbf E}.

Note that \hat {\mathbf E} is uncorrelated with \hat{\mathbf X} as predicted values of OLS are always uncorrelated with the OLS residuals (orthogonality of the projection matrices \mathbf M and \mathbf P):

\hat {\mathbf X'} \, \hat{\mathbf E} \ = \ \underbrace {\mathbf {X'P_{\mathbf Z}}}_{\hat {\mathbf X'}} \, \underbrace {\mathbf {M_{\mathbf Z}X}}_{\hat{\mathbf E}} = \mathbf 0
- Leaving out one exogenous variable from Equation 7.18 in the first stage regression Equation 7.19 would not lead to a breakdown of the orthogonality of \hat {\mathbf E} and \hat{\mathbf X} and 2SLS would still be consistent but less efficient because the left out exogenous variable would be replaced by its linear projection on \mathbf Z ⁴
- Also note that \hat {\mathbf X} is a linear combination of variables {\mathbf Z} which all are weak exogenous with respect to \mathbf u by assumption
  - Hence, the regressors of the second stage regression are weak exogenous with respect to the error term \mathbf v = (\mathbf u + \hat{\mathbf E}) of the second stage regression Equation 7.22. Assumption MLR.4’ of Section 2.7 is therefore fulfilled
Applying OLS to Equation 7.22 yields the IV estimates, respectively, the 2SLS estimates, which are consistent based on the augments above:

\hat {\boldsymbol \beta}_{IV} = (\hat{\mathbf X}'\hat{\mathbf X})^{-1}\hat{\mathbf X}'\mathbf y \tag{7.23}

Plugging in the model Equation 7.18 for \mathbf y in Equation 7.23 leads to

\hat {\boldsymbol \beta}_{IV} = \boldsymbol \beta + (\hat{\mathbf X}'\hat{\mathbf X})^{-1}\hat{\mathbf X}'\mathbf u \tag{7.24}

Therefore, assuming iid errors \mathbf u with variance \sigma^2 and applying formula Equation C.7 leads to the the covariance matrix of \hat {\boldsymbol \beta}_{IV}

\operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) = \sigma^2 (\hat{\mathbf X}'\hat{\mathbf X})^{-1} \tag{7.25}

Finally we have to estimate \sigma^2, the variance of \mathbf u.

We calculate the residuals \hat {\mathbf u} based on Equation 7.18 (and not on the second stage regression Equation 7.22) by

\hat {\mathbf u} \ = \ \mathbf y - \hat {\mathbf y} \ = \ \mathbf y - \mathbf X \hat {\boldsymbol \beta}_{IV} \tag{7.26}

And furthermore, the estimated variance of \mathbf u is

\hat \sigma^2 \ = \ \dfrac {1}{n-k} \hat {\mathbf u}' \hat {\mathbf u} \tag{7.27}

The two step procedure describe above can be accomplished in only one step, which is much more convenient for actual computation.

For that, we plug in \mathbf P_{\mathbf Z} \mathbf X for \hat{\mathbf X} in Equation 7.23 and use the fact that \mathbf P_{\mathbf Z} is symmetric and idempotent,
i.e., \mathbf P'_{\mathbf Z}=\mathbf P_{\mathbf Z} and \mathbf P_{\mathbf Z}\mathbf P_{\mathbf Z}=\mathbf P_{\mathbf Z}

\hat {\boldsymbol \beta}_{IV} \, = (\hat{\mathbf X}' \hat{\mathbf X})^{-1}\hat{\mathbf X}'\mathbf y \, = \, (\mathbf X' \mathbf P'_{\mathbf Z} \mathbf P_{\mathbf Z} \mathbf X)^{-1}\mathbf X' \mathbf P'_{\mathbf Z} \mathbf y \, = \, (\mathbf X' \mathbf P_{\mathbf Z} \mathbf X)^{-1}\mathbf X' \mathbf P_{\mathbf Z} \mathbf y \tag{7.28}

and for the covariance matrix

\operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) \, = \, \sigma^2 (\hat{\mathbf X}'\hat{\mathbf X})^{-1} \, = \, \sigma^2(\mathbf X' \mathbf P'_{\mathbf Z} \mathbf P_{\mathbf Z} \mathbf X)^{-1} \, = \, \sigma^2 (\mathbf X' \mathbf P_{\mathbf Z} \mathbf X)^{-1} \tag{7.29}

Using the definition of \mathbf P_{\mathbf Z} and writing out, the resulting terms could obviously be calculated in one step

\hat {\boldsymbol \beta}_{IV} \, = \, \left(\mathbf X' \mathbf Z \, (\mathbf {Z'Z})^{-1} \mathbf Z' \mathbf X \right)^{-1} \mathbf X'\mathbf Z \, (\mathbf {Z'Z})^{-1} \, \mathbf Z' \mathbf y \tag{7.30}

\operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) \, = \, \sigma^2 (\mathbf X' \mathbf Z \, (\mathbf {Z'Z})^{-1} \mathbf Z' \mathbf X)^{-1} \tag{7.31}

An interesting special case arise, if \mathbf X and \mathbf Z have the same number of columns, i.e., k = l+m. In this case the number of rhs endogenous variables equals the number of external instruments and the matrix \mathbf X'\mathbf Z is square, k \times k, and invertible. Then, using the rule (ABC)^{-1}=C^{-1}B^{-1}A^{-1}, \hat {\boldsymbol \beta}_{IV} simplifies to

\hat {\boldsymbol \beta}_{IV} \, = \, (\mathbf Z' \mathbf X)^{-1} \, (\mathbf {Z'Z}) (\mathbf X' \mathbf Z) ^{-1} \mathbf X' \mathbf Z \, (\mathbf {Z'Z})^{-1} \, \mathbf Z' \mathbf y \, = \, (\mathbf Z' \mathbf X)^{-1} \, \mathbf Z' \mathbf y \tag{7.32}

This resembles Equation 7.3 and is sometimes called ordinary IV estimator. Note, in this case the model is just or exactly identified

Proof of consistency and asymptotic normality of IV/2SLS

To prove the consistency of \hat {\boldsymbol \beta}_{IV}, we take the solution Equation 7.30 and plug in the model Equation 7.18 for \mathbf y:

\hat {\boldsymbol \beta}_{IV} \, = \, {\boldsymbol \beta} + \left(\mathbf X' \mathbf Z \, (\mathbf {Z'Z})^{-1} \mathbf Z' \mathbf X \right)^{-1} \mathbf X'\mathbf Z \, (\mathbf {Z'Z})^{-1} \, \mathbf Z' \mathbf u \tag{7.33}

Afterwards, we are after dividing and multiplying the cross product terms by n accordingly and take the probability limit.
Note, according to Slutsky’s Theorem (Theorem A.4), \operatorname {plim} g(x) = f(\operatorname {plim}x) for a continuous function g(.)

\operatorname{plim}\left[\hat{\boldsymbol{\beta}}_{IV}\right] = \boldsymbol{\beta} \, + \, \left[\operatorname{plim}\left[\frac{\mathbf{X}' \mathbf{Z}}{n}\right] \ \operatorname{plim}\left[\frac{\mathbf{Z}' \mathbf{Z}}{n}\right]^{-1} \operatorname{plim}\left[\frac{\mathbf Z' \mathbf {X}}{n}\right]\right]^{-1} \times \\ \operatorname{plim}\left[\frac{\mathbf X' \mathbf Z}{n}\right] \ \operatorname{plim}\left[\frac{\mathbf Z' \mathbf Z}{n}\right]^{-1} \operatorname{plim}\left[\frac{\mathbf Z' \mathbf u}{n}\right] \tag{7.34}

According to LLN, if \mathbf X and \mathbf Z are “well behaved”, the empirical moment matrices converge to their population (theoretical) moments matrices \mathbf m. Thus we have

\operatorname{plim}\left[\hat{\boldsymbol{\beta}}_{IV}\right] = \boldsymbol{\beta} \, + \, \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf {ZZ}}^{-1} \cdot \operatorname{plim}\left[\frac{\mathbf Z' \mathbf u}{n}\right]

The last plim is

\operatorname{plim}\left[\frac{\mathbf Z' \mathbf u}{n}\right] \, = \, \operatorname{plim} \left[\frac{1}{n} \sum_{i=1}^n \mathbf z'_i u_i \right ] \tag{7.35}

According to the LLN (Theorem A.2), this average term converges in probability to the expectation of its summands. As we presuppose that \mathbf z_i is (weak) exogenous (MLR.4’) we have

E(\mathbf z'_i u_i) := \mathbf m_{\mathbf {Zu}} = E_z \left(E(\mathbf z'_i u_i \mid \mathbf z_i)\right) = E_z(\mathbf 0) = \mathbf 0 \tag{7.36}

Thus, we finally have

\operatorname{plim}\left[\hat{\boldsymbol{\beta}}_{IV}\right] = \boldsymbol{\beta} \, + \, \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf {ZZ}}^{-1} \cdot \underbrace {\mathbf m_{\mathbf {Zu}}}_{\mathbf 0} \, = \, \boldsymbol{\beta} \tag{7.37}

This proofs the consistency of the IV/2SLS estimator. However note that with our assumptions, IV/2SLS estimators are generally biased in finite samples, because some elements of \mathbf x_i are not exogenous (for the expectation operation, there is no thing comparable to the Slutsky’s theorem)

We furthermore state that \hat{\boldsymbol{\beta}}_{IV} is asymptotically normal distributed with the asymptotic expectation \boldsymbol \beta and asymptotic covariance matrix

\hat{\boldsymbol{\beta}}_{IV} \ \, \stackrel{a}{\sim} \, \, N \left(\boldsymbol{\beta}, \ \sigma^2 \frac{1}{n}\left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \, \mathbf m_{\mathbf {ZX}} \right]^{-1}\right) \tag{7.38}

Proof: From Equation 7.33 and Equation 7.34 we have

\sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta}) \ \stackrel{d} \longrightarrow \ \left[\operatorname{plim}\left[\frac{\mathbf{X}' \mathbf{Z}}{n}\right] \ \operatorname{plim}\left[\frac{\mathbf{Z}' \mathbf{Z}}{n}\right]^{-1} \operatorname{plim}\left[\frac{\mathbf Z' \mathbf {X}}{n}\right]\right]^{-1} \times \\ \operatorname{plim}\left[\frac{\mathbf X' \mathbf Z}{n}\right] \ \operatorname{plim}\left[\frac{\mathbf Z' \mathbf Z}{n}\right]^{-1} \times \ \stackrel{d} \longrightarrow \ \left[\frac{\mathbf Z' \mathbf u}{ \textcolor {red}{\sqrt n} }\right] \ \ \ \ \Rightarrow

\sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta}) \ \stackrel{d} \longrightarrow \ \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf {ZZ}}^{-1} \ \times \ \stackrel{d} \longrightarrow \ \left[\frac{\mathbf Z' \mathbf u}{\sqrt n}\right] \tag{7.39}

Thus, once again, the last term is the important one. In Section 4.3.1, we discussed that such a term converges in distribution under quite similar conditions as we had with the LLN to a normally distributed random vector with expected value \mathbf 0 and covariance matrix \sigma^2 \mathbf m_{\mathbf {ZZ}}, provided that E(u_i \mid \mathbf z_i)=0 and \operatorname {Var} (\mathbf u) = \sigma^2 \mathbf I – CLT, see Theorem A.3.

So, after applying the covariance matrix formula from Equation C.7, we get the covariance matrix of \sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta})

\operatorname{Var}(\sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta})) \ =

\left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \, \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf {ZZ}}^{-1} \left(\sigma^2 \mathbf m_{\mathbf {ZZ}}\right) \mathbf m_{\mathbf {ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \, \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \ = \\

\sigma^2 \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \tag{7.40}

From the limiting distribution of \sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta}) we immediately get the asymptotic distribution of \hat{\boldsymbol{\beta}}_{IV}

And a consistent estimator of the asymptotic covariance matrix is

\widehat {\operatorname{Asy.Var}}(\hat{\boldsymbol{\beta}}_{IV}) \ = \ \hat\sigma^2 \left [ \mathbf {X'Z} \ (\mathbf {Z'Z})^{-1} \mathbf {Z'X} \right]^{-1} \ = \ \hat\sigma^2 (\mathbf X' \mathbf P_{\mathbf Z} \mathbf X)^{-1} \tag{7.42}

Note, \hat \sigma^2 has to be calculated with residuals according to Equation 7.13
If \mathbf {X'Z} is “small”, i.e., the correlation between \mathbf {X} and \mathbf {Z} is small, then the inverse in the covariance matrix formula becomes “large” and we get large standard errors for the estimated coefficients – weak instrument problem

7.6 Identification

For a basic introduction to identification problems, see, Section 2.3.1.1.

But let us now discuss this issue by means of the first stage Equation 7.15, in particular, why we need external instruments to estimate \beta_1 in Equation 7.14

Suppose we have no external instrument z_i (the corresponding \gamma_j in the first stage regression Equation 7.15 are 0) and regress educ only on the internal exogenous variables exper and exper^2 in the first step. Then, \widehat {educ} would be a perfect linear combination of exper and exper^2
- However, these two variables are already present in the second stage Equation 7.16; this would generate a perfect collinearity between exper, exper^2 and \widehat {educ} in Equation 7.16, rendering it impossible to disentangle the effects of exper and exper^2 on the one hand and \widehat {educ} on the other hand. The coefficients \beta_1, \beta_2 and \beta_3 are thus not identified (not estimable) in this case
As a general rule, identification of the model equation requires that for every rhs endogenous variable we must have at least one distinct external instrument which is also relevant, i.e., the corresponding \gamma_j in first stage equation \ne 0 (rank condition)
The variables, which act as external instruments, must not be part of the model, Equation 7.14. This is called exclusion restrictions
If we have more than one external instrument per endogenous variable, the model is overidentified – which is often a good thing as we will see later. Otherwise, the model is just or exactly identified

7.6.1 Tests for weak instruments

Above, we explained why we need at least one external instrument per endogenous rhs variable

However, even if this condition is met (the corresponding \gamma_j in the first stage regression is \ne 0), it could be that the model is only barely identified. This is the case, if the conditional correlations between the external instrument and the endogenous rhs variable is low. In this instance, we have a weak instrument problem; fortunately, we can test for this circumstance

Weak instrument test: The first stage regression could be used to test for the relevance of the instruments

In our example, with Equation 7.15, we can carry out an F-test to examine, whether \gamma_3 and \gamma_4, the coefficients of fatheduc and matheduc, are jointly zero
- Monte Carlo simulations by Staiger and Stock (1997) show that a F-statistic less than 10 indicates a weak instrument problem. With heteroskedasticity in either the first or second stage equations the F-statistic should be more in the range of 20 and above
If we have more than one rhs side endogenous variable, things are more complicated; even if we have good F-statistics in every first stage regression, it is not guarantied that we have at least one distinct and relevant external instrument for each endogenous variable. We have to use specialized tests like the Cragg and Donald (1993) or Anderson (1984) tests which are based on the smallest canonical correlation of the rhs endogenous variables and the external instruments (conditioned on the other \mathbf x)

Example – Testing for relevance of instruments

We are testing the relevance of the instruments using the first step regression, Equation 7.15, i.e., regressing educ on the corresponding external instrument(s) and all other exogenous variables

It is the partial effect of the external instrument(s) that matters!

rel1 <-   lm(educ ~ fatheduc  +  exper + I(exper^2) + black + smsa + south, data=card)
rel2 <-   lm(educ ~ motheduc  +  exper + I(exper^2) + black + smsa + south, data=card)
rel12 <-  lm(educ ~ fatheduc  + motheduc  +  exper + I(exper^2) + black + smsa + south, data=card)
rel3 <-   lm(educ ~ nearc4    +  exper + I(exper^2) + black + smsa + south, data=card)

Code

modelsummary( list("fatheduc"=rel1, "motheduc"=rel2, 
                   "fatheduc+motheduc"=rel12, "nearc4"=rel3),
              output="gt", 
              statistic = "statistic",
              gof_omit = "A|B|L|F", 
              align = "ldddd", 
              stars = TRUE, 
              fmt = 4,
              coef_map = c("fatheduc", "motheduc", "nearc4",
                           "exper", "I(exper^2)", "black", "smsa", "south")
              )

Table 7.2:
Tests for weak instruments, several external instruments (t-statistics in brackes. Note, the F-statistic for a single variable is the square of the t-statistic)
	fatheduc	motheduc	fatheduc+motheduc	nearc4
fatheduc	0.1728***		0.1128***
	(14.3035)		(7.7539)
motheduc		0.1879***	0.1297***
		(14.5996)	(7.6118)
nearc4				0.3373***
				(4.0887)
exper	-0.3808***	-0.3754***	-0.3780***	-0.4100***
	(-10.0625)	(-10.7007)	(-9.8671)	(-12.1686)
I(exper^2)	0.0019	0.0009	0.0024	0.0007
	(1.0098)	(0.5376)	(1.2250)	(0.4438)
black	-0.4789***	-0.6015***	-0.3543**	-1.0061***
	(-4.1516)	(-5.9888)	(-2.9719)	(-11.2235)
smsa	0.3882***	0.3818***	0.3509***	0.4039***
	(4.3106)	(4.5528)	(3.8408)	(4.7578)
south	-0.1426+	-0.2211**	-0.1211	-0.2915***
	(-1.6511)	(-2.7157)	(-1.3852)	(-3.6790)
Num.Obs.	2320	2657	2220	3010
R2	0.477	0.492	0.482	0.474
RMSE	1.89	1.89	1.86	1.94
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

The t-statistics of fatheduc and motheduc are > 14 hence, a weak instrument problem can be ruled out for these instruments

F-test, whether both `fatheduc` and `motheduc` together are 0

F-statistic should be at least 10 to rule out a weak instrument problem
In the case of heteroscedasticity, the F-statistic should be at least 20 to rule out a weak instrument problem

lht(rel12, c("fatheduc=0", "motheduc=0"))

      
      Linear hypothesis test:
      fatheduc = 0
      motheduc = 0
      
      Model 1: restricted model
      Model 2: educ ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa + 
          south
      
        Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
      1   2214 8594.3                                  
      2   2212 7704.2  2    890.12 127.78 < 2.2e-16 ***
      ---
      Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The F-test (F > 100) clearly rules out that fatheduc and motheduc together are weak instruments

The t-statistic for nearc4 is about 4, which is suspicious in the case of heteroscedasticity. We therefore additionally test for heteroscedasticity – applying the Breusch-Pagan test, see Section 6.3.

bptest(rel3)

      
        studentized Breusch-Pagan test
      
      data:  rel3
      BP = 92.185, df = 6, p-value < 2.2e-16

The Bresch-Pagan test overwhelmingly reject homoscedasticity, hence nearc4 might be only a weak instrument

Another example

In this example we show what devastating effects a weak instrument can have.

We want to investigate the relationship between the weight of newborns (bwght) and smoking (packs)
However, we suppose some common relationships between unobserved genetic factors (in u) for bwght and smoking
Thus, besides OLS, we try an IV estimator

library(wooldridge); data("bwght")

# OLS estimation 
bwols <- lm(bwght ~ packs, data=bwght)
summary(bwols)

      
      Call:
      lm(formula = bwght ~ packs, data = bwght)
      
      Residuals:
          Min      1Q  Median      3Q     Max 
      -96.772 -11.772   0.297  13.228 151.228 
      
      Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
      (Intercept) 119.7719     0.5723 209.267  < 2e-16 ***
      packs       -10.2754     1.8098  -5.678 1.66e-08 ***
      ---
      Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
      
      Residual standard error: 20.13 on 1386 degrees of freedom
      Multiple R-squared:  0.02273, Adjusted R-squared:  0.02202 
      F-statistic: 32.24 on 1 and 1386 DF,  p-value: 1.662e-08

The variable packs has the expected negative sign
However, we conjecture that packs might be endogenous (this is a choice variable, which are always suspicious for endogeneity)
So, we need an instrument for smoking (packs)
- We take cigprice (for prices of cigarettes) as instrument
  1. cigprice should have no direct effect on bwght (conditional on packs) - exclusion restriction
  2. cigprice should be correlated with packs - some relevance
  3. Furthermore, cigprice is for sure unrelated to unobserved individual genetic factors contained in u - exogenous

The corresponding first stage regression is

# First stage regression
first <-  lm(packs ~ cigprice, data=bwght)
coeftest(first)

      
      t test of coefficients:
      
                    Estimate Std. Error t value Pr(>|t|)
      (Intercept) 0.06742568 0.10253837  0.6576   0.5109
      cigprice    0.00028288 0.00078297  0.3613   0.7179

As we see, the t-statistic is very low indicating that we should not use ciprice as instrument for packs ==> no relevance

But what happens if we do it nonetheless?

# IV estimation 
second <- ivreg(bwght ~ packs | cigprice, data=bwght)
summary(second)

      
      Call:
      ivreg(formula = bwght ~ packs | cigprice, data = bwght)
      
      Residuals:
          Min      1Q  Median      3Q     Max 
      -856.32   15.35   33.35   47.35  188.35 
      
      Coefficients:
                  Estimate Std. Error t value Pr(>|t|)
      (Intercept)    82.65     104.63   0.790     0.43
      packs         345.47    1002.19   0.345     0.73
      
      Residual standard error: 108.2 on 1386 degrees of freedom
      Multiple R-Squared: -27.22,   Adjusted R-squared: -27.24 
      Wald test: 0.1188 on 1 and 1386 DF,  p-value: 0.7304

As we see, we have an unexpected sign with an absurd high estimate for packs and an extreme high standard error for packs (very low t-value). (Note, the R2 has no natural interpretation for IV/2SLS estimations as we have no orthogonality property with these estimators)
Apparently, IV estimates with very weak instruments can yield much more unreliable results than OLS

7.7 Testing for endogeneity - Hausman-Wu test

Sometimes, it is not clear whether some regressors are correlated with the error term u_i, i.e., if they are actually endogenous, and whether we need an IV estimator. Thus, a test for the appropriateness of OLS is desirable.
A practical problem is that u_i is not observable and that the observed OLS residuals \hat u_i are always uncorrelated with all regressors; orthogonality property of OLS: \mathbf X' \hat {\mathbf u} = \mathbf 0
The Hausman test provides the following test idea for the problem at hand:
- If there is no endogeneity problem in the model, OLS is consistent (and efficient), but the same is also true for IV/2SLS estimators with regard to consistency (but obviously not for efficiency). Therefore, in this case, the parameter estimates of both procedures should converge to the same true parameter values
- If, on the contrary, there is an endogeneity problem, only IV/2SLS would be consistent
- Thus, under the null hypothesis of no endogeneity problem, the OLS estimator \boldsymbol \beta_{OLS} and the IV/2SLS estimator \boldsymbol \beta_{IV} should not differ too much (only because of sampling errors)
- If, on the other hand, the H_0 is false, we would expect that the two estimators differ more than sampling errors would suggest

A natural test, Hausman test, would therefore look whether the difference between the two estimators is too large. This test can be cast in terms of a usual Wald statistic, (compare Equation 3.13)

W = \mathbf {d}' \{ { \operatorname{Var}}(\mathbf{d}) \}^{-1} \mathbf{d} , \ \ \ \text{ with } \ \mathbf d = (\hat {\boldsymbol \beta}_{OLS} - \hat {\boldsymbol \beta}_{IV})

However, the difficulty with this test statistic is that the covariance matrix of \mathbf d, { \operatorname{Var}}(\mathbf{d}), which, asymptotically, can be shown to be \left[ \operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) - { \operatorname{Var}}(\hat {\boldsymbol \beta}_{OLS}) \right], is not of full rank and thus has no inverse (you would have to rely on a generalized inverse)
Fortunately, there is a much more simpler and equivalent variant of this test, the Hausman-Wu test. This test is a two step procedure (but not 2SLS, of course):
1. We estimate the usual first stage (reduced form) regression. The endogenous rhs variable is \tilde x_j
\tilde {x}_j = \underbrace {\hat \gamma_0 + \hat \gamma_1 x_1 + \cdots + \hat \gamma_{l} x_{k-1} \, + \, \hat \gamma_{l+1}z_1 + \cdots + \hat \gamma_{l+m}z_m}_{\hat {x}_j} + \hat e \tag{7.43}
1. In the second stage we estimate the original model, Equation 7.8, by OLS, but with the errors \hat e of the first equation as an additional variable, and test whether the coefficient of \hat e, \, \hat \delta, is zero

y = \beta_0 + \beta_1 x_1 + \cdots + \beta_j \tilde x_j + \textcolor {red} {\delta \hat e} + u \tag{7.44}

Why does the Hausman-Wu procedure work?
- In the first stage regression, Equation 7.43, the endogenous variable \tilde x_j is regressed only on exogenous variables, so \hat x_j is uncorrelated with u by assumption. Thereby, \tilde x_j is decomposed in an exogenous part, \hat x_j, and in a possibly endogenous part, \hat e
  - Thus, if \tilde x_j is correlated with the error u of the original equation, this correlation must be due to the the residuals of Equation 7.43, \hat e
  - With other words, \tilde x_j is endogenous (correlated with u of the original model) if and only if the residuals \hat e of the first stage regression are correlated with u
- Therfore, we can test for endogeneity of \tilde x, by adding \hat e to the original model. If we cannot reject the H_0: \delta=0, then there is no convincing evidence for an endogeneity problem and we should use OLS – as OLS is much more efficient. Otherwise we need IV/2SLS
This test also works, if we have more than one rhs endogenous variable. In this case, we simply estimate a reduced form equation like Equation 7.43 for every endogenous variable and plugging the residuals of all these equations into Equation 7.44 as additional variables. Then, we use an F-test to test whether all these added residuals are jointly insignificant
Note, if \delta = 0, the estimated coefficients of Equation 7.44 are exactly the same as the OLS estimates of the original model
If \delta \neq 0, the estimated coefficients of Equation 7.44 are exactly the same as the 2SLS estimates of the original model, which is not so obvious

Formal analysis and proof of the last two statements

To show the logic of this test more formally, we state the original model, with \tilde {x} denoting the possible rhs endogenous variable and \mathbf x the vector of exogenous explanatory variables

y = \tilde {x}\beta_1 + \mathbf x \boldsymbol \beta_2 + u \tag{7.45}

As \tilde {x} is possible endogenous, we apply 2SLS
- The first step regression with the vector of exogenous external instruments \mathbf z is
\tilde {x} = \underbrace {\mathbf x \hat {\boldsymbol \gamma}_1 + \mathbf z \hat {\boldsymbol \gamma}_2}_{\hat x} + \hat e \tag{7.46}
- Substituting Equation 7.46 into Equation 7.45, we reach to the second step regression. Applying OLS to Equation 7.47 estimates \beta_1 and \boldsymbol \beta_2 consistently as 2SLS y = \hat { x} \hat {\beta}_{1_{2SLS}} + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} + \hat v \tag{7.47}
- Note that v = u + \beta_1 \hat e
Now the trick; we add the residuals of the first stage regression, Equation 7.46, \hat e, to the second stage regression, Equation 7.47, as additional variable. But by construction (orthogonality property of OLS), \hat e is orthogonal (uncorrelated) to every explanatory variable in Equation 7.47. Thus, adding \hat e as additional variable in Equation 7.47 does not alter the 2SLS estimates of \beta_1 and \boldsymbol \beta_2 in Equation 7.47 (adding orthogonal regressors doesn’t change the the estimated parameters, compare Section 2.9.2)

y = \hat {x} \hat {\beta}_{1_{2SLS}} + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} + \hat \alpha \hat e + \hat v \tag{7.48}
- Side note: As v = u + \beta_1 \hat e, \hat \alpha should converge to \beta_1, if \hat e and u are uncorrelated. Therefore, another variant of the Hausman-Wu test would be to test the equality of \hat \alpha and \hat \beta_1 in Equation 7.48

Finally, we exploit the identity \tilde x = \hat x + \hat e form the first stage regression, Equation 7.46, to reparameterize Equation 7.48: We replace \hat x by (\tilde x - \hat e) in Equation 7.48 and arrive to

y = (\tilde x - \hat e) \hat {\beta}_{1_{2SLS}} + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} + \hat \alpha \hat e + \hat v \ \ \ \Rightarrow

y = \tilde x \hat {\beta}_{1_{2SLS}} + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} + \underbrace {(\hat \alpha - \hat \beta_{1_{2SLS}})}_{\hat \delta} \hat e + \hat v \tag{7.49}

Note, as this is only reparameterization (no additional or lost information), the OLS estimates of Equation 7.49 remain unaffected

Equation 7.49 is the same as Equation 7.44, the equation for the Hausman-Wu test.

Pay attention that the OLS estimates of this equation actually deliver the 2SLS estimates of \beta_1 and \boldsymbol \beta_2
Furthermore, Equation 7.49 is basically an OLS model, as the variable in question, \tilde x, is included in its (questionable) original form
- If the estimated coefficient \hat \delta of \hat e is zero, i.e., \hat \alpha = \hat \beta_{1_{2SLS}}, we actually get the OLS estimates of \beta_1 and \boldsymbol \beta_2, i.e., 2SLS is not necessary. And this is the substance of the Hausman-Wu test!
The formulation in Equation 7.49 therefore shows the equivalence of the Hausman-Wu test with the original Hausman test:
- The more important the term \hat e in Equation 7.49, the more the 2SLS estimates will differ from the OLS estimates
- And this was the original test idea of the Hausman test

Final Remark: If we use \hat x instead of \hat e in Equation 7.44 and Equation 7.49, we will end up with the very same test results and estimates

y = \tilde x (\hat \beta_{1_{2SLS}} + \hat \delta) + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} - \underbrace {(\hat \alpha - \hat \beta_{1_{2SLS}})}_{\hat \delta} \hat x + \hat v \tag{7.50}

7.8 Testing overidentifying restrictions – Sargan test

At the beginning of this chapter we claimed that we have to assume the exogeneity of the instruments and that it is not possible to test whether they are valid, i.e., uncorrelated with u from the main equation. That is true for exactly (just) identified models

If we have more external instruments than needed to identify the model, the model is overidentified. This can improve efficiency but moreover can be exploited for a test of the validity of the instruments
Such a test for the validity of the instruments is the Sargan test or Hansen’s J -test
The main idea of the Sargan test is as following:
- If we have more external instrument than needed for identification, we can calculate several different 2SLS estimates of \boldsymbol \beta using different set of instruments
- If all the instruments are valid, the different 2SLS estimates of \boldsymbol \beta would all be consistent and converge to the same true values of \boldsymbol \beta
- Hence, for a specific sample, the difference of the different estimates should not be larger than expected from sampling errors. If they do, there is something wrong with the instruments
It can be shown that this test can be carried out with a quite simple auxiliary equation approach described below

The starting point is the second step regression (2SLS) of a structural equation like the following

y = \mathbf x \hat {\boldsymbol \beta}_{_{2SLS}} + \hat {x}_j \hat {\beta}_{j_{2SLS}} + \underbrace {\widehat {(u + \beta_j \hat e)} }_{\hat v} \tag{7.51}

Thereby, \hat x_j was obtained by a first step regression of the endogenous \tilde x_j on the exogenous model variables \mathbf x (internal instruments) and on several external instruments \mathbf z

\tilde x_j = \mathbf x \hat{\boldsymbol \gamma}_1 + \mathbf z \hat{\boldsymbol \gamma}_2 + \hat e \tag{7.52}

We presuppose that \hat x_j was estimated in this first step with more than one external instrument. Hence, the model is overidentified
Subsequently, we estimate the following auxiliary equation, which is the Sargan test equation

\hat u = \mathbf x \boldsymbol \delta_1 + \mathbf z \boldsymbol \delta_2 + \epsilon \tag{7.53}
- Note that we use the 2SLS residuals calculated by Equation 7.13 and not \hat v from Equation 7.51
Now we test whether the 2SLS residuals \hat u are actually uncorrelated with all exogenous variables, in particular with the external instruments \mathbf z
- If \mathbf x and \mathbf z are actually exogenous, the fit of Equation 7.53, measured with by n \cdot R^2, should be zero (besides sampling errors)
- The test statistic n \cdot R^2 of this equation is \chi^2(m) distributed, m being the number of overidentifying restrictions, i.e., the number of external instruments minus the number of rhs endogenous variables
If we reject the H_0 (all instruments are valid), then at least one external instrument is not valid, i.e., not exogenous and therfore erroneously excluded from the main equation

Why does this test only work for overidentified models?

This test only works if we have more external instruments than rhs endogenous variables

Suppose not and we have one rhs endogenous variable and only one external instrument z_1
- Then \hat x from the first step equation is (besides \mathbf x) simply a multiple to the one external instrument z_1
- In this case one can show that z_1 is orthogonal to \hat u, the 2SLS residuals (conditional on \mathbf x); for a proof, see footnote ⁵
  - As the exogenous variables in \mathbf x are also orthogonal to \hat u, it follows that all parameters \boldsymbol \delta of the test Equation 7.53 are zero and the R^2 is always zero as well. Hence, a test for correlation with \hat u is not possible in this case
If we have more external instruments than rhs endogenous variables, than \hat x is not a simple multiple of one instrument but equals a particular linear combination of several z_j. Thus, the single z_j are not automatically orthogonal to \hat u and therefore, a test for correlation with \hat u is possible ⁶

Rejecting the H_0 does not tell us which external instrument is invalid
- However, if we have at least more than one overidentifying restriction, we can infer whether certain subgroups of instruments are valid. In this case, we estimate the model with only a subgroup of external instruments (in which we have more confidence) and calculate the Saragn test statistic for this subgoup
- Afterwards, we estimate the model with all external instruments and calculate the Sargan test statistic, which is usually larger than the previous one. The difference of these two test statistics (which is \chi^2(m-m_1)) should not differ too much. If they do, the compliment set of the trustworthy subgroup is invalid; – J-Diff-test
A problem with the Sargan test is that the power of the test (see Section 3.4) could be quite low, especially if the external instruments have a common source or are highly correlated
- For instance, the 2SLS estimates obtained by two different instruments could be very similar, even if both instruments are invalid
- Therefore, if we are not able to reject the H_0 of the Sargan test, we should not rely too much on this result, especially if the external instruments are highly correlated

In the following more formal section we will show more clearly the connection of this test with overidentifying restrictions and introduce General Methods of Moments (GMM) estimators

7.8.1 GMM estimators

GMM estimators, overidentifying restrictions and the Saragn test

At the heart of IV/2SLS estimation is the assumption that the instruments (internal + external) \mathbf z are exogenous. This assumption can be cast in so called moment restrictions:

E(\mathbf z_i' u_i) = \mathbf 0 \ = \ \left[ \begin{array} {c} E(z_{1,i} u_i)=0 \\ E(z_{2,i}u_i)=0 \\ \vdots \\ E(z_{k,1}u_i)=0 \end{array} \right] \tag{7.54}

First, we presuppose that \mathbf z hat k elements, i.e., we have as many instruments than variables in the structural model we want to estimate, so that the model is exactly (just) identified
Replacing this theoretical (population) moments by its sample counterparts we get

\frac{1}{n}\sum_{i=1}^n {\mathbf z}_i' \hat u_i = \mathbf 0 \ = \ \left[ \begin{array} {c} \frac{1}{n}\sum_{i=1}^n z_{1,i} \hat u_i =0 \\ \frac{1}{n}\sum_{i=1}^n z_{2,i} \hat u_i =0\\ \vdots \\ \frac{1}{n}\sum_{i=1}^n z_{k,i} \hat u_i = 0 \end{array} \right] \ = \ \frac{1}{n} \mathbf Z' \hat {\mathbf u} \ = \ \mathbf 0 \tag{7.55}

Substituting the structural model \mathbf y - \mathbf X \hat {\boldsymbol \beta} for \hat {\mathbf u} we get

\frac{1}{n}\sum_{i=1}^n \mathbf z_i' (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \ = \ \left[ \begin{array} {c} \frac{1}{n}\sum\nolimits _{i=1}^n z_{1,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{2,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \vdots \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{k,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \end{array} \right] \ = \ \frac{1}{n} \mathbf Z' (\mathbf y - \mathbf X \hat {\boldsymbol \beta} ) \ = \ \mathbf 0 \tag{7.56}

So, this clearly shows that we have k equations to determine the k parameters \beta_1,\ldots,\beta_k (without loss for generality, we assume demeaned variables, so we need no intercept \beta_0)
Multiplying out Equation 7.56 and solving for \hat {\boldsymbol \beta} we finally arrive to the (ordinary) IV estimator from Equation 7.32 (Section 7.5.2) as a methods of moment estimator

\frac{1}{n}\mathbf Z' \mathbf y = \frac{1}{n}\mathbf Z' \mathbf X \hat{\boldsymbol \beta} \ \ \Rightarrow \ \ \hat {\boldsymbol \beta}_{IV} = (\mathbf Z' \mathbf X )^{-1}\mathbf Z' \mathbf y \tag{7.57}

The analysis above describes the case for an exactly (just) identified model as we have k equations in k parameters (number of rhs endogenous variables equals the number of external instruments)

But what happens if the model is overidentified, meaning that we have more instruments as variables?

In this case \mathbf z_i has k+m elements, m being the number of overidentifying restrictions. Hence, we have more equations than parameters in \boldsymbol \beta – the model is overdetermined

\frac{1}{n}\sum_{i=1}^n \mathbf z_i' (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \ = \ \left[ \begin{array} {c} \frac{1}{n}\sum\nolimits _{i=1}^n z_{1,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{2,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \vdots \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{k,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \vdots \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{k+m,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \end{array} \right] \ = \ \frac{1}{n} \mathbf Z' (\mathbf y - \mathbf X \hat {\boldsymbol \beta}) = \ \mathbf 0 \tag{7.58}

We have m excess equations
Multiplying out Equation 7.58 we once again get

\frac{1}{n}\mathbf Z' \mathbf y = \frac{1}{n}\mathbf Z' \mathbf X \hat {\boldsymbol \beta} \tag{7.59}

But this time, \mathbf Z' \mathbf X is a \left( (k+m), k \right) matrix and thus, not quadratic any more! Therefore, a usual inverse of this matrix does not exists and so we cannot solve this system for {\hat {\boldsymbol \beta}}

Because we have m excess equations, basic mathematics knowledge tells us that there exists no \hat {\boldsymbol \beta} with k elements so that k+m linear independent equations can be jointly satisfied
So, how to solve for \hat {\boldsymbol \beta} in this case?
- The key idea for this problem is to search for a \hat {\boldsymbol \beta} which approximately satisfies the k+m linear equations in a best manner, i.e., a weighted sum of the squared moment restrictions from Equation 7.58 should be as close as possible to zero, but not exactly zero. This procedure is called Generalized Methods of Moment (GMM)
- Hence, to estimate {\boldsymbol \beta}, we minimize a weighted sum of the squared sample moments, with weights given by a positive definite and symmetric weighting matrix \mathbf W. We call the resulting quadratic form J
  
  \underset{\hat{\beta}}{\operatorname {min}} \ J := \ n \cdot \frac {1}{n}(\mathbf y - \mathbf X \hat {\boldsymbol \beta})'\mathbf Z \, \mathbf W \, \mathbf Z' (\mathbf y - \mathbf X \hat {\boldsymbol \beta})\frac {1}{n} \tag{7.60}
  
  Remark: We multiply the quadratic form by n for testing purposes as otherwise J would converge to zero (but this plays no role for optimization)
  (Remark: Multiplying Equation 7.59 with a left sided pseudo inverse of \mathbf Z' \mathbf X is equivalent to minimize J with \mathbf W = \mathbf I – we would have an unweighted sum of the sample moments in this case)
The solution to this minimization problem is obtained by setting the first derivative of J with respect to \hat {\boldsymbol \beta} to zero and solving the resulting matrix equation for \hat {\boldsymbol \beta}:

\hat {\boldsymbol \beta}_{GMM} = (\mathbf X' \mathbf Z \mathbf W \mathbf Z' \mathbf X)^{-1} \mathbf X' \mathbf Z \mathbf W \mathbf Z' \mathbf y \tag{7.61}

Generalized Method of Moments estimator are generally consistent. And this particular estimator \hat {\boldsymbol \beta}_{GMM} is consistent, regardless of our choice of \mathbf W (if the moment restrictions from Equation 7.58 are true, of course)
But we want an optimal weighting matrix to minimize the variance of the estimated parameters – we want an efficient estimator
It turns out that this optimal weighting matrix is proportional to an estimate of the inverse of the asymptotic covariance matrix of the moment restrictions. For homoskedastic errors this is

\mathbf W = [E(\mathbf z_i' u_i u_i' \mathbf z_i)]^{-1} = [E_z(E(u_i^2\mathbf z_i' \mathbf z_i \, | \, \mathbf z_i))]^{-1} = [{\sigma^2} E_z(\mathbf z_i' \mathbf z_i)]^{-1} \ \ \Rightarrow

\widehat{\mathbf W} = \left( {\hat\sigma^2} \frac{1}{n} \sum_{i=1}^n \mathbf z_i' \mathbf z_i\right)^{-1} = \frac {n}{\hat \sigma^2} (\mathbf Z' \mathbf Z)^{-1} \tag{7.62}

Plugging \widehat {\mathbf W} into Equation 7.61 we get the efficient GMM estimator for homoscedastic errors

\hat {\boldsymbol \beta}_{EGMM} = (\mathbf X' \mathbf Z (\mathbf Z' \mathbf Z)^{-1} \mathbf Z' \mathbf X)^{-1} \mathbf X' \mathbf Z (\mathbf Z' \mathbf Z)^{-1} \mathbf Z' \mathbf y \tag{7.63}

This is the 2SLS estimator from Equation 7.30 (Section 7.5.2). Therefore, 2SLS is the efficient GMM estimator for homoscedastic errors

All results so far are resting on the truth of the assumptions E(\mathbf z_i' u_i)=\mathbf 0
However, these assumptions can be tested if we have overidentifying restrictions, i.e., more equations than variables in the system of Equation 7.58
This test is based on J, with the optimal weighting matrix \hat {\mathbf W} for homoscedastic errors plugged in

\hat J = \dfrac {(\mathbf y - \mathbf X \hat{\boldsymbol \beta})' \mathbf Z \, (\mathbf Z' \mathbf Z)^{-1} \, \mathbf Z' (\mathbf y - \mathbf X \hat {\boldsymbol \beta}) } {\hat \sigma^2}

If the model is just identified, \hat {\boldsymbol \beta} is solving the system equations of Equation 7.56 exactly and \hat J is always zero – no test possible
If we have more equations than variables, the system of equations can only be approximately solved by \hat {\boldsymbol \beta}. This is even the case, if all moment restrictions E(\mathbf z_i' u_i)=\mathbf 0 are true. However, the larger \hat J, the more likely it is that for some or even all moment restrictions E(\mathbf z_i' u_i)\ne\mathbf 0, i.e., are not true
Hence, a test procedure for the validity of the overidentifying restrictions (and thus for the validity of the instruments), is to look whether \hat J is too large – larger than sampling errors would suggest

Substituting for (\mathbf y - \mathbf X \hat {\boldsymbol \beta}) = \hat {\mathbf u} in Equation 7.60 we get

\hat J \, = \, \dfrac {\hat {\mathbf u}' \mathbf Z \, (\mathbf Z' \mathbf Z)^{-1} \, \mathbf Z' \hat {\mathbf u} } {\hat \sigma^2} \, = \, \dfrac {\hat {\mathbf u}' \mathbf P_{\mathbf Z} \, \hat {\mathbf u} } {\hat \sigma^2} \, = \, n \dfrac { ( \mathbf P_{\mathbf Z} \hat {\mathbf u})' (\mathbf P_{\mathbf Z} \, \hat {\mathbf u}) } {\hat {\mathbf u}' \hat {\mathbf u}} \tag{7.64}

Here, \mathbf P_{\mathbf Z} is the projection matrix (Hat matrix, see Equation C.11), which is idempotent and projects \hat {\mathbf u} into the linear subspace of \mathbb{R}^n spanned by the columns of \mathbf Z
Hence, \mathbf P_{\mathbf Z} \hat {\mathbf u} are the predicted values of a regression of the 2SLS-residuals \hat {\mathbf u} on all instruments in \mathbf Z (compare the Sargan test, Equation 7.53. Therefore, the numerator is the sum of the squares (the scalar product) of these predicted values, i.e., the SSE of this regression (Equation 7.53)
The denominator is the sum of the squared 2SLS-residuals, hence SST of \hat {\mathbf u}

Thus, \hat J is n times the R^2 of a regression of the squared 2SLS-residuals on all instruments; remember, R^2 := \frac {SSE}{SST}.
If n \cdot R^2 is too large, at least one (or even all) instrument is not exogenous

This J-statistic can be shown to be asymptotically \chi^2(m) distributed, with m being the number of overidentifying restrictions and is identical to the Sargan test procedure described in the text above (see Equation 7.53)

7.9 Summary

With IV/2SLS estimation techniques we can handle the problem of endogenous rhs variables, which is a widespread phenomenon
The drawback of IV/2SLS estimates are the generally much larger standard errors of the estimated parameters
- Therefore, one should always use OLS if this is justifiable. The Hausman-Wu test can give some indication for this matter
The drawbacks of IV/2SLS are particularly present if we have only weak instruments. Thus, a test for weak instruments as described above is mandatory for a credible analysis
Furthermore, if we have an overidentified model, a Sargan test (J-test) is mandatory as well; the whole IV/2SLS procedure is grounded on valid instruments. If only one instrument is not valid, the entire analysis breaks down
Corrections for heteroskedasticity / serial correlation analogous to OLS and IV/2SLS easily extends to time series and panel data situations
In the following example, we once again estimate our wage equation by 2SLS (as we did at the begin of this chapter), but this time we additionally carry out the diagnostic tests described above. Fortunately, the R procedure ivreg does most of the work for us

7.10 Example – 2SLS with diagnostics

2SLS estimation with father and mother education as instrument for education
We have two external instruments, thus the model is overidentified

library(wooldridge); library(AER); library(texreg)
data("card")

# Complication: Because of missing values in `fatheduc` and `motheduc`,
# which would make some problems when we carry out some test by hand

# We generate a new data set `card1` with missing values of `fatheduc` and `motheduc` excluded 
card1 <-  subset(card, card$fatheduc>-1 & card$motheduc>-1)


# 2SLS estimation
iv12 <-  ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | 
                fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, 
               data=card1)


# Saving 2SLS residuals
resid_iv <- iv12$residuals

# To get the three described diagnostic tests for 2SLS, we have to set  
# the option "diagnostics=TRUE" 
# summary(iv12, diagnostics = TRUE)

Code

# Modifications for modelsummery to print Diagnostic statistics for ivreg

library(broom)
glance_custom.ivreg <- function(x, ...) {
  Dia <- " " 
  WI <- paste0( sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[1,3], 2 ), fmt = '%4.2f'), " [", 
                sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[1,4], 3 ), fmt = '%4.3f'), "]" )
  WU <- paste0( sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[2,3], 2 ), fmt = '%4.2f'), " [", 
                sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[2,4], 3 ), fmt = '%4.3f'), "]" )
  SA <- paste0( sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[3,3], 2 ), fmt = '%4.2f'), " [", 
                sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[3,4], 3 ), fmt = '%4.3f'), "]" )
  out <- data.frame( "Diagnostics" = Dia,
                     "Weak Instr" = WI, 
                     "Hausman WU" = WU,  
                     "Sargan" = SA )
  return(out)
}

Code

#summary(iv12, diagnostics = TRUE)

library(modelsummary)
modelsummary(list( "2SLS" = iv12 ),
             shape =  term ~ statistic,
             statistic = c('std.error', 'statistic', 'p.value', 'conf.int'), 
             stars = TRUE, 
             gof_omit = "A|L|B|F",
             align = "ldddddd",
             fmt= 4,
             output = "gt")

Table 7.3:
IV estimates of a wage equation using father and mother education as instruments and showing important diagnostic statistics
	2SLS
	Est.	S.E.	t	p	2.5 %	97.5 %
(Intercept)	4.2642***	0.2189	19.4792	<1e-04	3.8349	4.6934
educ	0.0999***	0.0128	7.8341	<1e-04	0.0749	0.1249
exper	0.0989***	0.0095	10.3954	<1e-04	0.0802	0.1175
I(exper^2)	-0.0024***	0.0004	-6.1028	<1e-04	-0.0032	-0.0017
black	-0.1506***	0.0260	-5.8009	<1e-04	-0.2015	-0.0997
smsa	0.1509***	0.0196	7.6933	<1e-04	0.1125	0.1894
south	-0.1073***	0.0181	-5.9364	<1e-04	-0.1427	-0.0718
Num.Obs.	2220
R2	0.253
RMSE	0.38
Diagnostics
Weak.Instr	127.78 [0.000]
Hausman.WU	3.97 [0.047]
Sargan	2.05 [0.152]
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

Test for weak instruments by hand

# First stage regression
first <- lm(educ ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, data = card1)

# Testing whether coefficients of external instruments are jointly zero
lht(first, c("motheduc", "fatheduc"))

      
      Linear hypothesis test:
      motheduc = 0
      fatheduc = 0
      
      Model 1: restricted model
      Model 2: educ ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa + 
          south
      
        Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
      1   2214 8594.3                                  
      2   2212 7704.2  2    890.12 127.78 < 2.2e-16 ***
      ---
      Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Doing the Hausman-Wu test by hand

# We need the residual of first stage regression of educ on all exoegenous variables 

resid1 <- first$residuals 

# Regressing the model of interest with residuals of the first stage regression 
# as additional variable

# Hausman-Wu test; Look at the p-value of resid1
# Further, compare estimated coefficients with the 2SLS estimates; they are are identical 
 
Hausman_Wu <- lm(lwage ~ educ + exper + I(exper^2) + black + smsa + south + 
                   resid1, data = card1)

Code

library(modelsummary)
modelsummary(list( "Hausman_Wu" = Hausman_Wu ),
             shape =  term ~ statistic,
             statistic = c('std.error', 'statistic', 'p.value', 'conf.int'), 
             stars = TRUE, 
             gof_omit = "A|L|B|F",
             align = "ldddddd",
             fmt= 4,
             output = "gt")

Table 7.4:
Hausman-Wu test for wage equation above. `resid1` are the residuals of the first stage regression. Note, the coefficients of this eqaution are identical to the 2SLS estimates
	Hausman_Wu
	Est.	S.E.	t	p	2.5 %	97.5 %
(Intercept)	4.2642***	0.2171	19.6427	<1e-04	3.8384	4.6899
educ	0.0999***	0.0126	7.8998	<1e-04	0.0751	0.1247
exper	0.0989***	0.0094	10.4826	<1e-04	0.0804	0.1174
I(exper^2)	-0.0024***	0.0004	-6.1540	<1e-04	-0.0032	-0.0017
black	-0.1506***	0.0257	-5.8496	<1e-04	-0.2011	-0.1001
smsa	0.1509***	0.0195	7.7578	<1e-04	0.1128	0.1891
south	-0.1073***	0.0179	-5.9862	<1e-04	-0.1424	-0.0721
resid1	-0.0266*	0.0134	-1.9915	0.0465	-0.0528	-0.0004
Num.Obs.	2220
R2	0.266
RMSE	0.38
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

Sargan test by hand

```{r}
#| comment: "     "

# Regression of IV residuals on all exogenous variables
sargan <- lm(resid_iv ~ fatheduc + motheduc + 
               exper + I(exper^2) + black + smsa + south, 
             data = card1)

# Test statistic: J = n * R^2
J <- length( sargan$residuals ) * summary(sargan)$r.squared


print("Result of Sargan test")
print( paste( "J-stat =", sprintf( "%.3f",J ), "   p-value =", sprintf( "%.4f",1-pchisq(J,1) ) ) )
```

      [1] "Result of Sargan test"
      [1] "J-stat = 2.051    p-value = 0.1522"

In principle, every variable not part of a correctly specified structural equation that is uncorrelated with the error term u (condition 1. is met), could serve as an external instrument, especially if condition 3. is met. So, strictly speaking, condition 2. is redundant but nonetheless helpful for the distinction of external and internal instruments and to understand the basic problem of finding external instruments.↩︎
If the strong requirements for proxy variables are satisfied, see Equation 2.47 and the following analysis.↩︎
In particular, OLS predicted values \hat y = \mathbf Py and OLS residuals \hat u = \mathbf My are always uncorrelated because \mathbf P and \mathbf M are orthogonal matrices, see Equation C.10 and Equation C.11.↩︎
This is not the case if the 2SLS estimates are calculated by hand, like described in text following Equation 7.9. The reason is that in this case in the second stage regression the exogenous variables remain unchanged as regressors are not replaced by their linear projections. This can lead to a correlation of the exogenous variable, which was left out in the first stage regression, with the residuals of the first stage regression, violating MLR.4’ in Equation 7.12.↩︎
Note that generally, the 2SLS residual do not retain the orthogonality property from their OLS counterparts, which makes the argument in the text considerably more complicated to proof.
First of all, we have to distinguish the residuals of Equation 7.51, \hat v from the 2SLS residuals:
The former a are defined as \hat v = y-\hat {\boldsymbol \beta} \mathbf x - \hat x_j \hat \beta_j and the latter are \hat u = y-\hat {\boldsymbol \beta} \mathbf x - \tilde x_j \hat \beta_j. Substituting \tilde x_j = \hat x_j +\hat e from the first stage regression we get: \hat u = y-\hat {\boldsymbol \beta} \mathbf x - ( \hat x_j +\hat e) \hat \beta_j \; = \; y-\hat {\boldsymbol \beta} \mathbf x - \hat x_j \beta_j - \hat e \hat \beta_j \; \Rightarrow \; \hat u = (\hat v - \hat e \hat \beta_j), which is not obvious.
Secondly, we proof that \mathbf x is uncorrelated with \hat u = \hat v - \beta_j \hat e:
Because of the orthogonality property of OLS it follows from the first stage regression that \mathbf x is uncorrelated with the first stage residuals \hat e. From the second stage regression Equation 7.51 it follows that \mathbf x is uncorrelated with \hat v as well. Thus, \mathbf x is uncorrelated with \hat u.
Thirdly, we proof that z_1 is uncorrelated with \hat u = \hat v - \beta_j \hat e:
If we have only one instrument z_1, \hat x_j in Equation 7.51 is: \hat x_j = \mathbf x \hat {\boldsymbol \gamma_1} + z_1 \hat \gamma_{2,1}. As \hat x_j and \mathbf x are uncorrelated with \hat v (orthogonality in Equation 7.51), z_1 must be uncorrelated with \hat v as well.
Furthermore, z_1 is uncorrelated with the first stage residuals \hat e as well, because of the orthogonality property of OLS. Hence, z_1 is uncorrelated with \hat u.
Therefore, both \mathbf x and z_1 are uncorrelated with \hat u from Equation 7.53, leading to R^2=0 in this case.↩︎
Regarding the third argument of the previous footnote we now have:
\hat x_j = \mathbf x \hat {\boldsymbol \gamma}_1 + z_1 \hat \gamma_{2,1} + z_2 \hat \gamma_{2,2}. As \hat x_j and \mathbf x are uncorrelated with \hat v (orthogonality in Equation 7.51), the linear combination z_1 \hat \gamma_{2,1} + z_2 \hat \gamma_{2,2} must be uncorrelated with \hat v as well.
Furthermore, z_1 and z_2 are uncorrelated with the first stage residuals \hat e, because of the orthogonality property of OLS. We conclude, z_1 \hat \gamma_{2,1} + z_2 \hat \gamma_{2,2} is uncorrelated with \hat u but not z_1 or z_2 for themselves.
Hence, z_1 and z_2 are generally correlated with \hat u in Equation 7.53 and so, we generally have R^2 \neq 0.↩︎

--- title-block-banner: true subtitle: "Mainly based on @wooldridge_Intro_Econometrics, Chapters 15" --- # IV Estimators {#sec-IV} ## Main Causes of the Problem ##### Endogeneity problems are endemic in social sciences/economics. Possible reasons are: - **Omitted variables**: In many cases, important variables (e.g., personal characteristics) cannot be observed and thus are part of the error term - These are often *correlated* with the observed explanatory variables leading to a violation of MLR.4' -- $E(u_i|\mathbf x_i)\neq0$, see @sec-MLR) -- which we denote as **endogeneity problem**. This problem implies *biased* and *inconsistent* estimates of the parameters - *Example*: $\log(wages_i) = \beta_0 + \beta_1 educ_i + \beta_3 exper_i + u_i$. The important variable $ability_i$ is omitted (as not easily observable) and hence part of $u_i$. But $ability_i$ is probably correlated with $educ_i$, violating MLR.4. Therefore, $\beta_1$ does not only measure the partial effect of $educ_i$ but indirectly also the effect of $ability$ - **Measurement errorin explanatory variables** may also lead to endogeneity - *Example*: $\log(wages_i) = \beta_0 + \beta_1 educ_i + \beta_3 exper_i + u_i$. Suppose, we cannot measure experience accurately, we only observe $exper_i^B = exper_i + e_i$, $e_i$ being a *measurement error*. So we have: $\log(wages_i) = \beta_0 + \beta_1 educ_i + \beta_3 exper_i^B + (u_i - \beta_3 e_i)$, with the new error term $(u_i - \beta_3 e_i)$ being correlated with the observed explanatory variable $exper_i^B$, violating MLR.4 - **Simultaneous equations, reversed causality and feedbacks** are an additional source of endogeneity - *Example*: We want to estimate a makro *consumption* function: $c_t = \beta_0 + \beta_1 y_t + u_t$, especially the *marginal propensity to consume* $\beta_1$. But we also have the VGR identity $y_t=c_t+i_t+g_t$. Thus, whenever $u_t$ is high, consumption $c_t$ is high, but income $y_t$ as well. Therefore, $u_t$ and $y_t$ are positively correlated, violating MLR.4 and leading to an overestimation of $\beta_1$; this parameter also measures $c_t \rightarrow y_t$ - **Lagged endogenous variables** as explanatory variables in connection with *autocorrelated* errors. This is important for *dynamic* models, especially for panel data - *dynamic panel bias* (not to be confused with unobserved fixed effects in panel data, which is an omitted variable problem) - **Non-random sample, self selection** ##### **Solutions to the endogeneity:** - Proxy variables method for omitted regressors - Model for selection process - Fixed effects methods if 1) panel data are available, 2) endogeneity is time-constant, and 3) regressors are not time-constant - **Instrumental variables methods** (**IV**) - IV estimators are the most prominent method to address endogeneity problems ------------------------------------------------------------------------ ## Main idea of IV estimation The main causes for endogeneity of explanatory variables discussed above are so common that nearly every empirical work is more or less affected by this problem - Assume, our model is the following with $\tilde x_i$ being **endogenous**, i.e. correlated with $u_i$, violating MLR.4' (therefore the **tilde** over $x_i$). This is the so called **structural** equation which describes the causal effect we want to estimate $$ y_i = \beta_0 + \beta_1 \tilde x_i + u_i $$ {#eq-struct} - The method of **instrumental variables** is a remedy for the endogeneity problem - The main idea is that the variables $\tilde {\mathbf x}_i$ which are **correlated** with $u_i$ are *"replaced"* in some way *with instruments* > These instruments should contain additional information (outside of @eq-struct) to help resolve the endogeneity problem, i.e., to disentangle the looked for partial effect of $\tilde {\mathbf x}_i$ from feedbacks or other sources of correlation which we discussed above - The **external instruments** (we denote them $\mathbf z_i$) have to satisfy the following **three conditions**: [^iv-estimators-1] 1. $\mathbf z_i$ have to be **(weak) exogenous**; $Cov (z_i, u_i) = 0$, see @sec-MLR 2. $\mathbf z_i$ *must not be a part* of the *structural* equation of interest -- **exclusion restrictions**.\ We need **external** instruments with additional outside information! 3. $\mathbf z_i$ have to be **relevant**; $Cov (z_i, \tilde x_i) \neq 0$, indeed, the correlation between $z_i$ and $\tilde x_i$ should be as high as possible - From the first and third requirement, we can easily drive the **IV estimator** for *one* explanatory variable and *one* instrument [^iv-estimators-1]: In principle, every variable not part of a correctly specified structural equation that is uncorrelated with the error term $u$ (condition 1. is met), could serve as an external instrument, especially if condition 3. is met. So, strictly speaking, condition 2. is redundant but nonetheless helpful for the distinction of external and internal instruments and to understand the basic problem of finding external instruments. ------------------------------------------------------------------------ ## The IV estimator - Based on $Cov (z_i, u_i) = 0$ we can derive a **method of moments estimator**. From @eq-struct, we have $u_i = y_i - \beta_0 - \beta_1 \tilde x_i$. Plug this into $Cov (z_i, u_i)$ $$ Cov \left( z_i, (y_i - \beta_0 - \beta_1 \tilde x_i) \right) \ = \ Cov(z_i,y_i) - \beta_1 Cov(z_i,\tilde x_i) \, = \, 0 \ \ \Rightarrow $$ $$ \beta_1 = \dfrac{Cov(z_i,y_i)}{Cov(z_i,\tilde x_i)} $$ {#eq-ordinary_IV} - The parameter is estimable (**identified**) because we can write down $\beta_1$ in terms of population moments which can be replaced with their empirical counterparts to reach to the *IV estimator* for $\beta_1$ $$ \hat\beta_{1,IV} = \dfrac{\frac {1}{n}\sum_i (z_i-\bar z)(y_i-\bar y)} {\frac {1}{n}\sum_i (z_i-\bar z)(\tilde x_i-\bar x)} $$ {#eq-ordinary_IV_estimator} - If every variable is well behaved, we can apply the LLN and it follows that @eq-ordinary_IV_estimator converges to @eq-ordinary_IV with an ever increasing sample size. Hence, $\hat\beta_{1,IV}$ is a **consistent estimator** for $\beta_1$, whereas the OLS estimator $$ \hat\beta_{1} = \dfrac{\frac {1}{n}\sum_i (\tilde x_i-\bar x)(y_i-\bar y)} {\frac {1}{n}\sum_i (\tilde x_i-\bar x)^2} \ $$ {#eq-ols_estimator} is not; because $\tilde x_i$ is correlated with $u_i$ by assumption ------------------------------------------------------------------------ #### How to find instruments? - The consistency of the IV estimators relies on the exogeneity of $z_i$. Unfortunately, this exogeneity cannot be tested directly (without additional information), hence we have to **assume** this -- based on **economic theory**, **common sense** or **introspection** - If we have more external instruments as needed (more than one in this example), we can test whether the instruments are exogenous as a group; this will be discussed later -- Sargan J test - In practice, the **main difficulty** with IV estimators **is to find appropriate instruments**. Let us consider our good old wage equation: $$ wage_i = \beta_0 + \beta_1 educ_i + \beta_2 exper_i + \underbrace {(ability + v_i)}_{u_i} $$ {#eq-wageequation} - We are interested in the partial effect of education on the wage. But we probably have an **omitted variable problem** as `ability` of the people is clearly important for the received wage and is not directly observable and thus, `ability` is a part of $u_i$ - However, `ability` and therefore $u_i$ are probably correlated with `educ` -- people with higher ability also tend to be more educated. But this violates MLR.4' and thus, `educ` is endogenous - So, we need at least one external instrument for `educ`, which is - not part of @eq-wageequation - is relevant - is exogenous (not correlated with the error term and thus *correctly excluded* from the main model) ------------------------------------------------------------------------ ##### There have been proposed several instruments for this matter - The *education of the mother or father* 1) No direct wage determinant 2) Correlated with education of the child because of social factors 3) Probably (?) uncorrelated with innate ability (problem: ability may be inherited from parents) - The *number of siblings* 1) No direct wage determinant 2) Correlated with education because of resource constraints in household 3) Probably uncorrelated with innate ability - *College proximity* when 16 years old 1) No direct wage determinant 2) Correlated with education because more education if lived near college 3) Uncorrelated with error (?) - *Month of birth* 1) No direct wage determinant 2) Correlated with education because of compulsory school attendance laws (in German: Schulpflicht) 3) Uncorrelated with error - In all these cases one could question the exogeneity of the proposed instrument or their relevance, or even both. However, at least the relevance can be tested - In the following, we estimate @eq-wageequation as an example with OLS and IV - Note the coefficients of the endogenous `educ` (this actually is expected to be over- and not underestimated by OLS -- maybe an additional errors in variables problem?) and the considerably larger standard errors of `educ` ------------------------------------------------------------------------ #### Example: IV versus OLS ```{r, message=FALSE} library(wooldridge); library(AER); library(texreg) data("card") # OLS ols <- lm(lwage ~ educ + exper + I(exper^2) + black + smsa + south, data=card) # IV with father education as instrument # Note, in ivreg the instruments are after "|" and you have to include all (!) # exogenous variables of the model but not the endogenous educ iv1 <- ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | fatheduc + exper + I(exper^2) + black + smsa + south, data=card) # IV with mother education as instrument iv2 <- ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | motheduc + exper + I(exper^2) + black + smsa + south, data=card) # IV with father and mother education as instruments iv12 <- ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, data=card) # IV with nearc4 (proximity to a 4 year collage) as instrument iv3 <- ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | nearc4 + exper + I(exper^2) + black + smsa + south, data=card) ``` ------------------------------------------------------------------------ ```{r} #| code-fold: true #| label: tbl-OLSvIV #| tbl-cap: "Comparison of OLS with IV estimates, using different instruments (se in brackets)" library(modelsummary) modelsummary( list("OLS"=ols, "IV-fath"=iv1, "IV-moth"=iv2, "IV-fath_moth"=iv12, "IV-nearc4"=iv3), gof_omit = "A|L|B|F", align = "lddddd", stars = TRUE, fmt = 3, output="gt") ``` ------------------------------------------------------------------------ ## The relevance of relevance - Besides the *exogeneity*, the **relevance** of the instrument, $Cov(z_i,\tilde x_i)$, plays an extreme important role. To show this we subtract from model @eq-struct the corresponding mean values and premultiply by $(z_i-\bar z)$ $$ (z_i-\bar z) (y_i-\bar y) = \beta_1 (z_i-\bar z)(\tilde x_i-\bar x) + (z_i-\bar z)u_i $$ - Taking the expectation we get $$ E[(z_i-\bar z) (y_i-\bar y)] = \beta_1 E[(z_i-\bar z)(\tilde x_i-\bar x)] + E[(z_i-\bar z)u_i] \ \ \Rightarrow $$ $$ Cov(z_i,y_i) = \beta_1 Cov(z_i,\tilde x_i) + Cov(z_i,u_i) $$ - And dividing by $Cov(z_i,\tilde x_i)$ $$ \dfrac {Cov(z_i,y_i)}{Cov(z_i,\tilde x_i)} = \beta_1 + \dfrac {Cov(z_i,u_i)}{Cov(z_i,\tilde x_i)} $$ - Replacing the theoretical moment by their empirical counterparts, we recognize that the left hand side is equal to the IV estimate of $\beta_1$, @eq-ordinary_IV_estimator. Taking the probability limits (@eq-A_slutz_mult) we arrive to $$ \operatorname {plim} \hat\beta_{1,IV} \ = \ \beta_1 + \dfrac{\operatorname {plim} \frac {1}{n}\sum_i (z_i-\bar z)u_i} {\operatorname {plim} \frac {1}{n}\sum_i (z_i-\bar z)(\tilde x_i-\bar x)} \ = \ \beta_1 + \dfrac {\operatorname {Corr}(z_i,u_i)}{\operatorname {Corr}(z_i,\tilde x_i)} \dfrac {\sigma_u}{\sigma_{\tilde x}} $$ {#eq-relevance} ------------------------------------------------------------------------ - As @eq-relevance shows, $\hat \beta_{1,IV}$ is consistent if the correlation between $z_i$ and $u_i$ is zero, $\operatorname {Corr}(z_i,u_i)=0$, i.e., $z$ is exogenous - Suppose, this correlation is not *exactly* zero, but *small*. In this case we would expect only a small asymptotic bias in $\hat \beta_{1,IV}$ - However, if *additionally*, $\operatorname {Corr}(z_i,\tilde x_i)$ is small as well -- **if** $z_i$ **is not relevant** -- the bias in the IV estimate could become *considerable large*. $\rightarrow$ **Weak instrument problem** - We can derive a relation similar to @eq-relevance for the OLS estimate; we subtract the corresponding mean values from model @eq-struct and premultiply by $(\tilde x_i-\bar x)$. This yields $$ \operatorname {plim} \hat\beta_{1,OLS} \ = \ \beta_1 + \dfrac{\operatorname {plim} \frac {1}{n}\sum_i (\tilde x_i-\bar x)u_i} {\operatorname {plim} \frac {1}{n}\sum_i (\tilde x_i-\bar x)(\tilde x_i-\bar x)} \ = \ \beta_1 + \dfrac {\operatorname {Corr}(\tilde x_i,u_i)}{\operatorname {Var}(\tilde x_i)} {\sigma_u \sigma_{\tilde x}} \ = \ $$ $$ \beta_1 + \operatorname {Corr}(\tilde x_i,u_i) \frac {\sigma_u}{\sigma_{\tilde x}} $$ {#eq-bias_ols} - Thus, besides ${\sigma_u}/{\sigma_{\tilde x}}$ (explain why) the asymptotic bias of the OLS estimates depends on $\operatorname {Corr}(\tilde x_i,u_i)$ - However, if we have a *weak instrument problem*, it is easily possible that the asymptotic bias of the IV estimate is even larger than that of the OLS estimate; $$ \frac{\operatorname{Corr}(z_i, u_i)}{\operatorname{Corr}(z_i, \tilde x_i)}>\operatorname{Corr}(\tilde x_i, u_i) \text { e.g. } \frac{0.03}{0.2}>0.1 $$ ------------------------------------------------------------------------ ## Two stage least square (2SLS) Suppose, we want to estimate a more elaborated structural model equation $$ y = \beta_0 + \beta_1 x_1 + \cdots + \beta_{l} x_{l} + \beta_{j} \tilde x_j + u $$ {#eq-struct1} with $l$ exogenous variables, $x_1, \ldots , x_l$, and one **endogenous**, $\tilde x_j$. In the introduction we argued that in the case of endogenous regressors we have to "replace" this variable by an exogenous and relevant instrument. But we were not specific what replace really means - Here, **replace is not meant literally** in the sense that we actually replace $\tilde x$ with $z$ in @eq-struct1 and then apply OLS. This would be a *Proxy variable approach*, which might be *sometimes* useful with an error in the variables or omitted variable problem [^iv-estimators-2] - As an example, if we directly replace `educ` in our wage equation @eq-wageequation by an exogenous proxy variable $z$ (for instance with `fatheduc` ), the OLS-coefficient of $z$ would generally not estimate the effect of education on earned wage, $\beta_1$, and probably introduce an additional errors in the variables problem - The **IV approach is another one**: We do not replace $\tilde x$ with $z$, but rather with the *predicted values* of a *regression* of $\tilde x$ *on all the exogenous variables of the model* ***including*** *the external instrument* $z$. We denote this predicted values with $\hat x$ and call this the **first step regression** or **reduced form regression** [^iv-estimators-2]: If the strong requirements for proxy variables are satisfied, see @eq-EiVProxy and the following analysis. $$ \tilde x_j \, = \, \gamma_0 + \gamma_1 x_1 + \cdots + \gamma_{l} x_{l} \, + \, \gamma_{l+1}z_1 + \cdots + \gamma_{l+m}z_m + e $$ {#eq-first_stage} - Here, $x_1,\ldots,x_{l}$ are the $l$ exogenous variables of the model (sometimes called **internal instruments**), for instance `exper` and `exper`$^2$ in our wage equation, and $z_1,\ldots,z_m$ are the $m$ **external instruments** - We estimate the coefficients $\gamma$ of @eq-first_stage by OLS, leading to the predicted values $\hat x_j$ $$ \tilde x_j \, = \, \underbrace{ \hat \gamma_0 + \hat \gamma_1 x_1 + \cdots + \hat \gamma_{l} x_{l} \, + \, \hat \gamma_{l+1}z_1 + \cdots + \hat \gamma_{l+m}z_m}_{\hat x_j} + \hat e \quad \Rightarrow $$ {#eq-first_stage1} $$ \tilde x_j \, = \, \hat x_j + \hat e $$ {#eq-first_stage2} ------------------------------------------------------------------------ **Remark**: The **reduced form** @eq-first_stage is the functional relationship of an endogenous variable dependent *only* on exogenous variables (the exogenous variables and error terms drive the endogenous ones -- *data generating process*) and is related to *simultaneous equation models* ------------------------------------------------------------------------ - In the **second step regression** we estimate the original model, but with $\hat x_j$ in place of $\tilde x_j$, i.e., we insert $\hat x_j + \hat e$ from @eq-first_stage2 for $\tilde x_j$ in @eq-struct1 $$ y = \beta_0 + \beta_1 x_1 + \cdots + \beta_{l} x_{l} + \beta_{j} \hat x_j + \underbrace { (u + \beta_j \hat e)}_v $$ {#eq-second_stage} - The new error $v$ is composed of two components: - The error from the structural model, $u$. This is uncorrelated with the exogenous variables $x_1,\ldots,x_{l}$ and is now also uncorrelated with $\hat x_j$, because the latter is a linear combination of $x_1,\ldots,x_{l}$ and the exogenous external instruments $z_1,\ldots,z_{m}$ from the first stage regression, @eq-first_stage1 - The residuals of the first stage regression, $\hat e$. But these pose no problems (besides the larger error variance), because the residuals are uncorrelated with all variables in @eq-second_stage by construction (orthogonality property) -- hence, no errors in the variables problem and no violation of MLR.4' [^iv-estimators-3] - > Hence, the parameters of @eq-second_stage can be consistently estimated by OLS [^iv-estimators-3]: In particular, OLS predicted values $\hat y = \mathbf Py$ and OLS residuals $\hat u = \mathbf My$ are always uncorrelated because $\mathbf P$ and $\mathbf M$ are orthogonal matrices, see @eq-C_MMat and @eq-C_PMat. **Remark**: The 2SLS *residuals*, and subsequently the *residual variance* $\hat \sigma^2$, have to computed by $$ \hat u \ = \ y - \underbrace {\hat\beta_0 + \hat\beta_1 x_1 + \cdots + \hat\beta_{l} x_{l} + \hat\beta_{j} \tilde x_j}_{\hat y} $$ {#eq-2sls_residuals} - Thereby, the 2SLS estimates of the $\beta$s are used, but with the **original** variable $\tilde x_j$ and not with $\hat x_j$. This procedure yields $\hat u_i$ from @eq-struct1 and not $\hat v_i$ from @eq-second_stage - After estimating $\sigma^2$ by @eq-sigma_hat with residuals based on @eq-2sls_residuals, all tests can be carried out in the usual way ------------------------------------------------------------------------ ##### **Example**: Suppose our wage equation is $$ wage_i \ = \ \beta_0 + \beta_1 educ_i + \beta_2 exper_i + \beta_3 exper_i^2 + u_i $$ {#eq-exampl_wage_equation} - Because of the unobserved ability (which is therefore part of $u_i$) and the fact that ability is probably correlated with education the variable education is endogenous. Therefore, we need instruments for education, which are not part of the model, are exogenous and relevant - Suppose we have **two** external instruments for education: education of the mother and education of the father. We already discussed them before - Thus, in the **first stage** regression we regress `educ` on *all exogenous variables of the model (internal instruments) and the two external instrumental variables* $$ educ_i = \gamma_0 + \gamma_1 exper_i + \gamma_2 exper_i^2 + \gamma_3 fatheduc_i + \gamma_4 motheduc_i + e_i $$ {#eq-first_stage_example} - This first stage regression yields the predicted values $\widehat {educ}$ ------------------------------------------------------------------------ - In the **second stage** we estimate the original model, but with $\widehat {educ}$ instead of $educ$ $$ wage_i \ = \ \beta_0 + \beta_1 \widehat {educ_i} + \beta_2 exper_i + \beta_3 exper_i^2 + v_i $$ {#eq-second_stage_example} - This two-step procedure yield consistent estimates for all $\beta$s; therefore the name **two stage least square** *Remark*: Usually, if we have exactly as many *external* instrument as right hand side (rhs) *endogenous* variables, we call the procedure (ordinary) IV, otherwise 2SLS. But this labeling doesn't seems to be unanimously used - **Why does this procedure work** - All variables in the second stage regression are *exogenous* because `educ` was replaced by a *prediction, only based on* ***exogenous*** *information* - By using the prediction based on exogenous information, `educ` is **purged** of its **endogenous part** (the part that is related to the error term) - Thus, only *that part of educ remains* which is *exogenous*. And this is the reason why the coefficient of $\widehat {educ}$ represents the **causal** effect of education on received wages (and not some mixtures of effects, see @sec-causal_interpretation) ------------------------------------------------------------------------ ### 2SLS -- variance of estimates - The most important **downside** of IV/2SLS estimations is that the **variance** of the IV/2SLS estimates are generally *considerably larger* than that of OLS estimates, i.e. they are *less precise* (look at @tbl-OLSvIV) - Therefore, IV/2SLS **need large samples to be useful** - Below, the formulas for the variance of OLS estimates and the formula for IV/2SLS estimates is shown $$ \operatorname {Var}(\hat \beta_{j,OLS}) \ = \ \dfrac{\sigma^2}{ \underbrace{SST_j}_{\sum_{i=1}^n (x_{ij} - \bar x_j)^2} (1-R_j^2) } $$ $$ \operatorname {Var}(\hat \beta_{j,2SLS}) \ = \ \dfrac{\sigma^2}{ \underbrace{SST_j}_{\sum_{i=1}^n (\hat x_{ij} - \bar x_j)^2} (1 - R_{j}^2) } $$ {#eq-var_2sls} - These formulas only differ in that for calculation of the latter one, the explanatory variable $x_j$ is replaced with its prediction from the first step regression, $\hat x_j$ ------------------------------------------------------------------------ ##### The variance of the IV/2SLS estimate $\beta_j$ - *increases* with the error variance $\sigma^2$ and *decreases* with sample size *n* - *decreases* with the total variation of the predicted values $\hat x_j$ - *increases* with $R_{j}^2$, which is the $R^2$ of a regression of $\hat x_j$ on all the other explanatory $x$-es The *last two points* are always (considerably) *worse* for IV/2SLS than for OLS and ***even worsens more with poor or weak instruments*** - The **error variance** $\sigma_v^2$ of the second stage regression is larger, because the error term additionally contains the first stage residuals. However, the *residuals* are *purged form this effect* if they are correctly computed by @eq-2sls_residuals - The variation of a *predicted variable*, SSE, is always *less* than the variation of the *original variable*, SST. The definition of the $R^2$ is based in this ratio SSE/SST $\leq$ 1; $\rightarrow$ **less variation of the corresponding explanatory variable** $\hat x_j$ - The $R^2$, i.e. the fit of a regression of the *predicted variable* $\hat x_j$ on all the other $x$-es is always higher than the $R^2$ of a regression of the *original variable* $x_j$ on all the other $x$-es - The reason is that in the *first stage* regression, $x_j$ is regressed on *all* exogenous variables of the model (the other $x$-s) plus the external instruments. Hence, the predicted values of this regression, $\hat x_j$, are, besides the effects of the *external* instruments, a linear function of these other $x$-es. This implies: The correlation between $\hat x_j$ and the other $x$-es is *typically much higher* than the correlation between the original $x_j$ and the other $x$-es - With other words, IV/2SLS exhibit an **inherent multicollinearity problem** ------------------------------------------------------------------------ ### Matrix notation for IV/2SLS {#sec-IV_mat} ::: {.callout-caution collapse="true" icon="false"} ## Formal analysis of IV/2SLS estimation in matrix notation We have the **structural model**:\ (Let's assume that all variables are demeaned, so we can forget the intercept -- otherwise, the intercept would be part of the exogenous model variables in $\mathbf X$ and hence an internal instrument as well. The derived formulas are unaffected by this, and instead of $k$ we would have $k+1$ respectively, instead of $l$ we would have $l+1$.) $$ \mathbf y \, = \, \mathbf X \boldsymbol \beta + \mathbf u $$ {#eq-ivstruct_mat} Some of the variables in the $n \times k$ matrix $\mathbf X$ are **endogenous**, i.e., are correlated with $\mathbf u$. This leads to **inconsistent** OLS estimators $\hat {\boldsymbol \beta}$. In the **first step** of 2SLS, we regress **all** $k$ variables in $\mathbf X$ on **all** $l$ exogenous variables of the model from @eq-ivstruct_mat (internal instruments) **plus** the $m$ external instruments. The data matrix (the $n \times (l+m)$ instrument matrix) of regressors $\mathbf Z$ contains **both** groups of variables. For identification we necessarily must have $(l+m) \geq k$. - For every i^th^ row of $\mathbf Z$ we assume $E(\mathbf z_i \mid u_i)=\mathbf 0$, hence, every column (variable) of $\mathbf Z$ is (weak) exogenous with regard to the error term $\mathbf u$ Thus, we have for the $k$ **first step regressions**: $$ \mathbf X \, = \, \mathbf Z \, \boldsymbol\Gamma + \mathbf E $$ {#eq-iv1stage_mat} - with the $(l+m) \times k$ coefficient matrix $\mathbf \Gamma$ (the columns of $\mathbf \Gamma$ containing the coefficients of the $k$ first step regressions) and - the first step $n \times k$ error term matrix $\mathbf E$ We need the *predicted values* -- *the linear projections* -- of the first stage regression @eq-iv1stage_mat: $$ \hat {\mathbf X} \, = \, \mathbf Z \, \hat {\boldsymbol \Gamma} \, = \, \mathbf Z \, \underbrace {(\mathbf {Z'Z})^{-1} \mathbf Z'\mathbf X}_{\hat{\boldsymbol \Gamma}} \, = \, \mathbf P_{\mathbf Z} \mathbf X $$ {#eq-iv1stage_hatmat} Thereby, we have used the **projection matrix** (**hat matrix**): $$\mathbf P_{\mathbf Z}=\mathbf Z (\mathbf {Z'Z})^{-1} \mathbf Z'$$ This matrix was introduced in @sec-FWL_Mat, see also @eq-C_PMat. - With regard to our **example** from @eq-exampl_wage_equation, the matrix $\hat {\mathbf X}$ of the linear projections contains: a) The **exogenous** variables of the model including the intercept and b) The **predicted** values of the **endogenous** variables, $\widehat {educ}$ in our example; Hence: $\hat {\mathbf X} = [1, exper, exper^2, \widehat {educ}]$ - > Regarding a), the exogenous variables of the model; this follows directly from the fact that we regress the exogenous model variables on themselves (these are part of the matrix $\mathbf Z$). Hence, we get a perfect fit for them and their predicted values are identical to the variables themselves. Thus, the exogenous variables of the model act as their **own** (internal) instruments - Applying OLS to the first step regressions @eq-iv1stage_mat implies: $$ \mathbf X \ = \underbrace {\hat {\mathbf X}}_ {\mathbf Z \hat {\mathbf \Gamma} } + \ \hat {\mathbf E} $$ {#eq-iv1stage_pred_mat} ------------------------------------------------------------------------ In the **second step** of 2SLS we replace $\mathbf X$ in @eq-ivstruct_mat with $\hat {\mathbf X} + \hat{\mathbf E}$ from @eq-iv1stage_pred_mat and arrive to $$ \mathbf y \, = \, \hat{\mathbf X} \boldsymbol \beta + \underbrace {(\mathbf u + \hat{\mathbf E} \boldsymbol \beta)}_{\mathbf v} $$ {#eq-iv2stage_mat} with some errors $\mathbf v$, which additionally to $\mathbf u$, contains the residuals of the first stage regressions $\hat {\mathbf E}$. - Note that $\hat {\mathbf E}$ is uncorrelated with $\hat{\mathbf X}$ as predicted values of OLS are always uncorrelated with the OLS residuals (orthogonality of the projection matrices $\mathbf M$ and $\mathbf P$): $$ \hat {\mathbf X'} \, \hat{\mathbf E} \ = \ \underbrace {\mathbf {X'P_{\mathbf Z}}}_{\hat {\mathbf X'}} \, \underbrace {\mathbf {M_{\mathbf Z}X}}_{\hat{\mathbf E}} = \mathbf 0 $$ - Leaving out one exogenous variable from @eq-ivstruct_mat in the first stage regression @eq-iv1stage_mat would not lead to a breakdown of the orthogonality of $\hat {\mathbf E}$ and $\hat{\mathbf X}$ and 2SLS would still be consistent but less efficient because the left out exogenous variable would be replaced by its linear projection on $\mathbf Z$ [^iv-estimators-4] - Also note that $\hat {\mathbf X}$ is a linear combination of variables ${\mathbf Z}$ which all are weak exogenous with respect to $\mathbf u$ by assumption - Hence, the regressors of the second stage regression are weak exogenous with respect to the error term $\mathbf v = (\mathbf u + \hat{\mathbf E})$ of the second stage regression @eq-iv2stage_mat. Assumption MLR.4' of @sec-MLR is therefore fulfilled - Applying OLS to @eq-iv2stage_mat yields the **IV estimates**, respectively, the **2SLS estimates**, which are consistent based on the augments above: $$ \hat {\boldsymbol \beta}_{IV} = (\hat{\mathbf X}'\hat{\mathbf X})^{-1}\hat{\mathbf X}'\mathbf y $$ {#eq-iv2stage_beat_mat} - Plugging in the model @eq-ivstruct_mat for $\mathbf y$ in @eq-iv2stage_beat_mat leads to $$ \hat {\boldsymbol \beta}_{IV} = \boldsymbol \beta + (\hat{\mathbf X}'\hat{\mathbf X})^{-1}\hat{\mathbf X}'\mathbf u $$ {#eq-iv2stage_alternative_mat} - Therefore, assuming iid errors $\mathbf u$ with variance $\sigma^2$ and applying formula @eq-C_covmatrix_lin_combination leads to the the **covariance matrix** of $\hat {\boldsymbol \beta}_{IV}$ $$ \operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) = \sigma^2 (\hat{\mathbf X}'\hat{\mathbf X})^{-1} $$ {#eq-2sls_covmat_mat} Finally we have to estimate $\sigma^2$, the variance of $\mathbf u$. - We calculate the **residuals** $\hat {\mathbf u}$ based on @eq-ivstruct_mat (and not on the second stage regression @eq-iv2stage_mat) by $$ \hat {\mathbf u} \ = \ \mathbf y - \hat {\mathbf y} \ = \ \mathbf y - \mathbf X \hat {\boldsymbol \beta}_{IV} $$ {#eq-iv_residuals_mat} - And furthermore, the **estimated variance of** $\mathbf u$ is $$ \hat \sigma^2 \ = \ \dfrac {1}{n-k} \hat {\mathbf u}' \hat {\mathbf u} $$ {#eq-iv_sigma2_mat} ------------------------------------------------------------------------ The **two step procedure** describe above can be accomplished in only **one step**, which is much more convenient for actual computation. - For that, we plug in $\mathbf P_{\mathbf Z} \mathbf X$ for $\hat{\mathbf X}$ in @eq-iv2stage_beat_mat and use the fact that $\mathbf P_{\mathbf Z}$ is symmetric and *idempotent*,\ i.e., $\mathbf P'_{\mathbf Z}=\mathbf P_{\mathbf Z}$ and $\mathbf P_{\mathbf Z}\mathbf P_{\mathbf Z}=\mathbf P_{\mathbf Z}$ $$ \hat {\boldsymbol \beta}_{IV} \, = (\hat{\mathbf X}' \hat{\mathbf X})^{-1}\hat{\mathbf X}'\mathbf y \, = \, (\mathbf X' \mathbf P'_{\mathbf Z} \mathbf P_{\mathbf Z} \mathbf X)^{-1}\mathbf X' \mathbf P'_{\mathbf Z} \mathbf y \, = \, (\mathbf X' \mathbf P_{\mathbf Z} \mathbf X)^{-1}\mathbf X' \mathbf P_{\mathbf Z} \mathbf y $$ {#eq-2sls_beta_matPz} and for the covariance matrix $$ \operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) \, = \, \sigma^2 (\hat{\mathbf X}'\hat{\mathbf X})^{-1} \, = \, \sigma^2(\mathbf X' \mathbf P'_{\mathbf Z} \mathbf P_{\mathbf Z} \mathbf X)^{-1} \, = \, \sigma^2 (\mathbf X' \mathbf P_{\mathbf Z} \mathbf X)^{-1} $$ {#eq-2sls_covmat_matPz} - Using the definition of $\mathbf P_{\mathbf Z}$ and writing out, the resulting terms could obviously be calculated in one step ::: {.callout-important appearance="simple" icon="false"} $$ \hat {\boldsymbol \beta}_{IV} \, = \, \left(\mathbf X' \mathbf Z \, (\mathbf {Z'Z})^{-1} \mathbf Z' \mathbf X \right)^{-1} \mathbf X'\mathbf Z \, (\mathbf {Z'Z})^{-1} \, \mathbf Z' \mathbf y $$ {#eq-2sls_beta_matPz1} $$ \operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) \, = \, \sigma^2 (\mathbf X' \mathbf Z \, (\mathbf {Z'Z})^{-1} \mathbf Z' \mathbf X)^{-1} $$ {#eq-2sls_covmat_matPz1} ::: An interesting **special case** arise, if $\mathbf X$ and $\mathbf Z$ have the same number of columns, i.e., $k = l+m$. In this case the number of rhs endogenous variables *equals* the number of external instruments and the matrix $\mathbf X'\mathbf Z$ is **square,** $k \times k$, and invertible. Then, using the rule $(ABC)^{-1}=C^{-1}B^{-1}A^{-1}$, $\hat {\boldsymbol \beta}_{IV}$ simplifies to $$ \hat {\boldsymbol \beta}_{IV} \, = \, (\mathbf Z' \mathbf X)^{-1} \, (\mathbf {Z'Z}) (\mathbf X' \mathbf Z) ^{-1} \mathbf X' \mathbf Z \, (\mathbf {Z'Z})^{-1} \, \mathbf Z' \mathbf y \, = \, (\mathbf Z' \mathbf X)^{-1} \, \mathbf Z' \mathbf y $$ {#eq-2sls_beta_mat_oIV} - This resembles @eq-ordinary_IV_estimator and is sometimes called **ordinary** IV estimator. Note, in this case the model is **just** or **exactly identified** ::: [^iv-estimators-4]: This is not the case if the 2SLS estimates are calculated **by hand**, like described in text following @eq-first_stage. The reason is that in this case in the second stage regression the exogenous variables remain unchanged as regressors are not replaced by their linear projections. This can lead to a correlation of the exogenous variable, which was left out in the first stage regression, with the residuals of the first stage regression, violating MLR.4' in @eq-second_stage. ------------------------------------------------------------------------ ::: {.callout-tip collapse="true" icon="false"} ## Proof of consistency and asymptotic normality of IV/2SLS To prove the **consistency** of $\hat {\boldsymbol \beta}_{IV}$, we take the solution @eq-2sls_beta_matPz1 and plug in the model @eq-ivstruct_mat for $\mathbf y$: $$ \hat {\boldsymbol \beta}_{IV} \, = \, {\boldsymbol \beta} + \left(\mathbf X' \mathbf Z \, (\mathbf {Z'Z})^{-1} \mathbf Z' \mathbf X \right)^{-1} \mathbf X'\mathbf Z \, (\mathbf {Z'Z})^{-1} \, \mathbf Z' \mathbf u $$ {#eq-IVmat_beta_u} Afterwards, we are after dividing and multiplying the cross product terms by $n$ accordingly and take the probability limit.\ Note, according to Slutsky's Theorem (@thm-plim), $\operatorname {plim} g(x) = f(\operatorname {plim}x)$ for a continuous function $g(.)$ $$ \operatorname{plim}\left[\hat{\boldsymbol{\beta}}_{IV}\right] = \boldsymbol{\beta} \, + \, \left[\operatorname{plim}\left[\frac{\mathbf{X}' \mathbf{Z}}{n}\right] \ \operatorname{plim}\left[\frac{\mathbf{Z}' \mathbf{Z}}{n}\right]^{-1} \operatorname{plim}\left[\frac{\mathbf Z' \mathbf {X}}{n}\right]\right]^{-1} \times \\ \operatorname{plim}\left[\frac{\mathbf X' \mathbf Z}{n}\right] \ \operatorname{plim}\left[\frac{\mathbf Z' \mathbf Z}{n}\right]^{-1} \operatorname{plim}\left[\frac{\mathbf Z' \mathbf u}{n}\right] $$ {#eq-IVmat_consist} - According to LLN, if $\mathbf X$ and $\mathbf Z$ are "well behaved", the empirical moment matrices converge to their population (theoretical) moments matrices $\mathbf m$. Thus we have $$ \operatorname{plim}\left[\hat{\boldsymbol{\beta}}_{IV}\right] = \boldsymbol{\beta} \, + \, \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf {ZZ}}^{-1} \cdot \operatorname{plim}\left[\frac{\mathbf Z' \mathbf u}{n}\right] $$ - The last plim is $$ \operatorname{plim}\left[\frac{\mathbf Z' \mathbf u}{n}\right] \, = \, \operatorname{plim} \left[\frac{1}{n} \sum_{i=1}^n \mathbf z'_i u_i \right ] $$ {#eq-IVmat_consistZu} - According to the LLN (@thm-LLN), this average term converges in probability to the expectation of its summands. As we presuppose that $\mathbf z_i$ is (weak) *exogenous* (MLR.4') we have $$ E(\mathbf z'_i u_i) := \mathbf m_{\mathbf {Zu}} = E_z \left(E(\mathbf z'_i u_i \mid \mathbf z_i)\right) = E_z(\mathbf 0) = \mathbf 0 $$ {#eq-IVmat_E_Zu} - Thus, we finally have $$ \operatorname{plim}\left[\hat{\boldsymbol{\beta}}_{IV}\right] = \boldsymbol{\beta} \, + \, \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf {ZZ}}^{-1} \cdot \underbrace {\mathbf m_{\mathbf {Zu}}}_{\mathbf 0} \, = \, \boldsymbol{\beta} $$ {#eq-IVmat_consist1} - This **proofs the consistency** of the IV/2SLS estimator. However note that with our assumptions, IV/2SLS estimators *are generally biased in finite samples*, because some elements of $\mathbf x_i$ are not exogenous (for the expectation operation, there is no thing comparable to the Slutsky's theorem) ------------------------------------------------------------------------ We furthermore state that $\hat{\boldsymbol{\beta}}_{IV}$ is **asymptotically normal distributed** with the *asymptotic expectation* $\boldsymbol \beta$ and *asymptotic covariance matrix* $$ \hat{\boldsymbol{\beta}}_{IV} \ \, \stackrel{a}{\sim} \, \, N \left(\boldsymbol{\beta}, \ \sigma^2 \frac{1}{n}\left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \, \mathbf m_{\mathbf {ZX}} \right]^{-1}\right) $$ {#eq-IVmat_distri} - Proof: From @eq-IVmat_beta_u and @eq-IVmat_consist we have $$ \sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta}) \ \stackrel{d} \longrightarrow \ \left[\operatorname{plim}\left[\frac{\mathbf{X}' \mathbf{Z}}{n}\right] \ \operatorname{plim}\left[\frac{\mathbf{Z}' \mathbf{Z}}{n}\right]^{-1} \operatorname{plim}\left[\frac{\mathbf Z' \mathbf {X}}{n}\right]\right]^{-1} \times \\ \operatorname{plim}\left[\frac{\mathbf X' \mathbf Z}{n}\right] \ \operatorname{plim}\left[\frac{\mathbf Z' \mathbf Z}{n}\right]^{-1} \times \ \stackrel{d} \longrightarrow \ \left[\frac{\mathbf Z' \mathbf u}{ \textcolor {red}{\sqrt n} }\right] \ \ \ \ \Rightarrow $$ $$ \sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta}) \ \stackrel{d} \longrightarrow \ \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf {ZZ}}^{-1} \ \times \ \stackrel{d} \longrightarrow \ \left[\frac{\mathbf Z' \mathbf u}{\sqrt n}\right] $$ {#eq-IVmat_CLT} Thus, once again, the last term is the important one. In @sec-assympt_normality, we discussed that such a term converges in distribution under quite similar conditions as we had with the LLN *to a normally distributed random vector* with *expected value* $\mathbf 0$ and *covariance matrix* $\sigma^2 \mathbf m_{\mathbf {ZZ}}$, provided that $E(u_i \mid \mathbf z_i)=0$ and $\operatorname {Var} (\mathbf u) = \sigma^2 \mathbf I$ -- CLT, see @thm-CLT. - So, after applying the covariance matrix formula from @eq-C_covmatrix_lin_combination, we get the covariance matrix of $\sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta})$ $$ \operatorname{Var}(\sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta})) \ = $$ $$ \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \, \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf {ZZ}}^{-1} \left(\sigma^2 \mathbf m_{\mathbf {ZZ}}\right) \mathbf m_{\mathbf {ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \, \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} \ = \\ $$ $$ \sigma^2 \left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \ \mathbf m_{\mathbf {ZX}} \right]^{-1} $$ {#eq-IVmat_CovMat} - From the *limiting distribution* of $\sqrt n \, (\hat{\boldsymbol{\beta}}_{IV} - \boldsymbol{\beta})$ we immediately get the *asymptotic distribution* of $\hat{\boldsymbol{\beta}}_{IV}$ $$ \hat{\boldsymbol{\beta}}_{IV} \ \, \stackrel{a}{\sim} \, \, N \left(\boldsymbol{\beta}, \ \sigma^2 \frac{1}{n}\left[ \mathbf m_{\mathbf {XZ}} \ \mathbf m_{\mathbf{ZZ}}^{-1} \, \mathbf m_{\mathbf {ZX}} \right]^{-1}\right) $$ {#eq-IVmat_LimitDistrib} - And a consistent *estimator* of the **asymptotic covariance matrix** is $$ \widehat {\operatorname{Asy.Var}}(\hat{\boldsymbol{\beta}}_{IV}) \ = \ \hat\sigma^2 \left [ \mathbf {X'Z} \ (\mathbf {Z'Z})^{-1} \mathbf {Z'X} \right]^{-1} \ = \ \hat\sigma^2 (\mathbf X' \mathbf P_{\mathbf Z} \mathbf X)^{-1} $$ {#eq-IVmat_hat_CovMat} - Note, $\hat \sigma^2$ has to be calculated with residuals according to @eq-2sls_residuals - If $\mathbf {X'Z}$ is "small", i.e., the correlation between $\mathbf {X}$ and $\mathbf {Z}$ is small, then the inverse in the covariance matrix formula becomes "large" and we get large standard errors for the estimated coefficients -- *weak instrument problem* ::: ------------------------------------------------------------------------ ## Identification For a basic introduction to identification problems, see, @sec-reduced_form. But let us now discuss this issue by means of the first stage @eq-first_stage_example, in particular, why we need *external* instruments to estimate $\beta_1$ in @eq-exampl_wage_equation - Suppose we have no external instrument $z_i$ (the corresponding $\gamma_j$ in the first stage regression @eq-first_stage_example are $0$) and regress `educ` only on the internal exogenous variables `exper` and `exper`$^2$ in the first step. Then, $\widehat {educ}$ would be a perfect *linear combination* of `exper` and `exper`$^2$ - However, these two variables are already present in the second stage @eq-second_stage_example; this would generate a *perfect collinearity* between `exper`, `exper`$^2$ and $\widehat {educ}$ in @eq-second_stage_example, rendering it impossible to disentangle the effects of `exper` and `exper`$^2$ on the one hand and $\widehat {educ}$ on the other hand. The coefficients $\beta_1, \beta_2$ and $\beta_3$ are thus **not identified** (not estimable) in this case - As a *general rule*, **identification** of the model equation requires that for *every* rhs *endogenous* *variable* we must have *at least one* **distinct external** *instrument* which is also *relevant*, i.e., the corresponding $\gamma_j$ in first stage equation $\ne 0$ (rank condition) - The variables, which act as **external** instruments, **must not be part of the model**, @eq-exampl_wage_equation. This is called **exclusion restrictions** - If we have *more* than *one* *external* *instrument* per endogenous variable, the model is **overidentified** -- which is often a good thing as we will see later. Otherwise, the model is **just** or **exactly identified** ------------------------------------------------------------------------ ### Tests for weak instruments Above, we explained why we need at least one external instrument per endogenous rhs variable - However, even if this condition is met (the corresponding $\gamma_j$ in the first stage regression is $\ne 0$), it could be that the model is only **barely identified**. This is the case, if the conditional *correlations* between the external instrument and the endogenous rhs variable is *low*. In this instance, we have a **weak instrument problem**; fortunately, we can test for this circumstance **Weak instrument test**: The first stage regression could be used to test for the *relevance* of the instruments - In our example, with @eq-first_stage_example, we can carry out an F-test to examine, whether $\gamma_3$ and $\gamma_4$, the coefficients of `fatheduc` and `matheduc`, are *jointly zero* - Monte Carlo simulations by @staiger1997econometrica show that a F-statistic *less than* 10 **indicates a weak instrument problem**. With *heteroskedasticity* in either the first or second stage equations the F-statistic should be more in the range of 20 and above - If we have *more than one* rhs side endogenous variable, things are more complicated; even if we have good F-statistics in every first stage regression, it is not guarantied that we have at least one distinct and relevant external instrument for each endogenous variable. We have to use specialized tests like the @cragg1993testing or Anderson (1984) tests which are based on the smallest *canonical correlation* of the rhs endogenous variables and the external instruments (conditioned on the other $\mathbf x$) ------------------------------------------------------------------------ #### Example -- Testing for relevance of instruments We are testing the **relevance** of the instruments using the **first step regression**, @eq-first_stage_example, i.e., regressing educ on the corresponding external instrument(s) and all other exogenous variables - It is the partial effect of the external instrument(s) that matters! ```{r} rel1 <- lm(educ ~ fatheduc + exper + I(exper^2) + black + smsa + south, data=card) rel2 <- lm(educ ~ motheduc + exper + I(exper^2) + black + smsa + south, data=card) rel12 <- lm(educ ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, data=card) rel3 <- lm(educ ~ nearc4 + exper + I(exper^2) + black + smsa + south, data=card) ``` ------------------------------------------------------------------------ ```{r} #| code-fold: true #| label: tbl-WeakInstTests #| tbl-cap: "Tests for weak instruments, several external instruments (t-statistics in brackes. Note, the F-statistic for a single variable is the square of the t-statistic)" modelsummary( list("fatheduc"=rel1, "motheduc"=rel2, "fatheduc+motheduc"=rel12, "nearc4"=rel3), output="gt", statistic = "statistic", gof_omit = "A|B|L|F", align = "ldddd", stars = TRUE, fmt = 4, coef_map = c("fatheduc", "motheduc", "nearc4", "exper", "I(exper^2)", "black", "smsa", "south") ) ``` - The t-statistics of `fatheduc` and `motheduc` are \> 14 hence, a weak instrument problem can be ruled out for these instruments ------------------------------------------------------------------------ ##### F-test, whether both `fatheduc` and `motheduc` **together** are 0 - F-statistic should be at least 10 to rule out a weak instrument problem - In the case of heteroscedasticity, the F-statistic should be at least 20 to rule out a weak instrument problem ```{r} #| comment: " " lht(rel12, c("fatheduc=0", "motheduc=0")) ``` - The F-test (F \> 100) clearly rules out that `fatheduc` *and* `motheduc` **together** are weak instruments ------------------------------------------------------------------------ The t-statistic for nearc4 is about 4, which is suspicious in the case of heteroscedasticity. We therefore additionally test for heteroscedasticity -- applying the Breusch-Pagan test, see @sec-BP_test. ```{r} #| comment: " " bptest(rel3) ``` - The Bresch-Pagan test overwhelmingly reject homoscedasticity, hence `nearc4` might be only a weak instrument ------------------------------------------------------------------------ #### Another example In this example we show what devastating effects a weak instrument can have. - We want to investigate the relationship between the weight of newborns (`bwght`) and smoking (`packs`) - However, we suppose some common relationships between unobserved genetic factors (in u) for `bwght` and `smoking` - Thus, besides OLS, we try an IV estimator ```{r} #| comment: " " library(wooldridge); data("bwght") # OLS estimation bwols <- lm(bwght ~ packs, data=bwght) summary(bwols) ``` - The variable `packs` has the expected negative sign - However, we conjecture that `packs` might be **endogenous** (this is a choice variable, which are always suspicious for endogeneity) - So, we need an instrument for smoking (`packs`) - We take `cigprice` (for prices of cigarettes) as instrument 1. `cigprice` should have no direct effect on `bwght` (conditional on packs) - **exclusion restriction** 2. `cigprice` should be correlated with packs - **some relevance** 3. Furthermore, `cigprice` is for sure unrelated to unobserved individual genetic factors contained in u - **exogenous** ------------------------------------------------------------------------ - The corresponding first stage regression is ```{r} #| comment: " " # First stage regression first <- lm(packs ~ cigprice, data=bwght) coeftest(first) ``` - As we see, the t-statistic is very low indicating that we should not use `ciprice` as instrument for `packs` ==\> no relevance ------------------------------------------------------------------------ - But what happens if we do it nonetheless? ```{r} #| comment: " " # IV estimation second <- ivreg(bwght ~ packs | cigprice, data=bwght) summary(second) ``` - As we see, we have an unexpected sign with an absurd high estimate for `packs` and an extreme high standard error for `packs` (very low t-value). (Note, the R2 has no natural interpretation for IV/2SLS estimations as we have no orthogonality property with these estimators) - > Apparently, IV estimates with very weak instruments can yield much more unreliable results than OLS ------------------------------------------------------------------------ ## Testing for endogeneity - Hausman-Wu test - Sometimes, it is not clear whether some regressors are correlated with the error term $u_i$, i.e., if they are actually endogenous, and **whether we need an IV estimator**. Thus, a test for the appropriateness of OLS is desirable. - A practical problem is that $u_i$ is not observable and that the *observed* OLS residuals $\hat u_i$ are *always* uncorrelated with all regressors; orthogonality property of OLS: $\mathbf X' \hat {\mathbf u} = \mathbf 0$ - The **Hausman test** provides the following test idea for the problem at hand: - If there is *no* endogeneity problem in the model, OLS is consistent (and efficient), but the same is also true for IV/2SLS estimators with regard to consistency (but obviously not for efficiency). Therefore, in this case, the parameter estimates of **both** procedures should converge to the same true parameter values - If, on the contrary, there is an endogeneity problem, only IV/2SLS would be consistent - Thus, under the *null hypothesis* of *no* endogeneity problem, the OLS estimator $\boldsymbol \beta_{OLS}$ and the IV/2SLS estimator $\boldsymbol \beta_{IV}$ *should not differ too much* (only because of sampling errors) - If, on the other hand, the $H_0$ is false, we would expect that the two estimators differ *more* than sampling errors would suggest ------------------------------------------------------------------------ - A natural test, **Hausman test**, would therefore look whether the **difference** between the two estimators is too large. This test can be cast in terms of a usual Wald statistic, (compare @eq-WaldStatMat) $$ W = \mathbf {d}' \{ { \operatorname{Var}}(\mathbf{d}) \}^{-1} \mathbf{d} , \ \ \ \text{ with } \ \mathbf d = (\hat {\boldsymbol \beta}_{OLS} - \hat {\boldsymbol \beta}_{IV}) $$ - However, the difficulty with this test statistic is that the covariance matrix of $\mathbf d$, ${ \operatorname{Var}}(\mathbf{d})$, which, asymptotically, can be shown to be $\left[ \operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) - { \operatorname{Var}}(\hat {\boldsymbol \beta}_{OLS}) \right]$, is not of full rank and thus has no inverse (you would have to rely on a generalized inverse) - Fortunately, there is a much more simpler and equivalent variant of this test, the **Hausman-Wu test**. This test is a two step procedure (but *not* 2SLS, of course): 1. We estimate the usual first stage (reduced form) regression. The endogenous rhs variable is $\tilde x_j$ $$ \tilde {x}_j = \underbrace {\hat \gamma_0 + \hat \gamma_1 x_1 + \cdots + \hat \gamma_{l} x_{k-1} \, + \, \hat \gamma_{l+1}z_1 + \cdots + \hat \gamma_{l+m}z_m}_{\hat {x}_j} + \hat e $$ {#eq-hWU1} 2. In the second stage we estimate the **original model**, @eq-struct1, by **OLS**, but with the errors $\hat e$ of the first equation as an **additional** variable, and test whether the coefficient of $\hat e$, $\, \hat \delta$, is zero $$ y = \beta_0 + \beta_1 x_1 + \cdots + \beta_j \tilde x_j + \textcolor {red} {\delta \hat e} + u $$ {#eq-hWU2} ------------------------------------------------------------------------ - Why does the Hausman-Wu procedure work? - In the first stage regression, @eq-hWU1, the endogenous variable $\tilde x_j$ is regressed only on *exogenous* variables, so $\hat x_j$ is *uncorrelated* with $u$ by assumption. Thereby, $\tilde x_j$ is **decomposed** in an exogenous part, $\hat x_j$, and in a possibly endogenous part, $\hat e$ - Thus, if $\tilde x_j$ is correlated with the error $u$ of the original equation, this correlation must be **due to the the residuals** of @eq-hWU1, $\hat e$ - With other words, $\tilde x_j$ is endogenous (correlated with $u$ of the original model) if and only if the residuals $\hat e$ of the first stage regression are correlated with $u$ - Therfore, we can test for endogeneity of $\tilde x$, by adding $\hat e$ to the original model. If we cannot reject the $H_0: \delta=0$, then there is *no convincing evidence* for an endogeneity problem and we **should use OLS** -- as OLS is much *more efficient*. Otherwise we **need IV/2SLS** - This test also works, if we have more than one rhs endogenous variable. In this case, we simply estimate a reduced form equation like @eq-hWU1 for *every* endogenous variable and plugging the residuals of all these equations into @eq-hWU2 as additional variables. Then, we use an F-test to test whether all these added residuals are jointly insignificant - Note, if $\delta = 0$, the estimated coefficients of @eq-hWU2 are exactly the same as the OLS estimates of the original model - If $\delta \neq 0$, the estimated coefficients of @eq-hWU2 are exactly the same as the 2SLS estimates of the original model, which is not so obvious ------------------------------------------------------------------------ ::: {.callout-tip collapse="true" icon="false"} ## Formal analysis and proof of the last two statements To show the logic of this test more formally, we state the original model, with $\tilde {x}$ denoting the possible rhs *endogenous* variable and $\mathbf x$ the vector of *exogenous explanatory* variables $$ y = \tilde {x}\beta_1 + \mathbf x \boldsymbol \beta_2 + u $$ {#eq-hwu_base} - As $\tilde {x}$ is possible endogenous, we apply 2SLS - The *first step* regression with the vector of exogenous **external** instruments $\mathbf z$ is $$ \tilde {x} = \underbrace {\mathbf x \hat {\boldsymbol \gamma}_1 + \mathbf z \hat {\boldsymbol \gamma}_2}_{\hat x} + \hat e $$ {#eq-hwu_1st} - Substituting @eq-hwu_1st into @eq-hwu_base, we reach to the *second step* regression. Applying **OLS** to @eq-hwu_2nd estimates $\beta_1$ and $\boldsymbol \beta_2$ consistently as **2SLS** $$ y = \hat { x} \hat {\beta}_{1_{2SLS}} + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} + \hat v $$ {#eq-hwu_2nd} - Note that $v = u + \beta_1 \hat e$ - *Now the trick*; we add the residuals of the first stage regression, @eq-hwu_1st, $\hat e$, to the second stage regression, @eq-hwu_2nd, as additional variable. But *by construction* (orthogonality property of OLS), $\hat e$ is *orthogonal* (uncorrelated) *to every explanatory variable in* @eq-hwu_2nd. Thus, adding $\hat e$ as additional variable in @eq-hwu_2nd **does not alter** the **2SLS** estimates of $\beta_1$ and $\boldsymbol \beta_2$ in @eq-hwu_2nd (adding orthogonal regressors doesn't change the the estimated parameters, compare @sec-omitVar) $$ y = \hat {x} \hat {\beta}_{1_{2SLS}} + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} + \hat \alpha \hat e + \hat v $$ {#eq-hwu_hwu_alternative} - Side note: As $v = u + \beta_1 \hat e$, $\hat \alpha$ should converge to $\beta_1$, if $\hat e$ and $u$ are uncorrelated. Therefore, another variant of the Hausman-Wu test would be to test the equality of $\hat \alpha$ and $\hat \beta_1$ in @eq-hwu_hwu_alternative ------------------------------------------------------------------------ - Finally, we exploit the identity $\tilde x = \hat x + \hat e$ form the first stage regression, @eq-hwu_1st, to reparameterize @eq-hwu_hwu_alternative: We replace $\hat x$ by $(\tilde x - \hat e)$ in @eq-hwu_hwu_alternative and arrive to $$ y = (\tilde x - \hat e) \hat {\beta}_{1_{2SLS}} + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} + \hat \alpha \hat e + \hat v \ \ \ \Rightarrow $$ $$ y = \tilde x \hat {\beta}_{1_{2SLS}} + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} + \underbrace {(\hat \alpha - \hat \beta_{1_{2SLS}})}_{\hat \delta} \hat e + \hat v $$ {#eq-hwu_hwu} - Note, as this is only reparameterization (no additional or lost information), the OLS estimates of @eq-hwu_hwu remain unaffected > @eq-hwu_hwu is the **same** as @eq-hWU2, the **equation for the Hausman-Wu test**. - Pay attention that the OLS estimates of this equation actually deliver the 2SLS estimates of $\beta_1$ and $\boldsymbol \beta_2$ - Furthermore, @eq-hwu_hwu is basically an OLS model, as the variable in question, $\tilde x$, is included in its (questionable) **original form** - If the estimated **coefficient** $\hat \delta$ of $\hat e$ is **zero**, i.e., $\hat \alpha = \hat \beta_{1_{2SLS}}$, we actually get the OLS estimates of $\beta_1$ and $\boldsymbol \beta_2$, i.e., 2SLS is not necessary. And this is the substance of the Hausman-Wu test! - The formulation in @eq-hwu_hwu therefore shows the equivalence of the Hausman-Wu test with the original Hausman test: - The more important the term $\hat e$ in @eq-hwu_hwu, the more the 2SLS estimates will differ from the OLS estimates - And this was the original test idea of the Hausman test *Final Remark*: If we use $\hat x$ instead of $\hat e$ in @eq-hWU2 and @eq-hwu_hwu, we will end up with the very same test results and estimates $$ y = \tilde x (\hat \beta_{1_{2SLS}} + \hat \delta) + \mathbf x \hat {\boldsymbol \beta}_{2_{2SLS}} - \underbrace {(\hat \alpha - \hat \beta_{1_{2SLS}})}_{\hat \delta} \hat x + \hat v $$ {#eq-hwu_hwu1} ::: ------------------------------------------------------------------------ ## Testing overidentifying restrictions -- Sargan test At the beginning of this chapter we claimed that we have to *assume* the exogeneity of the instruments and that it is not possible to test whether they are **valid**, i.e., uncorrelated with $u$ from the main equation. That is true for exactly (just) identified models - If we have **more external** instruments than needed to identify the model, the model is **overidentified**. This can improve efficiency but moreover can be exploited for a test of the *validity of the instruments* - Such a **test** for the **validity of the instruments** is the **Sargan test** or **Hansen's** ***J*** **-test** - The **main idea** of the Sargan test is as following: - If we have more external instrument than needed for identification, we can calculate several **different** 2SLS estimates of $\boldsymbol \beta$ using different set of instruments - If all the instruments are valid, the different 2SLS estimates of $\boldsymbol \beta$ would all be consistent and converge to the same true values of $\boldsymbol \beta$ - Hence, for a specific sample, the difference of the different estimates should not be larger than expected from sampling errors. If they do, there is something wrong with the instruments - It can be shown that this test can be carried out with a quite simple **auxiliary equation** approach described below ------------------------------------------------------------------------ - The starting point is the second step regression (2SLS) of a structural equation like the following $$ y = \mathbf x \hat {\boldsymbol \beta}_{_{2SLS}} + \hat {x}_j \hat {\beta}_{j_{2SLS}} + \underbrace {\widehat {(u + \beta_j \hat e)} }_{\hat v} $$ {#eq-sargan_2sls} - Thereby, $\hat x_j$ was obtained by a first step regression of the endogenous $\tilde x_j$ on the exogenous model variables $\mathbf x$ (internal instruments) and on several external instruments $\mathbf z$ $$ \tilde x_j = \mathbf x \hat{\boldsymbol \gamma}_1 + \mathbf z \hat{\boldsymbol \gamma}_2 + \hat e $$ {#eq-sargan_1ststep} - We presuppose that $\hat x_j$ was estimated in this first step with **more than one external** instrument. Hence, the model is **overidentified** - Subsequently, we estimate the following auxiliary equation, which is the **Sargan test equation** $$ \hat u = \mathbf x \boldsymbol \delta_1 + \mathbf z \boldsymbol \delta_2 + \epsilon $$ {#eq-sargan_eq} - Note that we use the 2SLS residuals calculated by @eq-2sls_residuals and not $\hat v$ from @eq-sargan_2sls - Now we test whether the 2SLS residuals $\hat u$ are actually uncorrelated with all exogenous variables, in particular with the external instruments $\mathbf z$ - If $\mathbf x$ and $\mathbf z$ are actually exogenous, the fit of @eq-sargan_eq, measured with by $n \cdot R^2$, should be zero (besides sampling errors) - The test statistic $n \cdot R^2$ of this equation is $\chi^2(m)$ distributed, $m$ being the *number of overidentifying restrictions*, i.e., the number of external instruments minus the number of rhs endogenous variables - If we reject the $H_0$ (**all** instruments are valid), then *at least one external instrument is not valid*, i.e., not exogenous and therfore erroneously excluded from the main equation ------------------------------------------------------------------------ ##### Why does this test only work for overidentified models? This test only works if we have *more* external instruments than rhs endogenous variables - Suppose **not** and we have one rhs endogenous variable and **only one external instrument** $z_1$ - Then $\hat x$ from the first step equation is (besides $\mathbf x$) simply a multiple to the one external instrument $z_1$ - In this case one can show that $z_1$ is orthogonal to $\hat u$, the 2SLS residuals (conditional on $\mathbf x$); for a proof, see footnote [^iv-estimators-5] - As the exogenous variables in $\mathbf x$ are also orthogonal to $\hat u$, it follows that all parameters $\boldsymbol \delta$ of the test @eq-sargan_eq are zero and the $R^2$ is always zero as well. Hence, a test for correlation with $\hat u$ is not possible in this case - If we have more external instruments than rhs endogenous variables, than $\hat x$ is not a simple multiple of one instrument but equals a *particular* linear combination of several $z_j$. Thus, the single $z_j$ are *not automatically* orthogonal to $\hat u$ and therefore, a test for correlation with $\hat u$ is possible [^iv-estimators-6] [^iv-estimators-5]: Note that generally, the 2SLS residual do not retain the orthogonality property from their OLS counterparts, which makes the argument in the text considerably more complicated to proof.\ **First of all**, we have to distinguish the residuals of @eq-sargan_2sls, $\hat v$ from the 2SLS residuals:\ The former a are defined as $\hat v = y-\hat {\boldsymbol \beta} \mathbf x - \hat x_j \hat \beta_j$ and the latter are $\hat u = y-\hat {\boldsymbol \beta} \mathbf x - \tilde x_j \hat \beta_j$. Substituting $\tilde x_j = \hat x_j +\hat e$ from the first stage regression we get: $\hat u = y-\hat {\boldsymbol \beta} \mathbf x - ( \hat x_j +\hat e) \hat \beta_j$ $\; = \;$ $y-\hat {\boldsymbol \beta} \mathbf x - \hat x_j \beta_j - \hat e \hat \beta_j$ $\; \Rightarrow \;$ $\hat u = (\hat v - \hat e \hat \beta_j)$, which is not obvious.\ **Secondly**, we proof that $\mathbf x$ is uncorrelated with $\hat u = \hat v - \beta_j \hat e$:\ Because of the orthogonality property of OLS it follows from the first stage regression that $\mathbf x$ is uncorrelated with the first stage residuals $\hat e$. From the second stage regression @eq-sargan_2sls it follows that $\mathbf x$ is uncorrelated with $\hat v$ as well. Thus, $\mathbf x$ is uncorrelated with $\hat u$.\ **Thirdly**, we proof that $z_1$ is uncorrelated with $\hat u = \hat v - \beta_j \hat e$:\ If we have only one instrument $z_1$, $\hat x_j$ in @eq-sargan_2sls is: $\hat x_j = \mathbf x \hat {\boldsymbol \gamma_1} + z_1 \hat \gamma_{2,1}$. As $\hat x_j$ and $\mathbf x$ are uncorrelated with $\hat v$ (orthogonality in @eq-sargan_2sls), $z_1$ must be uncorrelated with $\hat v$ as well.\ Furthermore, $z_1$ is uncorrelated with the first stage residuals $\hat e$ as well, because of the orthogonality property of OLS. Hence, $z_1$ is uncorrelated with $\hat u$.\ **Therefore**, both $\mathbf x$ and $z_1$ are uncorrelated with $\hat u$ from @eq-sargan_eq, leading to $R^2=0$ in this case. [^iv-estimators-6]: Regarding the **third** argument of the previous footnote we now have:\ $\hat x_j = \mathbf x \hat {\boldsymbol \gamma}_1 + z_1 \hat \gamma_{2,1} + z_2 \hat \gamma_{2,2}$. As $\hat x_j$ and $\mathbf x$ are uncorrelated with $\hat v$ (orthogonality in @eq-sargan_2sls), the linear combination $z_1 \hat \gamma_{2,1} + z_2 \hat \gamma_{2,2}$ must be uncorrelated with $\hat v$ as well.\ Furthermore, $z_1$ and $z_2$ are uncorrelated with the first stage residuals $\hat e$, because of the orthogonality property of OLS. We conclude, $z_1 \hat \gamma_{2,1} + z_2 \hat \gamma_{2,2}$ is uncorrelated with $\hat u$ but not $z_1$ or $z_2$ for themselves.\ Hence, $z_1$ and $z_2$ are generally correlated with $\hat u$ in @eq-sargan_eq and so, we generally have $R^2 \neq 0$. ------------------------------------------------------------------------ - Rejecting the $H_0$ does not tell us which external instrument is invalid - However, if we have at least *more* than *one* overidentifying restriction, we can infer whether certain subgroups of instruments are valid. In this case, we estimate the model with only a subgroup of external instruments (in which we have more confidence) and calculate the Saragn test statistic for this subgoup - Afterwards, we estimate the model with *all* external instruments and calculate the Sargan test statistic, which is usually larger than the previous one. The difference of these two test statistics (which is $\chi^2(m-m_1)$) should not differ too much. If they do, the compliment set of the trustworthy subgroup is invalid; -- *J-Diff-test* - A problem with the Sargan test is that the **power of the test** (see @sec-typesoferrors) could be quite low, especially if the external instruments have a common source or are highly correlated - For instance, the 2SLS estimates obtained by two different instruments could be very similar, even if **both** instruments are **invalid** - Therefore, if we are not able to reject the $H_0$ of the Sargan test, we should not rely too much on this result, especially if the external instruments are highly correlated In the following more formal section we will show more clearly the connection of this test with overidentifying restrictions and introduce **General Methods of Moments** (GMM) estimators ------------------------------------------------------------------------ ### GMM estimators {#sec-GMM} ::: {.callout-caution collapse="true" icon="false"} ## GMM estimators, overidentifying restrictions and the Saragn test At the heart of IV/2SLS estimation is the assumption that the instruments (internal **+** external) $\mathbf z$ are **exogenous.** This assumption can be cast in so called **moment restrictions**: $$ E(\mathbf z_i' u_i) = \mathbf 0 \ = \ \left[ \begin{array} {c} E(z_{1,i} u_i)=0 \\ E(z_{2,i}u_i)=0 \\ \vdots \\ E(z_{k,1}u_i)=0 \end{array} \right] $$ {#eq-GMM1} - First, we presuppose that $\mathbf z$ hat $k$ elements, i.e., we have as many instruments than variables in the structural model we want to estimate, so that the model is exactly (just) identified - Replacing this theoretical (population) moments by its *sample counterparts* we get $$ \frac{1}{n}\sum_{i=1}^n {\mathbf z}_i' \hat u_i = \mathbf 0 \ = \ \left[ \begin{array} {c} \frac{1}{n}\sum_{i=1}^n z_{1,i} \hat u_i =0 \\ \frac{1}{n}\sum_{i=1}^n z_{2,i} \hat u_i =0\\ \vdots \\ \frac{1}{n}\sum_{i=1}^n z_{k,i} \hat u_i = 0 \end{array} \right] \ = \ \frac{1}{n} \mathbf Z' \hat {\mathbf u} \ = \ \mathbf 0 $$ {#eq-GMM2} ------------------------------------------------------------------------ - Substituting the structural model $\mathbf y - \mathbf X \hat {\boldsymbol \beta}$ for $\hat {\mathbf u}$ we get $$ \frac{1}{n}\sum_{i=1}^n \mathbf z_i' (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \ = \ \left[ \begin{array} {c} \frac{1}{n}\sum\nolimits _{i=1}^n z_{1,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{2,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \vdots \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{k,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \end{array} \right] \ = \ \frac{1}{n} \mathbf Z' (\mathbf y - \mathbf X \hat {\boldsymbol \beta} ) \ = \ \mathbf 0 $$ {#eq-GMM3} - So, this clearly shows that we have $k$ equations to determine the $k$ parameters $\beta_1,\ldots,\beta_k$ (without loss for generality, we assume demeaned variables, so we need no intercept $\beta_0$) - Multiplying out @eq-GMM3 and solving for $\hat {\boldsymbol \beta}$ we finally arrive to the (ordinary) IV estimator from @eq-2sls_beta_mat_oIV (@sec-IV_mat) as a **methods of moment estimator** $$ \frac{1}{n}\mathbf Z' \mathbf y = \frac{1}{n}\mathbf Z' \mathbf X \hat{\boldsymbol \beta} \ \ \Rightarrow \ \ \hat {\boldsymbol \beta}_{IV} = (\mathbf Z' \mathbf X )^{-1}\mathbf Z' \mathbf y $$ {#eq-GMM_OIV} - The analysis above describes the case for an exactly (just) identified model as we have $k$ equations in $k$ parameters (number of rhs endogenous variables equals the number of external instruments) *But what happens if the model is overidentified, meaning that we have more instruments as variables?* ------------------------------------------------------------------------ - In this case $\mathbf z_i$ has $k+m$ elements, $m$ being the number of overidentifying restrictions. Hence, we have more equations than parameters in $\boldsymbol \beta$ -- the model is **overdetermined** $$ \frac{1}{n}\sum_{i=1}^n \mathbf z_i' (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \ = \ \left[ \begin{array} {c} \frac{1}{n}\sum\nolimits _{i=1}^n z_{1,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{2,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \vdots \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{k,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \\ \vdots \\ \frac{1}{n}\sum\nolimits _{i=1}^n z_{k+m,i} (y_i - \mathbf x_i \hat {\boldsymbol \beta}) \end{array} \right] \ = \ \frac{1}{n} \mathbf Z' (\mathbf y - \mathbf X \hat {\boldsymbol \beta}) = \ \mathbf 0 $$ {#eq-GMM4} - We have $m$ excess equations - Multiplying out @eq-GMM4 we once again get $$ \frac{1}{n}\mathbf Z' \mathbf y = \frac{1}{n}\mathbf Z' \mathbf X \hat {\boldsymbol \beta} $$ {#eq-GMM5} - But this time, $\mathbf Z' \mathbf X$ is a $\left( (k+m), k \right)$ matrix and thus, *not quadratic* any more! Therefore, *a usual inverse of this matrix does not exists and so we cannot solve this system for* ${\hat {\boldsymbol \beta}}$ ------------------------------------------------------------------------ - Because we have $m$ excess equations, basic mathematics knowledge tells us that there exists no $\hat {\boldsymbol \beta}$ with $k$ elements so that $k+m$ linear independent equations can be jointly satisfied - So, how to solve for $\hat {\boldsymbol \beta}$ in this case? - The key idea for this problem is to search for a $\hat {\boldsymbol \beta}$ which *approximately satisfies* the $k+m$ linear equations in *a best manner*, i.e., a weighted sum of the squared moment restrictions from @eq-GMM4 should be *as close as possible to zero*, but not exactly zero. This procedure is called **Generalized Methods of Moment** (**GMM**) - Hence, to estimate ${\boldsymbol \beta}$, we **minimize** a weighted sum of the squared sample moments, with weights given by a positive definite and symmetric weighting matrix $\mathbf W$. We call the resulting *quadratic form* $J$ $$ \underset{\hat{\beta}}{\operatorname {min}} \ J := \ n \cdot \frac {1}{n}(\mathbf y - \mathbf X \hat {\boldsymbol \beta})'\mathbf Z \, \mathbf W \, \mathbf Z' (\mathbf y - \mathbf X \hat {\boldsymbol \beta})\frac {1}{n} $$ {#eq-GMM_J} Remark: We multiply the quadratic form by $n$ for testing purposes as otherwise $J$ would converge to zero (but this plays **no** role for optimization)\ (Remark: Multiplying @eq-GMM5 with a left sided pseudo inverse of $\mathbf Z' \mathbf X$ is equivalent to minimize $J$ with $\mathbf W = \mathbf I$ -- we would have an **un**weighted sum of the sample moments in this case) - The **solution** to this minimization problem is obtained by setting the first derivative of $J$ with respect to $\hat {\boldsymbol \beta}$ to zero and solving the resulting matrix equation for $\hat {\boldsymbol \beta}$: $$ \hat {\boldsymbol \beta}_{GMM} = (\mathbf X' \mathbf Z \mathbf W \mathbf Z' \mathbf X)^{-1} \mathbf X' \mathbf Z \mathbf W \mathbf Z' \mathbf y $$ {#eq-GMM_est} ------------------------------------------------------------------------ - *Generalized Method of Moments estimator* are generally consistent. And this particular estimator $\hat {\boldsymbol \beta}_{GMM}$ is consistent, **regardless** of our choice of $\mathbf W$ (if the moment restrictions from @eq-GMM4 are true, of course) - But we want an optimal weighting matrix to *minimize the variance* of the estimated parameters -- we want an *efficient* estimator - It turns out that this *optimal* weighting matrix is proportional to an estimate of the *inverse of the asymptotic covariance matrix of the moment restrictions*. For homoskedastic errors this is $$ \mathbf W = [E(\mathbf z_i' u_i u_i' \mathbf z_i)]^{-1} = [E_z(E(u_i^2\mathbf z_i' \mathbf z_i \, | \, \mathbf z_i))]^{-1} = [{\sigma^2} E_z(\mathbf z_i' \mathbf z_i)]^{-1} \ \ \Rightarrow $$ $$ \widehat{\mathbf W} = \left( {\hat\sigma^2} \frac{1}{n} \sum_{i=1}^n \mathbf z_i' \mathbf z_i\right)^{-1} = \frac {n}{\hat \sigma^2} (\mathbf Z' \mathbf Z)^{-1} $$ {#eq-GMM_covM} - Plugging $\widehat {\mathbf W}$ into @eq-GMM_est we get the *efficient* GMM estimator for homoscedastic errors $$ \hat {\boldsymbol \beta}_{EGMM} = (\mathbf X' \mathbf Z (\mathbf Z' \mathbf Z)^{-1} \mathbf Z' \mathbf X)^{-1} \mathbf X' \mathbf Z (\mathbf Z' \mathbf Z)^{-1} \mathbf Z' \mathbf y $$ {#eq-GMM_est_eff} - This is the 2SLS estimator from @eq-2sls_beta_matPz1 (@sec-IV_mat). *Therefore, 2SLS is the efficient GMM estimator for homoscedastic errors* ------------------------------------------------------------------------ - *All results so far are resting on the truth of the assumptions* $E(\mathbf z_i' u_i)=\mathbf 0$ - However, these assumptions can be tested if we have overidentifying restrictions, i.e., more equations than variables in the system of @eq-GMM4 - This test is based on $J$, with the optimal weighting matrix $\hat {\mathbf W}$ for homoscedastic errors plugged in $$ \hat J = \dfrac {(\mathbf y - \mathbf X \hat{\boldsymbol \beta})' \mathbf Z \, (\mathbf Z' \mathbf Z)^{-1} \, \mathbf Z' (\mathbf y - \mathbf X \hat {\boldsymbol \beta}) } {\hat \sigma^2} $$ - If the model is *just* identified, $\hat {\boldsymbol \beta}$ is solving the system equations of @eq-GMM3 *exactly* and $\hat J$ is *always* zero -- no test possible - If we have *more equations than variables*, the system of equations can *only be approximately* solved by $\hat {\boldsymbol \beta}$. This is even the case, if *all* moment restrictions $E(\mathbf z_i' u_i)=\mathbf 0$ are true. However, the larger $\hat J$, the more likely it is that for some or even all moment restrictions $E(\mathbf z_i' u_i)\ne\mathbf 0$, i.e., are *not* true - Hence, a **test procedure** for the *validity of the overidentifying restrictions* (and thus for the *validity of the instruments*), is to look *whether* $\hat J$ *is too large* -- larger than sampling errors would suggest ------------------------------------------------------------------------ - Substituting for $(\mathbf y - \mathbf X \hat {\boldsymbol \beta}) = \hat {\mathbf u}$ in @eq-GMM_J we get $$ \hat J \, = \, \dfrac {\hat {\mathbf u}' \mathbf Z \, (\mathbf Z' \mathbf Z)^{-1} \, \mathbf Z' \hat {\mathbf u} } {\hat \sigma^2} \, = \, \dfrac {\hat {\mathbf u}' \mathbf P_{\mathbf Z} \, \hat {\mathbf u} } {\hat \sigma^2} \, = \, n \dfrac { ( \mathbf P_{\mathbf Z} \hat {\mathbf u})' (\mathbf P_{\mathbf Z} \, \hat {\mathbf u}) } {\hat {\mathbf u}' \hat {\mathbf u}} $$ {#eq-GMM_J1} - Here, $\mathbf P_{\mathbf Z}$ is the projection matrix (Hat matrix, see @eq-C_PMat), which is idempotent and projects $\hat {\mathbf u}$ into the linear subspace of $\mathbb{R}^n$ spanned by the columns of $\mathbf Z$ - Hence, $\mathbf P_{\mathbf Z} \hat {\mathbf u}$ are the predicted values of a regression of the 2SLS-residuals $\hat {\mathbf u}$ on all instruments in $\mathbf Z$ (compare the Sargan test, @eq-sargan_eq. Therefore, the numerator is the sum of the squares (the scalar product) of these predicted values, i.e., the SSE of this regression (@eq-sargan_eq) - The denominator is the sum of the squared 2SLS-residuals, hence SST of $\hat {\mathbf u}$ > Thus, $\hat J$ is $n$ times the $R^2$ of a regression of the squared 2SLS-residuals on all instruments; remember, $R^2 := \frac {SSE}{SST}$.\ > If $n \cdot R^2$ is too large, **at least one** (or even all) **instrument is not exogenous** - This $J$-statistic can be shown to be asymptotically $\chi^2(m)$ distributed, with $m$ being the number of overidentifying restrictions and is *identical* to the **Sargan** test procedure described in the text above (see @eq-sargan_eq) ::: ------------------------------------------------------------------------ ## Summary - With IV/2SLS estimation techniques we can handle the problem of *endogenous* rhs variables, which is a widespread phenomenon - The drawback of IV/2SLS estimates are the generally much larger standard errors of the estimated parameters - Therefore, one should always use OLS if this is justifiable. The **Hausman-Wu** test can give some indication for this matter - The drawbacks of IV/2SLS are particularly present if we have only weak instruments. Thus, a test for **weak instruments** as described above is mandatory for a credible analysis - Furthermore, if we have an overidentified model, a **Sargan test** (*J*-test) is mandatory as well; the whole IV/2SLS procedure is grounded on valid instruments. If only one instrument is not valid, the entire analysis breaks down - Corrections for heteroskedasticity / serial correlation analogous to OLS and IV/2SLS easily extends to time series and panel data situations - In the following **example**, we once again estimate our wage equation by 2SLS (as we did at the begin of this chapter), but this time we additionally carry out the diagnostic tests described above. Fortunately, the R procedure `ivreg` does most of the work for us ------------------------------------------------------------------------ ## Example -- 2SLS with diagnostics - 2SLS estimation with father and mother education as instrument for education - We have two external instruments, thus the model is overidentified ```{r} library(wooldridge); library(AER); library(texreg) data("card") # Complication: Because of missing values in `fatheduc` and `motheduc`, # which would make some problems when we carry out some test by hand # We generate a new data set `card1` with missing values of `fatheduc` and `motheduc` excluded card1 <- subset(card, card$fatheduc>-1 & card$motheduc>-1) # 2SLS estimation iv12 <- ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south | fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, data=card1) # Saving 2SLS residuals resid_iv <- iv12$residuals # To get the three described diagnostic tests for 2SLS, we have to set # the option "diagnostics=TRUE" # summary(iv12, diagnostics = TRUE) ``` ------------------------------------------------------------------------ ```{r} #| code-fold: true # Modifications for modelsummery to print Diagnostic statistics for ivreg library(broom) glance_custom.ivreg <- function(x, ...) { Dia <- " " WI <- paste0( sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[1,3], 2 ), fmt = '%4.2f'), " [", sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[1,4], 3 ), fmt = '%4.3f'), "]" ) WU <- paste0( sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[2,3], 2 ), fmt = '%4.2f'), " [", sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[2,4], 3 ), fmt = '%4.3f'), "]" ) SA <- paste0( sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[3,3], 2 ), fmt = '%4.2f'), " [", sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[3,4], 3 ), fmt = '%4.3f'), "]" ) out <- data.frame( "Diagnostics" = Dia, "Weak Instr" = WI, "Hausman WU" = WU, "Sargan" = SA ) return(out) } ``` ```{r} #| code-fold: true #| label: tbl-IV_plus_diagnostics #| tbl-cap: "IV estimates of a wage equation using father and mother education as instruments and showing important diagnostic statistics" #summary(iv12, diagnostics = TRUE) library(modelsummary) modelsummary(list( "2SLS" = iv12 ), shape = term ~ statistic, statistic = c('std.error', 'statistic', 'p.value', 'conf.int'), stars = TRUE, gof_omit = "A|L|B|F", align = "ldddddd", fmt= 4, output = "gt") ``` ------------------------------------------------------------------------ ##### Test for weak instruments by hand ```{r} #| comment: " " # First stage regression first <- lm(educ ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, data = card1) # Testing whether coefficients of external instruments are jointly zero lht(first, c("motheduc", "fatheduc")) ``` ------------------------------------------------------------------------ ##### Doing the Hausman-Wu test by hand ```{r} #| comment: " " # We need the residual of first stage regression of educ on all exoegenous variables resid1 <- first$residuals # Regressing the model of interest with residuals of the first stage regression # as additional variable # Hausman-Wu test; Look at the p-value of resid1 # Further, compare estimated coefficients with the 2SLS estimates; they are are identical Hausman_Wu <- lm(lwage ~ educ + exper + I(exper^2) + black + smsa + south + resid1, data = card1) ``` ------------------------------------------------------------------------ ```{r} #| code-fold: TRUE #| label: tbl-Hausman_Wu #| tbl-cap: "Hausman-Wu test for wage equation above. `resid1` are the residuals of the first stage regression. Note, the coefficients of this eqaution are identical to the 2SLS estimates" library(modelsummary) modelsummary(list( "Hausman_Wu" = Hausman_Wu ), shape = term ~ statistic, statistic = c('std.error', 'statistic', 'p.value', 'conf.int'), stars = TRUE, gof_omit = "A|L|B|F", align = "ldddddd", fmt= 4, output = "gt") ``` ------------------------------------------------------------------------ ##### Sargan test by hand ```{r} #| comment: " " #| echo: fenced # Regression of IV residuals on all exogenous variables sargan <- lm(resid_iv ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, data = card1) # Test statistic: J = n * R^2 J <- length( sargan$residuals ) * summary(sargan)$r.squared print("Result of Sargan test") print( paste( "J-stat =", sprintf( "%.3f",J ), " p-value =", sprintf( "%.4f",1-pchisq(J,1) ) ) ) ```

7.1 Main Causes of the Problem

Endogeneity problems are endemic in social sciences/economics.

Solutions to the endogeneity:

7.2 Main idea of IV estimation

7.3 The IV estimator

How to find instruments?

There have been proposed several instruments for this matter

Example: IV versus OLS

7.4 The relevance of relevance

7.5 Two stage least square (2SLS)

Example:

7.5.1 2SLS – variance of estimates

The variance of the IV/2SLS estimate \beta_j

7.5.2 Matrix notation for IV/2SLS

7.6 Identification

7.6.1 Tests for weak instruments

Example – Testing for relevance of instruments

F-test, whether both fatheduc and motheduc together are 0

Another example

7.7 Testing for endogeneity - Hausman-Wu test

7.8 Testing overidentifying restrictions – Sargan test

Why does this test only work for overidentified models?

7.8.1 GMM estimators

7.9 Summary

7.10 Example – 2SLS with diagnostics

Test for weak instruments by hand

Doing the Hausman-Wu test by hand

Sargan test by hand

F-test, whether both `fatheduc` and `motheduc` together are 0