library(wooldridge); library(AER); library(texreg)
data("card")
# OLS
ols <- lm(lwage ~ educ + exper + I(exper^2) + black + smsa + south, data=card)
# IV with father education as instrument
# Note, in ivreg the instruments are after "|" and you have to include all (!)
# exogenous variables of the model but not the endogenous educ
iv1 <- ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south |
fatheduc + exper + I(exper^2) + black + smsa + south,
data=card)
# IV with mother education as instrument
iv2 <- ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south |
motheduc + exper + I(exper^2) + black + smsa + south,
data=card)
# IV with father and mother education as instruments
iv12 <- ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south |
fatheduc + motheduc + exper + I(exper^2) + black + smsa + south,
data=card)
# IV with nearc4 (proximity to a 4 year collage) as instrument
iv3 <- ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south |
nearc4 + exper + I(exper^2) + black + smsa + south,
data=card)
7.1 Main Causes of the Problem
Solutions to the endogeneity:
Proxy variables method for omitted regressors
Model for selection process
-
Fixed effects methods if
- panel data are available,
- endogeneity is time-constant, and
- regressors are not time-constant
-
Instrumental variables methods (IV)
- IV estimators are the most prominent method to address endogeneity problems
7.2 Main idea of IV estimation
The main causes for endogeneity of explanatory variables discussed above are so common that nearly every empirical work is more or less affected by this problem
- Assume, our model is the following with \tilde x_i being endogenous, i.e. correlated with u_i, violating MLR.4’ (therefore the tilde over x_i). This is the so called structural equation which describes the causal effect we want to estimate
y_i = \beta_0 + \beta_1 \tilde x_i + u_i \tag{7.1}
The method of instrumental variables is a remedy for the endogeneity problem
The main idea is that the variables \tilde {\mathbf x}_i which are correlated with u_i are “replaced” in some way with instruments
These instruments should contain additional information (outside of Equation 7.1) to help resolve the endogeneity problem, i.e., to disentangle the looked for partial effect of \tilde {\mathbf x}_i from feedbacks or other sources of correlation which we discussed above
-
The external instruments (we denote them \mathbf z_i) have to satisfy the following three conditions: 1
\mathbf z_i have to be (weak) exogenous; Cov (z_i, u_i) = 0, see Section 2.7
\mathbf z_i must not be a part of the structural equation of interest – exclusion restrictions.
We need external instruments with additional outside information!\mathbf z_i have to be relevant; Cov (z_i, \tilde x_i) \neq 0, indeed, the correlation between z_i and \tilde x_i should be as high as possible
From the first and third requirement, we can easily drive the IV estimator for one explanatory variable and one instrument
7.3 The IV estimator
- Based on Cov (z_i, u_i) = 0 we can derive a method of moments estimator. From Equation 7.1, we have u_i = y_i - \beta_0 - \beta_1 \tilde x_i. Plug this into Cov (z_i, u_i)
Cov \left( z_i, (y_i - \beta_0 - \beta_1 \tilde x_i) \right) \ = \ Cov(z_i,y_i) - \beta_1 Cov(z_i,\tilde x_i) \, = \, 0 \ \ \Rightarrow
\beta_1 = \dfrac{Cov(z_i,y_i)}{Cov(z_i,\tilde x_i)} \tag{7.2}
- The parameter is estimable (identified) because we can write down \beta_1 in terms of population moments which can be replaced with their empirical counterparts to reach to the IV estimator for \beta_1
\hat\beta_{1,IV} = \dfrac{\frac {1}{n}\sum_i (z_i-\bar z)(y_i-\bar y)} {\frac {1}{n}\sum_i (z_i-\bar z)(\tilde x_i-\bar x)} \tag{7.3}
- If every variable is well behaved, we can apply the LLN and it follows that Equation 7.3 converges to Equation 7.2 with an ever increasing sample size. Hence, \hat\beta_{1,IV} is a consistent estimator for \beta_1, whereas the OLS estimator
\hat\beta_{1} = \dfrac{\frac {1}{n}\sum_i (\tilde x_i-\bar x)(y_i-\bar y)} {\frac {1}{n}\sum_i (\tilde x_i-\bar x)^2} \ \tag{7.4}
is not; because \tilde x_i is correlated with u_i by assumption
How to find instruments?
-
The consistency of the IV estimators relies on the exogeneity of z_i. Unfortunately, this exogeneity cannot be tested directly (without additional information), hence we have to assume this – based on economic theory, common sense or introspection
- If we have more external instruments as needed (more than one in this example), we can test whether the instruments are exogenous as a group; this will be discussed later – Sargan J test
In practice, the main difficulty with IV estimators is to find appropriate instruments. Let us consider our good old wage equation:
wage_i = \beta_0 + \beta_1 educ_i + \beta_2 exper_i + \underbrace {(ability + v_i)}_{u_i} \tag{7.5}
-
We are interested in the partial effect of education on the wage. But we probably have an omitted variable problem as
ability
of the people is clearly important for the received wage and is not directly observable and thus,ability
is a part of u_i- However,
ability
and therefore u_i are probably correlated witheduc
– people with higher ability also tend to be more educated. But this violates MLR.4’ and thus,educ
is endogenous
- However,
-
So, we need at least one external instrument for
educ
, which is- not part of Equation 7.5
- is relevant
- is exogenous (not correlated with the error term and thus correctly excluded from the main model)
There have been proposed several instruments for this matter
-
The education of the mother or father
- No direct wage determinant
- Correlated with education of the child because of social factors
- Probably (?) uncorrelated with innate ability (problem: ability may be inherited from parents)
-
The number of siblings
- No direct wage determinant
- Correlated with education because of resource constraints in household
- Probably uncorrelated with innate ability
-
College proximity when 16 years old
- No direct wage determinant
- Correlated with education because more education if lived near college
- Uncorrelated with error (?)
-
Month of birth
- No direct wage determinant
- Correlated with education because of compulsory school attendance laws (in German: Schulpflicht)
- Uncorrelated with error
In all these cases one could question the exogeneity of the proposed instrument or their relevance, or even both. However, at least the relevance can be tested
-
In the following, we estimate Equation 7.5 as an example with OLS and IV
- Note the coefficients of the endogenous
educ
(this actually is expected to be over- and not underestimated by OLS – maybe an additional errors in variables problem?) and the considerably larger standard errors ofeduc
- Note the coefficients of the endogenous
Example: IV versus OLS
Code
library(modelsummary)
modelsummary( list("OLS"=ols,
"IV-fath"=iv1, "IV-moth"=iv2, "IV-fath_moth"=iv12, "IV-nearc4"=iv3),
gof_omit = "A|L|B|F",
align = "lddddd",
stars = TRUE,
fmt = 3,
output="gt")
OLS | IV-fath | IV-moth | IV-fath_moth | IV-nearc4 | |
---|---|---|---|---|---|
(Intercept) | 4.734*** | 4.467*** | 4.266*** | 4.264*** | 3.753*** |
(0.068) | (0.238) | (0.234) | (0.219) | (0.829) | |
educ | 0.074*** | 0.089*** | 0.102*** | 0.100*** | 0.132** |
(0.004) | (0.014) | (0.014) | (0.013) | (0.049) | |
exper | 0.084*** | 0.093*** | 0.095*** | 0.099*** | 0.107*** |
(0.007) | (0.010) | (0.009) | (0.010) | (0.021) | |
I(exper^2) | -0.002*** | -0.002*** | -0.002*** | -0.002*** | -0.002*** |
(0.000) | (0.000) | (0.000) | (0.000) | (0.000) | |
black | -0.190*** | -0.160*** | -0.168*** | -0.151*** | -0.131* |
(0.018) | (0.026) | (0.024) | (0.026) | (0.053) | |
smsa | 0.161*** | 0.155*** | 0.146*** | 0.151*** | 0.131*** |
(0.016) | (0.019) | (0.018) | (0.020) | (0.030) | |
south | -0.125*** | -0.113*** | -0.116*** | -0.107*** | -0.105*** |
(0.015) | (0.018) | (0.017) | (0.018) | (0.023) | |
Num.Obs. | 3010 | 2320 | 2657 | 2220 | 3010 |
R2 | 0.291 | 0.264 | 0.274 | 0.253 | 0.225 |
RMSE | 0.37 | 0.38 | 0.38 | 0.38 | 0.39 |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
7.4 The relevance of relevance
- Besides the exogeneity, the relevance of the instrument, Cov(z_i,\tilde x_i), plays an extreme important role. To show this we subtract from model Equation 7.1 the corresponding mean values and premultiply by (z_i-\bar z)
(z_i-\bar z) (y_i-\bar y) = \beta_1 (z_i-\bar z)(\tilde x_i-\bar x) + (z_i-\bar z)u_i
- Taking the expectation we get
E[(z_i-\bar z) (y_i-\bar y)] = \beta_1 E[(z_i-\bar z)(\tilde x_i-\bar x)] + E[(z_i-\bar z)u_i] \ \ \Rightarrow
Cov(z_i,y_i) = \beta_1 Cov(z_i,\tilde x_i) + Cov(z_i,u_i)
- And dividing by Cov(z_i,\tilde x_i)
\dfrac {Cov(z_i,y_i)}{Cov(z_i,\tilde x_i)} = \beta_1 + \dfrac {Cov(z_i,u_i)}{Cov(z_i,\tilde x_i)}
- Replacing the theoretical moment by their empirical counterparts, we recognize that the left hand side is equal to the IV estimate of \beta_1, Equation 7.3. Taking the probability limits (Equation A.9) we arrive to
\operatorname {plim} \hat\beta_{1,IV} \ = \ \beta_1 + \dfrac{\operatorname {plim} \frac {1}{n}\sum_i (z_i-\bar z)u_i} {\operatorname {plim} \frac {1}{n}\sum_i (z_i-\bar z)(\tilde x_i-\bar x)} \ = \ \beta_1 + \dfrac {\operatorname {Corr}(z_i,u_i)}{\operatorname {Corr}(z_i,\tilde x_i)} \dfrac {\sigma_u}{\sigma_{\tilde x}} \tag{7.6}
As Equation 7.6 shows, \hat \beta_{1,IV} is consistent if the correlation between z_i and u_i is zero, \operatorname {Corr}(z_i,u_i)=0, i.e., z is exogenous
-
Suppose, this correlation is not exactly zero, but small. In this case we would expect only a small asymptotic bias in \hat \beta_{1,IV}
- However, if additionally, \operatorname {Corr}(z_i,\tilde x_i) is small as well – if z_i is not relevant – the bias in the IV estimate could become considerable large. \rightarrow Weak instrument problem
We can derive a relation similar to Equation 7.6 for the OLS estimate; we subtract the corresponding mean values from model Equation 7.1 and premultiply by (\tilde x_i-\bar x). This yields
\operatorname {plim} \hat\beta_{1,OLS} \ = \ \beta_1 + \dfrac{\operatorname {plim} \frac {1}{n}\sum_i (\tilde x_i-\bar x)u_i} {\operatorname {plim} \frac {1}{n}\sum_i (\tilde x_i-\bar x)(\tilde x_i-\bar x)} \ = \ \beta_1 + \dfrac {\operatorname {Corr}(\tilde x_i,u_i)}{\operatorname {Var}(\tilde x_i)} {\sigma_u \sigma_{\tilde x}} \ = \
\beta_1 + \operatorname {Corr}(\tilde x_i,u_i) \frac {\sigma_u}{\sigma_{\tilde x}} \tag{7.7}
Thus, besides {\sigma_u}/{\sigma_{\tilde x}} (explain why) the asymptotic bias of the OLS estimates depends on \operatorname {Corr}(\tilde x_i,u_i)
However, if we have a weak instrument problem, it is easily possible that the asymptotic bias of the IV estimate is even larger than that of the OLS estimate;
\frac{\operatorname{Corr}(z_i, u_i)}{\operatorname{Corr}(z_i, \tilde x_i)}>\operatorname{Corr}(\tilde x_i, u_i) \text { e.g. } \frac{0.03}{0.2}>0.1
7.5 Two stage least square (2SLS)
Suppose, we want to estimate a more elaborated structural model equation
y = \beta_0 + \beta_1 x_1 + \cdots + \beta_{l} x_{l} + \beta_{j} \tilde x_j + u \tag{7.8}
with l exogenous variables, x_1, \ldots , x_l, and one endogenous, \tilde x_j.
In the introduction we argued that in the case of endogenous regressors we have to “replace” this variable by an exogenous and relevant instrument. But we were not specific what replace really means
-
Here, replace is not meant literally in the sense that we actually replace \tilde x with z in Equation 7.8 and then apply OLS. This would be a Proxy variable approach, which might be sometimes useful with an error in the variables or omitted variable problem 2
- As an example, if we directly replace
educ
in our wage equation Equation 7.5 by an exogenous proxy variable z (for instance withfatheduc
), the OLS-coefficient of z would generally not estimate the effect of education on earned wage, \beta_1, and probably introduce an additional errors in the variables problem
- As an example, if we directly replace
-
The IV approach is another one: We do not replace \tilde x with z, but rather with the predicted values of a regression of \tilde x on all the exogenous variables of the model including the external instrument z. We denote this predicted values with \hat x and call this the
first step regression or reduced form regression
\tilde x_j \, = \, \gamma_0 + \gamma_1 x_1 + \cdots + \gamma_{l} x_{l} \, + \, \gamma_{l+1}z_1 + \cdots + \gamma_{l+m}z_m + e \tag{7.9}
Here, x_1,\ldots,x_{l} are the l exogenous variables of the model (sometimes called internal instruments), for instance
exper
andexper
^2 in our wage equation, and z_1,\ldots,z_m are the m external instrumentsWe estimate the coefficients \gamma of Equation 7.9 by OLS, leading to the predicted values \hat x_j
\tilde x_j \, = \, \underbrace{ \hat \gamma_0 + \hat \gamma_1 x_1 + \cdots + \hat \gamma_{l} x_{l} \, + \, \hat \gamma_{l+1}z_1 + \cdots + \hat \gamma_{l+m}z_m}_{\hat x_j} + \hat e \quad \Rightarrow \tag{7.10}
\tilde x_j \, = \, \hat x_j + \hat e \tag{7.11}
Remark: The reduced form Equation 7.9 is the functional relationship of an endogenous variable dependent only on exogenous variables (the exogenous variables and error terms drive the endogenous ones – data generating process) and is related to simultaneous equation models
- In the second step regression we estimate the original model, but with \hat x_j in place of \tilde x_j, i.e., we insert \hat x_j + \hat e from Equation 7.11 for \tilde x_j in Equation 7.8
y = \beta_0 + \beta_1 x_1 + \cdots + \beta_{l} x_{l} + \beta_{j} \hat x_j + \underbrace { (u + \beta_j \hat e)}_v \tag{7.12}
-
The new error v is composed of two components:
The error from the structural model, u. This is uncorrelated with the exogenous variables x_1,\ldots,x_{l} and is now also uncorrelated with \hat x_j, because the latter is a linear combination of x_1,\ldots,x_{l} and the exogenous external instruments z_1,\ldots,z_{m} from the first stage regression, Equation 7.10
The residuals of the first stage regression, \hat e. But these pose no problems (besides the larger error variance), because the residuals are uncorrelated with all variables in Equation 7.12 by construction (orthogonality property) – hence, no errors in the variables problem and no violation of MLR.4’ 3
Hence, the parameters of Equation 7.12 can be consistently estimated by OLS
Remark: The 2SLS residuals, and subsequently the residual variance \hat \sigma^2, have to computed by
\hat u \ = \ y - \underbrace {\hat\beta_0 + \hat\beta_1 x_1 + \cdots + \hat\beta_{l} x_{l} + \hat\beta_{j} \tilde x_j}_{\hat y} \tag{7.13}
Thereby, the 2SLS estimates of the \betas are used, but with the original variable \tilde x_j and not with \hat x_j. This procedure yields \hat u_i from Equation 7.8 and not \hat v_i from Equation 7.12
After estimating \sigma^2 by Equation 2.35 with residuals based on Equation 7.13, all tests can be carried out in the usual way
Example:
Suppose our wage equation is
wage_i \ = \ \beta_0 + \beta_1 educ_i + \beta_2 exper_i + \beta_3 exper_i^2 + u_i \tag{7.14}
Because of the unobserved ability (which is therefore part of u_i) and the fact that ability is probably correlated with education the variable education is endogenous. Therefore, we need instruments for education, which are not part of the model, are exogenous and relevant
Suppose we have two external instruments for education: education of the mother and education of the father. We already discussed them before
Thus, in the first stage regression we regress
educ
on all exogenous variables of the model (internal instruments) and the two external instrumental variables
educ_i = \gamma_0 + \gamma_1 exper_i + \gamma_2 exper_i^2 + \gamma_3 fatheduc_i + \gamma_4 motheduc_i + e_i \tag{7.15}
- This first stage regression yields the predicted values \widehat {educ}
- In the second stage we estimate the original model, but with \widehat {educ} instead of educ
wage_i \ = \ \beta_0 + \beta_1 \widehat {educ_i} + \beta_2 exper_i + \beta_3 exper_i^2 + v_i \tag{7.16}
- This two-step procedure yield consistent estimates for all \betas; therefore the name two stage least square
Remark: Usually, if we have exactly as many external instrument as right hand side (rhs) endogenous variables, we call the procedure (ordinary) IV, otherwise 2SLS. But this labeling doesn’t seems to be unanimously used
-
Why does this procedure work
All variables in the second stage regression are exogenous because
educ
was replaced by a prediction, only based on exogenous informationBy using the prediction based on exogenous information,
educ
is purged of its endogenous part (the part that is related to the error term)Thus, only that part of educ remains which is exogenous. And this is the reason why the coefficient of \widehat {educ} represents the causal effect of education on received wages (and not some mixtures of effects, see Section 1.3.2)
7.5.1 2SLS – variance of estimates
The most important downside of IV/2SLS estimations is that the variance of the IV/2SLS estimates are generally considerably larger than that of OLS estimates, i.e. they are less precise (look at Table 7.1)
Therefore, IV/2SLS need large samples to be useful
Below, the formulas for the variance of OLS estimates and the formula for IV/2SLS estimates is shown
\operatorname {Var}(\hat \beta_{j,OLS}) \ = \ \dfrac{\sigma^2}{ \underbrace{SST_j}_{\sum_{i=1}^n (x_{ij} - \bar x_j)^2} (1-R_j^2) }
\operatorname {Var}(\hat \beta_{j,2SLS}) \ = \ \dfrac{\sigma^2}{ \underbrace{SST_j}_{\sum_{i=1}^n (\hat x_{ij} - \bar x_j)^2} (1 - R_{j}^2) } \tag{7.17}
- These formulas only differ in that for calculation of the latter one, the explanatory variable x_j is replaced with its prediction from the first step regression, \hat x_j
The variance of the IV/2SLS estimate \beta_j
increases with the error variance \sigma^2 and decreases with sample size n
decreases with the total variation of the predicted values \hat x_j
increases with R_{j}^2, which is the R^2 of a regression of \hat x_j on all the other explanatory x-es
The last two points are always (considerably) worse for IV/2SLS than for OLS and even worsens more with poor or weak instruments
The error variance \sigma_v^2 of the second stage regression is larger, because the error term additionally contains the first stage residuals. However, the residuals are purged form this effect if they are correctly computed by Equation 7.13
The variation of a predicted variable, SSE, is always less than the variation of the original variable, SST. The definition of the R^2 is based in this ratio SSE/SST \leq 1; \rightarrow less variation of the corresponding explanatory variable \hat x_j
-
The R^2, i.e. the fit of a regression of the predicted variable \hat x_j on all the other x-es is always higher than the R^2 of a regression of the original variable x_j on all the other x-es
The reason is that in the first stage regression, x_j is regressed on all exogenous variables of the model (the other x-s) plus the external instruments. Hence, the predicted values of this regression, \hat x_j, are, besides the effects of the external instruments, a linear function of these other x-es. This implies: The correlation between \hat x_j and the other x-es is typically much higher than the correlation between the original x_j and the other x-es
With other words, IV/2SLS exhibit an inherent multicollinearity problem
7.5.2 Matrix notation for IV/2SLS
7.6 Identification
For a basic introduction to identification problems, see, Section 2.3.1.1.
But let us now discuss this issue by means of the first stage Equation 7.15, in particular, why we need external instruments to estimate \beta_1 in Equation 7.14
-
Suppose we have no external instrument z_i (the corresponding \gamma_j in the first stage regression Equation 7.15 are 0) and regress
educ
only on the internal exogenous variablesexper
andexper
^2 in the first step. Then, \widehat {educ} would be a perfect linear combination ofexper
andexper
^2- However, these two variables are already present in the second stage Equation 7.16; this would generate a perfect collinearity between
exper
,exper
^2 and \widehat {educ} in Equation 7.16, rendering it impossible to disentangle the effects ofexper
andexper
^2 on the one hand and \widehat {educ} on the other hand. The coefficients \beta_1, \beta_2 and \beta_3 are thus not identified (not estimable) in this case
- However, these two variables are already present in the second stage Equation 7.16; this would generate a perfect collinearity between
As a general rule, identification of the model equation requires that for every rhs endogenous variable we must have at least one distinct external instrument which is also relevant, i.e., the corresponding \gamma_j in first stage equation \ne 0 (rank condition)
The variables, which act as external instruments, must not be part of the model, Equation 7.14. This is called exclusion restrictions
If we have more than one external instrument per endogenous variable, the model is overidentified – which is often a good thing as we will see later. Otherwise, the model is just or exactly identified
7.6.1 Tests for weak instruments
Above, we explained why we need at least one external instrument per endogenous rhs variable
- However, even if this condition is met (the corresponding \gamma_j in the first stage regression is \ne 0), it could be that the model is only barely identified. This is the case, if the conditional correlations between the external instrument and the endogenous rhs variable is low. In this instance, we have a weak instrument problem; fortunately, we can test for this circumstance
Weak instrument test: The first stage regression could be used to test for the relevance of the instruments
-
In our example, with Equation 7.15, we can carry out an F-test to examine, whether \gamma_3 and \gamma_4, the coefficients of
fatheduc
andmatheduc
, are jointly zero- Monte Carlo simulations by Staiger and Stock (1997) show that a F-statistic less than 10 indicates a weak instrument problem. With heteroskedasticity in either the first or second stage equations the F-statistic should be more in the range of 20 and above
If we have more than one rhs side endogenous variable, things are more complicated; even if we have good F-statistics in every first stage regression, it is not guarantied that we have at least one distinct and relevant external instrument for each endogenous variable. We have to use specialized tests like the Cragg and Donald (1993) or Anderson (1984) tests which are based on the smallest canonical correlation of the rhs endogenous variables and the external instruments (conditioned on the other \mathbf x)
Example – Testing for relevance of instruments
We are testing the relevance of the instruments using the first step regression, Equation 7.15, i.e., regressing educ on the corresponding external instrument(s) and all other exogenous variables
- It is the partial effect of the external instrument(s) that matters!
rel1 <- lm(educ ~ fatheduc + exper + I(exper^2) + black + smsa + south, data=card)
rel2 <- lm(educ ~ motheduc + exper + I(exper^2) + black + smsa + south, data=card)
rel12 <- lm(educ ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, data=card)
rel3 <- lm(educ ~ nearc4 + exper + I(exper^2) + black + smsa + south, data=card)
Code
modelsummary( list("fatheduc"=rel1, "motheduc"=rel2,
"fatheduc+motheduc"=rel12, "nearc4"=rel3),
output="gt",
statistic = "statistic",
gof_omit = "A|B|L|F",
align = "ldddd",
stars = TRUE,
fmt = 4,
coef_map = c("fatheduc", "motheduc", "nearc4",
"exper", "I(exper^2)", "black", "smsa", "south")
)
fatheduc | motheduc | fatheduc+motheduc | nearc4 | |
---|---|---|---|---|
fatheduc | 0.1728*** | 0.1128*** | ||
(14.3035) | (7.7539) | |||
motheduc | 0.1879*** | 0.1297*** | ||
(14.5996) | (7.6118) | |||
nearc4 | 0.3373*** | |||
(4.0887) | ||||
exper | -0.3808*** | -0.3754*** | -0.3780*** | -0.4100*** |
(-10.0625) | (-10.7007) | (-9.8671) | (-12.1686) | |
I(exper^2) | 0.0019 | 0.0009 | 0.0024 | 0.0007 |
(1.0098) | (0.5376) | (1.2250) | (0.4438) | |
black | -0.4789*** | -0.6015*** | -0.3543** | -1.0061*** |
(-4.1516) | (-5.9888) | (-2.9719) | (-11.2235) | |
smsa | 0.3882*** | 0.3818*** | 0.3509*** | 0.4039*** |
(4.3106) | (4.5528) | (3.8408) | (4.7578) | |
south | -0.1426+ | -0.2211** | -0.1211 | -0.2915*** |
(-1.6511) | (-2.7157) | (-1.3852) | (-3.6790) | |
Num.Obs. | 2320 | 2657 | 2220 | 3010 |
R2 | 0.477 | 0.492 | 0.482 | 0.474 |
RMSE | 1.89 | 1.89 | 1.86 | 1.94 |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
- The t-statistics of
fatheduc
andmotheduc
are > 14 hence, a weak instrument problem can be ruled out for these instruments
F-test, whether both fatheduc
and motheduc
together are 0
F-statistic should be at least 10 to rule out a weak instrument problem
In the case of heteroscedasticity, the F-statistic should be at least 20 to rule out a weak instrument problem
Linear hypothesis test
Hypothesis:
fatheduc = 0
motheduc = 0
Model 1: restricted model
Model 2: educ ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa +
south
Res.Df RSS Df Sum of Sq F Pr(>F)
1 2214 8594.3
2 2212 7704.2 2 890.12 127.78 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- The F-test (F > 100) clearly rules out that
fatheduc
andmotheduc
together are weak instruments
The t-statistic for nearc4 is about 4, which is suspicious in the case of heteroscedasticity. We therefore additionally test for heteroscedasticity – applying the Breusch-Pagan test, see Section 6.3.
bptest(rel3)
studentized Breusch-Pagan test
data: rel3
BP = 92.185, df = 6, p-value < 2.2e-16
- The Bresch-Pagan test overwhelmingly reject homoscedasticity, hence
nearc4
might be only a weak instrument
Another example
In this example we show what devastating effects a weak instrument can have.
We want to investigate the relationship between the weight of newborns (
bwght
) and smoking (packs
)However, we suppose some common relationships between unobserved genetic factors (in u) for
bwght
andsmoking
Thus, besides OLS, we try an IV estimator
library(wooldridge); data("bwght")
# OLS estimation
bwols <- lm(bwght ~ packs, data=bwght)
summary(bwols)
Call:
lm(formula = bwght ~ packs, data = bwght)
Residuals:
Min 1Q Median 3Q Max
-96.772 -11.772 0.297 13.228 151.228
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 119.7719 0.5723 209.267 < 2e-16 ***
packs -10.2754 1.8098 -5.678 1.66e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.13 on 1386 degrees of freedom
Multiple R-squared: 0.02273, Adjusted R-squared: 0.02202
F-statistic: 32.24 on 1 and 1386 DF, p-value: 1.662e-08
The variable
packs
has the expected negative signHowever, we conjecture that
packs
might be endogenous (this is a choice variable, which are always suspicious for endogeneity)-
So, we need an instrument for smoking (
packs
)-
We take
cigprice
(for prices of cigarettes) as instrumentcigprice
should have no direct effect onbwght
(conditional on packs) - exclusion restrictioncigprice
should be correlated with packs - some relevanceFurthermore,
cigprice
is for sure unrelated to unobserved individual genetic factors contained in u - exogenous
-
- The corresponding first stage regression is
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.06742568 0.10253837 0.6576 0.5109
cigprice 0.00028288 0.00078297 0.3613 0.7179
- As we see, the t-statistic is very low indicating that we should not use
ciprice
as instrument forpacks
==> no relevance
- But what happens if we do it nonetheless?
Call:
ivreg(formula = bwght ~ packs | cigprice, data = bwght)
Residuals:
Min 1Q Median 3Q Max
-856.32 15.35 33.35 47.35 188.35
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.65 104.63 0.790 0.43
packs 345.47 1002.19 0.345 0.73
Residual standard error: 108.2 on 1386 degrees of freedom
Multiple R-Squared: -27.22, Adjusted R-squared: -27.24
Wald test: 0.1188 on 1 and 1386 DF, p-value: 0.7304
As we see, we have an unexpected sign with an absurd high estimate for
packs
and an extreme high standard error forpacks
(very low t-value). (Note, the R2 has no natural interpretation for IV/2SLS estimations as we have no orthogonality property with these estimators)Apparently, IV estimates with very weak instruments can yield much more unreliable results than OLS
7.7 Testing for endogeneity - Hausman-Wu test
Sometimes, it is not clear whether some regressors are correlated with the error term u_i, i.e., if they are actually endogenous, and whether we need an IV estimator. Thus, a test for the appropriateness of OLS is desirable.
A practical problem is that u_i is not observable and that the observed OLS residuals \hat u_i are always uncorrelated with all regressors; orthogonality property of OLS: \mathbf X' \hat {\mathbf u} = \mathbf 0
-
The Hausman test provides the following test idea for the problem at hand:
If there is no endogeneity problem in the model, OLS is consistent (and efficient), but the same is also true for IV/2SLS estimators with regard to consistency (but obviously not for efficiency). Therefore, in this case, the parameter estimates of both procedures should converge to the same true parameter values
If, on the contrary, there is an endogeneity problem, only IV/2SLS would be consistent
Thus, under the null hypothesis of no endogeneity problem, the OLS estimator \boldsymbol \beta_{OLS} and the IV/2SLS estimator \boldsymbol \beta_{IV} should not differ too much (only because of sampling errors)
If, on the other hand, the H_0 is false, we would expect that the two estimators differ more than sampling errors would suggest
- A natural test, Hausman test, would therefore look whether the difference between the two estimators is too large. This test can be cast in terms of a usual Wald statistic, (compare Equation 3.13)
W = \mathbf {d}' \{ { \operatorname{Var}}(\mathbf{d}) \}^{-1} \mathbf{d} , \ \ \ \text{ with } \ \mathbf d = (\hat {\boldsymbol \beta}_{OLS} - \hat {\boldsymbol \beta}_{IV})
However, the difficulty with this test statistic is that the covariance matrix of \mathbf d, { \operatorname{Var}}(\mathbf{d}), which, asymptotically, can be shown to be \left[ \operatorname{Var}(\hat {\boldsymbol \beta}_{IV}) - { \operatorname{Var}}(\hat {\boldsymbol \beta}_{OLS}) \right], is not of full rank and thus has no inverse (you would have to rely on a generalized inverse)
-
Fortunately, there is a much more simpler and equivalent variant of this test, the Hausman-Wu test. This test is a two step procedure (but not 2SLS, of course):
- We estimate the usual first stage (reduced form) regression. The endogenous rhs variable is \tilde x_j
\tilde {x}_j = \underbrace {\hat \gamma_0 + \hat \gamma_1 x_1 + \cdots + \hat \gamma_{l} x_{k-1} \, + \, \hat \gamma_{l+1}z_1 + \cdots + \hat \gamma_{l+m}z_m}_{\hat {x}_j} + \hat e \tag{7.43}
- In the second stage we estimate the original model, Equation 7.8, by OLS, but with the errors \hat e of the first equation as an additional variable, and test whether the coefficient of \hat e, \, \hat \delta, is zero
y = \beta_0 + \beta_1 x_1 + \cdots + \beta_j \tilde x_j + \textcolor {red} {\delta \hat e} + u \tag{7.44}
-
Why does the Hausman-Wu procedure work?
-
In the first stage regression, Equation 7.43, the endogenous variable \tilde x_j is regressed only on exogenous variables, so \hat x_j is uncorrelated with u by assumption. Thereby, \tilde x_j is decomposed in an exogenous part, \hat x_j, and in a possibly endogenous part, \hat e
Thus, if \tilde x_j is correlated with the error u of the original equation, this correlation must be due to the the residuals of Equation 7.43, \hat e
With other words, \tilde x_j is endogenous (correlated with u of the original model) if and only if the residuals \hat e of the first stage regression are correlated with u
Therfore, we can test for endogeneity of \tilde x, by adding \hat e to the original model. If we cannot reject the H_0: \delta=0, then there is no convincing evidence for an endogeneity problem and we should use OLS – as OLS is much more efficient. Otherwise we need IV/2SLS
-
This test also works, if we have more than one rhs endogenous variable. In this case, we simply estimate a reduced form equation like Equation 7.43 for every endogenous variable and plugging the residuals of all these equations into Equation 7.44 as additional variables. Then, we use an F-test to test whether all these added residuals are jointly insignificant
Note, if \delta = 0, the estimated coefficients of Equation 7.44 are exactly the same as the OLS estimates of the original model
If \delta \neq 0, the estimated coefficients of Equation 7.44 are exactly the same as the 2SLS estimates of the original model, which is not so obvious
7.8 Testing overidentifying restrictions – Sargan test
At the beginning of this chapter we claimed that we have to assume the exogeneity of the instruments and that it is not possible to test whether they are valid, i.e., uncorrelated with u from the main equation. That is true for exactly (just) identified models
If we have more external instruments than needed to identify the model, the model is overidentified. This can improve efficiency but moreover can be exploited for a test of the validity of the instruments
Such a test for the validity of the instruments is the Sargan test or Hansen’s J -test
-
The main idea of the Sargan test is as following:
If we have more external instrument than needed for identification, we can calculate several different 2SLS estimates of \boldsymbol \beta using different set of instruments
If all the instruments are valid, the different 2SLS estimates of \boldsymbol \beta would all be consistent and converge to the same true values of \boldsymbol \beta
Hence, for a specific sample, the difference of the different estimates should not be larger than expected from sampling errors. If they do, there is something wrong with the instruments
It can be shown that this test can be carried out with a quite simple auxiliary equation approach described below
- The starting point is the second step regression (2SLS) of a structural equation like the following
y = \mathbf x \hat {\boldsymbol \beta}_{_{2SLS}} + \hat {x}_j \hat {\beta}_{j_{2SLS}} + \underbrace {\widehat {(u + \beta_j \hat e)} }_{\hat v} \tag{7.51}
- Thereby, \hat x_j was obtained by a first step regression of the endogenous \tilde x_j on the exogenous model variables \mathbf x (internal instruments) and on several external instruments \mathbf z
\tilde x_j = \mathbf x \hat{\boldsymbol \gamma}_1 + \mathbf z \hat{\boldsymbol \gamma}_2 + \hat e \tag{7.52}
We presuppose that \hat x_j was estimated in this first step with more than one external instrument. Hence, the model is overidentified
-
Subsequently, we estimate the following auxiliary equation, which is the Sargan test equation
\hat u = \mathbf x \boldsymbol \delta_1 + \mathbf z \boldsymbol \delta_2 + \epsilon \tag{7.53}
- Note that we use the 2SLS residuals calculated by Equation 7.13 and not \hat v from Equation 7.51
-
Now we test whether the 2SLS residuals \hat u are actually uncorrelated with all exogenous variables, in particular with the external instruments \mathbf z
If \mathbf x and \mathbf z are actually exogenous, the fit of Equation 7.53, measured with by n \cdot R^2, should be zero (besides sampling errors)
The test statistic n \cdot R^2 of this equation is \chi^2(m) distributed, m being the number of overidentifying restrictions, i.e., the number of external instruments minus the number of rhs endogenous variables
If we reject the H_0 (all instruments are valid), then at least one external instrument is not valid, i.e., not exogenous and therfore erroneously excluded from the main equation
Why does this test only work for overidentified models?
This test only works if we have more external instruments than rhs endogenous variables
-
Suppose not and we have one rhs endogenous variable and only one external instrument z_1
Then \hat x from the first step equation is (besides \mathbf x) simply a multiple to the one external instrument z_1
-
In this case one can show that z_1 is orthogonal to \hat u, the 2SLS residuals (conditional on \mathbf x); for a proof, see footnote 5
- As the exogenous variables in \mathbf x are also orthogonal to \hat u, it follows that all parameters \boldsymbol \delta of the test Equation 7.53 are zero and the R^2 is always zero as well. Hence, a test for correlation with \hat u is not possible in this case
If we have more external instruments than rhs endogenous variables, than \hat x is not a simple multiple of one instrument but equals a particular linear combination of several z_j. Thus, the single z_j are not automatically orthogonal to \hat u and therefore, a test for correlation with \hat u is possible 6
-
Rejecting the H_0 does not tell us which external instrument is invalid
However, if we have at least more than one overidentifying restriction, we can infer whether certain subgroups of instruments are valid. In this case, we estimate the model with only a subgroup of external instruments (in which we have more confidence) and calculate the Saragn test statistic for this subgoup
Afterwards, we estimate the model with all external instruments and calculate the Sargan test statistic, which is usually larger than the previous one. The difference of these two test statistics (which is \chi^2(m-m_1)) should not differ too much. If they do, the compliment set of the trustworthy subgroup is invalid; – J-Diff-test
-
A problem with the Sargan test is that the power of the test (see Section 3.4) could be quite low, especially if the external instruments have a common source or are highly correlated
For instance, the 2SLS estimates obtained by two different instruments could be very similar, even if both instruments are invalid
Therefore, if we are not able to reject the H_0 of the Sargan test, we should not rely too much on this result, especially if the external instruments are highly correlated
In the following more formal section we will show more clearly the connection of this test with overidentifying restrictions and introduce General Methods of Moments (GMM) estimators
7.8.1 GMM estimators
7.9 Summary
With IV/2SLS estimation techniques we can handle the problem of endogenous rhs variables, which is a widespread phenomenon
-
The drawback of IV/2SLS estimates are the generally much larger standard errors of the estimated parameters
- Therefore, one should always use OLS if this is justifiable. The Hausman-Wu test can give some indication for this matter
The drawbacks of IV/2SLS are particularly present if we have only weak instruments. Thus, a test for weak instruments as described above is mandatory for a credible analysis
Furthermore, if we have an overidentified model, a Sargan test (J-test) is mandatory as well; the whole IV/2SLS procedure is grounded on valid instruments. If only one instrument is not valid, the entire analysis breaks down
Corrections for heteroskedasticity / serial correlation analogous to OLS and IV/2SLS easily extends to time series and panel data situations
In the following example, we once again estimate our wage equation by 2SLS (as we did at the begin of this chapter), but this time we additionally carry out the diagnostic tests described above. Fortunately, the R procedure
ivreg
does most of the work for us
7.10 Example – 2SLS with diagnostics
2SLS estimation with father and mother education as instrument for education
We have two external instruments, thus the model is overidentified
library(wooldridge); library(AER); library(texreg)
data("card")
# Complication: Because of missing values in `fatheduc` and `motheduc`,
# which would make some problems when we carry out some test by hand
# We generate a new data set `card1` with missing values of `fatheduc` and `motheduc` excluded
card1 <- subset(card, card$fatheduc>-1 & card$motheduc>-1)
# 2SLS estimation
iv12 <- ivreg(lwage ~ educ + exper + I(exper^2) + black + smsa + south |
fatheduc + motheduc + exper + I(exper^2) + black + smsa + south,
data=card1)
# Saving 2SLS residuals
resid_iv <- iv12$residuals
# To get the three described diagnostic tests for 2SLS, we have to set
# the option "diagnostics=TRUE"
# summary(iv12, diagnostics = TRUE)
Code
# Modifications for modelsummery to print Diagnostic statistics for ivreg
library(broom)
glance_custom.ivreg <- function(x, ...) {
Dia <- " "
WI <- paste0( sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[1,3], 2 ), fmt = '%4.2f'), " [",
sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[1,4], 3 ), fmt = '%4.3f'), "]" )
WU <- paste0( sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[2,3], 2 ), fmt = '%4.2f'), " [",
sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[2,4], 3 ), fmt = '%4.3f'), "]" )
SA <- paste0( sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[3,3], 2 ), fmt = '%4.2f'), " [",
sprintf(round( summary(x, diagnostics = TRUE)$diagnostics[3,4], 3 ), fmt = '%4.3f'), "]" )
out <- data.frame( "Diagnostics" = Dia,
"Weak Instr" = WI,
"Hausman WU" = WU,
"Sargan" = SA )
return(out)
}
Code
#summary(iv12, diagnostics = TRUE)
library(modelsummary)
modelsummary(list( "2SLS" = iv12 ),
shape = term ~ statistic,
statistic = c('std.error', 'statistic', 'p.value', 'conf.int'),
stars = TRUE,
gof_omit = "A|L|B|F",
align = "ldddddd",
fmt= 4,
output = "gt")
2SLS | ||||||
---|---|---|---|---|---|---|
Est. | S.E. | t | p | 2.5 % | 97.5 % | |
(Intercept) | 4.2642*** | 0.2189 | 19.4792 | <1e-04 | 3.8349 | 4.6934 |
educ | 0.0999*** | 0.0128 | 7.8341 | <1e-04 | 0.0749 | 0.1249 |
exper | 0.0989*** | 0.0095 | 10.3954 | <1e-04 | 0.0802 | 0.1175 |
I(exper^2) | -0.0024*** | 0.0004 | -6.1028 | <1e-04 | -0.0032 | -0.0017 |
black | -0.1506*** | 0.0260 | -5.8009 | <1e-04 | -0.2015 | -0.0997 |
smsa | 0.1509*** | 0.0196 | 7.6933 | <1e-04 | 0.1125 | 0.1894 |
south | -0.1073*** | 0.0181 | -5.9364 | <1e-04 | -0.1427 | -0.0718 |
Num.Obs. | 2220 | |||||
R2 | 0.253 | |||||
RMSE | 0.38 | |||||
Diagnostics | ||||||
Weak.Instr | 127.78 [0.000] | |||||
Hausman.WU | 3.97 [0.047] | |||||
Sargan | 2.05 [0.152] | |||||
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
Test for weak instruments by hand
# First stage regression
first <- lm(educ ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa + south, data = card1)
# Testing whether coefficients of external instruments are jointly zero
lht(first, c("motheduc", "fatheduc"))
Linear hypothesis test
Hypothesis:
motheduc = 0
fatheduc = 0
Model 1: restricted model
Model 2: educ ~ fatheduc + motheduc + exper + I(exper^2) + black + smsa +
south
Res.Df RSS Df Sum of Sq F Pr(>F)
1 2214 8594.3
2 2212 7704.2 2 890.12 127.78 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Doing the Hausman-Wu test by hand
# We need the residual of first stage regression of educ on all exoegenous variables
resid1 <- first$residuals
# Regressing the model of interest with residuals of the first stage regression
# as additional variable
# Hausman-Wu test; Look at the p-value of resid1
# Further, compare estimated coefficients with the 2SLS estimates; they are are identical
Hausman_Wu <- lm(lwage ~ educ + exper + I(exper^2) + black + smsa + south +
resid1, data = card1)
Code
library(modelsummary)
modelsummary(list( "Hausman_Wu" = Hausman_Wu ),
shape = term ~ statistic,
statistic = c('std.error', 'statistic', 'p.value', 'conf.int'),
stars = TRUE,
gof_omit = "A|L|B|F",
align = "ldddddd",
fmt= 4,
output = "gt")
Hausman_Wu | ||||||
---|---|---|---|---|---|---|
Est. | S.E. | t | p | 2.5 % | 97.5 % | |
(Intercept) | 4.2642*** | 0.2171 | 19.6427 | <1e-04 | 3.8384 | 4.6899 |
educ | 0.0999*** | 0.0126 | 7.8998 | <1e-04 | 0.0751 | 0.1247 |
exper | 0.0989*** | 0.0094 | 10.4826 | <1e-04 | 0.0804 | 0.1174 |
I(exper^2) | -0.0024*** | 0.0004 | -6.1540 | <1e-04 | -0.0032 | -0.0017 |
black | -0.1506*** | 0.0257 | -5.8496 | <1e-04 | -0.2011 | -0.1001 |
smsa | 0.1509*** | 0.0195 | 7.7578 | <1e-04 | 0.1128 | 0.1891 |
south | -0.1073*** | 0.0179 | -5.9862 | <1e-04 | -0.1424 | -0.0721 |
resid1 | -0.0266* | 0.0134 | -1.9915 | 0.0465 | -0.0528 | -0.0004 |
Num.Obs. | 2220 | |||||
R2 | 0.266 | |||||
RMSE | 0.38 | |||||
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
Sargan test by hand
```{r}
#| comment: " "
# Regression of IV residuals on all exogenous variables
<- lm(resid_iv ~ fatheduc + motheduc +
sargan + I(exper^2) + black + smsa + south,
exper data = card1)
# Test statistic: J = n * R^2
<- length( sargan$residuals ) * summary(sargan)$r.squared
J
print("Result of Sargan test")
print( paste( "J-stat =", sprintf( "%.3f",J ), " p-value =", sprintf( "%.4f",1-pchisq(J,1) ) ) )
```
[1] "Result of Sargan test"
[1] "J-stat = 2.051 p-value = 0.1522"
In principle, every variable not part of a correctly specified structural equation that is uncorrelated with the error term u (condition 1. is met), could serve as an external instrument, especially if condition 3. is met. So, strictly speaking, condition 2. is redundant but nonetheless helpful for the distinction of external and internal instruments and to understand the basic problem of finding external instruments.↩︎
If the strong requirements for proxy variables are satisfied, see Equation 2.47 and the following analysis.↩︎
In particular, OLS predicted values \hat y = \mathbf Py and OLS residuals \hat u = \mathbf My are always uncorrelated because \mathbf P and \mathbf M are orthogonal matrices, see Equation C.10 and Equation C.11.↩︎
This is not the case if the 2SLS estimates are calculated by hand, like described in text following Equation 7.9. The reason is that in this case in the second stage regression the exogenous variables remain unchanged as regressors are not replaced by their linear projections. This can lead to a correlation of the exogenous variable, which was left out in the first stage regression, with the residuals of the first stage regression, violating MLR.4’ in Equation 7.12.↩︎
Note that generally, the 2SLS residual do not retain the orthogonality property from their OLS counterparts, which makes the argument in the text considerably more complicated to proof.
First of all, we have to distinguish the residuals of Equation 7.51, \hat v from the 2SLS residuals:
The former a are defined as \hat v = y-\hat {\boldsymbol \beta} \mathbf x - \hat x_j \hat \beta_j and the latter are \hat u = y-\hat {\boldsymbol \beta} \mathbf x - \tilde x_j \hat \beta_j. Substituting \tilde x_j = \hat x_j +\hat e from the first stage regression we get: \hat u = y-\hat {\boldsymbol \beta} \mathbf x - ( \hat x_j +\hat e) \hat \beta_j \; = \; y-\hat {\boldsymbol \beta} \mathbf x - \hat x_j \beta_j - \hat e \hat \beta_j \; \Rightarrow \; \hat u = (\hat v - \hat e \hat \beta_j), which is not obvious.
Secondly, we proof that \mathbf x is uncorrelated with \hat u = \hat v - \beta_j \hat e:
Because of the orthogonality property of OLS it follows from the first stage regression that \mathbf x is uncorrelated with the first stage residuals \hat e. From the second stage regression Equation 7.51 it follows that \mathbf x is uncorrelated with \hat v as well. Thus, \mathbf x is uncorrelated with \hat u.
Thirdly, we proof that z_1 is uncorrelated with \hat u = \hat v - \beta_j \hat e:
If we have only one instrument z_1, \hat x_j in Equation 7.51 is: \hat x_j = \mathbf x \hat {\boldsymbol \gamma_1} + z_1 \hat \gamma_{2,1}. As \hat x_j and \mathbf x are uncorrelated with \hat v (orthogonality in Equation 7.51), z_1 must be uncorrelated with \hat v as well.
Furthermore, z_1 is uncorrelated with the first stage residuals \hat e as well, because of the orthogonality property of OLS. Hence, z_1 is uncorrelated with \hat u.
Therefore, both \mathbf x and z_1 are uncorrelated with \hat u from Equation 7.53, leading to R^2=0 in this case.↩︎Regarding the third argument of the previous footnote we now have:
\hat x_j = \mathbf x \hat {\boldsymbol \gamma}_1 + z_1 \hat \gamma_{2,1} + z_2 \hat \gamma_{2,2}. As \hat x_j and \mathbf x are uncorrelated with \hat v (orthogonality in Equation 7.51), the linear combination z_1 \hat \gamma_{2,1} + z_2 \hat \gamma_{2,2} must be uncorrelated with \hat v as well.
Furthermore, z_1 and z_2 are uncorrelated with the first stage residuals \hat e, because of the orthogonality property of OLS. We conclude, z_1 \hat \gamma_{2,1} + z_2 \hat \gamma_{2,2} is uncorrelated with \hat u but not z_1 or z_2 for themselves.
Hence, z_1 and z_2 are generally correlated with \hat u in Equation 7.53 and so, we generally have R^2 \neq 0.↩︎