OLS is still unbiased and consistent under heteroscedastictiy
Also, interpretation of R-squared is not changed
Heteroscedasticity invalidates variance formulas for OLS estimators
Hence, the usual F-tests and t-tests formulas are not valid under heteroscedasticity
Under heteroscedasticity, OLS is no longer the best linear unbiased estimator (BLUE). There may be more efficient linear estimators
6.2 Heteroscedasticity-robust inference
Formulas for OLS standard errors and related statistics have been developed that are robust to heteroscedasticity of unknown form.
These heteroscedasticity-robust (corrected) standard error are called HC standard errors, or HCCM, or White (Huber) standard errors. For the single variable case one can show that
White (1980) showed that one can replace \sigma_i^2 by the squared OLS residuals \hat u_i^2 to get a consistent estimate for \operatorname {Var}(\hat \beta_1)
Note, if \sigma_i^2=\sigma^2, i.e. const, than the above formula in Equation 6.1 reduce to the usual one: \operatorname {Var}(\hat \beta_1)=\sigma^2 / SST_x (Equation 1.19)
In the multivariate case, (x_i - \bar x)^2 is replaced by \hat r_{i,j}^2, which are the residuals of a regression of x_j on all the other x, and SST_x is replaced by SST_{j} of that regression (see FWL-Theorem, Section 2.5)
Formula form Equation 6.2 for the heteroscedasticity-robust estimate of the variance is only valid in large samples
Using Equation 6.2, the usual t-test is asymptotically valid
Heteroscedasticity robust versions of the F-statistic are also available in most software
In most software packages, these heteroscedasticity corrected formulas for the standard errors (in general, for the covariance matrix) are invoked by simply setting an option with the call of the estimation or test procedure
Matrix notation
A Digression in matrix notation
Under homoscedasticity we assume the covariance matrix:
We use the normalization: \sum_{i=1}^{n} w_{i} = n and w_i > 0. This normalization has the advantage that with w_i = 1, \forall i, we immediately reach to the homoscedasticity case with \operatorname {Var}(\mathbf{u} \mid \mathbf{X})=\sigma^{2} \mathbf{I}
Further, we can interpret \sigma^2 as the average variance \bar \sigma^2 and the w_i s are simply scaling factors.
The covariance matrix for \hat \beta is (using Equation C.4):
With \Omega being the diagonal matrix of weighs defined above. If \Omega = I, the formula collapses to the standard (homoscedastic) covariance matrix for \hat \beta
\sigma^{2}(\mathbf{X}' \mathbf{X})^{-1}
\tag{6.5}
Estimating the general covariance matrix for \boldsymbol{\hat \beta}
As we assume, that the form of the heteroscedasticity is unknown, we would have to estimate the nw_is from Equation 6.3, but this number can not be consistently estimated with n-k-1 degrees of freedom left
Fortunately, it suffices to estimate only the middle part of the sandwich matrix in Equation 6.4, which is a (k+1) x (k+1) matrix
We divide by n^2, so that a plim exists (the typical element of this matrix is a weighted average of expectations)
The basic idea is that, because of the consistency of \boldsymbol{\hat \beta}, the residuals \hat{\mathbf{u}} become more and more similar to the error term \mathbf{u} with a growing sample
Therefore, as \sigma_i^2 = E(u_i^2 \mid \mathbf x_i), White (1980) showed that replacing \sigma_i^2 with \hat u_i^2 (as u_i^2 is not observable) leads to a consistent estimate:
Wage equation with ordinary and HC se (se and t-statistics in
brackets)
Ord. se
HC se
(Intercept)
0.1280
0.1280
(0.1059)
(0.1084)
(1.2083)
(1.1804)
educ
0.0904***
0.0904***
(0.0075)
(0.0079)
(12.1004)
(11.4550)
exper
0.0410***
0.0410***
(0.0052)
(0.0051)
(7.8916)
(8.1162)
expersq
-0.0007***
-0.0007***
(0.0001)
(0.0001)
(-6.1639)
(-6.4471)
Num.Obs.
526
526
R2
0.300
0.300
R2 Adj.
0.296
0.296
RMSE
0.44
0.44
Std.Errors
IID
Custom
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
F-test
# Testing linear hypothesislht(lmout, c("exper", "expersq"))
Linear hypothesis test
Hypothesis:
exper = 0
expersq = 0
Model 1: restricted model
Model 2: lwage ~ educ + exper + expersq
Res.Df RSS Df Sum of Sq F Pr(>F)
1 524 120.77
2 522 103.79 2 16.979 42.696 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F-test with HC test statistic
# Testing linear hypothesis with hccm by setting the option "vcov = hccm"lht(lmout, c("exper", "expersq"), vcov. =hccm)
Linear hypothesis test
Hypothesis:
exper = 0
expersq = 0
Model 1: restricted model
Model 2: lwage ~ educ + exper + expersq
Note: Coefficient covariance matrix supplied.
Res.Df Df F Pr(>F)
1 524
2 522 2 42.905 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Summary
Using heteroscedasticity corrected standards errors (or a HC covariance matrix) lead to asymptotically valid test statistics
But it may still be interesting to know, whether we have a heteroscedasticity problem at all as OLS is not be the most efficient estimator in this case
Furthermore, small sample properties of HC standard errors are uncertain. So, if there is no heteroscedasticity problem it is better to use the standard formulas
6.3 Breusch-Pagan test
The most used procedure for testing of heteroscedasticity is the Breusch-Pagan test; Breusch and Pagan (1979).
We want to test for a constant variance:
H_0: \ \operatorname {Var}(u \mid \mathbf {x}) = \sigma^2
This implies (see Appendix A):
Therefore, to detect dependencies of \operatorname {Var}(u) from the explanatory variables, estimate an auxiliary equation and regress \hat u_i^2 on all explanatory variables of the original model
# F Test# With summary(bp) we directly get the F-Statistic form the output # of the auxiliary equation (last line)summary(bp)
Call:
lm(formula = lmout$residuals^2 ~ lotsize + sqrft + bdrms, data = hprice1)
Residuals:
Min 1Q Median 3Q Max
-9044 -2212 -1256 -97 42582
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.523e+03 3.259e+03 -1.694 0.09390 .
lotsize 2.015e-01 7.101e-02 2.838 0.00569 **
sqrft 1.691e+00 1.464e+00 1.155 0.25128
bdrms 1.042e+03 9.964e+02 1.046 0.29877
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6617 on 84 degrees of freedom
Multiple R-squared: 0.1601, Adjusted R-squared: 0.1301
F-statistic: 5.339 on 3 and 84 DF, p-value: 0.002048
Hence, both test variants strongly reject the H_0 of homoscedasticity
Breusch-Pagan test with bptest()
Much more conveniently, we can use the bptest() function to directly perform an Breusch-Pagan test:
# Default: all explanatory variables are usedbptest(lmout)
studentized Breusch-Pagan test
data: lmout
BP = 14.092, df = 3, p-value = 0.002782
The test result is identical to the above one
# The same model in logslmout2<-lm(log(price)~log(lotsize)+log(sqrft)+bdrms, data =hprice1)bptest(lmout2)
studentized Breusch-Pagan test
data: lmout2
BP = 4.2232, df = 3, p-value = 0.2383
Taking logs often reduces the heteroscedasticity problem
White test
With the Breusch-Pagan test, \hat u^2 is regressed on all explanatory variables of the model to detect dependencies of \sigma_i^2 on these variables
But these dependencies of could have a more complicate and non-linear character
This aspect is tackled by the Whited test
Thereby, \hat u^2 is not only regressed on all explanatory variables but additionally on the squares of the variables and all cross-products (interactions)
To carry out the test, we can use the bptest() function and directly specify all the variables, we want to have in the auxiliary equation.
So, the White test differ from the Breusch-Pagan test as the former also includes non-linear transformations of the explanatory variables
# Model in levels# The "~" in the function call means that the following is a formula bptest(lmout, ~(lotsize+sqrft+bdrms)^2+I(lotsize^2)+I(sqrft^2)+I(bdrms^2), data =hprice1)
studentized Breusch-Pagan test
data: lmout
BP = 33.732, df = 9, p-value = 9.953e-05
A disadvantage of this form of the White test is that including all squares and interactions leads to a large number of estimated parameters (e.g. k=6 leads to 27 parameters to be estimated)
A variant the of White test
Regress \hat u^2 on fitted, \hat y_i, and squared fitted values, \hat y_i^2. This regression indirectly tests the dependence of the squared residuals on the explanatory variables, their squares, and interactions, because the predicted value of y_i and its square implicitly contain all of these terms.
If the other Gauss-Markov assumptions hold as well (MLR.1 - MLR.5, see Section 2.7), OLS applied to the transformed model is the best linear unbiased estimator.
Important special case of heteroscedasticity
If the observations are reported as averages regarding a the city / county / state / country or firm level, they should be weighted by the size of the unit
For instance,
regressing average contribution to pensions of firm ion
average earnings if firm i and average age of employees in firm i,
one should multiply the variables with the square root of the number of employees
The reason is that the larger the firm, the less variation in the data and in u_i (because of the LLN, Theorem A.2)
In particular, if errors are homoscedastic at the individual-level, WLS with weights equal to group size should be used. So, the variables have to be multiplied by \sqrt n, n being the size of the individual group
If the assumption of homoscedasticity at the individual-level is not exactly right, one can calculate robust standard errors after WLS (i.e., for the transformed model)
6.4.2 Unknown heteroscedasticity function (FWLS)
If the particular form of the heteroscedasticity is unknown, i.e., we don’t know h_i=h(\mathbf x_i) from Equation 6.11, we have to estimateh_i in advance. Afterwards, we processed as with WLS. This procedure is called feasible weighted least square (FWLS).
Simply regressing \hat u_i^2 on the explaining variables, as we did for the Breusch-Pagen test with Equation 6.10, does not work because the estimated h_i (the fitted values of such an equation) have to be strict non-negative (variances cannot be negative)
Therefore, we assume as in Equation 6.11 that \operatorname {Var}(u_i \mid \textbf{x}_i) = \sigma_i^2 = \sigma^2 h(\mathbf x_i), but furthermore presume the special form h(\mathbf x_i) = \exp(\mathbf x_i \boldsymbol \delta), which always generate positive values
Hence, as \operatorname {Var}(u_i \mid \textbf{x}_i) = E(u_i^2 \mid \textbf{x}_i), we have
Use the inverse square root of the estimated heteroscedasticity function 1 / \sqrt {\hat h_i} to transform the data, as we did in the case of a known heteroscedasticity function
Software packages often expect a special input form for weights using WLS options, e.g.: w_i = 1/\hat h_i in R
For other packages, consult the manual which input form of weights is expected
Feasible WLS (a special case of feasible GLS) is consistent and asymptotically more efficient than OLS
Recipe for applying FWLS
Estimate the model of interest with OLS and calculate the residuals \hat u_i
Calculate the log of the squared residuals, \log \hat u_i^2, and regress these on the independent variables to get the fitted values \widehat{\log \hat u_i^2 }
The looked for variance factors are the \exp(.) (anti-logarithms) of the fitted values: \hat h_i = \exp(\widehat{\log \hat u_i^2 }), which are strictly positive. Then, FWLS is WLS with estimated \hat h_i instead of known h_i
There are two possible ways to further proceed:
Either, transform all the model variables (including the intercept!) by multiplying with 1 / \sqrt {\hat h_i} and estimate the transformed model with OLS to get FWLS
Or, use the option: weight = 1/h of the lm() function in R, where h is the estimated factor \hat h_i
The latter procedure is much more convenient than transforming the variables “by hand”, but notice that in R the FWLS residuals are not stored in the output list of lm(). If you want the FWLS residuals, you have to look for in the summary() output list
Example - FWLS
As an example, we estimate the demand for cigarettes as a function of income, prices, education, age and age2 and a dummy variable measuring restrictions of smoking in restaurants.
Of special interest are the effects of income, prices and restrictions of smoking in restaurants.
In R, we load the data set smoke of the wooldridge package. Further, we load the package AER, which contains many useful econometric tools, for instance HC standard errors, and the package modelsummary to make fancy output tables.
First line of defense against the strong heteroscedasticity: We calculate HC standard errors, so that at least the t-tests (and all other tests) are asymptotically valid. To do this, we use the vcov. = vcovHC option in coeftest().
# calculate HC standard errors with option vcovHC coeftest(out, vcov. =vcovHC)
Surprisingly, the calculated HC t-statistics are not very different to the uncorrected ones
The second line strategy is to apply FWLS (FGLS). To do this, we first estimate the auxiliary equation with log (!) of squared residuals dependent on all explanatory variables, Equation 6.10:
aux<-lm(log(sqres)~log(income)+log(cigpric)+educ+age+I(age^2)+restaurn, data =smoke)
Applying exp (the inverse function of log) on the fitted values of the auxiliary equation yields the estimated weights \hat h_i:
We calculate the weights 1/\hat h_i, which is the form R expects:
# lm() function needs the form 1/hi as option variablew<-1/h_hat
We once more call lm() for the original model, but now with the option weights = w, which provides the FWLS estimates, stored in the list fwls:
# FGLS, weigts = 1/h_hatfwls<-lm(cigs~log(income)+log(cigpric)+educ+age+I(age^2)+restaurn, data =smoke, weights =w)# coefs and statistics of FWLScoeftest(fwls)
Comparing the results of OLS, OLS with HC t-statistics and FWLS, (se
and t-statistics in brackets)
OLS
OLS HC
FWLS
log(income)
0.8803
0.8803
1.2952**
(0.7278)
(0.6014)
(0.4370)
(1.2095)
(1.4637)
(2.9639)
log(cigpric)
-0.7509
-0.7509
-2.9403
(5.7733)
(6.0899)
(4.4601)
(-0.1301)
(-0.1233)
(-0.6592)
educ
-0.5015**
-0.5015**
-0.4634***
(0.1671)
(0.1631)
(0.1202)
(-3.0016)
(-3.0743)
(-3.8570)
age
0.7707***
0.7707***
0.4819***
(0.1601)
(0.1395)
(0.0968)
(4.8132)
(5.5251)
(4.9784)
I(age^2)
-0.0090***
-0.0090***
-0.0056***
(0.0017)
(0.0015)
(0.0009)
(-5.1765)
(-6.1091)
(-5.9897)
restaurn
-2.8251*
-2.8251**
-3.4611***
(1.1118)
(1.0114)
(0.7955)
(-2.5410)
(-2.7932)
(-4.3508)
Num.Obs.
807
807
807
R2
0.053
0.053
0.113
RMSE
13.35
13.35
13.40
Std.Errors
IID
Custom
IID
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
The difference between OLS and OLS with HC se and t-statistics are not large but noticeable
The differences between the OLS and FWLS estimates are quite large. Especially, income is now statistically significant and we have substantially larger effects of cig-prices on cig demand as well! (a more than three times larger effect)
Furthermore, the effect of smoking restrictions is much more substantial and more significant
These much more significant results might demonstrate the expected efficiency gains of WLS
But we have to caution: Under our standard assumptions, both OLS and FWLS are consistent
Hence, in a large sample they should converge to the same true coefficients
In our example, the differences are quite large, although we have more than 800 observations. This might be a hint, that one of our standard assumption is not valid, especially E(u_i \mid \textbf{x}_i) = 0
However, whether these differences in the estimated coefficients are overall significant is unclear and have to be tested – Hausman test
Actually, it is quite plausible that cig-prices and smoking restrictions by the governments vary in response to cig demand. But this would violate E(u_i \mid \textbf{x}_i) = 0, leading to biased and inconsistent estimates even for FWLS and FGLS
IV-estimators are a possible remedy
Breusch, Trevor S, and Adrian R Pagan. 1979. “A Simple Test for Heteroscedasticity and Random Coefficient Variation.”Econometrica: Journal of the Econometric Society, 1287–94.
White, Halbert. 1980. “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.”Econometrica: Journal of the Econometric Society, 817–38.
Wooldridge, Jeffrey M. 2019. Introductory Econometrics: A Modern Approach. Seventh ed. Boston: Cengage.
For a general treatment of generalized least square (GLS) see the discussion following Equation 9.45. For our particular problem at hand, the inverse square root matrix \Omega^{-\frac{1}{2}} of the covariance matrix of heteroscedastic errors, \sigma^2\Omega, is a diagonal matrix with elements \frac {1}{\sqrt h_i}.↩︎
---title-block-banner: truesubtitle: "Mainly based on @wooldridge_Intro_Econometrics, Chapters 6"---# Heteroscedasticity## Consequences of heteroscedasticity for OLSHeteroscedasticity means that the conditional variance of the error term is **not constant** but depends on some explanatory variables, i.e.$${\operatorname {Var}}(u_i \mid \textbf{x}) = \sigma_i^2 = \sigma^2 h(\textbf{x})$$**Consequences**:1. OLS is still *unbiased* and *consistent* under heteroscedastictiy2. Also, interpretation of R-squared is not changed3. Heteroscedasticity invalidates **variance** formulas for OLS estimators - Hence, the usual F-tests and t-tests formulas are **not valid** under heteroscedasticity4. Under heteroscedasticity, OLS is *no longer the best linear unbiased estimator* (BLUE). There may be more efficient linear estimators------------------------------------------------------------------------## Heteroscedasticity-robust inferenceFormulas for OLS standard errors and related statistics have been developed that are robust to heteroscedasticity of unknown form.- These **heteroscedasticity-robust** (corrected) **standard error** are called HC standard errors, or HCCM, or *White* (Huber) standard errors. For the **single** variable case one can show that$$\operatorname {Var}(\hat \beta_1) =\dfrac{\sum_{i=1}^{n} (x_i - \bar x)^2 \sigma_i^2 }{SST_x^2}$$ {#eq-varheterosingle}- @white1980heteroskedasticity showed that one can replace $\sigma_i^2$ by the squared OLS residuals $\hat u_i^2$ to get a consistent estimate for $\operatorname {Var}(\hat \beta_1)$$$\widehat{\operatorname {Var}(\hat \beta_1)} = \dfrac{\sum_{i=1}^{n} (x_i - \bar x)^2 \hat u_i^2 }{SST_x^2}$$ {#eq-varheterosingleHC}- Note, if $\sigma_i^2=\sigma^2$, i.e. const, than the above formula in @eq-varheterosingle reduce to the usual one: $\operatorname {Var}(\hat \beta_1)=\sigma^2 / SST_x$ (@eq-var_beta1) - In the **multivariate** case, $(x_i - \bar x)^2$ is replaced by $\hat r_{i,j}^2$, which are the residuals of a regression of $x_j$ on all the other $x$, and $SST_x$ is replaced by $SST_{j}$ of that regression (see FWL-Theorem, @sec-FWL)Formula form @eq-varheterosingleHC for the **heteroscedasticity-robust estimate of the variance is only valid in large samples**- Using @eq-varheterosingleHC, the usual *t*-test is asymptotically valid- Heteroscedasticity robust versions of the *F*-statistic are also available in most software- In most software packages, these heteroscedasticity corrected formulas for the standard errors (in general, for the covariance matrix) are invoked by simply setting an option with the call of the estimation or test procedure------------------------------------------------------------------------#### Matrix notation::: {.callout-caution collapse="true" icon="false"}## A Digression in matrix notationUnder homoscedasticity we assume the covariance matrix:$$\operatorname {Var}(\mathbf{u} \mid \mathbf{X})=\sigma^{2} \mathbf{I}$$Heteroscedasticity means:$$\operatorname {Var}\left(u_{i} \mid \mathbf{x}_{i}\right)=\sigma_{i}^{2}, \quad i=1, \ldots, n, $$with the covariance matrix:$$\operatorname {Var}( \mathbf{u} \mid \mathbf{X}) \ = \ \sigma^{2} \mathbf{\Omega} \ =$$$$ \sigma^{2} \, \underbrace{ \left[ \begin{array}{cccc}w_{1} & 0 & \cdots & 0 \\0 & w_{2} & \cdots & 0 \\\vdots & \vdots & \ddots & \vdots \\0 & 0 & \cdots & w_{n}\end{array} \right] }_{\mathbf \Omega} \ = \ \left[\begin{array}{cccc}\sigma_{1}^{2} & 0 & \cdots & 0 \\0 & \sigma_{2}^{2} & \cdots & 0 \\\vdots & \vdots & \ddots & \vdots \\0 & 0 & \cdots & \sigma_{n}^{2}\end{array}\right] $$ {#eq-hetroMat}with $\sigma_{i}^{2} := \sigma^{2} w_{i}$.We use the normalization: $\sum_{i=1}^{n} w_{i} = n$ and $w_i > 0$. This normalization has the advantage that with $w_i = 1, \forall i$, we immediately reach to the homoscedasticity case with $\operatorname {Var}(\mathbf{u} \mid \mathbf{X})=\sigma^{2} \mathbf{I}$Further, we can interpret $\sigma^2$ as the average variance $\bar \sigma^2$ and the $w_i$ s are simply scaling factors.------------------------------------------------------------------------##### The covariance matrix for $\hat \beta$ is (using @eq-C_betahat_beta):$$\operatorname {Var}(\mathbf{\hat \beta} \mid \mathbf{X}) \ := \ E\left[( \ \hat{\boldsymbol{\beta}}-E(\hat{\boldsymbol{\beta}}) ( \hat{\boldsymbol{\beta}}-E(\hat{\boldsymbol{\beta}})^{\prime} \mid \mathbf{X}\right]\ =$$$$E\left[(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta})(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta})^{\prime} \mid \mathbf{X}\right] \ =$$$$E\left[ ( \ \underbrace{(\mathbf{X}' \mathbf{X})^{-1} \mathbf{X}^{\prime}\mathbf{u}}_{(\hat{\beta}-\beta)} \ )( \ \underbrace{(\mathbf{X}' \mathbf{X})^{-1} \mathbf{X}' \mathbf{u}}_{(\hat{\beta}-\beta)^{\prime}} \ )^{\prime} \mid \mathbf{X}\right] \ =$$$$ E \left[ (\mathbf{X}' \mathbf{X})^{-1} \mathbf{X} \mathbf{u} \mathbf{u}^{\prime} \mathbf{X}(\mathbf{X}' \mathbf{X})^{-1}) \mid \mathbf{X} \right] \ =$$$$\left( \mathbf{X}^{\prime} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime} \underbrace{ \ E\left(\mathbf{u u}^{\prime} \mid \mathbf{X}\right)}_{\sigma^{2} \Omega} \ \mathbf{X}(\mathbf{X}' \mathbf{X})^{-1} \ = $$$$\sigma^{2}(\mathbf{X}' \mathbf{X})^{-1} \, \mathbf{X}^{\prime} \mathbf{\Omega} \mathbf{X} \, (\mathbf{X}' \mathbf{X})^{-1} $$ {#eq-covMat_hetro}With $\Omega$ being the *diagonal* matrix of weighs defined above. If $\Omega = I$, the formula collapses to the standard (homoscedastic) covariance matrix for $\hat \beta$$$\sigma^{2}(\mathbf{X}' \mathbf{X})^{-1} $$ {#eq-covMat_homp}------------------------------------------------------------------------##### Estimating the general covariance matrix for $\boldsymbol{\hat \beta}$- As we assume, that the form of the heteroscedasticity is unknown, we would have to estimate the $n$ $w_i$s from @eq-hetroMat, but this number can not be consistently estimated with $n-k-1$ degrees of freedom left - Fortunately, it suffices to estimate only the middle part of the *sandwich matrix* in @eq-covMat_hetro, which is a $(k+1)$ x $(k+1)$ matrix - We divide by $n^2$, so that a plim exists (the typical element of this matrix is a weighted average of expectations)$$\dfrac{\sigma^{2} \mathbf{X}' \mathbf{\Omega} \mathbf{X}}{n^2} = \dfrac{1}{n^2}\sigma^{2} \sum_{i=1}^{n} w_{i} \mathbf{x}_{i}^{\prime} \mathbf{x}_{i} = \dfrac{1}{n^2}\sum_{i=1}^{n} \sigma_{i}^{2} \mathbf{x}_{i}^{\prime} \mathbf{x}_{i}$$ {#eq-hetrovarest}- The basic idea is that, because of the consistency of $\boldsymbol{\hat \beta}$, the residuals $\hat{\mathbf{u}}$ become more and more similar to the error term $\mathbf{u}$ with a growing sample - Therefore, as $\sigma_i^2 = E(u_i^2 \mid \mathbf x_i)$, @white1980heteroskedasticity showed that replacing $\sigma_i^2$ with $\hat u_i^2$ (as $u_i^2$ is not observable) leads to a consistent estimate:$$\dfrac{\widehat{ \sigma^2 \mathbf{X}' \mathbf{\Omega} \mathbf{X}})}{n^2} = \dfrac{1}{n^2}\sum_{i=1}^{n} \hat{u}_{i}^{2} \mathbf{x}_{i}' \mathbf{x}_{i}$$ {#eq-hetrovarest_uhat}------------------------------------------------------------------------Hence, a consistent estimate of the heteroscedasticity robust covariance matrix is:$$\widehat{\operatorname {Var}(\mathbf{\boldsymbol{\hat \beta}} \mid \mathbf{X})}_{HC} \ = \left(\dfrac{ \mathbf{X}' \mathbf{X} }{n}\right)^{-1} \left( \dfrac{1}{n^2}\sum_{i=1}^{n} \hat{u}_{i}^{2} \mathbf{x}_{i}' \mathbf{x}_{i} \right) \left( \dfrac{\mathbf{X}' \mathbf{X}}{n} \right)^{-1} $$ {#eq-HC_cov_mat}This reduces to$$(\mathbf{X}' \mathbf{X})^{-1} \, \underbrace{\left( \sum_{i=1}^{n} \hat{u}_{i}^{2} \mathbf{x}_{i}' \mathbf{x}_{i} \right)}_{(\widehat{\sigma^2 \mathbf X' \boldsymbol\Omega \mathbf X})} \, (\mathbf{X}' \mathbf{X})^{-1} $$ {#eq-HC_cov_mat1}:::------------------------------------------------------------------------#### Example: Hourly wage equation```{r}#| comment: " "library(AER)library(wooldridge)data(wage1)lmout <-lm(lwage ~ educ + exper + expersq, data = wage1)coeftest( lmout )```------------------------------------------------------------------------##### Same with heteroscedasticity robust se by setting the option `vcov.` = `hccm` or `vcov.` = `vcovHC`- Both are very similar functions which calculate the heteroscedasticity-robust covariance matrix, e.g. hc \<- `vcovHC`(model, `type`="`HC3`"); - type `HC0` is the White correction, `HC3` is default, considering outliers- Both are available with the package `AER`, relying on the packages `car` and `sandwich`, respectively```{r}#| comment: " "coeftest( lmout, vcov = vcovHC )```------------------------------------------------------------------------```{r}#| code-fold: true#| label: tbl-Hetero1#| tbl-cap: "Wage equation with ordinary and HC se (se and t-statistics in brackets)"library(modelsummary)modelsummary(list("Ord. se"=lmout, "HC se"=lmout),vcov =c("classical", vcovHC),stars =TRUE, fmt =4,statistic =c('std.error', 'statistic'),gof_omit ="AI|L|BIC|F",align ="ldd",output ="gt" )```------------------------------------------------------------------------##### F-test```{r}#| comment: " "# Testing linear hypothesislht( lmout, c("exper", "expersq") )```------------------------------------------------------------------------##### F-test with HC test statistic```{r}#| comment: " "# Testing linear hypothesis with hccm by setting the option "vcov = hccm"lht( lmout, c("exper", "expersq"), vcov. = hccm )```------------------------------------------------------------------------#### Summary- Using heteroscedasticity corrected standards errors (or a HC covariance matrix) lead to asymptotically valid test statistics- But it may still be interesting to know, whether we have a heteroscedasticity problem at all as OLS is not be the most efficient estimator in this case- Furthermore, small sample properties of HC standard errors are uncertain. So, if there is no heteroscedasticity problem it is better to use the standard formulas------------------------------------------------------------------------## Breusch-Pagan test {#sec-BP_test}The most used procedure for testing of **heteroscedasticity** is the Breusch-Pagan test; @breusch1979simple.We want to test for a constant variance:$$H_0: \ \operatorname {Var}(u \mid \mathbf {x}) = \sigma^2 $$ This implies (see Appendix A):$$\operatorname {Var}(u \mid \mathbf {x}) = E(u^2 \mid \mathbf {x}) - [ \underbrace{E(u \mid \mathbf {x})}_{\text{MRL.4:} \ = \ 0} ]^2 = E(u^2 \mid \textbf{x}) $$Hence, the $H_0$ becomes to:$$\Rightarrow \ H_0: \ E(u^2 \mid \mathbf x) = E(u^2) = \sigma^2$$> Therefore, to detect dependencies of $\operatorname {Var}(u)$ from the explanatory variables, estimate an **auxiliary equation** and regress $\hat u_i^2$ on all explanatory variables of the original model$$\hat{u}^2 = \delta_0 + \delta_1 x_1 + \ldots + \delta_k x_k + \epsilon$$ {#eq-hetero_aux}With the $H_0$:$$H_0: \ \delta_1 = 0, \ \delta_2 = 0, \ldots ,\ \delta_k = 0$$- There are two test statistics to carry out the test: - The first one is "the" F-statistic of a linear regression which tests whether *all* coefficients but the intercept are zero, see @sec-theFstat$$F \ = \ \dfrac{(SSR_r - SSR_{ur})/k}{SSR_{ur}/(n-k-1)} \ \sim \ F(k, n-k-1) $$- The second one is the lagrange multiplier variant and the actual Breusch-Pagan test statistic$$LM \ = \ n R_{\hat{u}^2}^2 \ \sim \ \chi^2(k)$$------------------------------------------------------------------------### Example: House prices```{r, echo = TRUE, comment=" "}data("hprice1")lmout <-lm(price ~ lotsize + sqrft + bdrms, data = hprice1 )coeftest(lmout)```------------------------------------------------------------------------#### Auxiliary equationEstimate the auxiliary equation: $\ \hat u^2 = f(x_1, ... , x_k)$```{r, echo = TRUE, comment=" "}bp <-lm( lmout$residuals^2~ lotsize + sqrft + bdrms, data = hprice1)coeftest(bp)```------------------------------------------------------------------------```{r, echo = TRUE, comment=" "}# We need the R^2 of the auxiliary equation bp# This is stored in the summary(bp) listsbp <-summary(bp)R2 <- sbp$r.squaredcat("R2 =",R2)# LM test: n*R2 is chisq(3) distributed. So we need n = df + k + 1, # df is stored in sbp$df[2]# Test statistic:cat("Chi2 =", (sbp$df[2]+3+1) * R2)# p-value:cat("pvalue =", 1-pchisq( (sbp$df[2]+3+1) * R2, 3) )```------------------------------------------------------------------------```{r, echo = TRUE, comment=" "}# F Test# With summary(bp) we directly get the F-Statistic form the output # of the auxiliary equation (last line)summary(bp)```- Hence, both test variants strongly reject the $H_0$ of homoscedasticity------------------------------------------------------------------------#### Breusch-Pagan test with `bptest()`##### Much more conveniently, we can use the `bptest()` function to directly perform an Breusch-Pagan test:```{r, echo = TRUE, comment=" "}# Default: all explanatory variables are usedbptest(lmout)```- The test result is identical to the above one------------------------------------------------------------------------```{r, echo = TRUE, comment=" "}# The same model in logslmout2 <-lm( log(price) ~log(lotsize) +log(sqrft) + bdrms, data = hprice1 )bptest(lmout2)```- Taking logs often reduces the heteroscedasticity problem------------------------------------------------------------------------#### White testWith the Breusch-Pagan test, $\hat u^2$ is regressed on all explanatory variables of the model to detect dependencies of $\sigma_i^2$ on these variables- But these dependencies of could have a more complicate and non-linear character- This aspect is tackled by the **Whited test** - Thereby, $\hat u^2$ is not only regressed on all explanatory variables but **additionally** on the **squares of the variables and all cross-products** (interactions)To carry out the test, we can use the `bptest()` function and directly specify all the variables, we want to have in the auxiliary equation.- So, the White test differ from the Breusch-Pagan test as the former also includes non-linear transformations of the explanatory variables```{r, echo = TRUE, comment=" "}# Model in levels# The "~" in the function call means that the following is a formula bptest( lmout, ~ (lotsize + sqrft + bdrms)^2+I(lotsize^2) +I(sqrft^2) +I(bdrms^2), data = hprice1 )```A disadvantage of this form of the White test is that including all squares and interactions leads to a large number of estimated parameters (e.g. k=6 leads to 27 parameters to be estimated)------------------------------------------------------------------------#### A variant the of White testRegress $\hat u^2$ on fitted, $\hat y_i$, and squared fitted values, $\hat y_i^2$. This regression indirectly tests the dependence of the squared residuals on the explanatory variables, their squares, and interactions, because the predicted value of $y_i$ and its square implicitly contain all of these terms.$$\hat{u}^2 = \delta_0 + \delta_1 \hat y + \delta_2 \hat {y}^2 + \epsilon$$$$H_0: \delta_1 = \delta_2 = 0 $$ $$ LM = n R_{\hat u^2 }^2 \sim \chi^2(2) $$To carry out the test, we once more use `bptest()````{r, echo = TRUE, comment=" "}# Model in levelsbptest( lmout, ~ lmout$fitted.values +I(lmout$fitted.values^2) )```------------------------------------------------------------------------```{r, echo=TRUE}# Model in logsbptest( lmout2, ~ lmout2$fitted.values +I(lmout2$fitted.values^2) )```The results are not identically, but similar to the former variant of the White test.------------------------------------------------------------------------##### A graphical version of the White test```{r}#| label: fig-bp_test#| fig-cap: "Residuals^2 versus fitted values"plot( lmout$residuals^2~ lmout$fitted.values, xlab="y hat", ylab="Squared Residuals")```------------------------------------------------------------------------## Weigted least square (WLS)### **Functional form of heteroscedasticity is known**We have the original model:$$y_i = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k + u_i$$with$$\operatorname {Var}(u_i \mid \textbf{x}_i) = \sigma_i^2 = \sigma^2 h_i, \ \ \ h_i = h(\mathbf{x}_i) > 0 $$ {#eq-known_hetero}Thereby, we assume that $h_i$ represents a **known** functional form of heteroscedasticity.We transform the original model by dividing through $\sqrt h_i$ which leads to **homoscedastic errors**:$$\dfrac{y_i}{\sqrt h_i} \ = \ \dfrac{\beta_0}{\sqrt h_i} + \beta_1 \dfrac{x_{i1}}{\sqrt h_i} + \beta_2 \dfrac{x_{i2}}{\sqrt h_i} + \dots + \beta_k \dfrac{x_{ik}}{\sqrt h_i} + \dfrac{u_i}{\sqrt h_i} $$This way, we get a model in transformed variables denoted with a "\*":$$y_i^* \ = \ \beta_0^* + \beta_1 x_{i1}^* + \beta_2 x_{i2}^* + \dots + \beta_k x_{ik}^* + u_i^*$$ {#eq-transf_model}And the variance of the transformed model is (considering @eq-known_hetero):$$\operatorname {\operatorname {Var}}(u_i^* \mid \textbf{x}_i) = \operatorname {Var}\left(\dfrac{u_i}{\sqrt h_i} \mid \mathbf{x}_i \right) = \frac{1}{h_i} Var(u_i \mid \mathbf{x}_i) = \frac{1}{h_i} \sigma^2 h_i = \sigma^2 $$> Because the variance of the error is now homoscedastic the transformed model, @eq-transf_model, can be estimated efficiently by OLS.> This procedure is called **Weighted Least Squares** (**WLS**), which is a form of **Generalized Least Squares** (**GLS**). [^heteroscedasticity-1][^heteroscedasticity-1]: For a general treatment of generalized least square (GLS) see the discussion following @eq-RE. For our particular problem at hand, the inverse square root matrix $\Omega^{-\frac{1}{2}}$ of the covariance matrix of heteroscedastic errors, $\sigma^2\Omega$, is a diagonal matrix with elements $\frac {1}{\sqrt h_i}$.------------------------------------------------------------------------Generally, we get WLS by minimizing weighted squared residuals (with some weights $w_i$) which is equivalent to applying OLS on transformed data:$$\min_{\hat \beta_i} \sum_i \hat u_i^2 \, w_i \ = \ \min_{\hat \beta_i} \sum_i \hat u_i^2 \dfrac {1}{h_i} \ = \ \min_{\hat \beta_i} \sum_i \left( \hat u_i \dfrac {1}{\sqrt h_i} \right)^2\ = $$ {#eq-WLS_minimization}$$\min_{\hat \beta_i} \sum_i \left( (y_i - \textbf{x}_i \hat {\boldsymbol \beta}) \dfrac {1}{\sqrt h_i} \right)^2 \ = \ \min_{\hat \beta_i} \sum_i (y_i^* - \textbf{x}_i^* \hat {\boldsymbol \beta} )^2 \ \Rightarrow \ \hat {\boldsymbol \beta}_{WLS}$$------------------------------------------------------------------------#### Why is WLS more efficient than OLS?- Observations with a large variance are less informative than observations with small variance and therefore should get less weight: - if the variance of $\hat u_i$ is large, much of the variation in $\hat u_i$ is accidentally and therefore conveys little information - on the other hand, if the variance of $\hat u_i$ is small, a large variation in $\hat u_i$ conveys much information - therefore, optimal weighting of the observations leads to a more efficient use of information in the data. Hence, WLS is BLUE- WLS is a special case of *generalized least squares* (GLS)------------------------------------------------------------------------#### Example: Savings and income$$ sav_i = \beta_0 + \beta_1 inc_i + u_i, \quad Var(u_i \mid inc_i)=\sigma^2 inc_i $$Transformed model:$$\dfrac{sav_i}{\sqrt{inc_i}} = \dfrac{\beta_0}{\sqrt{inc_i}} + \beta_1 \dfrac{inc_i}{\sqrt{inc_i}} + \dfrac{u_i}{\sqrt{inc_i}}$$This transformed model is homoscedastic:$$\operatorname {Var} \left( \dfrac{u_i}{\sqrt{inc_i}} \mid inc_i \right) = \dfrac {1}{inc_i} \operatorname {Var} ( u_i \mid inc_i ) = \frac{1}{inc_i}\sigma^2 inc_i = \sigma^2 $$If the other Gauss-Markov assumptions hold as well (MLR.1 - MLR.5, see @sec-MLR), OLS applied to the *transformed* model is the *best linear unbiased estimator*.------------------------------------------------------------------------#### Important special case of heteroscedasticity- If the observations are reported as **averages** regarding a the city / county / state / country or firm level, *they should be weighted by the size of the unit*- For instance, - regressing *average* contribution to pensions of firm $i$ *on* - *average* earnings if firm $i$ and *average* age of employees in firm $i$, - one should multiply the variables with the square root of the number of employees- The reason is that the larger the firm, the less variation in the data and in $u_i$ (because of the LLN, @thm-LLN)- In particular, if errors are homoscedastic at the individual-level, WLS with weights equal to group size should be used. So, the variables have to be multiplied by $\sqrt n$, $n$ being the size of the individual group- If the assumption of homoscedasticity at the individual-level is not exactly right, one can calculate robust standard errors after WLS (i.e., for the transformed model)------------------------------------------------------------------------### Unknown heteroscedasticity function (FWLS)If the particular form of the heteroscedasticity is unknown, i.e., we don't know $h_i=h(\mathbf x_i)$ from @eq-known_hetero, we have to *estimate* $h_i$ in advance. Afterwards, we processed as with WLS. This procedure is called **feasible weighted least square** (**FWLS**).- Simply regressing $\hat u_i^2$ on the explaining variables, as we did for the Breusch-Pagen test with @eq-hetero_aux, does not work because the estimated $h_i$ (the fitted values of such an equation) have to be strict non-negative (variances cannot be negative)- Therefore, we assume as in @eq-known_hetero that $\operatorname {Var}(u_i \mid \textbf{x}_i) = \sigma_i^2 = \sigma^2 h(\mathbf x_i)$, but furthermore presume the special form $h(\mathbf x_i) = \exp(\mathbf x_i \boldsymbol \delta)$, which always generate positive values- Hence, as $\operatorname {Var}(u_i \mid \textbf{x}_i) = E(u_i^2 \mid \textbf{x}_i)$, we have$$E(u_i^2 \mid \textbf{x}_i) \ = \ \sigma^2 h(\mathbf{x}_i) \ = \ \sigma^2 \exp(\mathbf x_i \boldsymbol \delta) \ = \ \sigma^2 \exp (\delta_0 + \delta_1 x_1 + \dots + \delta_k x_k ) $$- Replacing $E(u_i^2 \mid \textbf{x}_i)$ with the observed $\hat u^2$ we write (with an error $v$):$$\hat u^2 \ = \ \exp (\delta_0 + \delta_1 x_1 + \dots + \delta_k x_k + v) $$- Taking $\log$s we arrive to a log-linear regression equation, which is our **auxiliary equation** and this can be estimated by OLS$$\log(\hat u^2) = \delta_0 + \delta_1 x_1 + \dots + \delta_k x_k + \log v $$ {#eq-hetero_aux1}- After regressing the $\log$ of squared residuals on $\mathbf{x}$ we get the predicted values$$\widehat {\log(\hat u^2)} \ = \ \hat \delta_0 + \hat\delta_1 x_1 + \dots + \hat\delta_k x_k \ $$- And finally, applying the exponential function (the anti-logarithm) on these predicted values we get estimates for the $h_i$s' we are looking for$$\exp \left(\widehat {\log(\hat u^2)}\right) \ = \ \widehat {\hat u^2} \ = \ \exp( \hat \delta_0 + \hat\delta_1 x_1 + \dots + \hat\delta_k x_k ) \ = \ \hat h_i $$- Use the inverse square root of the estimated heteroscedasticity function $1 / \sqrt {\hat h_i}$ to transform the data, as we did in the case of a known heteroscedasticity function - Software packages often expect a special input form for weights using WLS options, e.g.: $w_i = 1/\hat h_i$ in R - For other packages, consult the manual which input form of weights is expected- Feasible WLS (a special case of *feasible GLS*) is consistent and *asymptotically* more efficient than OLS------------------------------------------------------------------------#### Recipe for applying FWLS1. Estimate the model of interest with OLS and calculate the residuals $\hat u_i$2. Calculate the log of the squared residuals, $\log \hat u_i^2$, and regress these on the independent variables to get the fitted values $\widehat{\log \hat u_i^2 }$3. The looked for variance factors are the $\exp(.)$ (anti-logarithms) of the fitted values: $\hat h_i = \exp(\widehat{\log \hat u_i^2 })$, which are strictly positive. Then, FWLS is WLS with estimated $\hat h_i$ instead of known $h_i$4. There are **two** possible ways to further proceed:\ - *Either*, transform all the model variables (including the intercept!) by multiplying with $1 / \sqrt {\hat h_i}$ and estimate the *transformed* model with OLS to get FWLS - *Or*, use the option: `weight` = $1/h$ of the `lm()` function in R, where $h$ is the estimated factor $\hat h_i$- The latter procedure is much more convenient than transforming the variables "by hand", but notice that in R the FWLS residuals are not stored in the output list of `lm()`. If you want the FWLS residuals, you have to look for in the `summary()` output list------------------------------------------------------------------------#### Example - FWLSAs an example, we estimate the demand for cigarettes as a function of income, prices, education, age and age^2^ and a dummy variable measuring restrictions of smoking in restaurants.Of special interest are the effects of income, prices and restrictions of smoking in restaurants.In R, we load the data set `smoke` of the `wooldridge` package. Further, we load the package `AER`, which contains many useful econometric tools, for instance HC standard errors, and the package `modelsummary` to make fancy output tables.```{r}library(wooldridge)library(AER)library(modelsummary)data(smoke)```We estimate the model for cigarettes demand and store the results in the list “out”.```{r}# Estimation of model -> outout <-lm(cigs ~log(income) +log(cigpric) + educ + age +I(age^2) + restaurn, data = smoke) ```------------------------------------------------------------------------We calculate and print the coefficients of the model and get exactly the same results as in the textbook of @wooldridge_Intro_Econometrics.```{r}#| comment: " "coeftest(out)```- Note, that income and cigarette prices are not significant and smoking restrictions at a 1.1% level.------------------------------------------------------------------------##### Breusch-Pagan testNow, we test for heteroscedasticity with the Breusch-Pagan test:```{r}#| comment: " "# Breusch-Pagan test -> clear evidence for strong heteroscedasticitybptest(out)```- The Breusch-Pagan test shows strong evidence for heteroscedasticityNext, we calculate the fitted values and residuals and plot the *squared* residuals versus fitted:```{r, echo = TRUE}# squared residualssqres <- out$residuals^2# predicted valuesyhat <- out$fitted.values```------------------------------------------------------------------------##### Plotting residuals^2^ versus $\hat y$, we get a clear picture of the problem```{r}#| fig-align: centerplot(sqres ~ yhat)```- Strong heteroscedasticity is clearly visible------------------------------------------------------------------------**First line of defense** against the strong heteroscedasticity: We calculate **HC standard errors**, so that at least the t-tests (and all other tests) are asymptotically valid. To do this, we use the `vcov. = vcovHC` option in `coeftest()`.```{r}#| comment: " "# calculate HC standard errors with option vcovHC coeftest(out, vcov. = vcovHC)```- Surprisingly, the calculated HC t-statistics are not very different to the uncorrected ones------------------------------------------------------------------------**The second line strategy** is to apply **FWLS** (FGLS). To do this, we first estimate the **auxiliary equation** with log (!) of squared residuals dependent on all explanatory variables, @eq-hetero_aux:```{r}#| comment: " "aux <-lm( log(sqres) ~log(income) +log(cigpric) + educ + age +I(age^2) + restaurn, data = smoke)```------------------------------------------------------------------------Applying exp (the inverse function of log) on the fitted values of the auxiliary equation yields the estimated weights $\hat h_i$:```{r}#| comment: " "h_hat <-exp( aux$fitted.values )```We calculate the weights $1/\hat h_i$, which is the form R expects:```{r}#| comment: " "# lm() function needs the form 1/hi as option variablew <-1/h_hat```------------------------------------------------------------------------We once more call `lm()` for the original model, but now with the option `weights = w`, which provides the **FWLS estimates**, stored in the list fwls:```{r}#| comment: " "# FGLS, weigts = 1/h_hatfwls <-lm(cigs ~log(income) +log(cigpric) + educ + age +I(age^2) + restaurn, data = smoke, weights = w) # coefs and statistics of FWLScoeftest(fwls)```------------------------------------------------------------------------```{r}#| label: "tbl-campare_ols_fwls"#| tbl-cap: Comparing the results of OLS, OLS with HC t-statistics and FWLS, (se and t-statistics in brackets)modelsummary( list("OLS "=out, "OLS HC "=out, "FWLS "=fwls), output="gt", statistic =c('std.error', 'statistic'),vcov =c("classical", vcovHC, "classical"),gof_omit ="A|BI|L|F",coef_omit ="Inter",align ="lddd", stars =TRUE, fmt =4) ```------------------------------------------------------------------------- The difference between OLS and OLS with HC se and t-statistics are not large but noticeable- The differences between the OLS and FWLS estimates are quite large. Especially, income is now statistically significant and we have substantially larger effects of cig-prices on cig demand as well! (a more than three times larger effect)- Furthermore, the effect of smoking restrictions is much more substantial and more significant- These much more significant results might demonstrate the expected efficiency gains of WLS- But we have to caution: Under our standard assumptions, both OLS and FWLS are consistent - Hence, in a large sample they should converge to the *same* true coefficients - In our example, the differences are quite large, although we have more than 800 observations. This might be a hint, that one of our standard assumption is not valid, especially $E(u_i \mid \textbf{x}_i) = 0$ - However, whether these differences in the estimated coefficients are overall significant is unclear and have to be tested -- Hausman test - Actually, it is quite plausible that cig-prices and smoking restrictions by the governments vary in response to cig demand. But this would violate $E(u_i \mid \textbf{x}_i) = 0$, leading to biased and inconsistent estimates even for FWLS and FGLS - IV-estimators are a possible remedy