6  Heteroscedasticity

Mainly based on Wooldridge (2019), Chapters 6

6.1 Consequences of heteroscedasticity for OLS

Heteroscedasticity means that the conditional variance of the error term is not constant but depends on some explanatory variables, i.e.

{\operatorname {Var}}(u_i \mid \textbf{x}) = \sigma_i^2 = \sigma^2 h(\textbf{x})

Consequences:

  1. OLS is still unbiased and consistent under heteroscedastictiy

  2. Also, interpretation of R-squared is not changed

  3. Heteroscedasticity invalidates variance formulas for OLS estimators

    • Hence, the usual F-tests and t-tests formulas are not valid under heteroscedasticity
  4. Under heteroscedasticity, OLS is no longer the best linear unbiased estimator (BLUE). There may be more efficient linear estimators


6.2 Heteroscedasticity-robust inference

Formulas for OLS standard errors and related statistics have been developed that are robust to heteroscedasticity of unknown form.

  • These heteroscedasticity-robust (corrected) standard error are called HC standard errors, or HCCM, or White (Huber) standard errors. For the single variable case one can show that

\operatorname {Var}(\hat \beta_1) =\dfrac{\sum_{i=1}^{n} (x_i - \bar x)^2 \sigma_i^2 }{SST_x^2} \tag{6.1}

  • White (1980) showed that one can replace \sigma_i^2 by the squared OLS residuals \hat u_i^2 to get a consistent estimate for \operatorname {Var}(\hat \beta_1)

\widehat{\operatorname {Var}(\hat \beta_1)} = \dfrac{\sum_{i=1}^{n} (x_i - \bar x)^2 \hat u_i^2 }{SST_x^2} \tag{6.2}

  • Note, if \sigma_i^2=\sigma^2, i.e. const, than the above formula in Equation 6.1 reduce to the usual one: \operatorname {Var}(\hat \beta_1)=\sigma^2 / SST_x (Equation 1.19)

    • In the multivariate case, (x_i - \bar x)^2 is replaced by \hat r_{i,j}^2, which are the residuals of a regression of x_j on all the other x, and SST_x is replaced by SST_{j} of that regression (see FWL-Theorem, Section 2.5)

Formula form Equation 6.2 for the heteroscedasticity-robust estimate of the variance is only valid in large samples

  • Using Equation 6.2, the usual t-test is asymptotically valid

  • Heteroscedasticity robust versions of the F-statistic are also available in most software

  • In most software packages, these heteroscedasticity corrected formulas for the standard errors (in general, for the covariance matrix) are invoked by simply setting an option with the call of the estimation or test procedure


Matrix notation

Under homoscedasticity we assume the covariance matrix:

\operatorname {Var}(\mathbf{u} \mid \mathbf{X})=\sigma^{2} \mathbf{I}

Heteroscedasticity means:

\operatorname {Var}\left(u_{i} \mid \mathbf{x}_{i}\right)=\sigma_{i}^{2}, \quad i=1, \ldots, n,

with the covariance matrix:

\operatorname {Var}( \mathbf{u} \mid \mathbf{X}) \ = \ \sigma^{2} \mathbf{\Omega} \ =

\sigma^{2} \, \underbrace{ \left[ \begin{array}{cccc} w_{1} & 0 & \cdots & 0 \\ 0 & w_{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & w_{n} \end{array} \right] }_{\mathbf \Omega} \ = \ \left[\begin{array}{cccc} \sigma_{1}^{2} & 0 & \cdots & 0 \\ 0 & \sigma_{2}^{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma_{n}^{2} \end{array}\right] \tag{6.3}

with \sigma_{i}^{2} := \sigma^{2} w_{i}.

We use the normalization: \sum_{i=1}^{n} w_{i} = n and w_i > 0. This normalization has the advantage that with w_i = 1, \forall i, we immediately reach to the homoscedasticity case with \operatorname {Var}(\mathbf{u} \mid \mathbf{X})=\sigma^{2} \mathbf{I}

Further, we can interpret \sigma^2 as the average variance \bar \sigma^2 and the w_i s are simply scaling factors.


The covariance matrix for \hat \beta is (using Equation C.4):

\operatorname {Var}(\mathbf{\hat \beta} \mid \mathbf{X}) \ := \ E\left[( \ \hat{\boldsymbol{\beta}}-E(\hat{\boldsymbol{\beta}}) ( \hat{\boldsymbol{\beta}}-E(\hat{\boldsymbol{\beta}})^{\prime} \mid \mathbf{X}\right]\ =

E\left[(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta})(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta})^{\prime} \mid \mathbf{X}\right] \ =

E\left[ ( \ \underbrace{(\mathbf{X}' \mathbf{X})^{-1} \mathbf{X}^{\prime} \mathbf{u}}_{(\hat{\beta}-\beta)} \ )( \ \underbrace{(\mathbf{X}' \mathbf{X})^{-1} \mathbf{X}' \mathbf{u}}_{(\hat{\beta}-\beta)^{\prime}} \ )^{\prime} \mid \mathbf{X}\right] \ =

E \left[ (\mathbf{X}' \mathbf{X})^{-1} \mathbf{X} \mathbf{u} \mathbf{u}^{\prime} \mathbf{X}(\mathbf{X}' \mathbf{X})^{-1}) \mid \mathbf{X} \right] \ =

\left( \mathbf{X}^{\prime} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime} \underbrace{ \ E\left(\mathbf{u u}^{\prime} \mid \mathbf{X}\right)}_{\sigma^{2} \Omega} \ \mathbf{X}(\mathbf{X}' \mathbf{X})^{-1} \ =

\sigma^{2}(\mathbf{X}' \mathbf{X})^{-1} \, \mathbf{X}^{\prime} \mathbf{\Omega} \mathbf{X} \, (\mathbf{X}' \mathbf{X})^{-1} \tag{6.4}

With \Omega being the diagonal matrix of weighs defined above. If \Omega = I, the formula collapses to the standard (homoscedastic) covariance matrix for \hat \beta

\sigma^{2}(\mathbf{X}' \mathbf{X})^{-1} \tag{6.5}


Estimating the general covariance matrix for \boldsymbol{\hat \beta}
  • As we assume, that the form of the heteroscedasticity is unknown, we would have to estimate the n w_is from Equation 6.3, but this number can not be consistently estimated with n-k-1 degrees of freedom left

    • Fortunately, it suffices to estimate only the middle part of the sandwich matrix in Equation 6.4, which is a (k+1) x (k+1) matrix

    • We divide by n^2, so that a plim exists (the typical element of this matrix is a weighted average of expectations)

\dfrac{\sigma^{2} \mathbf{X}' \mathbf{\Omega} \mathbf{X}}{n^2} = \dfrac{1}{n^2}\sigma^{2} \sum_{i=1}^{n} w_{i} \mathbf{x}_{i}^{\prime} \mathbf{x}_{i} = \dfrac{1}{n^2}\sum_{i=1}^{n} \sigma_{i}^{2} \mathbf{x}_{i}^{\prime} \mathbf{x}_{i} \tag{6.6}

  • The basic idea is that, because of the consistency of \boldsymbol{\hat \beta}, the residuals \hat{\mathbf{u}} become more and more similar to the error term \mathbf{u} with a growing sample

    • Therefore, as \sigma_i^2 = E(u_i^2 \mid \mathbf x_i), White (1980) showed that replacing \sigma_i^2 with \hat u_i^2 (as u_i^2 is not observable) leads to a consistent estimate:

\dfrac{\widehat{ \sigma^2 \mathbf{X}' \mathbf{\Omega} \mathbf{X}})}{n^2} = \dfrac{1}{n^2}\sum_{i=1}^{n} \hat{u}_{i}^{2} \mathbf{x}_{i}' \mathbf{x}_{i} \tag{6.7}


Hence, a consistent estimate of the heteroscedasticity robust covariance matrix is:

\widehat{\operatorname {Var}(\mathbf{\boldsymbol{\hat \beta}} \mid \mathbf{X})}_{HC} \ = \left(\dfrac{ \mathbf{X}' \mathbf{X} }{n}\right)^{-1} \left( \dfrac{1}{n^2}\sum_{i=1}^{n} \hat{u}_{i}^{2} \mathbf{x}_{i}' \mathbf{x}_{i} \right) \left( \dfrac{\mathbf{X}' \mathbf{X}}{n} \right)^{-1} \tag{6.8}

This reduces to

(\mathbf{X}' \mathbf{X})^{-1} \, \underbrace{\left( \sum_{i=1}^{n} \hat{u}_{i}^{2} \mathbf{x}_{i}' \mathbf{x}_{i} \right)}_{(\widehat{\sigma^2 \mathbf X' \boldsymbol\Omega \mathbf X})} \, (\mathbf{X}' \mathbf{X})^{-1} \tag{6.9}


Example: Hourly wage equation

library(AER)
library(wooldridge)
data(wage1)

lmout <- lm(lwage ~ educ + exper + expersq, data = wage1)
coeftest( lmout )
     
     t test of coefficients:
     
                    Estimate  Std. Error t value  Pr(>|t|)    
     (Intercept)  0.12799750  0.10593228  1.2083    0.2275    
     educ         0.09036582  0.00746799 12.1004 < 2.2e-16 ***
     exper        0.04100888  0.00519652  7.8916 1.765e-14 ***
     expersq     -0.00071356  0.00011576 -6.1639 1.421e-09 ***
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Same with heteroscedasticity robust se by setting the option vcov. = hccm or vcov. = vcovHC
  • Both are very similar functions which calculate the heteroscedasticity-robust covariance matrix, e.g. hc <- vcovHC(model, type=“HC3”);

    • type HC0 is the White correction, HC3 is default, considering outliers
  • Both are available with the package AER, relying on the packages car and sandwich, respectively

coeftest( lmout, vcov = vcovHC )
     
     t test of coefficients:
     
                    Estimate  Std. Error t value  Pr(>|t|)    
     (Intercept)  0.12799750  0.10843407  1.1804    0.2384    
     educ         0.09036582  0.00788877 11.4550 < 2.2e-16 ***
     exper        0.04100888  0.00505273  8.1162 3.475e-15 ***
     expersq     -0.00071356  0.00011068 -6.4471 2.599e-10 ***
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Code
library(modelsummary)

modelsummary(list("Ord. se"=lmout, "HC se"=lmout),
            vcov = c("classical", vcovHC),
            stars = TRUE, 
            fmt = 4,
            statistic = c('std.error', 'statistic'),
            gof_omit = "AI|L|BIC|F",
            align = "ldd",
            output = "gt"
            )
Table 6.1:

Wage equation with ordinary and HC se (se and t-statistics in brackets)

Ord. se HC se
(Intercept)     0.1280         0.1280   
   (0.1059)       (0.1084)  
   (1.2083)       (1.1804)  
educ     0.0904***      0.0904***
   (0.0075)       (0.0079)  
  (12.1004)      (11.4550)  
exper     0.0410***      0.0410***
   (0.0052)       (0.0051)  
   (7.8916)       (8.1162)  
expersq    -0.0007***     -0.0007***
   (0.0001)       (0.0001)  
  (-6.1639)      (-6.4471)  
Num.Obs.   526           526       
R2     0.300          0.300    
R2 Adj.     0.296          0.296    
RMSE     0.44           0.44     
Std.Errors   IID        Custom       
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

F-test
# Testing linear hypothesis
lht( lmout, c("exper", "expersq") )
     Linear hypothesis test
     
     Hypothesis:
     exper = 0
     expersq = 0
     
     Model 1: restricted model
     Model 2: lwage ~ educ + exper + expersq
     
       Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
     1    524 120.77                                  
     2    522 103.79  2    16.979 42.696 < 2.2e-16 ***
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

F-test with HC test statistic
# Testing linear hypothesis with hccm by setting the option "vcov = hccm"
lht( lmout, c("exper", "expersq"), vcov. = hccm )
     Linear hypothesis test
     
     Hypothesis:
     exper = 0
     expersq = 0
     
     Model 1: restricted model
     Model 2: lwage ~ educ + exper + expersq
     
     Note: Coefficient covariance matrix supplied.
     
       Res.Df Df      F    Pr(>F)    
     1    524                        
     2    522  2 42.905 < 2.2e-16 ***
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Summary

  • Using heteroscedasticity corrected standards errors (or a HC covariance matrix) lead to asymptotically valid test statistics

  • But it may still be interesting to know, whether we have a heteroscedasticity problem at all as OLS is not be the most efficient estimator in this case

  • Furthermore, small sample properties of HC standard errors are uncertain. So, if there is no heteroscedasticity problem it is better to use the standard formulas


6.3 Breusch-Pagan test

The most used procedure for testing of heteroscedasticity is the Breusch-Pagan test; Breusch and Pagan (1979).

We want to test for a constant variance:

H_0: \ \operatorname {Var}(u \mid \mathbf {x}) = \sigma^2 This implies (see Appendix A):

\operatorname {Var}(u \mid \mathbf {x}) = E(u^2 \mid \mathbf {x}) - [ \underbrace{E(u \mid \mathbf {x})}_{\text{MRL.4:} \ = \ 0} ]^2 = E(u^2 \mid \textbf{x})

Hence, the H_0 becomes to:

\Rightarrow \ H_0: \ E(u^2 \mid \mathbf x) = E(u^2) = \sigma^2

Therefore, to detect dependencies of \operatorname {Var}(u) from the explanatory variables, estimate an auxiliary equation and regress \hat u_i^2 on all explanatory variables of the original model

\hat{u}^2 = \delta_0 + \delta_1 x_1 + \ldots + \delta_k x_k + \epsilon \tag{6.10}

With the H_0:

H_0: \ \delta_1 = 0, \ \delta_2 = 0, \ldots ,\ \delta_k = 0

  • There are two test statistics to carry out the test:

    • The first one is “the” F-statistic of a linear regression which tests whether all coefficients but the intercept are zero, see Section 3.8.3

F \ = \ \dfrac{(SSR_r - SSR_{ur})/k}{SSR_{ur}/(n-k-1)} \ \sim \ F(k, n-k-1)

  • The second one is the lagrange multiplier variant and the actual Breusch-Pagan test statistic

LM \ = \ n R_{\hat{u}^2}^2 \ \sim \ \chi^2(k)


6.3.1 Example: House prices

data("hprice1")
lmout <- lm(price ~ lotsize + sqrft + bdrms, data = hprice1 )
coeftest(lmout)
     
     t test of coefficients:
     
                    Estimate  Std. Error t value  Pr(>|t|)    
     (Intercept) -2.1770e+01  2.9475e+01 -0.7386  0.462208    
     lotsize      2.0677e-03  6.4213e-04  3.2201  0.001823 ** 
     sqrft        1.2278e-01  1.3237e-02  9.2751 1.658e-14 ***
     bdrms        1.3853e+01  9.0101e+00  1.5374  0.127945    
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Auxiliary equation

Estimate the auxiliary equation: \ \hat u^2 = f(x_1, ... , x_k)

bp <- lm( lmout$residuals^2 ~ lotsize + sqrft + bdrms, 
                    data = hprice1)
coeftest(bp)
     
     t test of coefficients:
     
                    Estimate  Std. Error t value Pr(>|t|)   
     (Intercept) -5.5228e+03  3.2595e+03 -1.6944 0.093898 . 
     lotsize      2.0152e-01  7.1009e-02  2.8380 0.005691 **
     sqrft        1.6910e+00  1.4639e+00  1.1552 0.251285   
     bdrms        1.0418e+03  9.9638e+02  1.0455 0.298771   
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# We need the R^2 of the auxiliary equation bp
# This is stored in the summary(bp) list

sbp <- summary(bp)
R2 <- sbp$r.squared
cat("R2 =",R2)
     R2 = 0.1601407
# LM test: n*R2 is chisq(3) distributed. So we need n = df + k + 1, 
# df is stored in sbp$df[2]

# Test statistic:
cat("Chi2 =", (sbp$df[2]+3+1) * R2)
     Chi2 = 14.09239
# p-value:
cat("pvalue =", 1 - pchisq( (sbp$df[2]+3+1) * R2, 3) )
     pvalue = 0.00278206

# F Test
# With summary(bp) we directly get the F-Statistic form the output 
# of the auxiliary equation (last line)

summary(bp)
     
     Call:
     lm(formula = lmout$residuals^2 ~ lotsize + sqrft + bdrms, data = hprice1)
     
     Residuals:
        Min     1Q Median     3Q    Max 
      -9044  -2212  -1256    -97  42582 
     
     Coefficients:
                   Estimate Std. Error t value Pr(>|t|)   
     (Intercept) -5.523e+03  3.259e+03  -1.694  0.09390 . 
     lotsize      2.015e-01  7.101e-02   2.838  0.00569 **
     sqrft        1.691e+00  1.464e+00   1.155  0.25128   
     bdrms        1.042e+03  9.964e+02   1.046  0.29877   
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     
     Residual standard error: 6617 on 84 degrees of freedom
     Multiple R-squared:  0.1601,   Adjusted R-squared:  0.1301 
     F-statistic: 5.339 on 3 and 84 DF,  p-value: 0.002048
  • Hence, both test variants strongly reject the H_0 of homoscedasticity

Breusch-Pagan test with bptest()

Much more conveniently, we can use the bptest() function to directly perform an Breusch-Pagan test:
# Default: all explanatory variables are used

bptest(lmout)
     
        studentized Breusch-Pagan test
     
     data:  lmout
     BP = 14.092, df = 3, p-value = 0.002782
  • The test result is identical to the above one

# The same model in logs

lmout2 <- lm( log(price) ~ log(lotsize) + log(sqrft) + bdrms, 
              data = hprice1 )
bptest(lmout2)
     
        studentized Breusch-Pagan test
     
     data:  lmout2
     BP = 4.2232, df = 3, p-value = 0.2383
  • Taking logs often reduces the heteroscedasticity problem

White test

With the Breusch-Pagan test, \hat u^2 is regressed on all explanatory variables of the model to detect dependencies of \sigma_i^2 on these variables

  • But these dependencies of could have a more complicate and non-linear character

  • This aspect is tackled by the Whited test

    • Thereby, \hat u^2 is not only regressed on all explanatory variables but additionally on the squares of the variables and all cross-products (interactions)

To carry out the test, we can use the bptest() function and directly specify all the variables, we want to have in the auxiliary equation.

  • So, the White test differ from the Breusch-Pagan test as the former also includes non-linear transformations of the explanatory variables
# Model in levels
# The "~"  in the function call means that the following is a formula 

bptest( lmout, ~ (lotsize + sqrft + bdrms)^2 + I(lotsize^2) 
       + I(sqrft^2) + I(bdrms^2), data = hprice1 )
     
        studentized Breusch-Pagan test
     
     data:  lmout
     BP = 33.732, df = 9, p-value = 9.953e-05

A disadvantage of this form of the White test is that including all squares and interactions leads to a large number of estimated parameters (e.g. k=6 leads to 27 parameters to be estimated)


A variant the of White test

Regress \hat u^2 on fitted, \hat y_i, and squared fitted values, \hat y_i^2. This regression indirectly tests the dependence of the squared residuals on the explanatory variables, their squares, and interactions, because the predicted value of y_i and its square implicitly contain all of these terms.

\hat{u}^2 = \delta_0 + \delta_1 \hat y + \delta_2 \hat {y}^2 + \epsilon

H_0: \delta_1 = \delta_2 = 0 LM = n R_{\hat u^2 }^2 \sim \chi^2(2)

To carry out the test, we once more use bptest()

# Model in levels
bptest( lmout, ~ lmout$fitted.values + I(lmout$fitted.values^2) )
     
        studentized Breusch-Pagan test
     
     data:  lmout
     BP = 16.268, df = 2, p-value = 0.0002933

# Model in logs
bptest( lmout2, ~ lmout2$fitted.values + I(lmout2$fitted.values^2) )

    studentized Breusch-Pagan test

data:  lmout2
BP = 3.4473, df = 2, p-value = 0.1784

The results are not identically, but similar to the former variant of the White test.


A graphical version of the White test
plot( lmout$residuals^2  ~ lmout$fitted.values, 
      xlab="y hat", ylab="Squared Residuals")

Figure 6.1: Residuals^2 versus fitted values

6.4 Weigted least square (WLS)

6.4.1 Functional form of heteroscedasticity is known

We have the original model:

y_i = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k + u_i

with

\operatorname {Var}(u_i \mid \textbf{x}_i) = \sigma_i^2 = \sigma^2 h_i, \ \ \ h_i = h(\mathbf{x}_i) > 0 \tag{6.11}

Thereby, we assume that h_i represents a known functional form of heteroscedasticity.

We transform the original model by dividing through \sqrt h_i which leads to homoscedastic errors:

\dfrac{y_i}{\sqrt h_i} \ = \ \dfrac{\beta_0}{\sqrt h_i} + \beta_1 \dfrac{x_{i1}}{\sqrt h_i} + \beta_2 \dfrac{x_{i2}}{\sqrt h_i} + \dots + \beta_k \dfrac{x_{ik}}{\sqrt h_i} + \dfrac{u_i}{\sqrt h_i}

This way, we get a model in transformed variables denoted with a “*”:

y_i^* \ = \ \beta_0^* + \beta_1 x_{i1}^* + \beta_2 x_{i2}^* + \dots + \beta_k x_{ik}^* + u_i^* \tag{6.12}

And the variance of the transformed model is (considering Equation 6.11):

\operatorname {\operatorname {Var}}(u_i^* \mid \textbf{x}_i) = \operatorname {Var}\left(\dfrac{u_i}{\sqrt h_i} \mid \mathbf{x}_i \right) = \frac{1}{h_i} Var(u_i \mid \mathbf{x}_i) = \frac{1}{h_i} \sigma^2 h_i = \sigma^2

Because the variance of the error is now homoscedastic the transformed model, Equation 6.12, can be estimated efficiently by OLS.

This procedure is called Weighted Least Squares (WLS), which is a form of Generalized Least Squares (GLS). 1


Generally, we get WLS by minimizing weighted squared residuals (with some weights w_i) which is equivalent to applying OLS on transformed data:

\min_{\hat \beta_i} \sum_i \hat u_i^2 \, w_i \ = \ \min_{\hat \beta_i} \sum_i \hat u_i^2 \dfrac {1}{h_i} \ = \ \min_{\hat \beta_i} \sum_i \left( \hat u_i \dfrac {1}{\sqrt h_i} \right)^2\ = \tag{6.13}

\min_{\hat \beta_i} \sum_i \left( (y_i - \textbf{x}_i \hat {\boldsymbol \beta}) \dfrac {1}{\sqrt h_i} \right)^2 \ = \ \min_{\hat \beta_i} \sum_i (y_i^* - \textbf{x}_i^* \hat {\boldsymbol \beta} )^2 \ \Rightarrow \ \hat {\boldsymbol \beta}_{WLS}


Why is WLS more efficient than OLS?

  • Observations with a large variance are less informative than observations with small variance and therefore should get less weight:

    • if the variance of \hat u_i is large, much of the variation in \hat u_i is accidentally and therefore conveys little information

    • on the other hand, if the variance of \hat u_i is small, a large variation in \hat u_i conveys much information

    • therefore, optimal weighting of the observations leads to a more efficient use of information in the data. Hence, WLS is BLUE

  • WLS is a special case of generalized least squares (GLS)


Example: Savings and income

sav_i = \beta_0 + \beta_1 inc_i + u_i, \quad Var(u_i \mid inc_i)=\sigma^2 inc_i

Transformed model:

\dfrac{sav_i}{\sqrt{inc_i}} = \dfrac{\beta_0}{\sqrt{inc_i}} + \beta_1 \dfrac{inc_i}{\sqrt{inc_i}} + \dfrac{u_i}{\sqrt{inc_i}}

This transformed model is homoscedastic:

\operatorname {Var} \left( \dfrac{u_i}{\sqrt{inc_i}} \mid inc_i \right) = \dfrac {1}{inc_i} \operatorname {Var} ( u_i \mid inc_i ) = \frac{1}{inc_i}\sigma^2 inc_i = \sigma^2

If the other Gauss-Markov assumptions hold as well (MLR.1 - MLR.5, see Section 2.7), OLS applied to the transformed model is the best linear unbiased estimator.


Important special case of heteroscedasticity

  • If the observations are reported as averages regarding a the city / county / state / country or firm level, they should be weighted by the size of the unit

  • For instance,

    • regressing average contribution to pensions of firm i on

      • average earnings if firm i and average age of employees in firm i,
    • one should multiply the variables with the square root of the number of employees

  • The reason is that the larger the firm, the less variation in the data and in u_i (because of the LLN, Theorem A.2)

  • In particular, if errors are homoscedastic at the individual-level, WLS with weights equal to group size should be used. So, the variables have to be multiplied by \sqrt n, n being the size of the individual group

  • If the assumption of homoscedasticity at the individual-level is not exactly right, one can calculate robust standard errors after WLS (i.e., for the transformed model)


6.4.2 Unknown heteroscedasticity function (FWLS)

If the particular form of the heteroscedasticity is unknown, i.e., we don’t know h_i=h(\mathbf x_i) from Equation 6.11, we have to estimate h_i in advance. Afterwards, we processed as with WLS. This procedure is called feasible weighted least square (FWLS).

  • Simply regressing \hat u_i^2 on the explaining variables, as we did for the Breusch-Pagen test with Equation 6.10, does not work because the estimated h_i (the fitted values of such an equation) have to be strict non-negative (variances cannot be negative)

  • Therefore, we assume as in Equation 6.11 that \operatorname {Var}(u_i \mid \textbf{x}_i) = \sigma_i^2 = \sigma^2 h(\mathbf x_i), but furthermore presume the special form h(\mathbf x_i) = \exp(\mathbf x_i \boldsymbol \delta), which always generate positive values

  • Hence, as \operatorname {Var}(u_i \mid \textbf{x}_i) = E(u_i^2 \mid \textbf{x}_i), we have

E(u_i^2 \mid \textbf{x}_i) \ = \ \sigma^2 h(\mathbf{x}_i) \ = \ \sigma^2 \exp(\mathbf x_i \boldsymbol \delta) \ = \ \sigma^2 \exp (\delta_0 + \delta_1 x_1 + \dots + \delta_k x_k )

  • Replacing E(u_i^2 \mid \textbf{x}_i) with the observed \hat u^2 we write (with an error v):

\hat u^2 \ = \ \exp (\delta_0 + \delta_1 x_1 + \dots + \delta_k x_k + v)

  • Taking \logs we arrive to a log-linear regression equation, which is our auxiliary equation and this can be estimated by OLS

\log(\hat u^2) = \delta_0 + \delta_1 x_1 + \dots + \delta_k x_k + \log v \tag{6.14}

  • After regressing the \log of squared residuals on \mathbf{x} we get the predicted values

\widehat {\log(\hat u^2)} \ = \ \hat \delta_0 + \hat\delta_1 x_1 + \dots + \hat\delta_k x_k \

  • And finally, applying the exponential function (the anti-logarithm) on these predicted values we get estimates for the h_is’ we are looking for

\exp \left(\widehat {\log(\hat u^2)}\right) \ = \ \widehat {\hat u^2} \ = \ \exp( \hat \delta_0 + \hat\delta_1 x_1 + \dots + \hat\delta_k x_k ) \ = \ \hat h_i

  • Use the inverse square root of the estimated heteroscedasticity function 1 / \sqrt {\hat h_i} to transform the data, as we did in the case of a known heteroscedasticity function

    • Software packages often expect a special input form for weights using WLS options, e.g.: w_i = 1/\hat h_i in R

    • For other packages, consult the manual which input form of weights is expected

  • Feasible WLS (a special case of feasible GLS) is consistent and asymptotically more efficient than OLS


Recipe for applying FWLS

  1. Estimate the model of interest with OLS and calculate the residuals \hat u_i

  2. Calculate the log of the squared residuals, \log \hat u_i^2, and regress these on the independent variables to get the fitted values \widehat{\log \hat u_i^2 }

  3. The looked for variance factors are the \exp(.) (anti-logarithms) of the fitted values: \hat h_i = \exp(\widehat{\log \hat u_i^2 }), which are strictly positive. Then, FWLS is WLS with estimated \hat h_i instead of known h_i

  4. There are two possible ways to further proceed:

    • Either, transform all the model variables (including the intercept!) by multiplying with 1 / \sqrt {\hat h_i} and estimate the transformed model with OLS to get FWLS

    • Or, use the option: weight = 1/h of the lm() function in R, where h is the estimated factor \hat h_i

  • The latter procedure is much more convenient than transforming the variables “by hand”, but notice that in R the FWLS residuals are not stored in the output list of lm(). If you want the FWLS residuals, you have to look for in the summary() output list

Example - FWLS

As an example, we estimate the demand for cigarettes as a function of income, prices, education, age and age2 and a dummy variable measuring restrictions of smoking in restaurants.

Of special interest are the effects of income, prices and restrictions of smoking in restaurants.

In R, we load the data set smoke of the wooldridge package. Further, we load the package AER, which contains many useful econometric tools, for instance HC standard errors, and the package modelsummary to make fancy output tables.

We estimate the model for cigarettes demand and store the results in the list “out”.

# Estimation of model -> out
out <- lm(cigs ~ log(income) + log(cigpric) + educ + age + I(age^2) + restaurn, 
          data = smoke) 

We calculate and print the coefficients of the model and get exactly the same results as in the textbook of Wooldridge (2019).

     
     t test of coefficients:
     
                    Estimate Std. Error t value  Pr(>|t|)    
     (Intercept)  -3.6398259 24.0786589 -0.1512  0.879884    
     log(income)   0.8802678  0.7277830  1.2095  0.226821    
     log(cigpric) -0.7508616  5.7733424 -0.1301  0.896554    
     educ         -0.5014982  0.1670772 -3.0016  0.002769 ** 
     age           0.7706936  0.1601223  4.8132 1.776e-06 ***
     I(age^2)     -0.0090228  0.0017430 -5.1765 2.862e-07 ***
     restaurn     -2.8250847  1.1117936 -2.5410  0.011241 *  
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Note, that income and cigarette prices are not significant and smoking restrictions at a 1.1% level.

Breusch-Pagan test

Now, we test for heteroscedasticity with the Breusch-Pagan test:

# Breusch-Pagan test -> clear evidence for strong heteroscedasticity
bptest(out)
     
        studentized Breusch-Pagan test
     
     data:  out
     BP = 32.258, df = 6, p-value = 1.456e-05
  • The Breusch-Pagan test shows strong evidence for heteroscedasticity

Next, we calculate the fitted values and residuals and plot the squared residuals versus fitted:

# squared residuals
sqres <- out$residuals^2

# predicted values
yhat <- out$fitted.values

Plotting residuals2 versus \hat y, we get a clear picture of the problem
plot(sqres ~ yhat)

  • Strong heteroscedasticity is clearly visible

First line of defense against the strong heteroscedasticity: We calculate HC standard errors, so that at least the t-tests (and all other tests) are asymptotically valid. To do this, we use the vcov. = vcovHC option in coeftest().

# calculate HC standard errors with option vcovHC  
coeftest(out, vcov. = vcovHC)
     
     t test of coefficients:
     
                    Estimate Std. Error t value  Pr(>|t|)    
     (Intercept)  -3.6398259 25.8565263 -0.1408  0.888087    
     log(income)   0.8802678  0.6014119  1.4637  0.143677    
     log(cigpric) -0.7508616  6.0898855 -0.1233  0.901903    
     educ         -0.5014982  0.1631261 -3.0743  0.002182 ** 
     age           0.7706936  0.1394893  5.5251 4.456e-08 ***
     I(age^2)     -0.0090228  0.0014769 -6.1091 1.563e-09 ***
     restaurn     -2.8250847  1.0114249 -2.7932  0.005344 ** 
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Surprisingly, the calculated HC t-statistics are not very different to the uncorrected ones

The second line strategy is to apply FWLS (FGLS). To do this, we first estimate the auxiliary equation with log (!) of squared residuals dependent on all explanatory variables, Equation 6.10:

aux <- lm( log(sqres) ~ log(income) + log(cigpric) + educ + age + I(age^2) + 
             restaurn, data = smoke)

Applying exp (the inverse function of log) on the fitted values of the auxiliary equation yields the estimated weights \hat h_i:

h_hat <-  exp( aux$fitted.values )

We calculate the weights 1/\hat h_i, which is the form R expects:

# lm() function needs the form 1/hi as option variable
w <- 1/h_hat

We once more call lm() for the original model, but now with the option weights = w, which provides the FWLS estimates, stored in the list fwls:

# FGLS, weigts = 1/h_hat
fwls <- lm(cigs ~ log(income) + log(cigpric) + educ + age + I(age^2) + restaurn, 
                data = smoke, weights = w) 

# coefs and statistics of FWLS
coeftest(fwls)
     
     t test of coefficients:
     
                     Estimate  Std. Error t value  Pr(>|t|)    
     (Intercept)   5.63546270 17.80313936  0.3165  0.751673    
     log(income)   1.29523934  0.43701172  2.9639  0.003128 ** 
     log(cigpric) -2.94031167  4.46014462 -0.6592  0.509931    
     educ         -0.46344636  0.12015869 -3.8570  0.000124 ***
     age           0.48194797  0.09680824  4.9784 7.856e-07 ***
     I(age^2)     -0.00562721  0.00093948 -5.9897 3.175e-09 ***
     restaurn     -3.46106399  0.79550504 -4.3508 1.532e-05 ***
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

modelsummary( list("OLS "=out, "OLS HC "=out, "FWLS "=fwls), 
              output="gt", 
              statistic = c('std.error', 'statistic'),
              vcov = c("classical", vcovHC, "classical"),
              gof_omit = "A|BI|L|F",
              coef_omit = "Inter",
              align = "lddd", 
              stars = TRUE, 
              fmt = 4) 
Table 6.2:

Comparing the results of OLS, OLS with HC t-statistics and FWLS, (se and t-statistics in brackets)

OLS OLS HC FWLS
log(income)     0.8803         0.8803        1.2952** 
   (0.7278)       (0.6014)      (0.4370)  
   (1.2095)       (1.4637)      (2.9639)  
log(cigpric)    -0.7509        -0.7509       -2.9403   
   (5.7733)       (6.0899)      (4.4601)  
  (-0.1301)      (-0.1233)     (-0.6592)  
educ    -0.5015**      -0.5015**     -0.4634***
   (0.1671)       (0.1631)      (0.1202)  
  (-3.0016)      (-3.0743)     (-3.8570)  
age     0.7707***      0.7707***     0.4819***
   (0.1601)       (0.1395)      (0.0968)  
   (4.8132)       (5.5251)      (4.9784)  
I(age^2)    -0.0090***     -0.0090***    -0.0056***
   (0.0017)       (0.0015)      (0.0009)  
  (-5.1765)      (-6.1091)     (-5.9897)  
restaurn    -2.8251*       -2.8251**     -3.4611***
   (1.1118)       (1.0114)      (0.7955)  
  (-2.5410)      (-2.7932)     (-4.3508)  
Num.Obs.   807           807          807       
R2     0.053          0.053         0.113    
RMSE    13.35          13.35         13.40     
Std.Errors   IID        Custom          IID       
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

  • The difference between OLS and OLS with HC se and t-statistics are not large but noticeable

  • The differences between the OLS and FWLS estimates are quite large. Especially, income is now statistically significant and we have substantially larger effects of cig-prices on cig demand as well! (a more than three times larger effect)

  • Furthermore, the effect of smoking restrictions is much more substantial and more significant

  • These much more significant results might demonstrate the expected efficiency gains of WLS

  • But we have to caution: Under our standard assumptions, both OLS and FWLS are consistent

    • Hence, in a large sample they should converge to the same true coefficients

    • In our example, the differences are quite large, although we have more than 800 observations. This might be a hint, that one of our standard assumption is not valid, especially E(u_i \mid \textbf{x}_i) = 0

      • However, whether these differences in the estimated coefficients are overall significant is unclear and have to be tested – Hausman test

      • Actually, it is quite plausible that cig-prices and smoking restrictions by the governments vary in response to cig demand. But this would violate E(u_i \mid \textbf{x}_i) = 0, leading to biased and inconsistent estimates even for FWLS and FGLS

        • IV-estimators are a possible remedy

  1. For a general treatment of generalized least square (GLS) see the discussion following Equation 9.45. For our particular problem at hand, the inverse square root matrix \Omega^{-\frac{1}{2}} of the covariance matrix of heteroscedastic errors, \sigma^2\Omega, is a diagonal matrix with elements \frac {1}{\sqrt h_i}.↩︎