A.1 Sample or empirical moments
Mean: \hat{\mu}_{x} \equiv \bar m_{x} \equiv \bar{x}=\frac{1}{n} \sum_{i=1}^{n} x_{i}
Variance: \hat{\sigma}_{x}^{2} \equiv s_{x}^{2}=\frac{1}{n} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}=\frac{1}{n} \sum_{i=1}^{n}\left(x_{i}^{2}-2 x_{i} \bar{x}+\bar{x}^{2}\right)=\frac{1}{n} \sum_{i=1}^{n} x_{i}^{2}-(\bar{x})^{2}
Standard deviation: \hat{\sigma}_{x} \equiv s_{x} = \sqrt{s_x^2}
Covariance: \hat \sigma_{x,y} \equiv s_{x,y} \equiv \frac{1}{n} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right) \left(y_{i}-\bar{y}\right) = \frac{1}{n} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right) y_{i} = \frac{1}{n} \sum_{i=1}^{n}x_{i}y_i - \bar x \bar y
Correlation coefficient: \hat{\rho}_{x, y} \equiv r_{x,y} = \dfrac{\hat{\sigma}_{x,y}} {\hat{\sigma}_{x} \hat{\sigma}_{y}} = \frac{\frac{1}{n} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{\sqrt{\frac{1}{n} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}} \sqrt{\frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2}}}
A.2 Population moments, theoretical or true moments
X, Y and Z are random variables (RV)
Expectation: \mu_x \equiv E(X) = \sum_i p_i x_i \ \ \text{ or } \ \ \int_{-\infty}^{\infty} x \, f(x) \, dx; \ \ \ \ E(X|Z) = \int_{-\infty}^{\infty} x \, f(x|z)\,dx
with the density function f(x)Variance: \operatorname{Var}(X) \equiv {\sigma}_{x}^{2} = E \left[ \left( X-E(X) \right)^{2} \right] = E \left[ X^2 - 2XE(X) +E(X)^2 \right] = E(X^2) - E(X)^2
Standard deviation: {\sigma}_{x} = \sqrt{\sigma_x^2}
Covariance: \operatorname{Cov}(X,Y) \equiv \sigma_{x,y} = E\left[(X-E(X)) \, (Y-E(Y)) \right] = E(XY) - E(X)E(Y)
Correlation: {\rho}_{x, y} = \dfrac{{\sigma}_{x,y}}{{\sigma}_{x} {\sigma}_{y}}
A.3 Some important Rules for Expectations and Variances:
E(aX + bY + c) = aE(X) + bE(Y) + c
E(XY) = E(X)E(Y) + \operatorname{Cov}(X,Y)
\operatorname{Var}(aX + b) = a^2 \operatorname{Var}(X)
\operatorname{Var}(aX + bY + c) = a^2 \operatorname{Var}(X) + b^2 \operatorname{Var}(Y) + 2ab \operatorname{Cov}(X,Y)
E \left[a(X) Y | X \right] = a(X) E(Y | X)
E(Y) = E_x [E(Y|X)];
- \text{ Proof: } E(Y)=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} y \, f(x,y)\, dx\,dy = \int_{-\infty}^{\infty} \! \left[ \int_{-\infty}^{\infty} y \, f(y|x)\, dy \right ] f(x) \, dx
\operatorname{Var}(Y) = E_x[\operatorname{Var}(Y|X)]+\operatorname{Var}[E(Y|X)]
A.4 Probability distributions
Cumulative distribution function (cdf):
P(X\leq x) = F(X) = \int_{-\infty}^{x}f(\xi)d\xi
with the density function f(x).
An important property of a density function:
\int_{-\infty}^{\infty}f(x)\,dx = 1
Joint density function:
f(x,y) = f(y|x) \, f(x) \tag{A.1}
with f(y|x) being the conditional distribution of y, given a specific x.
A.5 Normal distribution
A random variable x with expectation \mu and variance \sigma^2 is normally distributed x \sim N(\mu, \sigma^2) if it exhibits the following density function:
f(x)=\frac{1}{\sigma \sqrt{2 \pi}} e^{\left[-\frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2\right]} \tag{A.2}
It follows, a standardized normally distributed random variable z \sim N(0, 1) exhibits the density function:
f(z)=\frac{1}{\sqrt{2 \pi}} e^{\left[-\frac{z^2}{2}\right]} \, := \ \phi(z) \tag{A.3}
with the cumulative distribution function (cdf)
F(z) = \int_{-\infty}^z \frac{1}{\sqrt{2 \pi}} e^{\left[-\frac{\zeta^2}{2}\right]} d\zeta \, := \ \Phi(z) \tag{A.4}
- Symmetry of the normal distribution imply: \phi(x)=\phi(-x) and \Phi(x)=1-\Phi(-x)
If the k-dimensional random vector \mathbf x \sim N(\boldsymbol \mu, \mathbf \Sigma) is jointly normally distributed with E(\mathbf x) = \boldsymbol \mu and covariance matrix \mathbf \Sigma, then its density function is given by:
f(\boldsymbol{x})=\frac{1}{(2 \pi)^{k / 2}|\mathbf \Sigma|^{1 / 2}} e^{\left[-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^{\prime} \mathbf \Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right]} \tag{A.5}
This is the multivariate normal distribution.
Reproductive property of the normal distribution: The sum of several independent normal random variables is itself normally distributed
A.6 Chi-squared distribution
If x_i \sim N(0,1) and are further independently distributed then
Z_n \ = \ \sum_{i=1}^n x_i^2 \ \ \sim \ \ \chi^2(n) We further have:
E\left(Z_n\right) \ = \ \sum_{i=1}^n E\left(z_i^2\right) \ =\ \sum_{i=1}^n \underbrace{\sigma^2}_1 \ = \ n
V\left(Z_n\right) \ = \ \sum_{i=1}^n V\left(z_i^2\right) \ = \ \sum_{i=1}^n E\left(z_i^2-1\right)^2 \ = \\ n E\left(z_i^4-2 z_i^2+1\right) \ = \ n(3-2+1) \ = \ 2n
(The last line utilizes the fact that the fourth moment of a standard normal distributed random variable is equal to 3.)
If Z is \chi^2(m)-distributed with m degrees of freedom and Z' is \chi^2(n)-distributed with n degrees of freedom, then the sum (Z + Z') is \chi^2(m+n)-distributed with (m + n) degrees of freedom
The \chi^2(n)-distribution is right-skewed, with the skewness decreasing as n increases. The \chi^2(n)-distribution converges to a normal distribution N(n, 2n) as n becomes large (Central Limit Theorem)
A.7 F-distribution
If Z_1 \sim \chi^2(m) and Z_2 \sim \chi^2(n) and are independently distributed then
X \ = \ \frac {(Z_1/m)} {(Z_2/n)} \ \ \sim \ \ F(m,n)
The F-distribution is right-skewed and thus resembles the \chi^2-distribution
m \cdot F(m,n) converges to \chi^2(m) with n \rightarrow \infty. Therefore, F \sim F(m, n) behaves like a \chi^2(m)-distributed random variable divided by m when n is large
A.8 t-distribution
If y \sim N(0, 1) and Z_n \sim \chi^2(n) and are independently distributed then
t \ = \ \dfrac{y}{\sqrt{Z_n / n}} \ \sim \ t(n)
Special case: t(1) is the Cauchy-distribution which does not even have an expectation. (LLN is not applicable in this case)
A t(n)-distribution with n>60 is practically identical to a N(0, 1) standardized normal distribution
If a variable is x is F(1,n)-distributes, then \sqrt x \sim t(n). Therefore the t-distribution is a special case of the F-distribution
A.9 Distribution of quadratic forms
Theorem A.1 (Distribution of quadratic forms) Let \mathbf z be a N(\mathbf 0, \mathbf V) normal distributed random vector with h elements. Then the quadratic form
\mathbf z' \mathbf V^{-1} \mathbf z
is distributed as
\mathbf z' \mathbf V^{-1} \mathbf z \ \sim \ \chi^2(h) \tag{A.6}
A.10 Law of large numbers (LLN)
Theorem A.2 (Law of Large Numbers) If x_i \in \mathbb {R} is independent and identical distributed (i.i.d.) and E(x_i) exits and furthermore E(x_i), V(x_i) < \infty, then as n \rightarrow \infty,
\hat \mu_x := \ \dfrac{1}{n}\sum_{i=1}^n x_i \ \ \underset {p} \longrightarrow \ \ E(x_i) := \ \mu_x Remark: This theorem can be extended to any transformation h(\mathbf x_i) of a random vector with a finite mean:
\hat \mu_{_{h(x)}} := \ \dfrac{1}{n}\sum_{i=1}^n h(\mathbf x_i) \ \ \underset {p} \longrightarrow \ \ E[ h (\mathbf x) ]
The theorem shows that under certain conditions, the sample mean of a sequence of random numbers (vectors) x_i converges to their expectations if the sample size n increases.
In practice this means that the theoretical (population) moments can be estimated using the empirical (sample) moments. The larger the sample, the more accurate the estimation
The Law of Large Numbers (LLN) is of fundamental importance in statistics and probability theory because many estimators can be expressed as functions of averages of random variables and thus, they can be more precisely estimated with increasing sample size n. In the limit, as n approaches infinity, these estimators converge to their true (population) values and are no longer random variables but can be treated as ordinary numbers (vectors)
The requirement for i.i.d. can be substantially relaxed in several ways; essentially it is often required that the variances of x_i are bounded both from above and below (not becoming zero or infinity), so that no single x_i has too much influence in the sum, and that there is not too strong a dependence between the x_i’s. Thus, \operatorname {Cov}(x_i, x_j) must go to zero when x_i and x_j are “sufficiently” far apart. This ensures that the random components of the x_i’s can cancel each other out
These assumptions are generally fulfilled for random samples from a population (e.g., cross-sectional data) or for time series which are stationary and weak depended (well behaved)
Proof of the LLN for an iid random variable:
First:
E(\bar{x})=\frac{1}{n} \sum_i E\left(x_i\right)=\frac{1}{n} \sum_i \mu_x=\frac{n}{n} \mu_x=\mu_x
And second:
V(\bar{x})=V\left(\frac{1}{n} \sum_i x_i\right)=\left(\frac{1}{n}\right)^2 V\left(\sum_i x_i\right)=(\text { thanks to iid }) \\ =\left(\frac{1}{n}\right)^2 \sum_i V\left(x_i\right)=\left(\frac{1}{n}\right)^2 \sum_i \sigma^2=\left(\frac{1}{n}\right)^2 n \sigma^2=\frac{1}{n} \sigma^2
\Rightarrow \lim _{n \rightarrow \infty} V(\bar{x})=0
q.e.d.
A.11 Central Limit Theorem (CLT)
Theorem A.3 (Central Limit Theorem) If x_i \in \mathbb {R} are independent but not necessarily identical distributed and V(x_i) < \infty, then as n \rightarrow \infty,
z=\sqrt n \left ( \frac{1}{n} \sum_{i=1}^n\left(x_i-\mu_i\right) \right) \ \ \xrightarrow [n \rightarrow \infty]{d} \ \ N ( \, 0, \lim _{N \rightarrow \infty} \underbrace{\frac{1}{n} \sum_{i=1}^n V\left(x_i\right)}_{\overline {V\left(x_i\right)}} \,) .
Closely related to LLN is the Central Limit Theorem (CLT). This theorem also concerns the average of a sequence of random variables. It is shown that under certain conditions, the sample mean of n arbitrarily distributed, uncorrelated, and mean-adjusted random variables x_i, with E(x_i) = \mu_i, multiplied by \sqrt n, approximately follows a normal distribution when n is sufficiently large:
The conditions for the CLT are of a similar nature to those of the LLN and can be relaxed. However, the variances of the x_i’s should not be too heterogeneous, and any dependencies between the x_i’s should not be too strong or must decrease quickly enough with the distance between the x_i’s
It should be noted that the sample mean of the x_i’s here is multiplied by the square root of n as otherwise the variance of the sample mean of the x_i’s would converge to zero, leading to a degeneration of the distribution, as seen in the LLN above
The CLT is applicable for most sample functions even with dependent observations. However, it often does not apply to non-stationary time series.
A.12 Slutsky’s Theorem for probability limits:
Theorem A.4 (Slutsky’s Theorem for probability limits) If \operatorname {plim} \ x_n = c \, and \, g(·) is a continuous function. Then we have for n \rightarrow \infty
\operatorname {plim} \ ( g(x_n)) = g(\operatorname {plim} \ x_n) \tag{A.7}
Hence, \operatorname {plim} and g(·) can switch places.
- Important applications are:
\operatorname {plim} (\mathbf X + \mathbf V) = \operatorname {plim} \mathbf X + \operatorname {plim} \mathbf V \tag{A.8}
\operatorname {plim} (\mathbf X \mathbf V) = \operatorname {plim} \mathbf X \times \operatorname {plim} \mathbf V \tag{A.9}
\operatorname {plim} \mathbf X^{-1} = {( \operatorname {plim} \mathbf X )}^{-1} \tag{A.10}
- If all \operatorname {plims} exist, of course. Especially the last two rules are in stark contrast to the rules for expectations