计量经济学笔记 | Frank's Blog

Learning

发布日期: 2023-04-10

作者: Frank Yu

文章字数: 13.2k

阅读时长: 75 分

阅读次数:

Ch1 Introductory

what is econometrics

Combine statistical techniques with economic theory.
Estimating economic relationships.
testing economic theories.
evaluating and implementing government and business policy.

basic types

Descriptive

challenges
- Sampling
  - draw conclosion about the population based on sample.
- Summary statistics
  - nice way to summarize complicated data.
If we have data we would know the answer.
Conditional Expectations: if I condition X to be some value, what is the expect value of Y. Often a variable that can take on a very large number of values is treated as continuous for convenient.
Example
- Mothers smoke one more cigarette during pregnancy are expected to give birth to child with 15g lower birth weight.

Forecasting

challenges
- Underfitting
- Overfitting
If we know the data and wait long enough, we will know the answer.

Causal （for structural）

Correlation: how two random variables move together
The difference between causation and correlation is a key concept in econometrics. We would like to identify causal effects and estimate their magnitude.
It is generally agreed that this is very difficult to do; having an economic model is often essential in establishing the causal interpretation.
Unless run perfect experiment, we will never know the answer.
requires $𝐸(𝑢|𝑥) = 0$
Econometrics focus on causal problems inherent in collecting and analyzing observational economic data.
有几种可能：
- x -> y
- z -> x, z -> y
- y -> x
Example
- Mothers smoke one more cigarette during pregnancy causes their child to have 15g lower birth weight.

Structure of Economic data

Cross-sectional data

A cross-sectional data set consists of a sample of units taken at a given point in time.
截面数据
assume:
- sample is drawn from the underlying population randomly
- Violation of random sampling: We want to obtain a random sample of family income. However, wealthier families are more likely to refuse to report.

Time-Series data

Each observation is uniquely determined by time
时间序列
A time series data set consists of observations on a variable or several variables over time.
- 可以是对多个变量的时间序列
Time is an important dimension in a time series data set.

Pooled Cross Sections and Panel or Longitudinal Data

Pooled cross sections include cross-sectional data in multiple years.
A panel data set consists of a time series for each cross-sectional member in the data set.
Panel data:
- the same units over time
pooled cross sections:
- diferent units, diferent time.
Each observation is uniquely determined by the unit and the time.
同时具备时间和变量差异

Ch2 The Simple Regression Model:

Interpretation and Estimation

Descriptive analysis

Define conditional expectation E（y|x）
- if I condition X to be some value, what is the expected value of Y?

Simple Linear model

$$E(y|x)=\beta_0+\beta_1x$$
$$\beta_0=E(y|x=0)$$
$$\beta_1=\frac{\partial E(y|x)}{\partial x}$$
let $u=y-E(y|x)$, thus $E(u|x)=0$
$$\hat{u_i}=y_i-\hat{\beta_0}-\hat{\beta_1}x_i$$
using law of iterated expectation, we get
- $E(u)=0$, and $E(ux)=0$
对于总体，写成 $y_i=\beta_0+\beta_1x_i+u_i$
对于样本，写成 $y_i=\hat{\beta_0}+\hat{\beta_1}x_i+\hat{u_i}$
- 带帽子表示的是样本，用来估计实际值

Method of Moments

Method of moments: use the sample average to estimate the population expectation
Use $\frac{1}{N}\sum$ to replace $E[·]$

population expectations sample analogue

$E(u)=0$ $\frac{1}{N}\sum \hat{u_i}=0$

$E(ux)=0$ $\frac{1}{N}\sum x_i\hat{u_i}=0$
using $\hat{u_i}=y_i-\hat{\beta_0}-\hat{\beta_1}x_i$ to represent u, the result is :
- $$\hat{\beta_1}=\frac{\frac{1}{N}\sum_{i=1}^N(x_i-\bar{x})(y_i-\bar{y})}{\frac{1}{N}\sum_{i=1}^N(x_i-\bar{x})^2}$$
- $$\hat{\beta_0}=\bar{y}-\hat{\beta_1}\bar{x}$$

Causal Estimation

$$y=\beta_0+\beta_1 x+u$$
- $\beta_{0}$ and $\beta_1$ are unknown numbers in the nature we want to uncover
- You choose x
- Nature chose u in a way that is unrelated to your choice of x
u represent things affect y other than x
- we think of u as some real thing, It’s just we can’t observe it
- u 是一个实际存在的变量
- To estimate the model,we need to know how u is determined.
- The simplest case is that u is assigned at random
- We can write this as $E(u|x)=0$.
  - u 不随着 x 而变
    - 重要！协方差为0
  - 均值是0
对于上式两边取期望，对x求偏导：
- $$\frac{\partial E[y|x]}{\partial x}=\beta_{1}+\frac{\partial E[u|x]}{\partial x}$$
- 只有当$E[u|x]$ 是常数时,$\beta_1$ 表示当x变化1时，y的平均变化。
- This condition gives the model a causal interpretation:
  - if $E(u|x)$ does not vary when x changes, then any change in y can be attributed to x
- 因此$\beta_{1}$ 反应了x对y的因果关系

Forecasting

我们希望得到：
- $$\hat{y}^*=\hat{\beta_0}+\hat{\beta_1}x^*$$
和causal不同，因为我们没有自己选择 $x^*$
使用最小二乘法，argmax来判断回归的y和真实值的关系。

两种方法：moment和OLS

Properties of Simple Regression Model

Properties of OLS on Any Sample of Data

$$\sum_{i=1}^{N}\hat{u}_{i}=0.$$
$$\sum_{i=1}^{N}x_{i}{\hat{u}}_{i}=0.$$
$$\sum_{i=1}^N\hat{y}_i\hat{u}_i=0$$
$$\hat{\beta_0}+\hat{\beta_1}\bar{x}=\bar{y}$$

Goodness of Fit

measure how well our model fits the data
decompose $y_i$ into two parts: the fitted value and the residual.
- $$y_i=\hat{y_i}+\hat{u_i}$$
- 第一部分是模型解释的，第二部分不是
Define the following terms：
- Total sum of squares（SST）$$S S T=\sum_{i=1}^{N}\bigl(y_{i}-\bar{y}\bigr)^{2}.$$
- Explained sum of squares（SSE）$$S S E=\sum_{i=1}^{N}({\hat{y}}_{i}-{\bar{y}})^{2}.$$
- Residual sum of squares（SSR）$$S S R=\sum_{i=1}^{N}\hat{u}_{i}^{2}.$$
- $$SST=SSE+SSR$$
定义 Goodness of fit， $R^2$
- $$R^2=\frac{SSE}{SST}$$
- 即y有多大的部分是由$y_i$ 解释的
- 总是在0到1之间
- 只是描述了x和y的相关性，并不能表示因果关系

Functional form

$$log(y)=\beta_0+\beta_1 x+u$$
- $$\beta_1=\frac{dlog(y)}{dy}=\frac{dy}{y}\cdot\frac{1}{x}$$

Expected Values and Variances of the OLS Estimators

Unbiasedness of OLS

Unbiasedness means the expectation of the estimator equals the true value.
无偏性
$E(\hat{\beta})=\beta$

假设：SLR（simple linear regression）

Linear in Parameters
Random Sampling
- $Cov(u_i,u_j)=0$
Sample Variation in the Explanatory Variable
- 即自变量x的取值不能只有一个点
Zero Conditional Mean: $E(u|x) = 0$

在上面四个条件成立时，$\beta_0$ 和 $\beta_1$ 都满足无偏性

Though the OLS estimator is unbiased, it is still possible that the estimates calculated using the sample is very different from β in the population.

Homoskedasticity
- The error u has the same variance given any value of the explanatory variable.
- In other words, $Var(u|x)=\sigma^2$
- 同方差性

在上面5个条件成立时，可以求方差：

$$Var(\hat{\beta_1}|x)=\frac{\sigma^2}{\sum_{i=1}^N(x_i-\bar{x})^2}=\frac{\sigma^2}{SST_x}$$
- 其中 $\sigma$ 是 u 的标准差
- 当 $\sigma$ 大时$V a r(\hat{\beta}_{1}|x)$ 方差大
- 当 x 的方差大时 $V a r(\hat{\beta}_{1}|x)$ 小
$$V a r(\hat{\beta_0}|x)=\frac{\sigma^2\sum_{i=1}^Nx_i^2}{N\sum_{i=1}^N(x_i-\bar{x})^2}$$

估计$\sigma$

需要用sample来估计 $\sigma^2$
$$s^2\equiv\hat{\sigma}^2=\frac{1}{N-2}\sum_{i=1}^N\hat{u}_i^2$$
这是无偏估计，其中-2是因为有两个约束条件，少了两个自由度。

Ch3 Multiple Regression Analysis: Estimation

Why we need multiple regression model?

Descriptive analysis: sometimes we want to estimate the conditional mean of y on multiple variables
Causal estimation: we know that something other than x may afect y, so we explicitly control them.
Forecasting: we want to use more variables to better predict y

Estimation and Interpretation

Population regression model:$$y=\beta_0+\beta_1 x_1+\cdots+\beta_nx_n+u$$
- zero conditional mean:
  - $$E(u|x_1,\cdots,x_n)=0$$
- besides, using law of iterated expectation
  - $$E(x_ju)=0\quad E(u)=0$$
Fitted value: $$\hat{y_i}=\hat{\beta_0}+\hat{\beta_1}x_{i1}+\cdots+\hat{\beta_k}x_{i k}$$
residual: $$\hat{u_i}= y_{i}-\hat{y_i}$$

Sample analog

population expectations	sample analogue
$E(u)=0$	$\frac{1}{N}\sum \hat{u_i}=0$
$E(x_1u)=0$	$\frac{1}{N}\sum x_{i1}\hat{u_i}=0$
$E(x_ku)=0$	$\frac{1}{N}\sum x_{ik}\hat{u_i}=0$

OLS

$$H\equiv\sum_{i=1}^n\hat{u_i}^2=\sum(y_i-b_0-b_1x_{i1}-\cdots-b_kx_{i k})^2$$
求最小值，得一阶条件：
- $$\frac{\partial H}{\partial b_0}=-\sum_{i=1}^n2(y_i-\hat{\beta_0}-\hat{\beta_1}x_{i1}-\cdots-\hat{\beta_k}x_{ik})=0$$
- $${\frac{\partial H}{\partial b_{j}}}=-\sum_{i=1}^{n}2x_{i j}(y_{i}-{\hat{\beta_0}}-{\hat{\beta_1}}x_{i1}-\cdots-{\hat{\beta_k}}x_{i k})=0,\forall j=1,2,…,k.$$
OLS and sample analogue give the same answer.

Interpretation

The coeicient of $x_i$ represents holding fixed other factors, the change in y when $x_i$ increases by one unit.
$$\Delta\hat{y}=\hat{\beta}_1 \Delta x_1+\hat{\beta}_2 \Delta x_2+\cdots+\hat{\beta}_k \Delta x_k$$
求 $\hat{\beta_j}$: Frisch-Waugh-Lovell Theorem
- Regress $x_j$ on other independent variables （including the constant）, obtain the residual $\hat{r_{ij}}$.
- Regress y on other independent variables （including the constant）, obtain the residual $\hat{r_{iy}}$.
- Regress $\hat{r_{iy}}$ on $\hat{r_{ij}}$, The resulting slope coefficient is $\hat{\beta_j}$

Goodness of fit

和 SLR 相同
但随着变量增加 $R^2$ 几乎一定会增大
引入 Adjusted R2
- $$\bar{R}^2=1-\frac{SSR/(N-k-1)}{SST/N-1}=1-(1-R^2)\frac{N-1}{N-k-1}$$
- N represents the size of the sample, k represents the number of independent variables （excluding the constant）

Expected Values and Variances of the OLS Estimators

假设：MLR（multiple linear regression）

Linear in Parameters
Random Sampling
- $Cov(u_i,u_j)=0$
No perfect collinearity
- 不能线性相关
- 不然无法区别这些线性相关成分
Zero Conditional Mean:
- $E(u|x_1,\cdots,x_k) = 0$

在上面四个条件成立时，$\beta_0$ 和 $\beta_1$ 都满足无偏性

Homoskedasticity
- $Var(u|x_1,\cdots,x_k)=\sigma^2$
- 同方差性

MLR1-5合成 Gauss-Markov Assumption
在Gauss-Markov条件下，方差满足：
- $$V a r(\hat{\beta_j})=\frac{\sigma^{2}}{SST_j(1-R_j^2)}$$
- $R_j^2$ is the R-squared from regressing $x_j$ on all other independent variables and including an intercept
The unbiased estimator of $σ^2$ is:
- $$\hat{\sigma}^{2}=\frac{1}{N-k-1}\sum_{i=1}^{n}\hat{u}_{i}^{2}.$$
- 自由度：N-k-1
- 是无偏估计

BLUE

sdandard deviation	standard error
$sd(\hat{\beta_j})$	$se(\hat{\beta_j})$
$\frac{\sigma}{[SST_j(1-R_j^2)]^{1/2}}$	$\frac{\hat{\sigma}}{[SST_j(1-R_j^2)]^{1/2}}$
$\sigma^2=\text{Var of u}$	$\hat{\sigma}^{2}=\frac{1}{n-k-1}\sum \hat{u}^2$
unknown	estimated using sample

OLS 是 best linear unbiased estimator, BLUE
- 满足线性，并且方差最小。

Practical issues

Omitted bias

假设有两个自变量，但只对其中一个进行回归，那么得到的$\hat{\beta}$ 与实际值有一个误差。
$E(\tilde{\beta}_1)=\beta_1+\beta_2\tilde{\delta}_1$
thus, $Bias(\tilde{\beta_1})=\beta_{2}\tilde{\delta}_{1}$
$Corr(x_1,x_2)>0$ $Corr(x_1,x_2)<0$

$\beta_2>0$ Positive Bias Negative Bias

$\beta_2<0$ Negative Bias Positive Bias
影响无偏性

	$Corr(x_1,x_2)>0$	$Corr(x_1,x_2)<0$
$\beta_2>0$	Positive Bias	Negative Bias
$\beta_2<0$	Negative Bias	Positive Bias

including irrelavent

不影响无偏性，但方差会变大

Multicollinearity

high （but not perfect） correlation between two or more independent variables
不影响无偏性，但方差会变大

Ch4 Multiple Regression Analysis: Inference

Classical Linear Regression Model

The Distribution of $\hat{β_j}$

为了得到分布，在Gauss-Markov条件上还要加上一条：
MLR6: Normality
- $$u\sim N(0,\sigma^2)$$
Assumptions MLR.1 through MLR.6 are called the classical linear model CLM assumptions.
summarize the population assumptions of the CLM is
- $$y|{\bf x}\ \sim\ No r m a l(\beta_{0}\ +\beta_{1}x_{1}+\ldots+\beta_{k}x_{k},\sigma^{2})$$
在CLM条件下，$\beta$ 服从正态分布
- $$\frac{\hat{\beta}_j-\beta_j}{sd(\hat{\beta}_j)}\sim Normal(0,1)$$
但由于实际中标准差不知道，需要用标准误来算。
- $$\frac{\hat{\beta_j}-\beta_j}{s e(\hat{\beta_j})}\sim t_{N-k-1}= t_{d f},$$
- 服从t分布，N-k-1是自由度，N是多少个样本，k是回归里面有多少个x

t检验

检验某个系数是否为0

Null hypothesis

Let $H_0$ be the null hypothesis that we want to test. Let $H_1$ be the alternative hypothesis.
We reject the null hypothesis when the test statistic falls in the rejection region.

rejection region

type 1 error:
- significance level = α = $Pr(\text{rejecting }H_0|H_0\text{ is true})$
type 2 error:
- $Pr(\text{not rejecting }H_0|H_1\text{ is true})$
我们的想法是先固定一个significance level，确定对type 1 error 的容忍度，然后再最小化 type 2 error

Testing Against One-Sided Alternatives

Suppose we are interested in testing
$$
\begin{array}{ll}
H_0: & \beta_j=0 . \\
H_1: & \beta_j>0 .
\end{array}
$$
Consider a test statistic:$$\frac{\hat{\beta}_j-\beta_j}{\operatorname{se(}(\hat{\beta}_j)}$$
When $H_0$ is true, the test statistic is: $$\frac{\hat{\beta_j}}{s e(\hat{\beta_j})} \sim t_{N-k-1}=t_{df}$$

It depends on the data.
We know its distribution under $H_0$.

令 $$t_{\hat{\beta}_j} \equiv \frac{\hat{\beta}_j}{s e(\hat{\beta}_j)}$$
We often call $t_{\hat{\beta}_j}$ t-statistic or t-ratio of $\hat{\beta}_j$.
$t_{\hat{\beta}_j}$ has the same sign as $\hat{\beta}_j$, because $\operatorname{se}(\hat{\beta}_j)>0$.

直觉上, 当 $t_\hat{\beta_j}$ 足够大的时候拒绝 $H_0$ : $t_\hat{\beta_i}$ 越大, $H_0$ 是真的可能性越低, $H_1$ 是真的可能性越高。

多大算足够大？
- Fix a significance level of $5 %$. The critical value, $c$ is the 95 th percentile when $H_0$ is true. It means when $H_0$ is true, the probability of getting a value as large as $c$ is $5 %$.
- Rejection rule:$t_{\hat{\beta}_j}>c$
Rejecting $H_0$ when $t_{\hat{\beta}_j}>c$ means the probability of making a type I error, that is, the probability of rejecting $H_0$ when $H_0$ is true, is $5 %$.

The idea of test

Fix a significance level $\alpha$. That is, decide our level of “tolerence” for the type I error.
Find the critical value associated with $\alpha$. For $H_1: \beta_j>0$, this means finding the $(1-\alpha)$-th percentile of the $\mathrm{t}$ distribution with $d f=N-k-1$.
Reject $H_0$ if $t_{\hat{\beta}_j}>c$

通常第一类错误和第二类错误是不能同时缩小的。需要取舍。

Two-sided Alternatives

We want to test:$$\begin{array}{ll}H_0: & \beta_j=0 . \\ H_1: & \beta_j \neq 0 .
\end{array}$$
This is the relevant alternative when the sign of $\beta_j$ is not well determined by theory.
Even when we know whether $\beta_j$ is positive or negative under the alternative, a two-sided test is often prudent.
求法：
1. Fix a significance level $\alpha$. That is, decide our level of “tolerence” for the type I error.
2. Find the critical value associated with $\alpha$. For $H_1: \beta_j \neq 0$, this means finding the $(1-\alpha / 2)$-th percentile of the $\mathrm{t}$ distribution with $d f=N-k-1$.
3. Reject $H_0$ if$$|t_{\hat{\beta}_j}|>c .$$
没有说明的话通常是双边的
If $H_0$ is rejected in favor of $H_1: \quad \beta_j \neq 0$ at the $5 %$ level, we usually say that “ $x_j$ is statistically significant, or statistically different from zero, at the $5 %$ level.”
If $H_0$ is not rejected, we say that “ $x_j$ is statistically insignificant at the $5 %$ level.”

Other Hypothesis

If the null is stated as:
$$
H_0: \beta_j=a_j
$$
Then the t-statistic is$$\frac{\hat{\beta_j}-a_j}{\operatorname{se}(\hat{\beta_j})} \sim t_{N-k-1}$$
We can use the general t statistic to test against one-sided or two-sided alternatives.

p-Values for t Tests

Given the observed value of the t statistic, what is the smallest significance level at which the null hypothesis would be rejected?
We call this “smallest signiicance level” p-value.
p-value represents the probability of observing a value as extreme as $t_{\hat{\beta}_j}$ under the $H_0$
$$
\begin{array}{ll}
H_0: & \beta_j=0 . \\
H_1: & \beta_j \neq 0 .
\end{array}
$$
- The p-value in this case is$$P(|T|>|t|),$$
- where we let $T$ denote a $\mathrm{t}$ distributed random variable with $N-k-1$ degrees of freedom and let $t$ denote the numerical value of the test statistic.

The p-value nicely summarizes the strength or weakness of the empirical evidence against the null hypothesis.
The p-value is the probability of observing a $t$ statistic as extreme as we did if the null hypothesis is true.
Signiicance level and critical value一一对应

Economic versus Statistical Signiicance

The statistical significance of a variable $x_j$ is determined entirely by the size of $t_{\hat{\beta}_j}$, whereas the economic significance or practical significance of a variable is related to the size (and sign) of $\hat{\beta}_j$.

We often care about both statistical significance and economic significance.

Confidence interval

We can construct a confidence level depending on $\alpha$. We call it a $(1-\alpha)$ confidence interval:$$[\hat{\beta}_j-c \cdot \operatorname{se}(\hat{\beta}_j), \hat{\beta}_j+c \cdot \operatorname{se}(\hat{\beta}_j)]$$
The critical value $c$ is the $(1-\alpha / 2)$ percentile in a $\mathrm{t}$ distribution with $d f=N-k-1$.
The meaning of a 95% conidence interval: if we sample repeatedly many times, then the true $β_j$ will appear in 95% of the confidence intervals.
对于很多次取样来说的可能性，但对于某一次特定的取样不能确定是否一定在置信区间里面。

三种方法一样：

Fix a significance level $\alpha$, calculate the critical value $c$, and then reject $H_0$ if $|t_{\hat{\beta}_j}|>c$.
Fix a significance level $\alpha$, calculate the $\mathrm{p}$-value, reject $H_0$ if $p<\alpha$.
reject if 0 is not in the confidence level.

Testing Multiple Linear Restrictions: The F Test

$$y=\beta_0+\beta_1 x_1+\beta_2 x_2+\beta_3 x_3+u$$We want to test$$H_0: \quad \beta_1=0 \text { and } \beta_2=0$$$$H_1: H_0\text{ is not true.}$$
Method:
- Consider the restricted model when $H_0$ is true$$y=\gamma_0+\gamma_3 x_3+u$$
- If $H_0$ is true, the two models are the same. That means when we include $x_1$ and $x_2$ into the model, the sum of squared residuals should not change much.
- However, if $H_0$ is false that means that at least one of $\beta_1, \beta_2$ is nonzero and the sum of squared residuals should fall when we include these new variables
- 看SSR是否相等

F-test
- $$F \equiv \frac{(S S R_r-S S R_{u r}) / q}{S S R_{u r} /(N-k-1)}= \frac{(R_{u r}^2-R_r^2) / q}{(1-R_{u r}^2) /(N-k-1)}$$
- $q$ is the number of linear restrictions, which is the difference in degrees of freedom in the restricted model versus the unrestricted model.
- $S S R_r$ is the sum of squared residuals from the restricted model and $S S R_{u r}$ is the sum of squared residuals from the unrestricted model.
- Since $S S R_r$ can be no smaller than $S S R_{u r}$, the $\mathrm{F}$ statistic is always nonnegative.
- We can show that the sampling distribution of the F-stat: $F \sim F_{q, N-k-1}$. We call this an $\mathrm{F}$ distribution with $q$ degrees of freedom in the numerator and $N-k-1$ degrees of freedom in the denominator.
t分布的平方就是F分布，因此单变量也可以用F分布。
In the $\mathrm{F}$ testing context, the $\mathrm{p}$-value is defined as$$P(\mathcal{F}>F) $$where $\mathcal{F}$ denote an $\mathrm{F}$ random variable with $(q, N-k-1)$ degrees of freedom, and $\mathrm{F}$ is the actual value of the test statistic.
p-value is the probability of observing a value of $\mathrm{F}$ at least as large as we did, given that the null hypothesis is true.
- $\rightarrow$ Reject $H_0$ if $p<\alpha$

通常情况下使用SSR形式的F检验比较好，因为有时候restricted的形式与unrestricted不同，不能直接用R2来计算。

Testing Multiple Linear Restrictions: The LM statistic

Lagrange multiplier （LM） statistic
The LM statistic can be used in testing multiple exclusion restrictions （as in an $F$ test） under large sample.
对于模型$$y=\beta_0+\beta_1 x_1+\ldots+\beta_k x_k+u$$We want to test whether the last $q$ of these variables all have zero population parameters:$$H_0: \beta_{k-q+1}=\beta_{k-q+2}=\ldots=\beta_k=0$$

$L M$ statistic
First estimate the restricted model:$$y=\tilde{\beta_0}+\tilde{\beta_1} x_1+\cdots+\tilde{\beta_{k-q}} x_{k-q}+\tilde{u}$$If the coefficients of the excluded independent variables $x_{k-q+1}$ to $x_k$ are truly zero in the population model, then they should be uncorrelated to $\tilde{u}$.
So regress $\tilde{u}$ on all $x$$$\tilde{u} \sim x_1, x_2, \ldots, x_k$$Let $R_u^2$ denote the R-squared of this regression. The smaller the $R_u^2$, the more likely $H_0$ is true. So a large $R_u^2$ provides evidence against $H_0$.
$L M=N \cdot R_u^2$. We can show that $L M$ follows chi-square distribution with $q$ degrees of freedom: $\mathcal{X}_q^2$.
- Reject $H_0$ if $L M>$ critical value $(p<$ significance level)
在大样本时，LM和F检验结果很相似

Ch5 Multiple Regression Analysis: OLS Asymptotics

Asymptotic Properties

Finite sample properties: properties hold for any sample of data.
Examples
- Unbiasedness of OLS
- OLS is BLUE
- Sampling distribution of the OLS estimators
Asymptotic properties or large sample properties: not defined for a particular sample size; rather, they are defined as the sample size grows without bound.
- 渐近性质

Consistency

一致性
Let $W_N$ be an estimator of $\theta$ based on a sample $Y_1, Y_2, \ldots, Y_N$ of size $N$. Then, $W_N$ is a consistent estimator of $\theta$ if for every $\epsilon>0$$$P(|W_N-\theta|>\epsilon) \rightarrow 0 \text { as } N \rightarrow \infty .$$
We also say consistency means:$$\operatorname{plim}(W_N)=\theta$$
- Intuitively, consistency means when the sample size becomes larger, the estimator gets closer and closer to the true value.
一致性和无偏性没有必然联系

Consistency of OLS

Under Assumptions MLR.1 through MLR.4, the OLS estimator $\hat{\beta}_j$ is consistent for $\beta_j$, for all $j=0,1, \ldots, k$.

When the sample size is larger, the OLS estimator is centered around the true parameter closer and closer.

Central Limit Theorem

Use the notation$$\hat{\theta}_N \stackrel{a}{\sim} N(0, \sigma^2)$$to mean that as the sample size $N$ gets larger, $\hat{\theta}_N$ is approximately normally distributed with mean 0 and variance $\sigma^2$.
Central Limit Theorem
- Let ${Y_1, Y_2, \ldots, Y_N}$ be a random sample with mean $\mu$ and variance $\sigma^2$. Then,$$Z_N=\frac{\bar{Y}_N-\mu}{\sigma / \sqrt{N}} \stackrel{a}{\sim} N(0,1)$$
- Intuitively, it means when the sample size gets larger, the distribution of the sample average is closer to a normal distribution.
- 不管Y的分布如何，当样本量足够大，都会趋向于正态分布

Asymptotic Normality of OLS

Under the Gauss-Markov Assumptions MLR.1 through MLR. 5 , for each $j=0,1, \ldots, k$
$$\begin{aligned}
& \frac{\hat{\beta}_j-\beta_j}{s d(\hat{\beta}_j)} \stackrel{a}{\sim} \operatorname{Normal}(0,1) . \\
& \frac{\hat{\beta}_j-\beta_j}{\operatorname{se}(\hat{\beta}_j)} \stackrel{a}{\sim} \operatorname{Normal}(0,1) .
\end{aligned}$$
OLS estimators are approximately normally distributed in large enough sample sizes.
这个定理说明当样本量足够大的时候，不需要u的正态分布假定。

Summary

Under MLR.1-MLR.4, OLS estimators are consistent.
Under MLR.1-MLR.5, OLS estimators have an asymptotic normal distribution.

Ch6 Multiple Regression Analysis: Further Issues

Efects of Data Scaling on OLS Statistics

changing unit of measurement

Consider the simple regression model:$$y=\beta_0+\beta_1 x+u$$
Now suppose $y^*=w_1 y$ and $x^*=w_2 x$, Then for this model:$$y^*=\beta_0^*+\beta_1^* x^*+u$$
$\hat{\beta}_0^*$ and $\hat{\beta}_0$, and $\hat{\beta}_1^*$ and $\hat{\beta}_1$ 的关系是？
- $$\hat{\beta}_0^*=w_1 \hat{\beta}_0, \quad \hat{\beta}_1^*=\frac{w_1}{w_2} \hat{\beta}_1$$
- $$\operatorname{se}(\hat{\beta}_0^*) =w_1 \operatorname{se}(\hat{\beta}_0), \quad\operatorname{se}(\hat{\beta}_1^*) =\frac{w_1}{w_2}\operatorname{se}(\hat{\beta}_1)$$
- $$t_\hat{\beta_0}^*=t_\hat{\beta_0} \quad t_\hat{\beta_1}^* =t_\hat{\beta_1}$$
- $$R^{2*}=R^2$$
- the statistical signiicance does not change.

Unit Change in Logarithmic Form

只影响截距，不影响系数

Beta Coefficients

Sometimes, it’s useful to obtain regression results when all variables are standardized: subtracting off its mean and dividing by its standard deviation.
$$
\begin{aligned}
y_i & =\hat{\beta_0}+\hat{\beta_1} x_{i 1}+\cdots+\hat{\beta_k} x_{i k}+\hat{u_i} . \\
(y_i-\bar{y}) / \hat{\sigma_y} & =(\hat{\sigma_1} / \hat{\sigma_y}) \hat{\beta_1}[(x_{i 1}-\bar{x_1}) / \hat{\sigma_1}]+\cdots \\
& +(\hat{\sigma_k} / \hat{\sigma_y}) \hat{\beta_k}[(x_{i k}-\bar{x_k}) / \hat{\sigma_k}]+(\hat{u_i} / \hat{\sigma_y}) \\
z_y & =\hat{b_1} z_1+\cdots+\hat{b_k} z_k+\text { error }
\end{aligned}
$$
where $z_y$ denotes the z-score of $y$. The new coefficients are$$\hat{b}_j=(\hat{\sigma}_j / \hat{\sigma}_y) \hat{\beta}_j$$
If $x_1$ increases by one standard deviation, then $\hat{y}$ changes by $\hat{b}_1$ standard deviation.
归一化了

Prediction Analysis

confidence interval for E(y|x)

Suppose we have estimated the equation$$\hat{y}=\hat{\beta}_0+\hat{\beta}_1 x_1+\hat{\beta}_2 x_2+\ldots+\hat{\beta}_k x_k$$Let $c_1, c_2, \ldots, c_k$ denote particular values for each of the $k$ independent variables.
The parameter we would like to estimate is$(\theta)=E(y \mid x_1=c_1, \ldots, x_k=c_k)=\beta_0+\beta_1 c_1+\beta_2 c_2+\ldots+\beta_k c_k$
The estimator of $\theta$ is$$\hat{\theta}=\hat{\beta}_0+\hat{\beta}_1 c_1+\hat{\beta}_2 c_2+\ldots+\hat{\beta}_k c_k$$
$$y=\theta+\beta_1(x_1-c_1)+\beta_2(x_2-c_2)+\ldots+\beta_k(x_k-c_k)+u$$

So we can regress $y_i$ on $(x_{i 1}-c_1), \ldots,(x_{i k}-c_k)$. The standard error and confidence interval of the intercept of this new regression is what we need.

Prediction Interval

$$y=E(y \mid x_1, \ldots, x_k)+u$$
The previous method form a confidence interval for $E(y \mid x_1, \ldots, x_k)$.

Sometimes we are interested in forming the confidence interval for an unknown outcome on $y$.
We need to account for the variation in $u$.

Let $x_1^0, \ldots, x_k^0$ be the new vales of the independent variables, which we assume we observe. Let $u^0$ be the unobserved error.
$$
y^0=\beta_0+\beta_1 x_1^0+\ldots+\beta_k x_k^0+u^0 .
$$
Our best prediction of $y^0$ is estimated from the OLS regression line
$$
\hat{y}^0=\hat{\beta}_0+\hat{\beta}_1 x_1^0+\ldots+\hat{\beta}_k x_k^0
$$
The prediction error in using $\hat{y}^0$ to predict $y^0$ is
Note $E(\hat{y}^0)=y^0$, because the $\hat{\beta}_j$ are unbiased. Because $u^0$ has zero mean, $E\left(\hat{e}^0\right)=0$.
Note that $u^0$ is uncorrelated with each $\hat{\beta}_j$, because $u^0$ is uncorrelated with the errors in the sample used to obtain the $\hat{\beta}_j$.

Therefore, the variance of the prediction error (conditional on all in-sample values of the independent variables) is:
$$
\operatorname{Var}(\hat{e}^0)=\operatorname{Var}(\hat{y}^0)+\operatorname{Var}(u^0)=\operatorname{Var}(\hat{y}^0)+\sigma^2 .
$$

The standard error of $\hat{e}^0$ is:$$se(\hat{e}^0)={[se(\hat{y}^0)]^2+\hat{\sigma}^2}^{1 / 2}$$
The prediction interval for $y^0$ is
$$
\sqrt{\hat{y}^0 \pm c \cdot s e\left(\hat{e}^0\right)}
$$

Ch7 Multiple Regression Analysis with Qualitative Information

A Single Dummy Independent Variable

We often capture binary information by defining binary variable or a zero-one variable.$$\text { female }= \begin{cases}0, & \text { if the individual is man } \\ 1, & \text { if the individual is woman }\end{cases}$$
zero-one leads to natural interpretations of the regression parameters
假设 $$\text { wage }=\beta_0+\delta_0 \text { female }+u$$
If we estimate the model using OLS$$\begin{aligned}
& \hat{\beta_0}=\overline{w a g e_{m e n}} \\
& \hat{\delta_0}=\overline{w a g e_{women}}-\overline{w a g e_{m e n}}\end{aligned}$$
If we regress $y$ on a dummy variable $x$, then the OLS estimate of the intercept represents the sample average of $y$ when $x=0$, the OLS estimate of the slope coefficient represents the difference between the sample average of $y$ when $x=1$ and $x=0$.
当增加变量时，例如 $$w a g e=\beta_0+\delta_0 \text { female }+\beta_1 e d u c+u$$
$δ_0$ is the difference in hourly wage between women and men, given the same amount of education.

线性相关

$wage=\beta_0 + \beta_1female+u$ 是可以的
- 选定了 male 作为 baseline group
$wage=\beta_0 male + \beta_1female+u$ 是可以的
$wage=\beta_0+\beta_1 male + \beta_2female+u$ 是不行的

Without intercept

$wage=\beta_0 male + \beta_1female+u$ 没有截距，在解释和计算检验的时候不方便，并且 $R^2$ 可能是负的。
In general, if there is no intercept in the regression model, the $R^2$ could be negative.
To address the issue, some researchers use the uncentered R-squared when there is no intercept in the model$$R_0^2=1-\frac{S S R}{S S T_0},$$where $S S T_0=\sum_{i=1}^N y_i^2$.

Using Dummy Variables for Multiple Categories

多个组别的时候，两种选择
第一种，k-1个独立的变量：
- $$\text { wage }=\beta_0+\beta_1 \text { marrmale }+\beta_2 \text { marrfemale }+\beta_3 \text { singfem }+u \text {. }$$
第二种，乘积项：
- $$\text { wage }=\beta_0+\beta_1 \text { female }+\beta_2 \text { married }+\beta_3 \text { female } \cdot \text { married }+u$$
注意如果一个变量只能在几个离散值之间选择，最好把每个值的情况独立成一个哑元，否则相当于暗示了斜率变化是线性的。
- 例如CR可以取0，1，2，3，4
- $$MBR=\beta_0+\beta_1 C R+\text { other factors }+u \text {. }$$
- $$MBR=\beta_0+\delta_1 C R 1+\delta_2 C R 2+\delta_3 C R 3+\delta_4 C R 4+\text{other factors}+u$$
- 第二种模型更好。

Interactions Involving Dummy Variables

在 $wage=\beta_0+\beta_1female+\beta_2educ+u$ 中我们假定了 educ 对于男女的影响是相同的。
为了区别，需要添加一个乘积项：$$E(\text { wage } \mid \text { female }, \text { educ })=\beta_0+\delta_0 \text { female }+\beta_1 \text { educ }+\delta_1 \text { female } \cdot \text { educ. }$$

Testing for Differences in Regression Functions across Groups

$$\text { wage }=\beta_0+\beta_1 \text { educ }+\beta_2 \text { exper }+\beta_3 \text { tenure }+u$$

We want to test whether all the coefficients are the same for men and women.
We can include interactive terms for all variables:$$
\begin{aligned}
\text { wage }= & \beta_0+\delta_0 \text { female }+\beta_1 \text { educ }+\delta_1 \text { educ } \cdot \text { female }+ \\
& \beta_2 \text { exper }+\delta_2 \text { exper } \cdot \text { female }+ \\
& \beta_3 \text { tenure }+\delta_3 \text { tenure } \cdot \text { female }+u .
\end{aligned}$$

The null hypothesis:$$H_0: \delta_0=0, \delta_1=0, \delta_2=0, \delta_3=0$$We can use F test to test the hypothesis: estiamte the unrestricted and restricted model, and then calculate the F-stat.

Chow statistic

对于只有一个二元变量和很多其他连续变量的回归，我们想判断其他所有的变量是否关于两组完全相同
We can show that the sum of squared residuals from the unrestricted model can be obtained from two separate regressions, one for each group: $S S R_{u r}=S S R_1+S S R_2$
The F-statstic:$$F=\frac{S S R_p-(S S R_1+S S R_2)}{S S R_1+S S R_2} \cdot \frac{N-2(k+1)}{k+1}$$$S S R_p$ : SSR from pooling the groups and estimating a single equation.
This is also called a Chow statistic.
Note: use the Chow test if
- the model satisfies homoskedasticity
- we want to test no differences at all between the groups

Program Evaluation

在社会学实验中控制变量法

Ch8 Heteroskedasticity

Consequence of Heteroskedasticity for OLS

Heteroskedasticity does not cause bias or inconsistency in the OLS estimators
- 对于一致性和无偏性无影响
The interpretation of our goodness-of-fit measures is also unaffected by the presence of heteroskedasticity.
- 对于goodness of fit 无影响
- $R^2$ and adj- $R^2$ are different ways of estimating the population R-squared, $1-\sigma_u^2 / \sigma_y^2$.
- both variances in the population $R^2$ are unconditional variances
- SSR/ $N$ consistently estimates $\sigma_u^2$, and $S S T / N$ consistently estimates $\sigma_y^2$, whether or $\operatorname{not} \operatorname{Var}(u \mid x)$ is constant

With heteroskedasticity, $\operatorname{Var}\left(\hat{\beta}_j\right)$ is biased.
- 对于方差有影响
- 标准差，t数据，置信区间都不再可靠
- 大样本也不能解决
- OLS is no longer BLUE.

Heteroskedasticity-Robust Inference after OLS Estimation

Consider the simple linear regression model:$$y_i=\beta_0+\beta_1 x_i+u_i$$Assume SLR.1-SLR.4 are satisfied, and there exists heteroskedasticity:$$\operatorname{Var}(u \mid x_i)=\sigma_i^2$$$\operatorname{Var}(u)$ takes on different values when $x$ varies

We don’t know the exact functional form of $\sigma_i^2$, it can be any function of $x$

Estimating $\operatorname{Var}\left(\hat{\beta}_j\right)$ under Heteroskedasticity

One valid estimator （White,1980）:$$\widehat{\operatorname{Var}}(\hat{\beta_1})=\frac{\sum_{i=1}^N(x_i-\bar{x})^2 \hat{u_i}^2}{[\sum_{i=1}^N(x_i-\bar{x})^2]^2} \equiv \frac{\sum_{i=1}^N(x_i-\bar{x})^2 \hat{u_i}^2}{SST_x^2}$$where $$SST_x=\sum_{i=1}^N(x_i-\bar{x})^2$$.
For multiple regression model:$$\begin{gathered}y_i=\beta_0+\beta_1 x_{i 1}+\beta_2 x_{i 2}+\ldots+\beta_k x_{i k}+u_i . \\ \operatorname{Var}(\hat{\beta_j})=\frac{\sum_{i=1}^N \hat{r_{ij}}^2 \sigma_i^2}{[\sum_{i=1}^N \hat{r_{ij}}^2]^2} \equiv \frac{\sum_{i=1}^N \hat{r_{ij}}^2 \sigma_i^2}{SSR_j^2}\end{gathered}$$The estimator:$$\widehat{\operatorname{Var}}(\hat{\beta_j})=\frac{\sum_{i=1}^N \hat{r_{ij}}^2 \hat{u_i}^2}{[\sum_{i=1}^N \hat{r_{ij}}^2]^2}=\frac{\sum_{i=1}^N \hat{r_{ij}}^2 \hat{u_i}^2}{SSR_j^2}$$
- $\hat{r_{ij}}$ is the residual from regressing $x_j$ on all other independent variables
- $S S R_j$ is the sum of residual squared of this regression
- The square root of $\widehat{\operatorname{Var}}\left(\hat{\beta}_j\right)$ is called the heteroskedasticity-robust standard error, or simply, robust standard errors.
Robust-standard error 是一致的

Compare the variance formula

Under homoskedasticity, $\operatorname{Var}(\hat{\beta_1})$ is simplified as$$\operatorname{Var}(\hat{\beta_j})=\frac{\sum_{i=1}^N \hat{r_{ij}}^2 \sigma_i^2}{SSR_j^2}=\frac{\sigma^2}{SSR_j}$$
Under heteroskedasticity,$$\operatorname{Var}(\hat{\beta_j})=\frac{\sum_{i=1}^N r_{i j}^2 \sigma_i^2}{SSR_j^2}=\frac{1}{SSR_j} \sum_{i=1}^N \frac{\hat{r_{ij}}^2}{SSR_j} \sigma_i^2=\frac{1}{SSR_j} \sum_{i=1}^N w_{ij} \sigma_i^2$$where $w_{ij}=\frac{\hat{\tau_{ij}}^2}{S S R_j}$. We know that $w_{ij}>0$ and $\sum_{i=1}^N w_{ij}=1$.
即，进行了加权平均。
Robust standard errors can be either larger or smaller than the usual standard errors.

Weighted Least Squares Estimation

Generalized Least Squares （GLS）

Assume MLR.1-MLR.4 are satisfied:$$y_i=\beta_0+\beta_1 x_{i 1}+\ldots+\beta_k x_{i k}+u_i$$
Assume that the variance of $u$ takes the following form:$$\operatorname{Var}(u \mid x_1, \ldots, x_k)=\sigma^2 h(x_1, \ldots, x_k)$$
We write $\sigma_i^2=\sigma^2 h\left(x_{i 1}, \ldots, x_{i k}\right)=\sigma^2 h_i$.

Consider an alternative regression model:$$\frac{y_i}{\sqrt{h_i}}=\beta_0 \frac{1}{\sqrt{h_i}}+\beta_1 \frac{x_{i 1}}{\sqrt{h_i}}+\ldots+\beta_k \frac{x_{i k}}{\sqrt{h_i}}+\frac{u_i}{\sqrt{h_i}}$$

Let $\mathbf{x}$ denote all the explanatory variables. Conditional on $\mathbf{x}, E\left(u_i / \sqrt{h_i} \mid \mathbf{x}\right)=E\left(u_i \mid \mathbf{x}\right) / \sqrt{h_i}=0$.
$\operatorname{Var}\left(u_i / \sqrt{h_i} \mid \mathbf{x}\right)=\sigma^2$, satisfying homoskedasticity.
Denote the OLS estimator after the transformation as ${\beta_j^*}$
We can prove that ${\beta_j^*}$ minimizes$$\sum_{i=1}^N(y_i-b_0-b_1 x_{i 1}-\cdots-b_k x_{i k})^2 / h_i$$
Weighted least squares estimator（WLS）:
- the weight for each $\hat{u}_i$ is $1 / h_i$. We give less weight for observations with higher variance. Intuitively, they provide less information.
${\beta_j^*}$ is still one estimator for the original model, and have the same interpretation
Because ${\beta_j^*}$ satisfies MLR.1-MLR.5, so it is BLUE under heteroskedasticity with the form $\sigma_i^2=\sigma^2 h_i$
${\beta_j^*}$ is also called generalized least squares estimators （GLS）

Feasible Generalized Least Squares （FGLS）

实际中需要估计 $h_i$

Assume $h_i$ takes the following form:$$\begin{aligned}\operatorname{Var}(u \mid x) & =\sigma^2 \exp \left(\delta_0+\delta_1 x_1+\ldots+\delta x_k\right) \\ u^2 & =\sigma^2 \exp \left(\delta_0+\delta_1 x_1+\ldots+\delta x_k\right) v,\end{aligned}$$where $v$ has a mean of one.
We take $\exp (\cdot)$ to guarantee that $\operatorname{Var}(u)>0$
Equivalently,
$$
\log \left(u^2\right)=\alpha+\delta_1 x_1+\ldots+\delta x_k+e .
$$

As usual, we replace the unobserved $u$ with the OLS residuals $\hat{u}$, and estimate $\log \left(\hat{u}^2\right) \sim 1, x_1, \ldots x_k$, calculate the fitted value $\hat{g}_i$. Then $\hat{h}_i=\exp \left(\hat{g}_i\right)$.

Procedure

Run the regression of $y$ on $1, x_1, \ldots, x_k$, get the residual $\hat{u}_i$
Calculate $\log \left(\hat{u}_i^2\right)$
Estimate $\log \left(\hat{u}_i^2\right) \sim 1, x_1, \ldots x_k$, get the fitted value $\hat{g}_i$
Compute $\hat{h}_i=\exp \left(\hat{g}_i\right)$
Use $1 / \hat{h}_i$ as weights, estimate $y \sim 1, x_1, \ldots, x_k$ using WLS.

FGLS is consistent, and has smaller asymptotic variance than OLS.

WLS or RSE

There is no guarantee that WLS is more efficient than OLS.

It is alwasy advised to report robust standard errors with WLS.

two solutions for heteroskedasticity:
- Use OLS to estiamte the model, calculate the robust standard errors （or use the max of the conventional s.e. and robust s.e.）
- Use FGLS to estimate the model, report conventional s.e. or robust s.e.
In practice, the first method is preferred in most cases

Testing for Heteroskedasticity

Breusch-Pagan Test for Heteroskedasticity

We want to know in model $y=\beta_0+\beta_1 x_1+. .+\beta_k x_k+u$, whether $u^2$ is correlated with $x$
Estimate $y=\beta_0+\beta_1 x_1+. .+\beta_k x_k+u$, get the residual $\hat{u}$
Estimate the following model and get $R_{\hat{u}^2}^2$ :$$\hat{u}_i^2=\delta_0+\delta_1 x_1+\ldots+\delta_k x_k+v$$
We test $H_0: \delta_1=\ldots=\delta_k=0$
Calculate the $L M$ statistic: $N \cdot R_{\hat{u}^2}^2$; or calculate the $F$ $\operatorname{statistic}\left[R_{\hat{u}^2}^2 / k\right] /\left[\left(1-R_{\hat{u}^2}^2\right) /(N-k-1)\right]$.
Reject homoskedasticity if
- test statistic $>$ critical value
- $p<$ significance level

The White Test for Heteroskedasticity

OLS standard errors are asymptotically valid if MLR.1-MLR.5 holds.
It turns out that the homoskedasticity assumption can be replaced with the weaker assumption that the squared error, $u^2$, is uncorrelated with all the independent variables $\left(x_j\right)$, the squares of the independent variables $\left(x_j^2\right)$, and all the cross products $\left(x_j x_h, \forall j \neq h\right)$.

When the model contain $k=2$ independent variables, the White test is based on an estimation of$$\hat{u}^2=\delta_0+\delta_1 x_1+\delta_2 x_2+\delta_3 x_1^2+\delta_4 x_2^2+\delta_5 x_1 x_2+v$$The White test for heteroskedasticity is the LM statistic for testing that all of the $\delta_j$ are zero, except for the intercept.
Problem: with many independent variables, we uses many degrees of freedom. Solution: use $\hat{y}^2$ :$$\hat{u}^2=\delta_0+\delta_1 \hat{y}+\delta_2 \hat{y}^2+v$$We then use the $\mathrm{F}$ or LM statistic for the null hypothesis $H_0: \delta_0=\delta_2=0$.

Ch12 Serial Correlation

Serial Correlation

Times series data

Time series data: observations on variables over time.
random sampling is often violated

Classical Assumptions about Time Series Data

The stochastic process ${(x_{t 1}, \ldots, x_{t k}, y_t): t=1,2, \ldots, T}$ follows the linear model:$$y_t=\beta_0+\beta_1 x_{t 1}+\ldots+\beta_k x_{t k}+u_t$$
No perfect collinearity.
Zero conditional mean.$$E(u_t \mid \mathbf{X})=0, t=1,2, \ldots, T$$
- where $\mathbf{X}$ is the explanatory variables for all time periods.
- $E\left(u_t \mid \mathbf{X}\right)=0$ means both $E\left(u_t \mid x_t\right)=0$ and also $E\left(u_t \mid x_s\right)=0, \forall t \neq s$.

Unbiasedness of $\mathrm{OLS}$
- Under assumptions TS.1, TS.2 and TS.3, the OLS estimators are unbiased and consistent.

Serial Correlation

No serial correlation assumption:$$\operatorname{Cov}((x_t-\bar{x}) u_t,(x_s-\bar{x}) u_s \mid X)=0, \forall t \neq s$$Or$$E(u_s u_t \mid X)=0, \forall t \neq s$$
For time-series data, this is often not true.

Auto-regression，AR

Think about a simple regression model:$$y_t=\beta_0+\beta_1 x_t+u_t$$
Assume that$$u_t=\rho u_{t-1}+e_t, t=1,2, \ldots, T$$where $|\rho|<1$, and $e_t$ are i.i.d with $E\left(e_t\right)=0$. This is called an autoregressive process of order one $(\operatorname{AR}(1))$.

Properties of AR

Because $e_t$ is i.i.d, $u_t$ will be correlated with current and past $e_t$, but not future values. If the time series has been going on forever$$u_t =\rho u_{t-1}+e_t=\rho^k u_{t-k}+\rho^{k-1} e_{t-(k-1)}+\ldots+e_t =\sum_{j=0}^{\infty} \rho^j e_{t-j}$$
$$E(u_t) =E(\sum_{j=0}^{\infty} \rho^j e_{t-j})=\sum_{j=0}^{\infty} \rho^j E(e_{t-j})=0$$
We can show that$$\begin{aligned}
\operatorname{Var}(u_t) & =\operatorname{Var}(\sum_{j=0}^{\infty} \rho^j e_{t-j})=\sum_{j=0}^{\infty} \rho^{2 j} \operatorname{Var}(e_{t-j}) \\
& =\operatorname{Var}(e_t) \sum_{j=0}^{\infty} \rho^{2 j}=\frac{\operatorname{Var}(e_t)}{1-\rho^2}\end{aligned}$$
Also$$\begin{aligned}\operatorname{Cov}(u_t, u_{t+1}) & =\operatorname{Cov}(u_t, \rho u_t+e_t)=\rho \operatorname{Var}(u_t) \\
\operatorname{Cov}(u_t, u_{t+j}) & =\rho^j \operatorname{Var}(u_t)
\end{aligned}$$
Assume further that $\bar{x}=0$ and homoskedasticity, that is $\operatorname{Var}\left(u_t \mid X\right)=\operatorname{Var}\left(u_t\right)=\sigma^2$. Then
$$
\begin{aligned}
\operatorname{Var}(\hat{\beta} \mid \mathbf{X}) & =\frac{\operatorname{Var}(\sum_{t=1}^T x_t u_t \mid \mathbf{X})}{S S T_x^2} \\
& =\frac{\sum_{t=1}^T x_t^2 \operatorname{Var}(u_t)+2 \sum_{t=1}^{T-1} \sum_{j=1}^{T-t} x_t x_{t+j} E(u_t u_{t+j})}{S S T_x^2} \\
& =\frac{\sigma^2}{S S T_x}+\frac{2 \sigma^2}{S S T_x^2} \sum_{t=1}^{T-1} \sum_{j=1}^{T-t} \rho^j x_t x_{t+j}
\end{aligned}$$

Consequence of ignore serial correlation

仍然是无偏、一致的
但传统的方差有问题了
会低估方差
完善方法：
1. 使用FGLS
2. 使用OLS，修正se

FGLS

Assume TS.1-TS.3. Further, assume $\operatorname{Var}\left(u_t \mid X\right)=\sigma^2$.$$
\begin{aligned}
& y_t=\beta_0+\beta_1 x_t+u_t . \\
& u_t=\rho u_{t-1}+e_t, t=1,2, \ldots, T .
\end{aligned}$$
- where $e_t$ is i.i.d and $E\left(e_t\right)=0$.
Transform the regression:$$\begin{aligned}
y_t-\rho y_{t-1} & =(1-\rho) \beta_0+\beta_1(x_t-\rho x_{t-1})+e_t, t \geq 2 . \\
\tilde{y}_t & =(1-\rho) \beta_0+\beta_1 \tilde{x}_t+e_t, t \geq 2\end{aligned}$$
We can use FGLS to estimate:

Estimate the model using OLS and obtain the OLS residuals $\hat{u}_t$
Use OLS to estimate $\hat{u_t} \sim \hat{u_{t-1}}$ and obtain $\hat{\rho}$.
Calculate $\tilde{y_t}=y_t-\hat{\rho} y_{t-1}$ and $\tilde{x_t}=x_t-\hat{\rho} x_{t-1}$, then use OLS to regress $\tilde{y_t}$ on $\tilde{x_t}$.

Serial Correlation-Robust Inference after OLS

了解即可，记住HAC（heteroskedasticity and auto-correlation consistent）
We can show that$$AVar(\hat{\beta_1})=(\sum_{t=1}^T E(r_t^2))^{-2} Var(\sum_{t=1}^T r_t u_t)$$where $r_t$ is the error term in $x_{t 1}=\delta_0+\delta_2 x_{t 2}+\ldots+\delta_k x_{t k}+r_t$. We want to find and estimator for $A \operatorname{Var}(\hat{\beta}_1)$.
Let $\hat{r}_t$ denote the residuals from regressing $x_1$ on all other independent variables, and $\hat{u}_t$ as the OLS residual from regressing $y$ on all $x$.
Define$$\hat{\nu}=\sum_{t=1}^T \hat{a_t}^2+2 \sum_{h=1}^g[1-h /(g+1)](\sum_{t=h+1}^T \hat{a_t} \hat{a_{t-h}}),$$where $\hat{a_t}=\hat{r_t} \hat{u_t}$.
Then$$s e(\hat{\beta_1})=[se_c(\hat{\beta_1}) / \hat{\sigma}]^2 \sqrt{\hat{\nu}}$$where $se_c(\hat{\beta_1})$ is the conventional standard error of $\hat{\beta}_1$, and $\hat{\sigma}$ is the square root of the sum of the OLS residual squared.
We use $g$ to capture how much serial correlation we are allowing in computing the standard error.
For annual data, choose $g=1$ or $g=2$
Use a larger $g$ for larger sample size.
When $g=1$,$$\hat{\nu}=\sum_{t=1}^T \hat{a_t}^2+\sum_{t=2}^T(\hat{a_t} \hat{a_{t-1}})$$
This formula is robust to arbitrary serial correlation and arbitrary heteroskedasticity. So people sometimes call this heteroskedasticity and auto-correlation consistent, or HAC, standard errors.

Spatial Correlation

Data with group structure

group structure, 例如不同班级的学生，在同班之内是有相关性的
Example: class size and test score$$y_{i g}=\beta_0+\beta_1 x_g+u_{i g}$$
Use $i$ to denote student, who are randomly assign to different class $g . y_{i g}$ is the test score of student $i$ （who is in class $g$ ）, $x_g$ is the class size （which has the same value for students in the same class.）
Assume that $E(u \mid X)=0$

However, observations within the same $g$ is not independent （students in the same class are exposed to the same teacher and classroom…）$$E(u_{i g} u_{j g})=\rho_u \sigma_u^2 \neq 0$$

We call $\rho_u$ intraclass correlation coefficient.、
这种相关性就叫 spatial correlation
存在这种情况时，一致性和无偏性还是保证的，但方差和标准差有变化。

Fix spatial correlation

OLS and Cluster Standard Errors

The general idea is to model correlation of error terms within a group, and assume no correlation across groups.
group数量变多的时候是consistent的
当数量大于42的时候就可以认为group数量够多了

Use group mean

Estimate$$\bar{y}_g=\beta_0+\beta_1 x_g+\bar{u}_g$$by WLS using the group size as weights.
We can generalize the method to models with microcovariates$$y_{i g}=\beta_0+\beta_1 x_g+\beta_2 w_{i g}+u_{i g}$$
1. Estimate$$y_{i g}=\mu_g+\beta_2 w_{i g}+\eta_{i g}$$The group effects, $\mu_g$, are coefficients on a full set of group dummies.
2. Regress the estimated group effects on group-level variables$$\hat{\mu}_g=\beta_0+\beta_1 x_g+e_g$$In this step, we could either weight by the group size, or use no weights.

Ch9 Proxy Variable and Measurement Error

Endogeneity and Exogeneity

Zero conditional mean condition:$$E(u \mid x)=0$$
- $x_j$ is endogenous if it is correlated with $u$.
- $x_j$ is exogenous if it is not correlated with $u$.
- Violating the zero conditional mean condition will cause the OLS estimator to be biased and inconsistent.

Proxy Variable

代理变量

Omitted Variable Bias

$$\log (\text { wage })=\beta_0+\beta_1 e d u c+\beta_2 a b i l+u$$
In this model, assume that $E(u \mid e d u c,abil)=0$
假设首要目的是估计 $\beta_1$ consistently,不关注 $\beta_2$.

但我们没有关于abil的数据, 所以只用 educ 回归 $\log ($ wage $)$
There is an omitted variable bias if $\operatorname{cov}(abil,educ) \neq 0$ and $\beta_2 \neq 0$.

One solution: use proxy variable for the omitted variable

Proxy variable: related to the unobserved variable that we would like to control for in our analysis
- 只需要这个变量proxy variable与abil相关，即correlated，不需要完全相同

Proxy

Formally, we have a model$$y=\beta_0+\beta_1 x_1+\beta_2 x_2^*+u$$
Assume that $E\left(u \mid x_1, x_2^*\right)=0$
$x_1$ is observed and $x_2^*$ is unobserved
We have a proxy variable for $x_2^*$, which is $x_2$$$x_2^*=\delta_0+\delta_2 x_2+v_2$$
where $v_2$ is the error to allow the possibility that $x_2$ and $x_2^*$ is not exactly related. $E\left(v_2 \mid x_2\right)=0$.
Replace the omitted variable by the proxy variable:$$\color{red}{y=(\beta_0+\beta_2 \delta_0)+\beta_1 x_1+\beta_2 \delta_2 x_2+(u+\beta_2 v_2)}$$To get an unbiased and consistent estimator for $\beta_1$, we require$$E(u+\beta_2 v_2 \mid x_1, x_2)=0$$
Break this down into two assumptions:
1. $E\left(u \mid x_1, x_2\right)=0$ : the proxy variable should be exogenous （intuitively, since $x_2^*$ is exogenous, the proxy variable is only good if it is also exogenous）
  - 代理变量需要时外生的
2. $E\left(v_2 \mid x_1, x_2\right)=0$ : this is equivalent as$$E(x_2^* \mid x_1, x_2)=E(x_2^* \mid x_2)=\delta_0+\delta_2 x_2$$Once $x_2$ is controlled for, the expected value of $x_2^*$ does not depend on $x_1$
在上面的例子中，变成：$$\log (w a g e)=\alpha_0+\alpha_1 e d u c+\alpha_2 I Q+e$$
In the wage equation example, the two assumptions are:
1. $E(u \mid e d u c, I Q)=0$
2. $E($ abil|educ, $I Q)=E(a b i l \mid I Q)=\delta_0+\delta_3 I Q$
  - The average level of ability only changes with IQ, not with education （once IQ is fixed）.
在这样的变化中，$\beta_1$ 是无偏的
- 违反假设，会造成误差

Using Lagged Dependent Variables as Proxy Variables

滞后因变量
$$\text { crime }=\beta_0+\beta_1 \text { unem }+\beta_2 \text { expend }+\beta_3 \text { crime }_{-1}+u$$
By including crime $_{-1}$ in the equation, $\beta_2$ captures the effect of expenditure of law enforcement on crime, for cities with the same previous crime rate and current unemployment rate.

Measurement Error

Measurement Error in the Dependent Variable

因变量的测量误差
Let $y^*$ denote the variable that we would like to explain.$$y^*=\beta_0+\beta_1 x_1+\ldots+\beta_k x_k+u,$$and we assume it satisfies the Gauss-Markov assumptions.
Let $y$ to denote the observed measure of $y^*$
Measurement error is defined as$$e_0=y-y^*$$
Plug in and rearrange$$y=\beta_0+\beta_1 x_1+\ldots+\beta_k x_k+u+e_0$$
当e和自变量都无关的时候，结果仍然是一致且无偏的，但方差会变大
仍然适用OLS

Measurement Error in the Independent Variable

自变量的测量误差
Consider a simple regression model:$$y=\beta_0+\beta_1 x_1^*+u$$We assume it satisfies the Gauss-Markov assumptions.

We do not observe $x_1^*$. Instead, we have a measure of $x_1^*$; call it $x_1$
The measurement error$$e_1=x_1-x_1^*$$Assume $E\left(e_1\right)=0$.

Plug in $x_1^*=x_1-e_1$$$y=\beta_0+\beta_1 x_1+(u-\beta_1 e_1)$$
To derive the properties of the OLS estimators, we need assumptions.
First, assume that:$$E(u \mid x_1^*, x_1)=0$$This implies $E(y \mid x_1^*, x_1)=E(y \mid x_1^*): x_1$ does not affect $y$ after $x_1^*$ has been controlled for.

Next, we consider two (mutually exclusive) cases about how the measurement error is correlated with $x$

$\operatorname{Cov}(x_1, e_1)=0$
$\operatorname{Cov}(x_1^*, e_1)=0$

Case 1：$\operatorname{Cov}(x_1, e_1)=0$

Plug in $x_1^*=x_1-e_1$$$y=\beta_0+\beta_1 x_1+(u-\beta_1 e_1)$$
Then $E\left(u-\beta_1 e_1 \mid x_1\right)=0$, so the OLS estimator of the slope coefficient of $x_1$ in the above model gives us unbiased and consistent estimator of $\beta_1$.
If $u$ is uncorrelated with $e_1$, then $\operatorname{Var}\left(u-\beta_1 e_1\right)=\sigma_u^2+\beta_1^2 \sigma_{e_1}^2$.
一致性和无偏性仍然成立

Case 2：$\operatorname{Cov}(x_1^*, e_1)=0$

The classical errors-in-variables （CEV） assumption is that $e_1$ is uncorrealted with the unobserved variables.
Idea: the two components of $x_1$ is uncorrelated$$x_1=x_1^*+e_1$$
Plug in $x_1^*=x_1-e_1$$$y=\beta_0+\beta_1 x_1+(u-\beta_1 e_1)$$
Then$$\operatorname{Cov}(u-\beta_1 e_1, x_1)=-\beta_1 \operatorname{Cov}(x_1, e_1)=-\beta_1 \sigma_{e_1}^2 \neq 0$$
此时一致性和无偏性都破坏了
The probability limit of $\hat{\beta}_1$
- $$\operatorname{plim}(\hat{\beta_1}) =\beta_1+\frac{\operatorname{Cov}(x_1, u-\beta_1 e_1)}{\operatorname{Var}(x_1)}=\beta_1-\frac{\beta_1 \sigma_{e_1}^2}{\sigma_{x_1}^{2 *}+\sigma_{e_1}^2}=\beta_1(\frac{\sigma_{x_1}^{2 *}}{\sigma_{x_1}^{2 *}+\sigma_{e_1}^2}) $$
- $\operatorname{plim}(\hat{\beta}_1)$ is closer to zero than $\beta_1$.
This is called the attenuation bias in OLS due to CEV
If the variance of $x_1^*$ is large relative to the variance in the measurement error, then the inconsistency in OLS will be small.

Case 3：$\operatorname{Cov}(x_1, e_1)=0 \text{ and } \operatorname{Cov}(x_1^*, e_1)=0$

这种情况下，OLS几乎一定会造成无偏性和一致性失效。

Ch15 Instrumental Variable

IV Estimator

Omitted Variable bias

$$\log (\text { wage })=\beta_0+\beta_1 e d u c+\beta_2 a b i l+e$$
In this model, assume that $E(e \mid e d u c, a b i l)=0$
只想一致地估计 $\beta_1$ 不在意 $\beta_2$.

假设没有abli的数据，只进行下面回归$$y=\beta_0+\beta e d u c+u$$where $u=\gamma a b i l+e$.
Note that $E(u \mid e d u c)=E\left(\beta_2 a b i l+e \mid e d u c\right)=\beta_2 E(a b i l \mid e d u c)$. If $E(a b i l)$ changes when educ changes, then the zero conditional mean assumption is not satisfied.

$$\begin{aligned}
\hat{\beta_{OLS}} & =\frac{\sum_{i=1}^N(y_i-\bar{y})(x_i-\bar{x})}{\sum_{i=1}^N(x_i-\bar{x})^2} =\beta+\frac{\sum_{i=1}^N(x_i-\bar{x}) u_i}{\sum_{i=1}^N(x_i-\bar{x})^2} \\
\hat{\beta_{OLS}} & \stackrel{plim}{\longrightarrow} \beta+\frac{\operatorname{cov}(x, u)}{\operatorname{var}(x)} .
\end{aligned}$$
Since $E(u \mid x) \neq 0, E\left(\hat{\beta}_{O L S}\right) \neq \beta$, OLS is not unbiased
Since $\operatorname{cov}(x, u) \neq 0$, OLS is not consistent.
此时无偏性和一致性都不满足

Instrumental Variable （IV）

用z来替换x
$$y=\beta_0+\beta_1 x+u$$
仍然可以保证 $E(u)=0$,因为可以调整截距项

With $E(u \mid x) \neq 0$, we no longer have $E(x u)=0$.
Estimation idea: find another variable $z$, where$$\operatorname{Cov}(x, z) \neq 0 ; \quad \operatorname{Cov}(z, u)=0$$
- $\operatorname{Cov}(z, u)$ implies$\operatorname{Cov}(z, u)=E(u z)-E(z) E(u)=E(u z)=0$

Use $E(u z)=0$ and $E(u)=0$ to find the sample analogue, $$E(u z)=0 \quad \frac{1}{N} \sum_{i=1}^N z_i \hat{u_i}=0 $$$$E(u)=0 \quad \frac{1}{N} \sum_{i=1}^N \hat{u_i}=0$$and solve the equations.
$$\begin{aligned}
\hat{\beta_1}^{I V} & =\frac{\sum_{i=1}^N\left(y_i-\bar{y}\right)\left(z_i-\bar{z}\right)}{\sum_{i=1}^N\left(x_i-\bar{x}\right)\left(z_i-\bar{z}\right)} \\
\hat{\beta_0}^{I V} & =\bar{y}-\hat{\beta_1}^{I V} \bar{x}
\end{aligned}$$
把这个叫做 IV estimator
OLS求的值是一种特殊的IV estimator，当z=x时。

Assumptions on IV

Instrument relevance:
- 相关性
- $\operatorname{Cov}(x, z) \neq 0: z$ is relevant for explaining variation in $x$.
- $x=\pi_0+\pi_1 z+v$, 进行零检验 $H_0: \pi_1=0$.
Instrument exogeneity:
- 外生性
- $\operatorname{Cov}(u, z)=0$ : 保证了一致性.
- 不能直接从数据中检验，需要根据金融理论。

Properties and Inference with the IV Estimator

一致性满足
无偏性不满足
- consider the expectation of $\hat{\beta_1}^{IV}$ conditional on $z$$$E(\hat{\beta_1}^{I V})=\beta+E(\frac{\sum_{i=1}^N(z_i-\bar{z}) u_i}{\sum_{i=1}^N(x_i-\bar{x})(z_i-\bar{z})})=\beta+E(E[\frac{\sum_{i=1}^n(z_i-\bar{z}) u_i}{\sum_{i=1}^N(x_i-\bar{x})(z_i-\bar{z})} \mid z])$$
- 由于x不是常数，因此不能进一步化简
方差：
- 增加假设$$E[u^2 \mid z]=\sigma^2$$
- 在如上假设情况下：$$AVar(\hat{\beta_1}^{IV})=\frac{\sigma^2}{N \sigma_x^2 \rho_{x, z}^2}$$
- 其中 $\sigma_x^2$ 是总体x的方差，$\sigma^2$ 是总体u的方差， $\rho_{x,z}^2$ 是总体x和z的相关性
- The asymptotic variance of $\hat{\beta}_1^{I V}$ is
- $$\widehat{AVar}(\hat{\beta_1}^{IV})=\frac{\hat{\sigma}^2}{SST_xR_{x,z}^2}$$
  - where $S S T_x=\sum_{i=1}^n(x_i-\bar{x})^2$, and $R_{x, z}^2$ is the R-squared of $x_i$ on $z_i$.
- Note that the variance of the OLS estimator is
- $$\widehat{\operatorname{Var}}(\hat{\beta}_1^{O L S})=\frac{\hat{\sigma}^2}{S S T_x}$$
- So the IV estimator has a larger variance.
- If $x$ an $z$ are only slightly correlated, then $R_{x, z}^2$ can be small, and this translate into a large sampling variance of the IV estimator.

Two Stage Least Squares

Multiple Instrumental Variables

有可能不只一个IV
考虑模型$$y=\beta_0+\beta_1 x_1+\beta_2 x_2+u$$
Assume $x_1$ is endogenous and has two IVs: $z_1$ and $z_2$. Assume $x_2$ is exogenous.
Two-stage least squares （2SLS）: 使用两个IV的线性组合来构造新的IV

2SLS

接着上面的例子
The steps of 2SLS
1. Stage 1: estimate （using OLS）$$x_1=\alpha_0+\alpha_1 z_1+\alpha_2 z_2+\alpha_3 x_2+v$$and calculate $\hat{x}_1$.
  - 这个阶段的回归需要包含所有的外生变量
2. Stage 2: use $\hat{x}_1$ as an IV for $x_1$. Or directly estimate$$y=\beta_0+\beta_1 \hat{x}_1+\beta_2 x_2+u$$
多个内生变量时：
- 考虑有两个内生变量的模型:$$y=\beta_0+\beta_1 x_1+\beta_2 x_2+\beta_3 x_3+u$$where $x_1$ and $x_2$ are endogenous （whose IV are $z_1$ and $z_2$）, $x_3$ is exogenous.
- In the first stage, we need to include all instruments and exogenous variables on the right hand side
- $$x_1=\alpha_0+\alpha_1 z_1+\alpha_2 z_2+\alpha_3 x_3+v$$$$x_2=\gamma_0+\gamma_1 z_1+\gamma_2 z_2+\gamma_3 x_3+v$$
IV的数量应该大于等于内生变量的数量

Issues with IV

Sample size

需要一个大样本，因为在2SLS的第一步中，$x=\alpha_0+\alpha_1z+e$
其中的$\alpha_0+α_1z$ 是与u无关的，e是与u相关的
在2SLS第二步中，我们希望用$\alpha_0+\alpha_1 z$ 来表示x,但实际上是用 $\hat{\alpha}_0+\hat{\alpha}_1 z$. 后项可能包含关于u的信息。
因此需要大样本。

Weak instruments

Weak instruments: low correlation between $x$ and $z$
Suppose there is some small correlated between $u$ and $z$$$\begin{aligned}\operatorname{plim}(\hat{\beta}_1^{I V}) & =\beta_1+\frac{\operatorname{Cov}(z, u)}{\operatorname{Cov}(z, x)} \\ & =\beta_1+\frac{\operatorname{Corr}(z, u)}{\operatorname{Corr}(z, x)} \cdot \frac{\sigma_u}{\sigma_x},\end{aligned}$$where $\sigma_u$ and $\sigma_x$ are the standard deviations of $u$ and $x$ in the population respectively.
We can show that$$\operatorname{plim}(\hat{\beta}_1^{O L S})=\beta_1+\operatorname{Corr}(x, u) \cdot \frac{\sigma_u}{\sigma_x}$$
If $\operatorname{Corr}(z, x)$ is small enough, then even if $\operatorname{Corr}(z, u)$ is small, the IV estimator could result in larger asymptotic bias than the OLS estimator.
对于无偏性在小样本下也有影响。

检验弱相关

$$y=\beta_0+\beta_1 x_1+\beta_2 x_2+u$$
Assume $x_1$ is endogenous with two instrumental variables $z_1$ and $z_2$, and $x_2$ is exogenous.
Estimate$$x_1=\alpha_0+\alpha_1 z_1+\alpha_2 z_2+\alpha_3 x_2+e$$
Test $H_0: \alpha_1=\alpha_2=0$
当F-stat大于10的时候可以说没有弱相关性

Testing for endogeneity

检验x与u是否相关
Suppose $x_1$ is endogenous, and the IV is $z$$$y=\beta_0+\beta_1 x_1+\beta_2 x_2+u$$
First stage: $x_1=\alpha_0+\alpha_1 z+\alpha_2 x_2+v$. If $x_1$ is correlated with $u$, then it must be $v$ is correlated with $u$. Estimate the equation to get $\hat{v}$.

Estimate $y=\delta_0+\delta_1 x_1+\delta_2 x_2+\delta_3 \hat{v}+e$. Test $H_0: \delta_3=0$.

Testing overidentifying restrictions

当z比x多的时候，可以比较好地推断 $Cov(z,u)=0$
Suppose $x_1$ is endogenous, and the IVs are $z_1$ and $z_2$.
Use either one of them, we calculate $\hat{\beta}_1^{I V 1}$ and $\hat{\beta}_1^{I V 2}$
If $\hat{\beta}_1^{I V 1}$ is very different from $\hat{\beta}_1^{I V 2}$, then at least one of them does not satisfy $\operatorname{Cov}(z, u)=0$.
If they are close to each other, then it could be both satisfies $\operatorname{Cov}(z, u)=0$, or neither.
当z很多的时候
- Testing overidentifying restrictions:
  1. 使用2SLS估计，并得到2SLS残项 $\hat{u}_1$.
  2. Regress $\hat{u}_1$ on all exogenous variables. Obtain the $R$-squared, say $R^2$.
  3. 若所有IV都与 $u_1$ 无关，则 $N \cdot R^2 \sim \chi_q^2$, where $q$ is the number of instrumental variables from outside the model minus the total number of endogenous explanatory variables.
  4. Reject $H_0$ if $N \cdot R^2$ exceeds the critical value.

Ch17 Limited Dependent Variable Models

Linear Probability model

Limited Dependent Variable

因变量只能取特定值
In the population, $y$ takes on two values: 0 and 1 . We are interested in how $x$ will affect $y$.
Suppose $x$ and $y$ has this linear relation:$$y=\beta_0+\beta_1 x+u$$
Suppose $E(u \mid x)=0$. Then$$E(y \mid x)=P(y=1 \mid x)=\beta_0+\beta_1 x$$$\beta_1$ represents when $x$ increases by one unit, the impact on the probability that $y=1$. In other words, $\beta_1.$ measures the marginal effect of $x$ on the probability that $y=1$
y是否是二元变量都不影响对这个模型的解释
- Descriptive： $\beta_1$ is the expected difference in the probability that $y=1$ if $x$ changes by one unit.
- Causal: one unit increase in $x$ causes the probability of $y=1$ to change by $\beta_1$ on average.
这个模型违反了同方差性，因为方差是x的函数，可以用FGLS来估算。

Non-linear Model and Maximum Likelihood

Consider the following non-linear model:$$E(y \mid x)=P(y=1 \mid x)=G(\beta_0+\beta_1 x)$$where $G$ is a function mapping values to the range of 0 and 1 , to make sure $E(y \mid x)$ belongs to 0 and 1 .
$G$ can have different functional forms. We consider two common ones:
- logistic function （logit）
  - $$G(z)=\frac{\exp (z)}{1+\exp (z)}$$
- standard normal CDF （probit）
  - $$G(z)=\Phi(z)$$

Properties of Logit and Probit

y的这两种分布取决于残项e的分布
Suppose random variable $e$ has a CDF:$$\operatorname{Pr}(e \leq z)=G(z)$$Here $G(z)$ can be either logit or probit.
Let $y^*=\beta_0+\beta_1 x+e$, where $e$ is independent of $x$.
$$\begin{aligned} P(y=1 \mid x) & =P(y^*>0 \mid x) =P(\beta_0+\beta_1 x+e>0 \mid x) \\ & =P(e>-\beta_0-\beta_1 x) =1-\operatorname{Pr}(e \leq-\beta_0-\beta_1 x) \\ & =1-G\left(-\beta_0-\beta_1 x\right)=G\left(\beta_0+\beta_1 x\right) .\end{aligned}$$

Partial effect of x on y

the marginal effect of $x$ on the probability that $y=1$$$\frac{\partial p(x)}{\partial x_1}=g(\beta_0+\beta_1 x) \beta_1$$where $p(x)=P(y=1 \mid x), g(z) \equiv \frac{d G}{d z}(z)$.
- 即对上面的式子求导
When we have more than one independent variables:$$\frac{\partial p(x)}{\partial x_j}=g(\beta_0+\boldsymbol{x} \boldsymbol{\beta}) \beta_j$$where $\boldsymbol{x} \boldsymbol{\beta}=\beta_1 x_1+\ldots+\beta_k x_k$.

So the ratio of the partial effect of $x_j$ and $x_k$ is $\frac{\beta_j}{\beta_k}$.
这种方法和OLS比的弱点在于这个偏导数的前项g取决于x
解决办法：
1. 找特殊点，例如均值点。
  - Partial effect at the average:$$g(\hat{\beta_0}+\overline{\boldsymbol{x}}\hat{\beta_{\mathbf{1}}})=g(\hat{\beta_0}+\hat{\beta_1} \bar{x_1}+\hat{\beta_2} \bar{x_2}+\cdot+\hat{\beta_k} \bar{x_k})$$
2. 求偏导的均值
  - Average marginal effect:$$[N^{-1} \sum_{i=1}^N g(\hat{\beta_0}+x_{i} \hat{\beta_1})] \hat{\beta_j}$$

Maximum Likelihood Estimation

估计：在样本中观察到某些值时y=1，另外一些值时y=0
使用最大似然估计找到 $\beta$ 使得这种成立的概率最大。
下面就是一个普通的求最大似然估计的过程，当作概统复习：
Suppose we have a random sample of size $N$. Fix every $x$ and $\beta$, the probability that $y=1$ is:$$E(y \mid x)=P(y=1 \mid x)=G(\beta_0+\beta_1 x) \equiv G(\beta x) $$
Then for any observation $y=0$ or $y=1$, its probability density function is:$$f(y \mid \beta x)=G(\beta x)^y[1-G(\beta x)]^{(1-y)}$$
For a random sample, all observations are independent of each other. Then the probability that we observe the sample is: （ $i$ is the index for each observation）$$f({y_1, \ldots, y_N} \mid \beta x_i)=\prod_{i=1}^N[G(\beta x_i)]^{y_i}[1-G(\beta x_i)]^{(1-y_i)}$$

Maximum likelihood estimation (MLE): maximize the probability that we observe the data:$$\max_{\boldsymbol{\beta}} f({y_1, \ldots, y_N} \mid \beta x_i)=\max_{\boldsymbol{\beta}} \prod_{i=1}^N[G(\beta x_i)]^{y_i}[1-G(\beta \boldsymbol{x}_{\boldsymbol{i}})]^{(1-y_i)}$$

Take the natural logarithm and define:$$\ell_i(\beta)=\log ([G(\beta x_i)]^{y_i}[1-G(\beta x_i)]^{(1-y_i)})=y_i \log [G( x_i)]+(1-y_i) \log [1-G(\beta x_i)]$$
Then we can equivalently write:$$\max_\beta \sum_{i=1}^N \ell_i(\beta)=\max_{\boldsymbol{\beta}} \sum_{y_i=1} \log [G(\beta x_{\boldsymbol{i}})]+\sum_{y_i=0} \log [1-G(\beta x_{\boldsymbol{i}})]$$
MLE is consistent and asymptotically efficient.

MLE and OLS

OLS用来估算线性模型
MLE用来估计线性和非线性模型
当u服从正态分布时，MLE与OLS结果相同

Appendix

$\color{red}{\text{Law of Iterated Expectation:}}$
- $$\color{red}{E(y)=E[E(y|x)]}$$
Summation operation
- $$\sum_{i=1}^N(x_i-\bar{x})(y_i-\bar{y})=\sum_{i=1}^N(x_i-\bar{x})y_i=\sum_{i=1}^N(x_iy_i-\bar{x}\bar{y})$$
Variance
- $$\begin{aligned}
  \operatorname{Var}(X+a) & =\operatorname{Var}(X) \\
  \operatorname{Var}(a X) & =a^2 \operatorname{Var}(X) \\
  \operatorname{Var}(X) & =\operatorname{Cov}(X, X) \\
  \operatorname{Var}(a X+b Y) & =a^2 \operatorname{Var}(X)+b^2 \operatorname{Var}(Y)+2 a b \operatorname{Cov}(X, Y) \\
  \operatorname{Var}\left(\sum_{i=1}^N a_i X_i\right) & =\sum_{i, j=1}^N a_i a_j \operatorname{Cov}(X_i, X_j) \\
  & =\sum_{i=1}^N a_i^2 \operatorname{Var}(X_i)+\sum_{i \neq j} a_i a_j \operatorname{Cov}(X_i, X_j) \\
  & =\sum_{i=1}^N a_i^2 \operatorname{Var}(X_i)+2 \sum_{i=1}^N \sum_{j=i+1}^N a_i a_j \operatorname{Cov}(X_i, X_j) .
  \end{aligned}$$

转载规则

《计量经济学笔记》由 Frank Yu 采用知识共享署名 4.0 国际许可协议进行许可。

《小丑》影评

小丑影评+chatgpt部分内容

2023-04-13 Waffle

Movie Joker

投资学笔记

复习，朱英姿老师投资学课件，记录笔记

2023-03-25 learning

Notes Finance

population expectations	sample analogue
$E(u)=0$	$\frac{1}{N}\sum \hat{u_i}=0$
$E(ux)=0$	$\frac{1}{N}\sum x_i\hat{u_i}=0$

Ch1 Introductory

what is econometrics

basic types

Descriptive

Forecasting

Causal （for structural）

Structure of Economic data

Cross-sectional data

Time-Series data

Pooled Cross Sections and Panel or Longitudinal Data

Ch2 The Simple Regression Model:

Interpretation and Estimation

Descriptive analysis

Simple Linear model

Method of Moments

Causal Estimation

Forecasting

Properties of Simple Regression Model

Properties of OLS on Any Sample of Data

Goodness of Fit

Functional form

Expected Values and Variances of the OLS Estimators

Unbiasedness of OLS

假设：SLR（simple linear regression）

估计$\sigma$

Ch3 Multiple Regression Analysis: Estimation

Why we need multiple regression model?

Estimation and Interpretation

Sample analog

OLS

Interpretation

Goodness of fit

Expected Values and Variances of the OLS Estimators

假设：MLR（multiple linear regression）

BLUE

Practical issues

Omitted bias

including irrelavent

Multicollinearity

Ch4 Multiple Regression Analysis: Inference

Classical Linear Regression Model

The Distribution of $\hat{β_j}$

t检验

Null hypothesis

rejection region

Testing Against One-Sided Alternatives

The idea of test

Two-sided Alternatives

Other Hypothesis

p-Values for t Tests

Economic versus Statistical Signiicance

Confidence interval

Testing Multiple Linear Restrictions: The F Test

Testing Multiple Linear Restrictions: The LM statistic

Ch5 Multiple Regression Analysis: OLS Asymptotics

Asymptotic Properties

Consistency

Consistency of OLS

Central Limit Theorem

Asymptotic Normality of OLS

Summary

Ch6 Multiple Regression Analysis: Further Issues

Efects of Data Scaling on OLS Statistics

changing unit of measurement

Unit Change in Logarithmic Form

Beta Coefficients

More on Functional Form

More on Goodness of Fit

Prediction Analysis

confidence interval for E(y|x)

Prediction Interval

Ch7 Multiple Regression Analysis with Qualitative Information

A Single Dummy Independent Variable

线性相关

Without intercept

Using Dummy Variables for Multiple Categories

Interactions Involving Dummy Variables

Testing for Differences in Regression Functions across Groups

Chow statistic

Program Evaluation