Ch3 III The Simple Linear Regression Model

硕士计量1-02-SimpleRegressionModel

The population regression line: TestScorei = 0 + 1STRi + ui 1 = slope of population regression line =

= change in test score for a unit change in STR Why are 0 and 1 “population” parameters? We would like to know the population value of 1. We don’t know 1, so must estimate it using data. ui incorporates all other factors that might affect TestScorei
minˆ ,ˆ
0
1
ˆ ˆ X ) Y (
n i 1 i 0 1 i
2

This minimization problem can be solved using calculus. The result is the OLS estimators of 0 and 1.
i 1
x
n
2
cov( x, y ) var( x )
2
provided that xi x 0
i 1
The OLS Estimator, Predicted Values, and Residuals

The OLS estimators of the slop 1 and the intercept 0 are:
i i i i i i i
2 ( X X ) ( Xi X )Xi ( Xi X )X ( X i X )X i 0 i

ch3 双变量线性回归模型(数学)-1

极大似然法的基本原理：用产生该样本概率最大的原则确定样本回归函数。
在满足基本假设条件下，对一元线性回归模型： Yi 0 1 X i ui
随机抽取n组样本观测值（Xi, Yi）（i=1,2,…n）。
假如模型的参数估计量已经求得，为那么Yi服从如下的正态分布：
3、总体回归函数（PRF）回归分析关心的是根据解释变量的已知或给
定值，考察被解释变量的总体均值，即当解释变量取某个确定值时，与之统计相关的被解释变量所有可能出现的对应值的平均值。
E (Y | X i ) f ( X i )
例1 一个假想的社区有100户家庭组成，要研究该社区每月家庭消费支出Y与每月家庭可支配收入X的关系。即如果知道了家庭的月收入，能否预测该社区家庭的平均月消费支出水平。为达到此目的，将该100户家庭划分为组内收入差不多的10组，以分析每一收入组的家庭消费支出。
注意 ①不线性相关并不意味着不相关。 ②有相关关系并不意味着一定有因果关系。 ③相关分析对称地对待任何（两个）变量，两个变量都被看作是随机的。回归分析对变量的处理方法存在不对称性，即区分应变量（被解释变量）和自变量（解释变量）：前者是随机变量，后者不是。
回归分析构成计量经济学的方法论基础，其主要内容包括：（1）根据样本观察值对经济计量模型参数进行估计，求得回归方程；（2）对回归方程、参数估计值进行显著性检验；（3）利用回归方程进行分析、评价及预测。
第三章经典单方程计量经济学模型：双变量线性回归模型
一、回归分析概述二、双变量线性回归模型的参数估计三、最小二乘估计量的性质四、拟合优度的测度五、双变量回归中的区间估计和假设检验六、双变量线性回归模型的应用——预测小结：本章知识结构图

简单线性回归模型E

y = 應變數的平均數
n = 觀察值的個數
15 第14章簡單線性迴歸第506頁
最小平方法實例
• 以亞曼披薩屋為例，說明最小平方法。假定資料來自 10 間鄰近大學校園的分店。對於樣本中第 i 個觀察值或第 i 間餐廳而言，xi 為學生人數 (單位：千人)；yi 為每季銷售額 (單位：$1000)。10 間餐廳之 xi 與 yi 值彙整於表 14.1。
其中
• 以 yi 表示餐廳 i 每季銷售額的觀察 (實際) 值，而以式 (14.4) 中之 y ˆi 表示餐廳 i 銷售額的預測值，樣本中每間餐廳均有銷 ˆi。為了使估計迴歸線能非常售額的實際觀察值 yi 與估計值 y 配適這些資料，我們希望銷售額的實際觀察值與預測值的差距是小的。
20
第14章簡單線性迴歸第504-505頁
• 例如
∗ 在分析廣告費用對銷售額的影響時，行銷經理要預測的是銷售額，所以銷售額為應變數；廣告費用則是用來預測銷售額之自變數。以統計符號而言，y 表示應變數，而 x 表示自變數。
4
第14章簡單線性迴歸第501頁
簡單線性迴歸模型
• 簡單線性迴歸：僅牽涉到單一自變數與單一應變數，而且兩變數間的關係近似直線。這種類型稱為簡單線性迴歸 (simple linear regression)。 • 複迴歸分析：牽涉兩個或以上自變數的迴歸分析稱為複迴歸分析 (multiple regression analysis) 。
14 第14章簡單線性迴歸第504-505頁
最小平方法
• 估計迴歸方程式的斜率與 y 截距
( xi x )( yi y ) b1 2 ( xi x )
b0 y b1 x

硕士计量103SimpleRegressionModel2

Unbiasedness Summary
The OLS estimates of 1 and 0 are
unbiased
Proof of unbiasedness depends on our 4 assumptions – if any assumption fails, then OLS is not necessarily unbiased
And
å å( ) uˆ = n-1 uˆ = n-1 u - (bˆ0 - b0 ) - (bˆ1 - b1)x = u - (bˆ0 - b0 ) - (bˆ1 - b1)x = 0
So uˆ2 = (uˆ - uˆ )2 = [(u - u) - (bˆ1 - b1)(x - x)]2 uˆ2 = (u - u)2 - 2(bˆ1 - b1)(x - x )(u - u) + (bˆ1 - b1)2(x - x)2
ns2s2(n1)s2
s s n E u 2nE (u ˆ1...u ˆn)2n22
n 2
n
Proof that E(s^ 2)= s2 (Cont.)
2nd, we need to find
E(ˆ11)2(xx)2
Since we have shown earlier
Estimating the Error Variance
We don’t know what the error variance,
s2, is.
We can estimate it from n-1ui But we don’t really observe the errors, ui What we do observe are the residuals, ûi We can use the residuals to form an

回归分析

Regression Analysis 回归分析
y

x
5
Regression Analysis
变量间的关系
（函数关系）
函数关系的例子
回归分析
某种商品的销售额 (y) 与销售量 (x) 之间的关系可表示为 y = p x (p 为单价) 圆的面积(S)与半径之间的关系可表示为S = r2
样本相关系数的定义公式是：
r
( X X )(Y Y ) ( X X ) (Y Y )
t t 2 t t
2
上式中， X 和 Y 分别是Ｘ和Ｙ的样本平均数。样本相关系数是根据样本观测值计算的，抽取的样本不同，其具体的数值也会有所差异。容易证明，样本相关系数是总体相关系数的一致估计量。
r的取值相关程度
|r|＜0.3 不线性相关
0.3≤|r|<0.5 0.5≤|r|<0.8
|r|≥0.8
低度线性相中度线性相高度线性关关相关
23
Regression Analysis 回归分析
•
3.如果|ｒ|=1，则表明Ｘ与Ｙ完全线性相关，当ｒ=1时，称为完全正相关，而ｒ=-1时，称为完全负相关。
相关分析（Correlation Analysis）是用于度量两个
数值变量间的关联程度
3
Regression Analysis 回归分析
一、函数关系与相关关系
1.函数关系
当一个或几个变量取一定的值时，另一个变量有确定值与之相对应，我们称这种关系为确定性的函数关系。
4
（函数关系）
（1）是一一对应的确定关系（2）设有两个变量 x 和 y ，变量 y 随变量 x 一起变化，并完全依赖于 x ，当变量 x 取某个数值时， y 依确定的关系取相应的值，则称 y 是 x 的函数，记为 y = f (x)，其中 x 称为自变量，y 称为因变量（3）各观测点落在一条线上

计量经济学(英文版).

Chapter 4 Statistical Properties of the OLS Estimators
Xi’An Institute of Post & Telecommunication Dept of Economic & Management Prof. Long
Simple Linear Regression Model y t = b1 + b 2 x t + e t
b1 + b2 x t
Assumptions of the Simple Linear Regression Model yt = b1 + b2x t + e t 2. E(e t) = 0 <=> E(yt) = b1 + b2x t
1.
3. var(e t)
4.3
=
4.
5.
cov(e i,e j)
x t c for every observation
= cov(yi,yj)
s 2 = var(yt)
= 0
6.
e t~N(0,s 2) <=> yt~N(b1+ b2x t,
The population parameters b1 and b2 are unknown population constants.
4.2
yt = household weekly food expenditures
x t = household weekly income
For a given level of x t, the expected level of food expenditures will be: E(yt|x t) =

regression

Slope = 0.10977
Intercept = 98.248
house price 98.24833 0.10977 (square feet)
பைடு நூலகம்
Interpretation of the Intercept, b0
house price 98.24833 0.10977 (square feet)
Total
9
32600.5000
Coefficients Intercept Square Feet 98.24833 0.10977
Standard Error 58.03348 0.03297
t Stat 1.69296 3.32938
P-value 0.12892 0.01039
Lower 95% -35.57720 0.03374
Regression
Correlation vs. Regression
A scatter diagram can be used to show the relationship between two variables Correlation analysis is used to measure strength of the association (linear relationship) between two variables
Least Squares Method
b0 and b1 are obtained by finding the values of b0
and b1 that minimize the sum of the squared differences between Yi and Y : ˆ

第五章经典线性回归模型(II)(高级计量经济学清华大学潘文清)

X1’X1b1+X1’X2b2=X1’Y (*) X2’X1b1+X2’X2b2=X2’Y (**) 由(**)得 b2=(X2’X2)-1X2’Y-(X2’X2)-1X2’X1b1 代入(*)且整理得： X1’M2X1b1=X1’M2Y b1=(X1’M2X1)-1X1’M2Y=X1-1M2Y=b* 其中，M2=I-X2(X2’X2)-1X2’ 又 M2Y=M2X1b1+M2X2b2+M2e1 而 M2X2=0， M2e1=e1-X2(X2’X2)-1X2’e1=e1 则 M2Y=M2X1b1+e1 或 e1=M2Y-M2X1b1=e* 或
X2=X1Q1+(I-P1)X2 =explained part + residuals
其中，Q1=(X1’X1)-1X1’X2
对
X2=X1Q1+(I-P1)X2 =X1Q1+M1X2
=explained part + residuals
M1X2就是排除了X1的其他因素对X2的“净”影响。
X2对X1的回归称为辅助回归(aon: 如何测度X1对Y的“净”影响？部分回归(Partial regression) Step 1: 排除X2的影响。将Y对X2回归，得“残差”M2Y=[(I-X2(X2’X2)-1X2’]Y 将X1对X2回归，得“残差”M2X1=[(I-X2(X2’X2)1X ’]X 1 M 2Y为排除了 X 的净Y，M X 为排除了X 的净X
2 2 2 1 2
1
Step 2: 估计X1对Y的“净”影响。
将 M2Y对M2X1回归，得X1对Y的“净”影响：
M2Y=M2X1b*+e*
这里，b*=[(M2X1)’(M2X1)]-1(M2X1)’M2Y=X1-1M2Y e*=M2Y-M2X1b*

3 simple regression

Chapter three two-variable simple regression analysis · Regression

e.g. consumption function Population regression line(PRL)总体回归线 Population regression curve Regression of Y on X

·Population regression function(PRF or conditional expectation function,CEF; or population regression, PR)

1. Functional form: linear function(assumption) Regression coefficient(回归系数), intercept（截距）,slope coefficient(斜率) 2. Linearity in parameters

c-d production function non-linearity 3. stochastic specification of PRF deviation(离差) is known as stochastic disturbance, or stochastic error term, or nonsystematic component

systematic or deterministic component

4. the significance of the stochastic disturbance term 5. tmerminolgoy 因变量、被解释变量、预测子、回归子、响应变量、内生变量、结果、被控制变量解释变量、自变量、预测元、回归元、刺激变量、外生变量、共变、控制变量 ·the smple regression function(SRF) 1. sample regression lines(SRL)

2.Simple linear regression examples(简单线性回归案例)

The interval does not contain 0, hence the director of admissions has a statistically significant result indicating that there is a positive linear relationship between GPA and ACT.
7
Example 2 (Cont’d)
b) Test whether or not a linear association exists between student’s ACT score (X) and GPA at the end of the freshman year (Y). Use a level of significance of .01. ¾ Hypotheses Null Hypothesis Alternative Hypothesis H0: ß1 = 0 H1: ß1 ≠ 0
1 1
2.11405 0.03883
0.32089 0.01277
6.59 3.04
<.0001 0.0029
0 0.26948
1.27390 0.00539
2.95420 0.07227
¾ From the SAS output, the 99% confidence interval for ß1 is (0.00539, 0.07227). ¾ Alternative method: t(.005, 118) = 2.61814, therefore, the confidence interval for ß1 is (0.03883 – 2.61814(0.01277), 0.03883 – 2.61814(0.01277)) = (0.0054, 0.07226)