Introductory Econometrics for Finance习题答案1
An Introduction To Econometrics - 武汉工程大学-精品课程

计量经济模型
引入扰动项u后,将需求函数写为:
Q = α +β P + u
(2)
这是一个计量经济模型,这种类型的计量经济模型也 叫做线性回归模型。在这样一个模型中,扰动项 u 代表 所有那些影响 Q 但未被显式地引入模型的因素以及纯粹 的随机因素。 经济学家与计量经济学家的主要区别是后者关心扰动 项。没有扰动项的关系称为精确的或确定的关系,而有 扰动项的关系称为随机的关系。当我们用一个随机关系 式来预测被解释变量的精确值时,结果往往有误差,扰 动项被用来估量这些“误差”的大小。
25
数据表和散点图
表 1.1
Q
P
0 1 2 3 4 5
Q
78 70 69 63 60 58
80 70 60
x
-
x x x x
x
i 1 i 2 i 3 i 4 i 5
P
26
图 1.3
图1.3显示的是一种近似线性而非严格线性 的关系。为什么不是所有6个点都位于数学模 型(1)所规定的直线上呢?这是因为我们在 导出需求曲线时假定所有影响Q的其它变量 保持不变,而实际上它们通常要变,这种变 动会对Q产生一些影响。结果是,观测到的Q 和P 的关系可能不精确。
18
二. 计量经济分析的步骤
一般说来,计量经济分析按照以下步骤进行:
(1)陈述理论(或假说) (2)建立计量经济模型 (3)收集数据 (4)估计参数 (5)假设检验 (6)预测和政策分析
让我们通过一个例子来说明上述步骤。假设某空调生产 商请一个计量经济学家为他研究价格上涨对空调需求量 的影响,该计量经济学家按上述步骤进行: 19
12
5.我国计量经济学研究状况
由于认识上的原因,我国对计量经济学的广泛研究 和应用起步较晚,始于70 年代后期。经过这些年的发 展,已经取得了长足的进步,很多政府部门和学术机 构建立了计量经济模型进行经济预测和政策分析。我 们已大大缩小了在此领域与先进国家的差距。可以预 见,计量经济学在促进我国国民经济的发展中将发挥 越来越大的作用。
第1章 INTERMEDIATE ECONOMETRICS-原版教材

Steps in Empirical Econometric Analysis
Specify hypothesis of interest in terms of the unknown parameters . Use econometric methods to estimate the parameters and formally test the hypothesizes of interest .
14
What can econometrics do for us?
Overall, we use econometrics to explain phenomena of economic nature, make policy recommendations and make forecasts about the future.
19
Steps in Empirical Econometric Analysis
Summary Econometrics is used in all applied economic fields to test economic theories, to inform government and private policy makers, and to predict economic time series. Sometimes, an econometric model is derived from a formal economic model, but in other cases, econometric models are based on informal economic reasoning and intuition. The goal of any econometric analysis is to estimate the parameters in the model and to test hypotheses about these parameters; the values and signs of the parameters determine the validity of an economic theory and the effects of certain policies.
Introductory.Econometrics.

Types of Data – Time Series
Time series data has a separate observation for each time period – e.g. stock prices Since not a random sample, different problems to consider Trends and seasonality will be important
Economics 20 - Prof. Anderson 4
Types of Data – Panel
Can pool random cross sections and treat similar to a normal cross section. Will just need to account for time differences. Can follow the same random individual observations over time – known as panel data or longitudinal data
Economics 20 - Prof. Anderson 3
Types of Data – Cross Sectional
Cross-sectional data is a random sample Each observation is a new individual, firm, etc. with information at a point in time If the data is not a random sample, we have a sample-selection problem
第1章 Introductory Econometrics for Finance(金融计量经济学导论-东北财经大学 陈磊)

1.2 The Special Characteristics of Financial Data
• 宏观经济计量分析的数据问题:
• 小样本;测量误差与数据修正
1-13
• 金融数据的观测频率高,数据量大 • 金融数据的质量高
这些意味着可以采用更强有力的分析技术,研究结果也更 可靠。
• 金融数据包含很多噪音(noisy),更难以从随机 的和无关的变动中分辨出趋势和规律 • 通常不满足正态分布 • 高频数据经常包含反映市场运行方式的、但人们并 不感兴趣的其它模式(pattern) ,需要在建模时加以 考虑
1-15
Types of Data
• Problems that Could be Tackled Using a Time Series Regression - How the value of a country’s stock index has varied with that country’s macroeconomic fundamentals. - How the value of a company’s stock price has varied when it announced the value of its dividend payment. - The effect on a country’s currency of an increase in its interest rate • Cross-sectional data(截面数据) are data on one or more variables collected at a single point in time, e.g. - Cross-section of stock returns on the New York Stock Exchange - A sample of bond credit ratings for UK banks
Introductory Econometrics_A Modern Appr,4E,Wooldridge_Jeffrey_soulution manual_ch18

110CHAPTER 18SOLUTIONS TO PROBLEMS18.1 With z t 1 and z t 2 now in the model, we should use one lag each as instrumental variables, z t-1,1 and z t-1,2. This gives one overidentifying restriction that can be tested.18.3 For δ ≠ β, y t – δz t = y t – βz t + (β – δ)z t , which is an I(0) sequence (y t – βz t ) plus an I(1) sequence. Since an I(1) sequence has a growing variance, it dominates the I(0) part, and the resulting sum is an I(1) sequence.18.5 Following the hint, we havey t – y t -1 = βx t – βx t -1 + βx t -1 – y t -1 + u torΔy t = βΔx t – (y t -1 – βx t -1) + u t .Next, we plug in Δx t = γΔx t -1 + v t to get Δy t = β(γΔx t -1 + v t ) – (y t -1 – βx t -1) + u t = βγΔx t -1 – (y t -1 – βx t -1) + u t + βv t≡ γ1Δx t -1 + δ(y t -1 – βx t -1) + e t ,where γ1 = βγ, δ = –1, and e t = u t + βv t .18.7 If unem t follows a stable AR(1) process, then this is the null model used to test for Granger causality: under the null that gM t does not Granger cause unem t , we can write unem t = β0 + β1unem t-1 + u t E(u t |unem t -1, gM t -1, unem t -2, gM t -2, …) = 0and |β1| < 1. Now, it is up to us to choose how many lags of gM to add to this equation. The simplest approach is to add gM t -1 and to do a t test. But we could add a second or third lag (and probably not beyond this with annual data), and compute an F test for joint significance of all lags of gM t .18.9 Let 1ˆn e+ be the forecast error for forecasting y n +1, and let 1ˆn a + be the forecast error for forecasting Δy n +1. By definition, 1ˆn e+= y n +1 − ˆnf = y n +1 – (ˆng + y n ) = (y n +1 – y n ) − ˆng = Δy n +1 − ˆn g = 1ˆn a +, where the last equality follows by definition of the forecasting error for Δy n +1.111SOLUTIONS TO COMPUTER EXERCISESC18.1 (i) The estimated GDL model isgprice = .0013 + .081 gwage + .640 gprice -1(.0003) (.031) (.045) n = 284, R 2 = .454.The estimated impact propensity is .081 while the estimated LRP is .081/(1 – .640) = .225. The estimated lag distribution is graphed below.above the estimated IP for the GDL model. Further, the estimated LRP from GDL model is much lower than that for the FDL model, which we estimated as 1.172. Clearly we cannot think of the GDL model as a good approximation to the FDL model. One reason these are so different can be seen by comparing the estimated lag distributions (see below for the GDL model). With the FDL, the largest lag coefficient is at the ninth lag, which is impossible with the GDL model (where the largest impact is always at lag zero). It could also be that {u t } in equation (18.8) does not follow an AR(1) process with parameter ρ, which would cause the dynamic regression to produce inconsistent estimators of the lag coefficients.112(iii) When we estimate the RDL from equation (18.16) we obtaingprice = .0011 + .090 gwage + .619 gprice -1 + .055 gwage -1(.0003) (.031) (.046) (.032)n = 284, R 2 = .460.The coefficient on gwage -1 is not especially significant but we include it in obtaining theestimated LRP. The estimated IP is .09 while the LRP is (.090 + .055)/(1 – .619) ≈ .381. These are both slightly higher than what we obtained for the GDL, but the LRP is still well below what we obtained for the FDL in Problem 11.5. While this RDL model is more flexible than the GDL model, it imposes a maximum lag coefficient (in absolute value) at lag zero or one. For the estimates given above, the maximum effect is at the first lag. (See the estimated lag distributiontt pcip = 1.80 + .349 pcip t-1 + .071 pcip t-2 + .067 pcip t-2(0.55) (.043) (.045) (.043)n = 554, R 2 = .166, ˆσ = 12.15.When pcip t-4 is added, its coefficient is .0043 with a t statistic of about .10.(ii) In the model113pcip t = δ0 + α1pcip t -1 + α2pcip t -2 + α3pcip t -3 + γ1pcsp t -1 + γ2pcsp t -2 + γ3pcsp t -3 + u t ,The null hypothesis is that pcsp does not Granger cause pcip . This is stated as H 0: γ1 = γ2 = γ3 = 0. The F statistic for joint significance of the three lags of pcsp t , with 3 and 547 df , is F = 5.37 and p -value = .0012. Therefore, we strongly reject H 0 and conclude that pcsp does Granger cause pcip .(iii) When we add Δi3t -1, Δi3t -2, and Δi3t -3 to the regression from part (ii), and now test the joint significance of pcsp t -1, pcsp t -2, and pcsp t -3, the F statistic is 5.08. With 3 and 544 df in the F distribution, this gives p -value = .0018, and so pcsp Granger causes pcip even conditional on past Δi3.C18.5 (i) The estimated equation is6t hy = .078 + 1.027 hy3t -1 − 1.021 Δhy3t − .085 Δhy3t -1 − .104 Δhy3t -2 (.028) (0.016) (0.038) (.037) (.037)n = 121, R 2 = .982, ˆσ = .123.The t statistic for H 0: β = 1 is (1.027 – 1)/.016 ≈ 1.69. We do not reject H 0: β = 1 at the 5% level against a two-sided alternative, although we would reject at the 10% level.(ii) The estimated error correction model is6t hy = .070 + 1.259 Δhy3t -1 − .816 (hy6t -1 – hy3t -2)(.049) (.278) (.256)+ .283 Δhy3t -2 + .127 (hy6t -2 – hy3t -3) (.272) (.256)n = 121, R 2 = .795.Neither of the added terms is individually significant. The F test for their joint significance gives F = 1.35, p -value = .264. Therefore, we would omit these terms and stick with the error correction model estimated in (18.39).C18.7 (i) The estimated linear trend equation using the first 119 observations and excluding the last 12 months istchnimp = 248.58 + 5.15 t (53.20) (0.77) n = 119, R 2 = .277, ˆσ= 288.33.114The standard error of the regression is 288.33.(ii) The estimated AR(1) model excluding the last 12 months istchnimp = 329.18 + .416 chnimp t -1 (54.71) (.084)n = 118, R 2 = .174, ˆσ = 308.17.Because ˆσis lower for the linear trend model, it provides the better in-sample fit.(iii) Using the last 12 observations for one-step-ahead out-of-sample forecasting gives an RMSE and MAE for the linear trend equation of about 315.5 and 201.9, respectively. For the AR(1) model, the RMSE and MAE are about 388.6 and 246.1, respectively. In this case, the linear trend is the better forecasting model.(iv) Using again the first 119 observations, the F statistic for joint significance of feb t , mar t , …, dec t when added to the linear trend model is about 1.15 with p -value ≈ .328. (The df are 11 and 107.) So there is no evidence that seasonality needs to be accounted for in forecasting chnimp .C18.9 (i) Using the data up through 1989 givesˆy = 3,186.04 + 116.24 t + .630 y t-1t(.148)(46.31)(1,163.09)n = 30, R2 = .994, ˆσ = 223.95.(Notice how high the R-squared is. However, it is meaningless as a goodness-of-fit measure because {y t} has a trend, and possibly a unit root.)(ii) The forecast for 1990 (t = 32) is 3,186.04 + 116.24(32) + .630(17,804.09) ≈18,122.30, because y is $17,804.09 in 1989. The actual value for real per capita disposable income was $17,944.64, and so the forecast error is –$177.66.(iii) The MAE for the 1990s, using the model estimated in part (i), is about 371.76.Withouty t-1 in the equation, we obtain(iv)ˆy = 8,143.11 + 311.26 tt(5.64)(103.38)n = 31, R2 = .991, ˆσ = 280.87.The MAE for the forecasts in the 1990s is about 718.26. This is much higher than for the model with y t-1, so we should use the AR(1) model with a linear time trend.C18.11 (i) For lsp500, the ADF statistic without a trend is t = −.79; with a trend, the t statistic is −2.20. This are both well above their respective 10% critical values. In addition, the estimated roots are quite close to one. For lip, the ADF statistic without a trend is −1.37 without a trend and −2.52 with a trend. Again, these are not close to rejecting even at the 10% levels, and the estimated roots are very close to one.(ii) The simple regression of lsp500 on lip giveslsp = −2.402 + 1.694 lip500(.024)(.095)n = 558, R2 = .903The t statistic for lip is over 70, and the R-squared is over .90. These are hallmarks of spurious regressions.u obtained in part (ii), the ADF statistic (with two lagged changes) (iii) Using the residuals ˆtis −1.57, and the estimated root is over .99. There is no evidence of cointegration. (The 10% critical value is −3.04.)115(iv) After adding a linear time trend to the regression from part (ii), the ADF statistic applied to the residuals is −1.88, and the estimated root is again about .99. Even with a time trend there is no evidence of cointegration.(v) It appears that lsp500 and lip do not move together in the sense of cointegration, even if we allow them to have unrestricted linear time trends. The analysis does not point to a long-run equilibrium relationship.C18.13 (i) The DF statistic is about −3.31, which is to the left of the 2.5% critical value (−3.12), and so, using this test, we can reject a unit root at the 2.5% level. (The estimated root is about.81.)(ii) When two lagged changes are added to the regression in part (i), the t statistic becomes −1.50, and the root is larger (about .915). Now, there is little evidence against a unit root.(iii) If we add a time trend to the regression in part (ii), the ADF statistic becomes −3.67, and the estimated root is about .57. The 2.5% critical value is −3.66, and so we are back to fairly convincingly rejecting a unit root.(iv) The best characterization seems to be an I(0) process about a linear trend. In fact, a stable AR(3) about a linear trend is suggested by the regression in part (iii).prcfat t, the ADF statistic without a trend is −4.74 (estimated root = .62) and with a(v)Fortime trend the statistic is −5.29 (estimated root = .54). Here, the evidence is strongly in favor of an I(0) process whether or not we include a trend.116。
Chapter 1 Introductory Economic Principles

表1.1 寻找社会问题的解决方案
问题
中 山 大 学 政 务 学 院
1、我国的钢铁生产者受到进
解决方案
对进口钢铁开征关税
可能的恶果
a.钢铁价格上升,使用钢铁的产业成本 增加,只好提高价格。 b.由于卖给我们的钢铁减少,外国人减 少购买我国的出口品。 a.屋主吝啬于维修保养。 b.较长期后,减少建造用来出租的房屋 a.雇主不太愿意雇佣女性。 b.厘定工资时增加了官僚和司法程序的 成本。
20
Public Economics
中 山 大 学 政 务 学 院
• 如果不能获得实践的结果,经济学(或 空气动力学,或其它的什么学)的用途 又是什么呢?所以,期望通过更好地理 解经济学与市场而获得更多的财富,这 样的想法是合理的。
Public Economics
21
行业 工程学 数学 药学 中 山 大 学 政 务 学 院 物理学 经济学 会计学 护理 商科,其它 政治科学与政府 心理学 生物学/生命科学 Public Economics 社会学 历史学 英语语言与文学 教育学 视觉与表现艺术 社工 哲学、宗教与神学
11
Public Economics
Economics as a Social Science
中 山 大 学 政 务 学 院
• 经济学是一门科学。与其它科学一样,它由解释 (理论)和经验事实构成。其中,理论帮助我们 理解真实世界,并对真实世界作出正确的推测, 而经验事实则可以验证理论,也可能推翻它。 • 经济学研究诸如以下的问题:资本利得税的减少 会使股市上涨吗?提高关税会增加消费者的福利 吗?加长刑期会减少犯罪吗?离婚变得容易会提 高女性地位吗?高价(意味着销售量较低)战略 会比低价战略给企业带来更多利润吗?
introductory econometrics中文版目录

第1篇横截面数据的回归分析
第1章计量经济学的性质与经济数据
1.1 什么是计量经济学?
1.2 经验经济分析的步骤
1.3 经济数据的结构
1.4 计量经济分析中的因果关系和其他条件不变的概念
小结
关键术语
习题
计算机习题
第2章简单回归模型
第3章多元回归分析:估计
第4章多元回归分析:推断
第5章多元回归分析:OLS的渐近性
第6章多元回归分析:深入专题
第7章含有定性信息的多元回归分析:二值(或虚拟)变量第8章异方差性
第9章模型设定和数据问题的深入探讨
第2篇时间序列数据的回归分析
第10章时间序列数据的基本回归分析
第11章OLS用于时间序列数据的其他问题
第12章时间序列回归中的序列相关和异方差
第3篇高深专题讨论
第13章跨时横截面的混合:简单面板数据方法
第14章高深的面板数据方法
第15章工具变量估计与两阶段最小二乘法
第16章联立方程模型
第17章限值因变量模型和样本选择纠正
第18章时间序列高深专题
第19章一个经验项目的实施
附录A 基本数学工具
附录B 概率论基础
附录C 数理统计基础
附录D 矩阵代数概述
附录E 矩阵形式的线性回归模型附录F 各章问题解答
附录G 统计用表。
introductory econometrics for finance Chapter4_solutions

Solutions to the Review Questions at the End of Chapter 41. In the same way as we make assumptions about the true value of beta and not the estimated values, we make assumptions about the true unobservable disturbance terms rather than their estimated counterparts, the residuals.We know the exact value of the residuals, since they are defined by t t t y y uˆˆ-=. So we do not need to make any assumptions about the residuals since we already know their value. We make assumptions about the unobservable error terms since it is always the true value of the population disturbances that we are really interested in, although we never actually know what these are.2. We would like to see no pattern in the residual plot! If there is a pattern in the resi dual plot, this is an indication that there is still some “action” or variability left in y t that has not been explained by our model. This indicates that potentially it may be possible to form a better model, perhaps using additional or completely different explanatory variables, or by using lags of either the dependent or of one or more of the explanatory variables. Recall that the two plots shown on pages 157 and 159, where the residuals followed a cyclical pattern, and when they followed an alternating pattern are used as indications that the residuals are positively and negatively autocorrelated respectively.Another problem if there is a “pattern” in the residuals is that, if it does indicate the presence of autocorrelation, then this may suggest that our standard error estimates for the coefficients could be wrong and hence any inferences we make about the coefficients could be misleading.3. The t -ratios for the coefficients in this model are given in the third row after the standard errors. They are calculated by dividing the individual coefficients by their standard errors.t yˆ = 0.638 + 0.402 x 2t - 0.891 x 3t 89.0,96.022==R R (0.436) (0.291) (0.763)t -ratios 1.46 1.38 -1.17The problem appears to be that the regression parameters are all individually insignificant (i.e. not significantly different from zero), although the value of R 2 and its adjusted version are both very high, so that the regression taken as a whole seems to indicate a good fit. This looks like a classic example of what we term near multicollinearity. This is where the individual regressors are very closely related, so that it becomes difficult to disentangle the effect of each individual variable upon the dependent variable.The solution to near multicollinearity that is usually suggested is that since the problem is really one of insufficient information in the sample to determine each of the coefficients, then one should go out and get more data. In other words, we should switch to a higher frequency of data for analysis (e.g. weeklyinstead of monthly, monthly instead of quarterly etc.). An alternative is also to get more data by using a longer sample period (i.e. one going further back in time), or to combine the two independent variables in a ratio (e.g. x 2t / x 3t ).Other, more ad hoc methods for dealing with the possible existence of near multicollinearity were discussed in Chapter 4:- Ignore it: if the model is otherwise adequate, i.e. statistically and in terms of each coefficient being of a plausible magnitude and having an appropriate sign. Sometimes, the existence of multicollinearity does not reduce the t -ratios on variables that would have been significant without the multicollinearity sufficiently to make them insignificant. It is worth stating that the presence of near multicollinearity does not affect the BLUE properties of the OLS estimator – i.e. it will still be consistent, unbiased and efficient since the presence of near multicollinearity does not violate any of the CLRM assumptions 1-4. However, in the presence of near multicollinearity, it will be hard to obtain small standard errors. This will not matter if the aim of the model-building exercise is to produce forecasts from the estimated model, since the forecasts will be unaffected by the presence of near multicollinearity so long as this relationship between the explanatory variables continues to hold over the forecasted sample.- Drop one of the collinear variables - so that the problem disappears. However, this may be unacceptable to the researcher if there were strong a priori theoretical reasons for including both variables in the model. Also, if the removed variable was relevant in the data generating process for y , an omitted variable bias would result.- Transform the highly correlated variables into a ratio and include only the ratio and not the individual variables in the regression. Again, this may be unacceptable if financial theory suggests that changes in the dependent variable should occur following changes in the individual explanatory variables, and not a ratio of them.4. (a) The assumption of homoscedasticity is that the variance of the errors isconstant and finite over time. Technically, we write 2)(u t u Var σ=.(b ) The coefficient estimates would still be the “correct” ones (assuming that the other assumptions required to demonstrate OLS optimality are satisfied), but the problem would be that the standard errors could be wrong. Hence if we were trying to test hypotheses about the true parameter values, we could end up drawing the wrong conclusions. In fact, for all of the variables except the constant, the standard errors would typically be too small, so that we would end up rejecting the null hypothesis too many times.(c) There are a number of ways to proceed in practice, including- Using heteroscedasticity robust standard errors which correct for the problem by enlarging the standard errors relative to what they would havebeen for the situation where the error variance is positively related to one of the explanatory variables.- Transforming the data into logs, which has the effect of reducing the effect of large errors relative to small ones.5. (a) This is where there is a relationship between the i th and j th residuals. Recall that one of the assumptions of the CLRM was that such a relationship did not exist. We want our residuals to be random, and if there is evidence of autocorrelation in the residuals, then it implies that we could predict the sign of the next residual and get the right answer more than half the time on average!(b) The Durbin Watson test is a test for first order autocorrelation. The test is calculated as follows. You would run whatever regression you were interested in, and obtain the residuals. Then calculate the statistic()∑∑==--=T t t T t t t uuu DW 22221ˆˆˆYou would then need to look up the two critical values from the Durbin Watson tables, and these would depend on how many variables and how many observations and how many regressors (excluding the constant this time) you had in the model.The rejection / non-rejection rule would be given by selecting the appropriate region from the following diagram:(c) We have 60 observations, and the number of regressors excluding the constant term is 3. The appropriate lower and upper limits are 1.48 and 1.69 respectively, so the Durbin Watson is lower than the lower limit. It is thus clear that we reject the null hypothesis of no autocorrelation. So it looks like the residuals are positively autocorrelated.(d) t t t t t u x x x y +∆+∆+∆+=∆4433221ββββThe problem with a model entirely in first differences, is that once we calculate the long run solution, all the first difference terms drop out (as in the long run we assume that the values of all variables have converged on their own long run values so that y t = y t -1 etc.) Thus when we try to calculate the long run solution to this model, we cannot do it because there isn’t a long run solution to this model!(e) t t t t t t t t v X X x x x x y ++++∆+∆+∆+=∆---1471361254433221βββββββThe answer is yes, there is no reason why we cannot use Durbin Watson in this case. You may have said no here because there are lagged values of the regressors (the x variables) variables in the regression. In fact this would be wrong since there are no lags of the DEPENDENT (y ) variable and hence DW can still be used.6. t rt t t t t t t u x x x y x x y +++++∆+∆+=∆----471361251433221βββββββThe major steps involved in calculating the long run solution are to- set the disturbance term equal to its expected value of zero- drop the time subscripts- remove all difference terms altogether since these will all be zero by the definition of the long run in this context.Following these steps, we obtain373625410x x x y βββββ++++=We now want to rearrange this to have all the terms in x 2 together and so that y is the subject of the formula:344624541376251437362514)()(x x y x x y x x x y βββββββββββββββββ+---=+---=----=The last equation above is the long run solution.7. Ramsey’s RESET test is a test of whether the functional form of the regression is appropriate. In other words, we test whether the relationship between the dependent variable and the independent variables really should be linear or whether a non-linear form would be more appropriate. The test works by adding powers of the fitted values from the regression into a secondregression. If the appropriate model was a linear one, then the powers of the fitted values would not be significant in this second regression.If we fail Ramsey’s RESET test, then the easiest “solution” is probably to transform all of the variables into logarithms. This has the effect of turning a multiplicative model into an additive one.If this still fails, then we really have to admit that the relationship between the dependent variable and the independent variables was probably not linear after all so that we have to either estimate a non-linear model for the data (which is beyond the scope of this course) or we have to go back to the drawing board and run a different regression containing different variables.8. (a) It is important to note that we did not need to assume normality in order to derive the sample estimates of αand βor in calculating their standard errors. We needed the normality assumption at the later stage when we come to test hypotheses about the regression coefficients, either singly or jointly, so that the test statistics we calculate would indeed have the distribution (t or F) that we said they would.(b) One solution would be to use a technique for estimation and inference which did not require normality. But these techniques are often highly complex and also their properties are not so well understood, so we do not know with such certainty how well the methods will perform in different circumstances.One pragmatic approach to failing the normality test is to plot the estimated residuals of the model, and look for one or more very extreme outliers. These would be residuals that are much “bigger” (either very big and positive, or very big and negative) than the rest. It is, fortunately for us, often the case that one or two very extreme outliers will cause a violation of the normality assumption. The reason that one or two extreme outliers can cause a violation of the normality assumption is that they would lead the (absolute value of the) skewness and / or kurtosis estimates to be very large.Once we spot a few extreme residuals, we should look at the dates when these outliers occurred. If we have a good theoretical reason for doing so, we can add in separate dummy variables for big outliers caused by, for example, wars, changes of government, stock market crashes, changes in market microstructure (e.g. the “big bang” of 1986). The effect of the dummy variable is exactly the same as if we had removed the observation from the sample altogether and estimated the regression on the remainder. If we only remove observations in this way, then we make sure that we do not lose any useful pieces of information represented by sample points.9. (a) Parameter structural stability refers to whether the coefficient estimates for a regression equation are stable over time. If the regression is not structurally stable, it implies that the coefficient estimates would be different for some sub-samples of the data compared to others. This is clearly not what we want to find since when we estimate a regression, we are implicitly assuming that the regression parameters are constant over the entire sample period under consideration.(b) 1981M1-1995M12r t = 0.0215 + 1.491 r mt RSS =0.189 T =1801981M1-1987M10r t = 0.0163 + 1.308 r mt RSS =0.079 T =821987M11-1995M12r t = 0.0360 + 1.613 r mt RSS =0.082 T =98(c) If we define the coefficient estimates for the first and second halves of the sample as α1 and β1, and α2 and β2 respectively, then the null and alternative hypotheses areH 0 : α1 = α2 and β1 = β2and H 1 : α1 ≠ α2 or β1 ≠ β2(d) The test statistic is calculated asTest stat. =304.1524180*082.0079.0)082.0079.0(189.0)2(*)(2121=-++-=-++-k k T RSS RSS RSS RSS RSSThis follows an F distribution with (k ,T -2k ) degrees of freedom. F (2,176) = 3.05 at the 5% level. Clearly we reject the null hypothesis that the coefficients are equal in the two sub-periods.10. The data we have are1981M1-1995M12r t = 0.0215 + 1.491 R mt RSS =0.189 T =1801981M1-1994M12r t = 0.0212 + 1.478 R mt RSS =0.148 T =1681982M1-1995M12r t = 0.0217 + 1.523 R mt RSS =0.182 T =168First, the forward predictive failure test - i.e. we are trying to see if the model for 1981M1-1994M12 can predict 1995M1-1995M12.The test statistic is given by832.3122168*148.0148.0189.0*2111=--=--T k T RSS RSS RSSWhere T 1 is the number of observations in the first period (i.e. the period that we actually estimate the model over), and T 2 is the number of observations we are trying to “predict”. The test statistic follows an F -distribution with (T 2, T 1-k ) degrees of freedom. F (12, 166) = 1.81 at the 5% level. So we reject the null hypothesis that the model can predict the observations for 1995. We would conclude that our model is no use for predicting this period, and from a practical point of view, we would have to consider whether this failure is a result of a-typical behaviour of the series out-of-sample (i.e. during 1995), or whether it results from a genuine deficiency in the model.The backward predictive failure test is a little more difficult to understand, although no more difficult to implement. The test statistic is given by532.0122168*182.0182.0189.0*2111=--=--T k T RSS RSS RSSNow we need to be a little careful in our interpretation of what exactly are the “first” and “second” sample periods. It would be possible to define T 1 as always being the first sample period. But I think it easier to say that T 1 is always the sample over which we estimate the model (even though it now comes after the hold-out-sample). Thus T 2 is still the sample that we are trying to predict, even though it comes first. You can use either notation, but you need to be clear and consistent. If you wanted to choose the other way to the one I suggest, then you would need to change the subscript 1 everywhere in the formula above so that it was 2, and change every 2 so that it was a 1.Either way, we conclude that there is little evidence against the null hypothesis. Thus our model is able to adequately back-cast the first 12 observations of the sample.11. By definition, variables having associated parameters that are not significantly different from zero are not, from a statistical perspective, helping to explain variations in the dependent variable about its mean value. One could therefore argue that empirically, they serve no purpose in the fitted regression model. But leaving such variables in the model will use up valuable degrees of freedom, implying that the standard errors on all of the other parameters in the regression model, will be unnecessarily higher as a result. If the number of degrees of freedom is relatively small, then saving a couple by deleting two variables with insignificant parameters could be useful. On the other hand, if the number of degrees of freedom is already very large, the impact of these additional irrelevant variables on the others is likely to be inconsequential.12. An outlier dummy variable will take the value one for one observation in the sample and zero for all others. The Chow test involves splitting the sample into two parts. If we then try to run the regression on both the sub-parts butthe model contains such an outlier dummy, then the observations on that dummy will be zero everywhere for one of the regressions. For that sub-sample, the outlier dummy would show perfect multicollinearity with the intercept and therefore the model could not be estimated.。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Solutions to the Review Questions at the End of Chapter 31. A list of the assumptions of the classical linear regression model’s disturbance terms is given in Box 3.3 on p56 of the book.We need to make the first four assumptions in order to prove that the ordinary least squares estimators of αand βare “best”, that is to prove that they have minimum variance among the class of linear unbiased estimators. The theorem that proves that OLS estimators are BLUE (provided the assumptions are fulfilled) is known as the Gauss-Markov theorem. If these assumptions are violated (which is dealt with in Chapter 4), then it may be that OLS estimators are no longer unbiased or “efficient”. That is, they may be inaccurate or subject to fluctuations between samples.We needed to make the fifth assumption, that the disturbances are normally distributed, in order to make statistical inferences about the population parameters from the sample data, i.e. to test hypotheses about the coefficients. Making this assumption implies that test statistics will follow a t-distribution (provided that the other assumptions also hold).2. If the models are linear in the parameters, we can use OLS.(1) Yes, can use OLS since the model is the usual linear model we have been dealing with.(2) Yes. The model can be linearised by taking logarithms of both sides and by rearranging. Although this is a very specific case, it has sound theoretical foundations (e.g. the Cobb-Douglas production function in economics), and it is the case that many relationships can be “approximately” linearised by taking logs of the variables. The effect of taking logs is to reduce the effect of extreme values on the regression function, and it may be possible to turn multiplicative models into additive ones which we can easily estimate.(3) Yes. We can estimate this model using OLS, but we would not be able to obtain the values of both β and γ, but we would obtain the value of these two coefficients multiplied together.(4) Yes, we can use OLS, since this model is linear in the logarithms. For those who have done some economics, models of this kind which are linear in the logarithms have the interesting property that the coefficients (αand β) can be interpreted as elasticities.5. Yes, in fact we can still use OLS since it is linear in the parameters. If we make a substitution, say Q t = X t Z t, then we can run the regression:y t = α +βQ t + u t as usual.So, in fact, we can estimate a fairly wide range of model types using these simple tools.3. The null hypothesis is that the true (but unknown) value of beta is equal to one, against a one sided alternative that it is greater than one:H0 : β = 1H1 : β > 1The test statistic is given by 682.20548.01147.1)ˆ(*ˆstat test =-=-=βββSE We want to compare this with a value from the t -table with T -2 degrees of freedom, where T is the sample size, and here T -2 =60. We want a value with 5% all in one tail since we are doing a 1-sided test. The critical t-value from the t -table is 1.671:The value of the test statistic is in the rejection region and hence we can reject the null hypothesis. We have statistically significant evidence that this security has a beta greater than one, i.e. it is significantly more risky than the market as a whole.4. We want to use a two-sided test to test the null hypothesis that shares in Chris Mining are completely unrelated to movements in the market as a whole. In other words, the value of beta in the regression model would be zero so that whatever happens to the value of the market proxy, Chris Mining would be completely unaffected by it.The null and alternative hypotheses are therefore:H 0 : β = 0H 1 : β ≠ 0The test statistic has the same format as before, and is given by: 150.1186.00214.0)(*ˆstat test =-=-=βββSE We want to find a value from the t -tables for a variable with 38-2=36 degrees of freedom, and we want to look up the value that puts 2.5% of the distribution in each tail since we are doing a two-sided test and we want to have a 5% size of test over all.Confidence intervals are almost invariably 2-sided, unless we are told otherwise (which we are not here), so we want to look up the values which put 2.5% in the upper tail and 0.5% in the upper tail for the 95% and 99% confidence intervals respectively. The 0.5% critical values are given as follows for a t-distribution with T-2=38-2=36 degrees of freedom:The confidence interval in each case is thus given by(0.214±0.186*2.03) for a 95% confidence interval, which solves to (-0.164,0.592) and(0.214±0.186*2.72) for a 99% confidence interval, which solves to (-0.292,0.720) There are a couple of points worth noting.First, one intuitive interpretation of an X% confidence interval is that we are X% sure that the true value of the population parameter lies within the interval. So we are 95% sure that the true value of beta lies within the interval (-0.164,0.592) and we are 99% sure that the true population value of beta lies within (-0.292,0.720). Thus in order to be more sure that we have the true vale of beta contained in the interval, i.e. as we move from 95% to 99% confidence, the interval must become wider.The second point to note is that we can test an infinite number of hypotheses about beta once we have formed the interval. For example, we would not reject the null hypothesis contained in the last question (i.e. that beta = 0), since that value of beta lies within the 95% and 99% confidence intervals. Would we reject or not reject a null hypothesis that the true value of beta was 0.6? At the 5% level, we should have enough evidence against the null hypothesis to reject it, since 0.6 is not contained within the 95% confidence interval. But at the 1% level, we would no longer have sufficient evidence to reject the null hypothesis, since 0.6 is now contained within the interval. Therefore we should always if possible conduct some sort of sensitivity analysis to see if our conclusions are altered by (sensible) changes in the level of significance used.6. It can be proved that a t-distribution is just a special case of the more general F-distribution. The square of a t-distribution with T-k degrees of freedom will be identical to an F-distribution with (1,T-k) degrees of freedom. But remember that if we use a 5% size of test, we will look up a 5% value for the F-distribution because the test is 2-sided even though we only look in one tail of the distribution. We look up a 2.5% value for the t-distribution since the test is 2-tailed.Examples at the 5% level from tablesT-k F critical value t critical value20 4.35 2.0940 4.08 2.0260 4.00 2.00120 3.92 1.987. We test hypotheses about the actual coefficients, not the estimated values. We want to make inferences about the likely values of the population parameters (i.e. to test hypotheses about them). We do not need to test hypotheses about the estimated values since we know exactly what our estimates are because we calculated them!8. (i) H0 : β3 = 2We could use an F- or a t- test for this one since it is a single hypothesis involving only one coefficient. We would probably in practice use a t-test since it iscomputationally simpler and we only have to estimate one regression. There is one restriction.(ii) H0 : β3 + β4 = 1Since this involves more than one coefficient, we should use an F-test. There is one restriction.(iii) H0 : β3 + β4 = 1 and β5 = 1Since we are testing more than one hypothesis simultaneously, we would use an F-test. There are 2 restrictions.(iv) H0 : β2 =0 and β3 = 0 and β4 = 0 and β5 = 0As for (iii), we are testing multiple hypotheses so we cannot use a t-test. We have 4 restrictions.(v) H0 : β2β3 = 1Although there is only one restriction, it is a multiplicative restriction. We therefore cannot use a t-test or an F-test to test it. In fact we cannot test it at all using the methodology that has been examined in this chapter.9. THE regression F-statistic would be given by the test statistic associated with hypothesis iv) above. We are always interested in testing this hypothesis since it tests whether all of the coefficients in the regression (except the constant) are jointly insignificant. If they are then we have a completely useless regression, where none of the variables that we have said influence y actually do. So we would need to go back to the drawing board!The alternative hypothesis is:H1 : β2≠ 0 or β3≠ 0 or β4≠ 0 or β5≠ 0Note the form of the alterna tive hypothesis: “or” indicates that only one of the components of the null hypothesis would have to be rejected for us to reject the null hypothesis as a whole.10. The restricted residual sum of squares will always be at least as big as the unrestricted residual sum of squares i.e.RRSS ≥ URSSTo see this, think about what we were doing when we determined what the regression parameters should be: we chose the values that minimised the residual sum of squares. We said that OLS would provide the “best” pa rameter values given the actual sample data. Now when we impose some restrictions on the model, so that they cannot all be freely determined, then the model should not fit as well as it did before. Hence the residual sum of squares must be higher once we have imposed the restrictions, otherwise the parameter values that OLS chose originally without the restrictions could not be the best.In the extreme case (very unlikely in practice), the two sets of residual sum of squares could be identical if the restrictions were already present in the data, so that imposing them on the model would yield no penalty in terms of loss of fit.11. The null hypothesis is: H0 : β3 + β4 = 1 and β5 = 1The first step is to impose this on the regression model:y t = β1 + β2x2t + β3x3t + β4x4t + β5x5t + u t subject to β3 + β4 = 1 and β5 = 1.We can rewrite the first part of the restriction as β4 = 1 - β3Then rewrite the regression with the restriction imposedy t = β1 + β2x2t + β3x3t+ (1-β3)x4t + x5t + u twhich can be re-writteny t = β1 + β2x2t + β3x3t + x4t - β3x4t + x5t + u tand rearranging(y t – x4t– x5t ) = β1+ β2x2t + β3x3t - β3x4t + u t(y t– x4t– x5t) = β1 + β2x2t + β3(x3t–x4t)+ u tNow create two new variables, call them P t and Q t:P t= (y t - x3t - x4t)Q t = (x2t -x3t)We can then run the linear regression:P t= β1 + β2x2t + β3Q t+ u t ,which constitutes the restricted regression model.The test statistic is calculated as ((RRSS-URSS)/URSS)*(T-k)/mIn this case, m=2, T=96, k=5 so the test statistic = 5.704. Compare this to an F-distribution with (2,91) degrees of freedom, which is approximately 3.10. Hence we reject the null hypothesis that the restrictions are valid. We cannot impose these restrictions on the data without a substantial increase in the residual sum of squares. 12. r i = 0.080 + 0.801S i + 0.321MB i + 0.164PE i - 0.084BETA i(0.064) (0.147) (0.136) (0.420) (0.120)1.25 5.452.36 0.390 -0.700The t-ratios are given in the final row above, and are in italics. They are calculated by dividing the coefficient estimate by its standard error. The relevant value from the t-tables is for a 2-sided test with 5% rejection overall. T-k = 195; t crit = 1.97. The null hypothesis is rejected at the 5% level if the absolute value of the test statistic is greater than the critical value. We would conclude based on this evidence that only firm size and market to book value have a significant effect on stock returns.If a stock’s beta increases from 1 t o 1.2, then we would expect the return on the stock to FALL by (1.2-1)*0.084 = 0.0168 = 1.68%This is not the sign we would have expected on beta, since beta would be expected to be positively related to return, since investors would require higher returns as compensation for bearing higher market risk.We would thus consider deleting the price/earnings and beta variables from the regression since these are not significant in the regression - i.e. they are not helping much to explain variations in y. We would not delete the constant term from the regression even though it is insignificant since there are good statistical reasons for its inclusion.。