The Regression-Discontinuity Design

合集下载

关于（模糊）断点回归设计的100篇精选Articles专辑！！！

关于（模糊）断点回归设计的100篇精选Articles专辑邮箱：***********************所有计量经济圈方法论丛的code程序, 宏微观数据库和各种软件都放在社群里.欢迎到计量经济圈社群交流访问.上图为“coronavirus”在世界范围内谷歌搜索趋势.前些日，咱们圈子引荐了①“实证研究中用到的200篇文章, 社科学者常备toolkit”、②实证文章写作常用到的50篇名家经验帖, 学者必读系列、③过去10年AER上关于中国主题的Articles专辑、④AEA 公布2017-19年度最受关注的十大研究话题, 给你的选题方向，受到各位学者欢迎和热议，很多博士生导师纷纷推荐给指导的学生参阅。

继上次，腾讯公司相关部门与因果推断研究小组开展了还算友好的交流后（“BATJ巨头急需大批经济学博士, 望奔走相告”），最近，阿里巴巴相关部门人员也希望在因果推断研究小组交流访问（因果推断研究小组惊动了阿里巴巴！）。

经济学博士在BATJ公司有啥用呢? 难不成比IT程序员还有能耐，正如上文所讲，因果推断在将来很长一段时间里都是科技公司和社科学者使用的主流方法。

我们会一如既往地在小组和社群探讨主流的因果推断方法，同时也欢迎大型科技公司与咱们学者保持更紧密的互动。

之前，咱们小组引荐了1.断点回归设计RDD分类与操作案例，2.RDD断点回归, Stata程序百科全书式的宝典，3.断点回归设计的前沿研究现状, RDD，4.断点回归设计什么鬼？且听哈佛客解析，5.断点回归和读者的提问解答，6.断点回归设计RDD全面讲解, 教育领域用者众多，7.没有工具变量、断点和随机冲击，也可以推断归因，8.找不到IV, RD和DID该怎么办? 这有一种备选方法，9.2卷RDD断点回归使用手册, 含Stata和R软件操作流程，10.DID, 合成控制, 匹配, RDD 四种方法比较, 适用范围和特征，11.安神+克拉克奖得主的RDD论文, 断点回归设计，12.伊斯兰政府到底对妇女友不友好？RDD经典文献，13.PSM,RDD,Heckman,Panel模型的操作程序，14.RDD经典文献, RDD模型有效性稳健性检验，15.2019年发表在JDE上的有趣文章, 计量方法最新趋势与合成控制法（关于合成控制法SCM的33篇精选Articles专辑！）和双重差分法一样（关于双重差分法DID的32篇精选Articles 专辑！），断点回归设计RDD也是当下非常流行的因果推断方法，在英文和中文顶刊中频繁出现。

准实验研究的英语

准实验研究的英语IntroductionExperimental research is a type of research design that involves manipulating one or more variables to observe the effect on another variable. However, in some situations, experimental research may not be feasible or ethical. In such cases, researchers may opt for quasi-experimental research, which is a type of research design that lacks the random assignment of participants to groups. This article explores quasi-experimental research, its types, advantages, and disadvantages.Types of Quasi-Experimental Research1. Pre-Experimental DesignsPre-experimental designs are the simplest type of quasi-experimental designs, and they involve measuring the dependent variable before and after an intervention. There are three types of pre-experimental designs: one-shot design, one-group pretest-posttest design, and static group comparison design.a. One-shot design: In this design, the researcher measures the dependent variable after the intervention. However, there is no control group, and hence, it isdifficult to determine whether the observed change is dueto the intervention or other factors.b. One-group pretest-posttest design: In this design,the researcher measures the dependent variable before and after the intervention. However, there is no control group, and hence, it is difficult to determine whether theobserved change is due to the intervention or other factors.c. Static group comparison design: In this design, the researcher compares the dependent variable in a group that received the intervention and a group that did not receive the intervention. However, the groups are not randomly assigned, and hence, there may be differences between the groups that may affect the results.2. Quasi-Experimental DesignsQuasi-experimental designs involve the manipulation ofan independent variable, but the participants are not randomly assigned to groups. There are four types of quasi-experimental designs: nonequivalent control group design,time-series design, interrupted time-series design, and regression-discontinuity design.a. Nonequivalent control group design: In this design, the researcher compares the dependent variable in a group that received the intervention and a group that did not receive the intervention. However, the groups are not randomly assigned, and hence, there may be differences between the groups that may affect the results.b. Time-series design: In this design, the researcher measures the dependent variable at multiple time points before and after the intervention. However, there is no control group, and hence, it is difficult to determine whether the observed change is due to the intervention or other factors.c. Interrupted time-series design: In this design, the researcher measures the dependent variable at multiple time points before and after the intervention. However, there is a control group, which allows the researcher to determine whether the observed change is due to the intervention or other factors.d. Regression-discontinuity design: In this design, the researcher selects participants based on a cutoff score on a continuous variable. Participants who score above the cutoff score receive the intervention, while those who score below the cutoff score do not receive the intervention. This design allows the researcher to determine whether the observed change is due to the intervention or other factors.Advantages of Quasi-Experimental Research1. Ethical ConsiderationsIn some situations, experimental research may not be ethical. For example, it may not be ethical to manipulate an independent variable that may harm participants. Quasi-experimental research provides an alternative to experimental research, which allows researchers to study the effect of an intervention without compromising the ethical considerations.2. Real-World SettingsQuasi-experimental research is often conducted in real-world settings, which enhances the ecological validity ofthe research findings. This means that the researchfindings are more likely to be applicable to real-world situations.3. Cost-EffectiveQuasi-experimental research is often less costly than experimental research. This is because it does not involve random assignment of participants to groups, which can be time-consuming and costly.Disadvantages of Quasi-Experimental Research1. Lack of ControlQuasi-experimental research lacks the control associated with experimental research. This means that there may be other factors that may affect the results, which may make it difficult to determine whether the observed change is due to the intervention or other factors.2. Selection BiasQuasi-experimental research may suffer from selection bias. This is because participants are not randomly assigned to groups, which may result in differences between the groups that may affect the results.3. Internal ValidityQuasi-experimental research may suffer from internal validity issues. This is because there may be other factors that may affect the results, which may make it difficult to determine whether the observed change is due to the intervention or other factors.ConclusionQuasi-experimental research is a type of research design that lacks the random assignment of participants to groups. It is often used in situations where experimental research may not be feasible or ethical. Quasi-experimental research has advantages such as being ethical, conducted in real-world settings, and cost-effective. However, it also has disadvantages such as lack of control, selection bias, and internal validity issues. Researchers should carefully consider the advantages and disadvantages of quasi-experimental research before deciding on the research design to use.。

断点回归设计的步骤

近在做一个需要利用断点回归设计的研究。

为了保证实践的规范性，并且避免未来审稿中可能面对的质疑，花了几天时间梳理了一下断点回归设计的标准操作，整理出来，供来人参考。

本文参考了三篇文献，先摆在这里，建议大家去读原文：第一篇：Lee, and Lemieux, 2010，" Regression Discontinuity Designs in Economics "，Journal ofEconomic Literature, Vol. 48: 281–355.第二篇：Pinotti, Paolo. "Clicking on heaven's door: The effect of immigrant legalization oncrime." American Economic Review107.1 (2017): 138-68.第三篇：Thoemmes, Felix, Wang Liao, and Ze Jin. "The Analysis of the Regression-DiscontinuityDesign in R." Journal of Educational and Behavioral Statistics 42.3 (2017): 341-360.1.断点回归常规操作流程第1步检查配置变量（assignment variable，又叫running variable、forcing variable）是否被操纵。

这里的配置变量，其实就是RD中决定是否进入实验的分数（Score），是否被操纵的意思就是，是否存在某种跳跃性的变化。

在实际操作中有两种方式来检验，一是画出配置变量的分布图。

最直接的方法，是使用一定数量的箱体（bin），画出配置变量的历史直方图（histogrm）。

为了观察出分布的总体形状，箱体的宽度要尽量小。

“2+26”城市雾霾治理政策效果评估

“2+26”城市雾霾治理政策效果评估作者：张中祥曹欢来源：《中国人口·资源与环境》2022年第02期摘要文章将《京津冀及周边地区2017年大气污染防治工作方案》和其后续“攻坚行动方案”的发布作为准自然实验，使用双重差分模型（DID）评估大气污染治理的政策效果。

回归结果发现：①“方案”的发布对于“2+26”城市的空气具有显著的改善作用，并通过了稳健性检验，构成雾霾的主要污染物PM2.5、PM10和AQI 的改善程度最明显，SO2、CO 和NO2的改善幅度次之，但O3浓度在政策处理期内不降反升，说明近年来O3污染程度加剧，亟须引起关注。

②长期视角下SO2和NO2的治理效果较短期情况下相比有所提升，说明有些大气污染物仍然具有进一步改善的潜力，印证了大气污染治理是一项长久的“攻坚战”。

③引入空间DID 分析，通过空间杜宾和双重差分的嵌套模型，放松个体相互独立的假设，从空间维度探讨“方案”的政策效果，对比空间视角下的直接效应与间接效应得出，区域联防联控大气治理手段相比单一地区空气质量改善政策而言能够使得治理效果事半功倍。

④使用中介效应模型，探讨了“方案”通过减少工业产值占GDP 的比重和减少能源消费总量达到空气质量改善的两种作用机制。

最后，文章为接下来进一步有效治理大气污染提出了相关的政策建议。

关键词“2+26”城市;双重差分模型;空间DID;机制分析中图分类号 X51;F061.5 文献标志码 A 文章编号1002-2104（2022）02-0026-11 DOI：10.12062/cpre20211126大气污染是中国经济不断快速发展的一项负外部公共品，在中国，受空气污染问题最多困扰的当属京津冀及周边地区[1-4]。

国务院发布的“十三五”生态环境保护规划中明确强调要“深化区域大气污染联防联控、显著削减京津冀及周边地区颗粒物浓度”，因此京津冀及周边地区成为大气污染防治的重点覆盖区域。

2017年2月17日，原环境保护部发布了《京津冀及周边地区2017年大气污染防治工作方案》（以下简称《2017方案》），形成了以京津冀及周边地区为主导的大气污染防治协作组。

RegressionDiscontinuity回归中断

Regression DiscontinuityBasic Idea•Sometimes whether something happens to you or not depends on your …score‟ on a particular variable e.g–You get a scholarship if you get above a certain mark in an exam,–you get given remedial education if you get below acertain level,–a policy is implemented if it gets more than 50% ofthe vote in a ballot,–your sentence for a criminal offence is higher if youare above a certain age (an …adult‟)•All these are potential applications of the …regression discontinuity‟ designMore formally..•assignment to treatment depends in a discontinuous way on some observable variable W•simplest form has assignment to treatment being based on W being above some critical value w0-thediscontinuity•method of assignment to treatment is the very opposite to that in random assignment –it is a deterministicfunction of some observable variable.•But, assignment to treatment is as …good as random‟ in the neighbourhood of the discontinuity –this is hard to grasp but I hope to explain itBasics of RDD Estimator •Suppose average outcome in absence of treatment conditional on W is:•Suppose average outcome with treatment conditional on W is:•This is …full outcomes‟ approach.•Treatment effect conditional on W isg 1(W)-g(W):()(),0E y W X g W==()()1,1E y W X g W==How can we estimate this?•Basic idea is to compare outcomes just to the left and right of discontinuity i.e. to compare:•As δ→0 this comes to:•i.e. treatment effect at W=w 0()()0000E y w W w E y w W w δδ+≥>-≥≥-()()1000g w g w -Comments•the RDD estimator compares the outcome of people who are just on both sides of the discontinuity -difference in means between these two groups is an estimate of the treatment effect at the discontinuity•says nothing about the treatment effect away from the discontinuity -this is a limitation of the RDD effect.•An important assumption is that underlying effect on W on outcomes is continuous so only reason for discontinuity is treatment effectSome pictures –underlying relationship between y and W is linearE(y│W)w0WNow introduce treatment E(y│W)βw0WThe procedure in practice •If take process described above literally should choose a value of δthat is very small•This will result in a small number of observations•Estimate may be consistent but precision will be low•desire to increase the sample size leads one to choose a larger value of δDangers•If δis not very small then may not estimate just treatment effect –look at picture•As one increases δthe measure of the treatment effect will get larger. This is spurious so what should one do about it?•The basic idea is that one should control for the underlying outcome functions.If underlying relationship linear •If the linear relationship is the correct specification then one could estimate the ATE simply by estimating the regression:•But no good reason to assumerelationship is linear and this may cause problems01y X W γβγε=+++Suppose true relationship is:E(y│W)w 0Wg 0(W)g 1(W)Observed relationship between E(y)and WE(y│W)w0W g0(W) g1(W)•one would want to control for a different relationship between y and W for the treatment and control groups •Another problem is that the outcome functions might not be linear in W –it could be quadratic or something else.•The researcher then typically faces a trade-off:– a large value of δto get more precision from a larger sample size but run the risk of a misspecification of the underlying outcomefunction.–Choose a flexible underlying functional form at the cost of some precision (intuitively a flexible functional form can get closer toapproximating a discontinuity in the outcomes).In practice•it is usual for the researcher to summarize all the data in the graph of the outcome against W to get some idea of the appropriate functional forms and how wide a window should be chosen.•But its always a good idea to investigate the sensitivity of estimates to alternative specifications.An example•Lemieux and Milligan “Incentive Effects of Social Assistance: A regression discontinuity approach”, Journal of Econometrics, 2008•In Quebec before 1989 childless benefit recipients received higher benefits when they reached their 30th birthdayThe PictureThe EstimatesNote•Note that the more flexible is the underlying relationship between employment rate and age, the less precise is the estimate。

Regression Discontinuity Design - University of Chicago

3 / 40
Introduction
Validity
Validity
Simple idea: assignment mechanism is completely known We know that the probability of treatment jumps to 1 if test score > c Assumption is that individuals cannot manipulate with precision their assignment variable (think about the SAT) Key word: precision. Consequence: comparable individuals near cutoﬀ point If treated and untreated individuals are similar near the cutoﬀ point then data can be analyzed as if it were a (conditionally) randomized experiment If this is true, then background characteristics should be similar near c (can be checked empirically) The estimated treatment eﬀect applies to those near the cutoﬀ point (external validity)
4 / 40
Introduction
Validity
Aside: validity doesn’t depend on assignment rule being “arbitrary” Hinges on assignment mechanism being known and free of manipulation with precision Manipulation example 1: Test with few questions and plenty of time Manipulation example 2: DMV test to get a driving license Again: some manipulation is ﬁne (you can always study harder, for example). Precision is the key

准实验研究方法 - 下载,经济金融网,中国经济学教育科研网

前测
结果2
实验组对照组
前测
后测
图3 带前后测的无实验处理对照组设计的第二种结果
结果3
实验组对照组前测
后测
图4 带前后测的无实验处理对照组设计的第三种结果
结果4
对照组实验组
前测
后测
图5 带前后测的无实验处理对照组设计的第四种结果
结果5
实验组对照组
前测
后测
图6 带前后测的无实验处理对照组设计的第五种结果
含多个前测的无实验处理对照组设计
O1 O2 X O3
O1

O2
O3
优点可以较为容易地处理选择－成熟因素要是O2在每一个组内都是非典型的观察值，并且假如没有O1，那么由于统计回归的存在就会使得我们推断出错误的实验处理效应。可以使用实验组的O1－O2相关系数来估计实验组的O2－O3相关系数，该估计值将会准确得多。
一项实验设计可被视为是对比较（对照）、
随机化和控制三者的权衡。（Spector, 1981:21）
2 准实验法中的效度分析
内部效度、外部效度、构思效度与统计结
论效度
因子效度（factorial validity）、工作分析效度
（job-analytic validity）、合成效度（synthetic validity）、理性效度（rational validity）等。（Cooper & Schindler, 1998:180, 注15）
2.2 内部效度及其影响因素
历史记录（history）成熟（maturation）测试效应（testing）
测试工具（instrumentation）

Regression Discontinuity Design v2

4
2.How to use RDD?
5
Problem 1： We want to study whether attending the first batch of undergraduate can raise a student's salary. If you directly compare the salary of two groups of students, the students of group No.1 attend the first batch of undergraduate while the students of group No.2 do not, you may draw a wrong conclusion because salary level is not only affected by academic qualifications, but also by the ability, family background, appearance and other factors, and the factors above are unobservable variables that can not be controlled.
10
《Evidence-on-the-impact-of-sustained-exposure-toair-pollution-on-life-expectancy》
N
Huai River
S
11
12
The estimated change in life expectancy just north of the Huai River is 5.04 years.

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Waldinger (Warwick) 6 / 48
Key Identifying Assumption
Key identifying assumption: E [Y0i jXi ] and E [Y1i jXi ] are continuos in Xi at X0 . This means that all other unobserved determinants of Y are continuously related to the running variable X . This allows us to use average outcomes of units just below the cuto¤ as a valid counterfactual for units right above the cuto¤. This assumption cannot be directly tested. But there are some tests which give suggestive evidence whether the assumption is satis…ed (see below).
1 2
Sharp RD: treatment is a deterministic function of a covariate X. Fuzzy RD: exploits discontinuities in the probability of treatment conditional on a covariate X (the discontinuity is then used as an IV).
10 / 48
Waldinger (Warwick)
Di¤erent Polynomials on the 2 Sides of the Discontinuity
To derive a regression model that can be used to estimate the causal e¤ect we use the fact that Di is a deterministic function of Xi : E [Yi jXi ] = E [Y0i jXi ] + (E [Y1i jXi ] E [Y0i jXi ])Di
Lecture 4: Regression Discontinuity Design
Fabian Waldinger
Waldinger ()
1 / 48
Topics Covered in Lecture
1 2 3 4 5
Sharp RD. Fuzzy RD. Practical Tips for Running RD Models. Example Fuzzy RD: Angrist & Lavy (1999) - Maimonides Rule. Regression Kink Design (very brie‡y).
1 2
Use a nonparametric kernel method (see more below). Use a pth order polynomial: i.e. estimate: Yi = α + β1 xi + β2 xi2 + ... + βp xip + ρDi + η i (1)
Sharp Regression Discontinuity - Nonlinear Case
Suppose the nonlinear relationship is E [Y0i jxi ] = f (Xi ) for some reasonably smooth function f (Xi ). In that case we can construct RD estimates by …tting: Yi = f (xi ) + ρDi + η i There are 2 ways of approximating f (xi ):
Waldinger (Warwick)
7 / 48
Sharp Regression Discontinuity - Nonlinear Case
Sometimes the trend relation E [Y0i jxi ] is nonlinear.
Waldinger (Warwick)
8 / 48
During the following introduction we will focus on the pth order polynomial approach but will discuss the other approach below.
Waldinger (Warwick)
9 / 48
(2)
Equation (1) above is a special case of (2) with β1 = β2 = βp = 0. The treatment e¤ect at Xo is ρ. The treatment e¤ect at Xi X0 = c > 0 is: ρ + β1 c + β2 c 2 + ...+ βp c p
Waldinger (Warwick) 11 / 48
Example Sharp RD: Lee (2008) Incumbency E¤ects
Lee (2008) uses a sharp RD design to estimate the probability that the incumbent wins an election. A large political science literature suggests that incumbents may use privileges and resources of o¢ ce to gain an advantage over potential challengers. An OLS regression of incumbency status on election success is likely to be biased because of unobserved di¤erences. Incumbents have already won an election so they may just be better. Lee analyzes the incumbency e¤ect using Democratic incumbents for US congressional elections. He analyzes the probability of winning the election in year t+1 by comparing candidates who just won compared to candidates who just lost the election in year t.
Di¤erent Polynomials on the 2 Sides of the Discontinuity
We can generalize the function f (xi ) by allowing the xi terms to di¤er on both sides of the threshold by including them both individually and interacting them with Di . As Lee and Lemieux (2010) note, allowing di¤erent functions on both sides of the discontinuity should be the main results in an RD paper (as otherwise we use values from both sides of the cuto¤ the estimate the function on each side). In that case we have: ei + β X e2 ep E [Y0i jXi ] = α + β01 X 02 i + ... + β0 p Xi ei + β X e2 ep E [Y1i jXi ] = α + ρ + β11 X 12 i + ... + β1 p Xi
RD captures the causal e¤ect by distinguishing the nonlinear and discontinuous function, 1(Xi Xo ) from the smooth function f (Xi ).
Waldinger (Warwick)
The regression model which you estimate is then: Yi = α + β01 e xi + β02 e xi2 + ... + β0p e xip
where β1 = β11
xi + β2 Di e xi2 + ... + βp Di e xip + η i + ρ Di + β 1 Di e β01 , β2 = β21 β21 and βp = β1p β0 p
ei = Xi where X
X0 Centering at X0 ensures that the treatment e¤ect at Xi = X0 is the coe¢ cient on Di in a regression model with interaction terms (because you do not have to add values of the Di interacted with X to get the treatment e¤ect at X0 ).