安慰剂检验介绍操作及举例

合集下载

双重差分法(DID)安慰剂检验的做法:随机抽取500次?

双重差分法(DID)安慰剂检验的做法:随机抽取500次?

双重差分法(DID)安慰剂检验的做法:随机抽取500次?“安慰剂”(placebo)⼀词来⾃医学上的随机实验,⽐如要检验某种新药的疗效。

此时,可将参加实验的⼈群随机分为两组,其中⼀组为实验组,服⽤真药;⽽另⼀组为控制组,服⽤安慰剂(⽐如,⽆⽤的糖丸),并且不让参与者知道⾃⼰服⽤的究竟是真药还是安慰剂,以避免由于主观⼼理作⽤⽽影响实验效果。

双重差分法(DID)安慰剂检验的核⼼思想就是虚构处理组或者虚构政策时间进⾏估计,如果虚构情况下“伪政策虚拟变量”的系数依然显著,那么就说明原来的估计结果很有可能出现了偏误,我们的被解释变量y的变动很有可能是受到了其他政策或者随机性因素的影响。

说到虚构,那么⾃然是可以随机虚构,也可以不随机虚构(作者⾃⼰设定)。

当然,我更推荐的还是随机虚构处理组或者是政策时间的⽅法。

由于我们使⽤的数据基本都是“⼤N⼩T”型的短⾯板数据,所以随机虚构政策时间没什么意义,⽂献⼀般做法都是将政策年份统⼀提前2年或3年重新进⾏回归,看看政策虚拟变量系数是否依然显著。

我们更多地还是随机虚构处理组,具体做法就是随机选取个体作为处理组,重复500次或者1000次,看看“伪政策虚拟变量”的系数是否显著。

数据来源⽯⼤千等(2018)发表在《中国⼯业经济》的论⽂《智慧城市建设能否降低环境污染》使⽤DID⽅法评估了智慧城市建设对城市环境污染的影响,《中国⼯业经济》期刊官⽹公布了这篇论⽂使⽤的数据和代码。

接下来,我就使⽤这篇论⽂的数据,给⼤家分享⼀下双重差分法(DID)安慰剂检验中随机虚构处理组这种⽅法的Stata操作。

原⽂信息⽯⼤千,丁海,卫平,刘建江.智慧城市建设能否降低环境污染[J].中国⼯业经济,2018(06):117-135.随机虚构处理组的Stata操作双重差分法(DID)安慰剂检验的⼀般做法就是随机选取个体作为处理组,重复500次或者1000次,看看“伪政策虚拟变量”的系数是否显著。

在⽯⼤千等(2018)这篇论⽂中,处理组有32个城市,控制组有165个城市,所以我们需要从197个城市中随机选取32个城市作为“伪处理组”,假设这32个城市是智慧城市试点,其他城市为控制组,然后⽣成“伪政策虚拟变量”(交互项)进⾏回归。

安慰剂检验介绍、操作及举例

安慰剂检验介绍、操作及举例

安慰剂检验介绍(Placebo test)安慰剂是一种附加实证检验的思路,并不存在一个具体的特定的操作方法。

一般存在两种寻找安慰剂变量的方法。

比如,在已有的实证检验中,发现自变量Xi会影响自变量Zi与因变量Yi之间存在相关关系。

在其后的实证检验中,采用其他主体(国家,省份,公司)的Xj变量作为安慰剂变量,检验Xj是否影响Zi与Yi之间的相关关系。

如果不存在类似于Xi的影响,即可排除Xi 的安慰剂效应,使得结果更为稳健。

另一种寻找安慰剂变量的方法。

已知,Xi是虚拟变量,Xi=1,if t>T;Xi=0 if t<T;Xi对Zi对Yi的影响的影响在T时前后有显著差异(DID)。

在其后的实证检验中,将Xi`设定为Xi`=1,if t>T+n;Xi`=0 if t<T+n,其中n根据实际情况取值,可正可负。

检验Xi`是否影响Zi与Yi之间的相关关系。

如果不存在类似于Xi的影响,即可排除Xi的安慰剂效应,使得结果更为稳健。

举例:以美国市场某种政策冲击识别策略的因果关系考察,在最后部分选取英国同期的因变量,检验是否有类似的特征,就是安慰剂检验。

以中国2007年所得税改革作为减税的政策冲击以验证减税对企业创新的影响。

亦可以通过把虚拟的政策实施时间往前往后推几年,作为虚拟的政策时点,如果检验发现没有类似的因果,文章的主要结论就更加可信了。

以下是详细的例题,安慰剂检验在最后。

Surviving Graduate Econometrics with R:Difference-in-Differences Estimation — 2 of 8The following replication exercise closely follows the homework assignment #2 in ECNS 562. The data for this exercise can be found here.The data is about the expansion of the Earned Income Tax Credit. This is a legislation aimed at providing a tax break for low income individuals. For some background on the subject, seeEissa, Nada, and Jeffrey B. Liebman. 1996. Labor Supply Responses to the Earned Income Tax Credit. Quarterly Journal of Economics. 111(2): 605-637.The homework questions (abbreviated):1.Describe and summarize data.2.Calculate the sample means of all variables for (a) single women with nochildren, (b) single women with 1 child, and (c) single women with 2+ children.3.Create a new variable with earnings conditional on working (missing fornon-employed) and calculate the means of this by group as well.4.Construct a variable for the “treatment” called ANYKIDS and a variable for afterthe expansion (called POST93—should be 1 for 1994 and later).5.Create a graph which plots mean annual employment rates by year(1991-1996) for single women with children (treatment) and without children (control).6.Calculate the unconditional difference-in-difference estimates of the effect ofthe 1993 EITC expansion on employment of single women.7.Now run a regression to estimate the conditional difference-in-differenceestimate of the effect of the EITC. Use all women with children as the treatment group.8.Reestimate this model including demographic characteristics.9.Add the state unemployment rate and allow its effect to vary by the presence ofchildren.10.A llow the treatment effect to vary by those with 1 or 2+ children.11.Estimate a “placebo” treatment model. Take data from only the pre-reformperiod. Use the same treatment and control groups. Introduce a placebo policy that begins in 1992 (so 1992 and 1993 both have this fake policy).A review: Loading your dataRecall the code for importing your data:STATA:/*Last modified 1/11/2011 */**************************************************************************The following block of commands go at the start of nearly all do files*/*Bracket comments with /* */ or just use an asterisk at line beginningclear /*Clears memory*/set mem 50m /*Adjust this for your particular dataset*/cd "C:\DATA\Econ 562\homework" /*Change this for your file structure*/log using stata_assign2.log, replace /*Log file records all commands & results*/display "$S_DATE $S_TIME"set more offinsheet using eitc.dta, clear*************************************************************************R:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # Kevin Goulding # ECNS 562 - Assignment 2 ########################################################################## # Load the foreign package require(foreign) # Import data from web site # update: first download the file eitc.dta from this link: # https:///open?id=0B0iAUHM7ljQ1cUZvRWxjUmpfVXM # Then import from your hard drive: eitc = read.dta("C:/link/to/my/download/folder/eitc.dta")</pre> Note that any comments can be embedded into R code, simply by putting a <code> # </code> to the can download the data file, and import it from your hard drive: eitc = read.dta("C:\DATA\Courses\Econ 562\homework\eitc.dta")Describe and summarize your dataRecall from part 1 of this series, the following code to describe and summarizeyour data:STATA:dessumR:In R, each column of your data is assigned a class which will determine how your data is treated in various functions. To see what class R has interpreted for all your variables, run the following code:1 2 3 4 sapply(eitc,class) summary(eitc)source('sumstats.r') sumstats(eitc)To output the summary statistics table to LaTeX, use the following code:1 2 require(xtable) # xtable package helps create LaTeX code xtable(sumstats(eitc))Note: You will need to re-run the code for sumstats() which you can find in an earlier post.Calculate Conditional Sample MeansSTATA:summarize if children==0summarize if children == 1summarize if children >=1summarize if children >=1 & year == 1994mean work if post93 == 0 & anykids == 1R:1 2 3 4 5 6 7 8 91011121314 # The following code utilizes the sumstats function (you will need to re-run this code) sumstats(eitc[eitc$children == 0, ])sumstats(eitc[eitc$children == 1, ])sumstats(eitc[eitc$children >= 1, ])sumstats(eitc[eitc$children >= 1 & eitc$year == 1994, ])# Alternately, you can use the built-in summary functionsummary(eitc[eitc$children == 0, ])summary(eitc[eitc$children == 1, ])summary(eitc[eitc$children >= 1, ])summary(eitc[eitc$children >= 1 & eitc$year == 1994, ])# Another example: Summarize variable 'work' for women with one child from 1993 onwards. summary(subset(eitc, year >= 1993 & children == 1, select=work))The code above includes all summary statistics – but say you are only interested in the mean. You could then be more specific in your coding, like this:1 2 3 mean(eitc[eitc$children == 0, 'work']) mean(eitc[eitc$children == 1, 'work']) mean(eitc[eitc$children >= 1, 'work'])Try out any of the other headings within the summary output, they should also work: min() for minimum value, max() for maximum value, stdev() for standard deviation, and others.Create a New VariableTo create a new variable called “c.earn” equal to earnings conditional on working (if “work” = 1), “NA” otherwise (“work” = 0) – use the following code:STATA:gen cearn = earn if work == 1R:1 2 3 4 5 6 7 eitc$c.earn=eitc$earn*eitc$workz = names(eitc)X = as.data.frame(eitc$c.earn)X[] = lapply(X, function(x){replace(x, x == 0, NA)}) eitc = cbind(eitc,X)eitc$c.earn = NULLnames(eitc) = zConstruct a Treatment VariableConstruct a variable for the treatment called “anykids” = 1 for treated individual (has at least one child); and a variable for after the expansion called “post93” = 1 for 1994 and later.STATA:gen anykids = (children >= 1)gen post93 = (year >= 1994)R:1 2 eitc$post93 = as.numeric(eitc$year >= 1994) eitc$anykids = as.numeric(eitc$children > 0)Create a plotCreate a graph which plots mean annual employment rates by year (1991-1996) for single women with children (treatment) and without children (control).STATA:preservecollapse work, by(year anykids)gen work0 = work if anykids==0label var work0 "Single women, no children"gen work1 = work if anykids==1label var work1 "Single women, children"twoway (line work0 year, sort) (line work1 year, sort), ytitle(Labor Force Participation Rates)graph save Graph "homework\eitc1.gph", replaceR:1 2 3 4 5 6 7 8 9101112131415 # Take average value of 'work' by year, conditional on anykidsminfo = aggregate(eitc$work, list(eitc$year,eitc$anykids == 1), mean)# rename column headings (variables)names(minfo) = c("YR","Treatment","LFPR")# Attach a new column with labelsminfo$Group[1:6] = "Single women, no children"minfo$Group[7:12] = "Single women, children"minforequire(ggplot2) #package for creating nice plotsqplot(YR, LFPR, data=minfo, geom=c("point","line"), colour=Group,xlab="Year", ylab="Labor Force Participation Rate") The ggplot2 package produces some nice looking charts.Calculate the D-I-D Estimate of the Treatment EffectCalculate the unconditional difference-in-difference estimates of the effect of the 1993 EITC expansion on employment of single women.STATA:mean work if post93==0 & anykids==0 mean work if post93==0 & anykids==1 mean work if post93==1 & anykids==0 mean work if post93==1 & anykids==1 R:1 2 3 4 5 a = colMeans(subset(eitc, post93 == 0 & anykids == 0, select=work))b = colMeans(subset(eitc, post93 == 0 & anykids == 1, select=work))c = colMeans(subset(eitc, post93 == 1 & anykids == 0, select=work))d = colMeans(subset(eitc, post93 == 1 & anykids == 1, select=work)) (d-c)-(b-a)Run a simple D-I-D RegressionNow we will run a regression to estimate the conditional difference-in-difference estimate of the effect of the Earned Income Tax Credit on “work”, using all women with children as the treatment group. The regression equation is as follows:Where is the white noise error term.STATA:gen interaction = post93*anykidsreg work post93 anykids interactionR:1 2 reg1 = lm(work ~ post93 + anykids + post93*anykids, data = eitc) summary(reg1)Include Relevant Demographics in RegressionAdding additional variables is a matter of including them in your coded regression equation, as follows:STATA:gen age2 = age^2 /*Create age-squared variable*/gen nonlaborinc = finc - earn /*Non-labor income*/reg work post93 anykids interaction nonwhite age age2 ed finc nonlaborincR:1 2 3 reg2 = lm(work ~ anykids + post93 + post93*anykids + nonwhite+ age + I(age^2) + ed + finc + I(finc-earn), data = eitc) summary(reg2)Create some new variablesWe will create two new interaction variables:1.The state unemployment rate interacted with number of children.2.The treatment term interacted with individuals with one child, or more than onechild.STATA:gen interu = urate*anykidsgen onekid = (children==1)gen twokid = (children>=2)gen postXone = post93*onekidgen postXtwo = post93*twokidR:1 2 3 4 5 6 7 8 9101112 # The state unemployment rate interacted with number of childreneitc$urate.int = eitc$urate*eitc$anykids### Creating a new treatment term:# First, we'll create a new dummy variable to distinguish between one child and 2+. eitc$manykids = as.numeric(eitc$children >= 2)# Next, we'll create a new variable by interacting the new dummy# variable with the original interaction term.eitc$tr2 = eitc$p93kids.interaction*eitc$manykidsEstimate a Placebo ModelTesting a placebo model is when you arbitrarily choose a treatment time before your actual treatment time, and test to see if you get a significant treatment effect.STATA:gen placebo = (year >= 1992)gen placeboXany = anykids*placeboreg work anykids placebo placeboXany if year<1994In R, first we’ll subset the data to exclude the time period after the real treatment (1993 and later). N ext, we’ll create a new treatment dummy variable, and run a regression as before on our data subset.R:1 2 3 4 5 6 7 8 9 10 # sub set the data, including only years before 1994.eitc.sub = eitc[eitc$year <= 1993,]# Create a new "after treatment" dummy variable# and interaction termeitc.sub$post91 = as.numeric(eitc.sub$year >= 1992)# Run a placebo regression where placebo treatment = post91*anykids reg3 <- lm(work ~ anykids + post91 + post91*anykids, data = eitc.sub) summary(reg3)The entire code for this post is available here (File –> Save As). If you have any questions or find problems with my code, you can e-mail me directlyat kevingoulding {at} gmail [dot] com.To continue on to Part 3 of our series, Fixed Effects estimation, click here.。

安慰剂检验介绍、操作及举例

安慰剂检验介绍、操作及举例

安慰剂检验介绍(Placebo test)安慰剂是一种附加实证检验的思路,并不存在一个具体的特定的操作方法。

一般存在两种寻找安慰剂变量的方法。

比如,在已有的实证检验中,发现自变量Xi会影响自变量Zi与因变量Yi之间存在相关关系。

在其后的实证检验中,采用其他主体(国家,省份,公司)的Xj变量作为安慰剂变量,检验Xj是否影响Zi与Yi之间的相关关系。

如果不存在类似于Xi的影响,即可排除Xi 的安慰剂效应,使得结果更为稳健。

另一种寻找安慰剂变量的方法。

已知,Xi是虚拟变量,Xi=1,if t>T;Xi=0 if t<T;Xi对Zi对Yi的影响的影响在T时前后有显著差异(DID)。

在其后的实证检验中,将Xi`设定为Xi`=1,if t>T+n;Xi`=0 if t<T+n,其中n根据实际情况取值,可正可负。

检验Xi`是否影响Zi与Yi之间的相关关系。

如果不存在类似于Xi的影响,即可排除Xi的安慰剂效应,使得结果更为稳健。

举例:以美国市场某种政策冲击识别策略的因果关系考察,在最后部分选取英国同期的因变量,检验是否有类似的特征,就是安慰剂检验。

以中国2007年所得税改革作为减税的政策冲击以验证减税对企业创新的影响。

亦可以通过把虚拟的政策实施时间往前往后推几年,作为虚拟的政策时点,如果检验发现没有类似的因果,文章的主要结论就更加可信了。

以下是详细的例题,安慰剂检验在最后。

Surviving Graduate Econometrics with R:Difference-in-Differences Estimation — 2 of 8The following replication exercise closely follows the homework assignment #2 in ECNS 562. The data for this exercise can be found here.The data is about the expansion of the Earned Income Tax Credit. This is a legislation aimed at providing a tax break for low income individuals. For some background on the subject, seeEissa, Nada, and Jeffrey B. Liebman. 1996. Labor Supply Responses to the Earned Income Tax Credit. Quarterly Journal of Economics. 111(2): 605-637.The homework questions (abbreviated):1.Describe and summarize data.2.Calculate the sample means of all variables for (a) single women with nochildren, (b) single women with 1 child, and (c) single women with 2+ children.3.Create a new variable with earnings conditional on working (missing fornon-employed) and calculate the means of this by group as well.4.Construct a variable for the “treatment” called ANYKIDS and a variable for afterthe expansion (called POST93—should be 1 for 1994 and later).5.Create a graph which plots mean annual employment rates by year(1991-1996) for single women with children (treatment) and without children (control).6.Calculate the unconditional difference-in-difference estimates of the effect ofthe 1993 EITC expansion on employment of single women.7.Now run a regression to estimate the conditional difference-in-differenceestimate of the effect of the EITC. Use all women with children as the treatment group.8.Reestimate this model including demographic characteristics.9.Add the state unemployment rate and allow its effect to vary by the presence ofchildren.10.A llow the treatment effect to vary by those with 1 or 2+ children.11.Estimate a “placebo” treatment model. Take data from only the pre-reformperiod. Use the same treatment and control groups. Introduce a placebo policy that begins in 1992 (so 1992 and 1993 both have this fake policy).A review: Loading your dataRecall the code for importing your data:STATA:/*Last modified 1/11/2011 */**************************************************************************The following block of commands go at the start of nearly all do files*/*Bracket comments with /* */ or just use an asterisk at line beginningclear /*Clears memory*/set mem 50m /*Adjust this for your particular dataset*/cd "C:\DATA\Econ 562\homework" /*Change this for your file structure*/log using stata_assign2.log, replace /*Log file records all commands & results*/display "$S_DATE $S_TIME"set more offinsheet using eitc.dta, clear*************************************************************************R:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # Kevin Goulding # ECNS 562 - Assignment 2 ########################################################################## # Load the foreign package require(foreign) # Import data from web site # update: first download the file eitc.dta from this link: # https:///open?id=0B0iAUHM7ljQ1cUZvRWxjUmpfVXM # Then import from your hard drive: eitc = read.dta("C:/link/to/my/download/folder/eitc.dta")</pre> Note that any comments can be embedded into R code, simply by putting a <code> # </code> to the can download the data file, and import it from your hard drive: eitc = read.dta("C:\DATA\Courses\Econ 562\homework\eitc.dta")Describe and summarize your dataRecall from part 1 of this series, the following code to describe and summarizeyour data:STATA:dessumR:In R, each column of your data is assigned a class which will determine how your data is treated in various functions. To see what class R has interpreted for all your variables, run the following code:1 2 3 4 sapply(eitc,class) summary(eitc)source('sumstats.r') sumstats(eitc)To output the summary statistics table to LaTeX, use the following code:1 2 require(xtable) # xtable package helps create LaTeX code xtable(sumstats(eitc))Note: You will need to re-run the code for sumstats() which you can find in an earlier post.Calculate Conditional Sample MeansSTATA:summarize if children==0summarize if children == 1summarize if children >=1summarize if children >=1 & year == 1994mean work if post93 == 0 & anykids == 1R:1 2 3 4 5 6 7 8 91011121314 # The following code utilizes the sumstats function (you will need to re-run this code) sumstats(eitc[eitc$children == 0, ])sumstats(eitc[eitc$children == 1, ])sumstats(eitc[eitc$children >= 1, ])sumstats(eitc[eitc$children >= 1 & eitc$year == 1994, ])# Alternately, you can use the built-in summary functionsummary(eitc[eitc$children == 0, ])summary(eitc[eitc$children == 1, ])summary(eitc[eitc$children >= 1, ])summary(eitc[eitc$children >= 1 & eitc$year == 1994, ])# Another example: Summarize variable 'work' for women with one child from 1993 onwards. summary(subset(eitc, year >= 1993 & children == 1, select=work))The code above includes all summary statistics – but say you are only interested in the mean. You could then be more specific in your coding, like this:1 2 3 mean(eitc[eitc$children == 0, 'work']) mean(eitc[eitc$children == 1, 'work']) mean(eitc[eitc$children >= 1, 'work'])Try out any of the other headings within the summary output, they should also work: min() for minimum value, max() for maximum value, stdev() for standard deviation, and others.Create a New VariableTo create a new variable called “c.earn” equal to earnings conditional on working (if “work” = 1), “NA” otherwise (“work” = 0) – use the following code:STATA:gen cearn = earn if work == 1R:1 2 3 4 5 6 7 eitc$c.earn=eitc$earn*eitc$workz = names(eitc)X = as.data.frame(eitc$c.earn)X[] = lapply(X, function(x){replace(x, x == 0, NA)}) eitc = cbind(eitc,X)eitc$c.earn = NULLnames(eitc) = zConstruct a Treatment VariableConstruct a variable for the treatment called “anykids” = 1 for treated individual (has at least one child); and a variable for after the expansion called “post93” = 1 for 1994 and later.STATA:gen anykids = (children >= 1)gen post93 = (year >= 1994)R:1 2 eitc$post93 = as.numeric(eitc$year >= 1994) eitc$anykids = as.numeric(eitc$children > 0)Create a plotCreate a graph which plots mean annual employment rates by year (1991-1996) for single women with children (treatment) and without children (control).STATA:preservecollapse work, by(year anykids)gen work0 = work if anykids==0label var work0 "Single women, no children"gen work1 = work if anykids==1label var work1 "Single women, children"twoway (line work0 year, sort) (line work1 year, sort), ytitle(Labor Force Participation Rates)graph save Graph "homework\eitc1.gph", replaceR:1 2 3 4 5 6 7 8 9101112131415 # Take average value of 'work' by year, conditional on anykidsminfo = aggregate(eitc$work, list(eitc$year,eitc$anykids == 1), mean)# rename column headings (variables)names(minfo) = c("YR","Treatment","LFPR")# Attach a new column with labelsminfo$Group[1:6] = "Single women, no children"minfo$Group[7:12] = "Single women, children"minforequire(ggplot2) #package for creating nice plotsqplot(YR, LFPR, data=minfo, geom=c("point","line"), colour=Group,xlab="Year", ylab="Labor Force Participation Rate") The ggplot2 package produces some nice looking charts.Calculate the D-I-D Estimate of the Treatment EffectCalculate the unconditional difference-in-difference estimates of the effect of the 1993 EITC expansion on employment of single women.STATA:mean work if post93==0 & anykids==0 mean work if post93==0 & anykids==1 mean work if post93==1 & anykids==0 mean work if post93==1 & anykids==1 R:1 2 3 4 5 a = colMeans(subset(eitc, post93 == 0 & anykids == 0, select=work))b = colMeans(subset(eitc, post93 == 0 & anykids == 1, select=work))c = colMeans(subset(eitc, post93 == 1 & anykids == 0, select=work))d = colMeans(subset(eitc, post93 == 1 & anykids == 1, select=work)) (d-c)-(b-a)Run a simple D-I-D RegressionNow we will run a regression to estimate the conditional difference-in-difference estimate of the effect of the Earned Income Tax Credit on “work”, using all women with children as the treatment group. The regression equation is as follows:Where is the white noise error term.STATA:gen interaction = post93*anykidsreg work post93 anykids interactionR:1 2 reg1 = lm(work ~ post93 + anykids + post93*anykids, data = eitc) summary(reg1)Include Relevant Demographics in RegressionAdding additional variables is a matter of including them in your coded regression equation, as follows:STATA:gen age2 = age^2 /*Create age-squared variable*/gen nonlaborinc = finc - earn /*Non-labor income*/reg work post93 anykids interaction nonwhite age age2 ed finc nonlaborincR:1 2 3 reg2 = lm(work ~ anykids + post93 + post93*anykids + nonwhite+ age + I(age^2) + ed + finc + I(finc-earn), data = eitc) summary(reg2)Create some new variablesWe will create two new interaction variables:1.The state unemployment rate interacted with number of children.2.The treatment term interacted with individuals with one child, or more than onechild.STATA:gen interu = urate*anykidsgen onekid = (children==1)gen twokid = (children>=2)gen postXone = post93*onekidgen postXtwo = post93*twokidR:1 2 3 4 5 6 7 8 9101112 # The state unemployment rate interacted with number of childreneitc$urate.int = eitc$urate*eitc$anykids### Creating a new treatment term:# First, we'll create a new dummy variable to distinguish between one child and 2+. eitc$manykids = as.numeric(eitc$children >= 2)# Next, we'll create a new variable by interacting the new dummy# variable with the original interaction term.eitc$tr2 = eitc$p93kids.interaction*eitc$manykidsEstimate a Placebo ModelTesting a placebo model is when you arbitrarily choose a treatment time before your actual treatment time, and test to see if you get a significant treatment effect.STATA:gen placebo = (year >= 1992)gen placeboXany = anykids*placeboreg work anykids placebo placeboXany if year<1994In R, first we’ll subset the data to exclude the time period after the real treatment (1993 and later). N ext, we’ll create a new treatment dummy variable, and run a regression as before on our data subset.R:1 2 3 4 5 6 7 8 9 10 # sub set the data, including only years before 1994.eitc.sub = eitc[eitc$year <= 1993,]# Create a new "after treatment" dummy variable# and interaction termeitc.sub$post91 = as.numeric(eitc.sub$year >= 1992)# Run a placebo regression where placebo treatment = post91*anykids reg3 <- lm(work ~ anykids + post91 + post91*anykids, data = eitc.sub) summary(reg3)The entire code for this post is available here (File –> Save As). If you have any questions or find problems with my code, you can e-mail me directlyat kevingoulding {at} gmail [dot] com.To continue on to Part 3 of our series, Fixed Effects estimation, click here.。

安慰剂检验不关于0对称

安慰剂检验不关于0对称

安慰剂检验不关于0对称摘要:1.安慰剂检验的定义和目的2.安慰剂检验与0 对称的关系3.如何进行安慰剂检验4.0 对称在安慰剂检验中的作用5.结论正文:一、安慰剂检验的定义和目的安慰剂检验(Placebo Test)是一种常用的统计分析方法,主要用于检验实验或观察性研究中的处理效应是否显著。

在实验中,安慰剂检验通常用于评估实验干预的有效性,帮助研究者判断实验结果是否受到其他因素的影响,从而为研究结果的可靠性提供证据。

二、安慰剂检验与0 对称的关系安慰剂检验与0 对称(Zero-Sum)是统计学中两个重要的概念。

在安慰剂检验中,实验组和对照组之间的差异需要通过比较实验组和对照组的平均值来评估。

而0 对称则表示实验组和对照组之间的平均值差异为0,即实验干预对实验结果没有影响。

三、如何进行安慰剂检验进行安慰剂检验的一般步骤如下:1.从样本中随机抽取一定比例的个体,将其定义为实验组。

2.使用greedy match 方法,为实验组匹配相应的对照组。

3.计算实验组和对照组的平均值差异,并进行统计检验。

4.根据统计检验结果判断实验干预是否具有显著性。

四、0 对称在安慰剂检验中的作用0 对称在安慰剂检验中的作用主要体现在两个方面:1.作为安慰剂检验的基准。

在安慰剂检验中,研究者需要根据0 对称原理,评估实验组和对照组之间的平均值差异是否显著。

如果差异不显著,说明实验干预可能没有效果。

2.用于估算置信区间。

在安慰剂检验中,研究者可以通过0 对称原理,估算实验组和对照组之间平均值差异的置信区间。

这有助于更准确地评估实验干预的效果。

五、结论总之,安慰剂检验是一种重要的统计分析方法,用于评估实验或观察性研究中的处理效应是否显著。

0 对称作为安慰剂检验的基准和置信区间的估算依据,在安慰剂检验中起着关键作用。

permute安慰剂检验的原理

permute安慰剂检验的原理

permute安慰剂检验的原理安慰剂检验是一种常用的临床试验方法,用于评估一种治疗方法的疗效。

在安慰剂检验中,参与者被随机分配到接受治疗组或安慰剂组,接受不同的处理。

这里,我将从多个角度来解释安慰剂检验的原理。

首先,安慰剂检验基于随机分配的原则。

参与者被随机分配到不同的组别,以消除可能的偏倚和混杂因素。

这样,两组之间的差异可以更准确地归因于治疗方法的效果,而不是其他因素的影响。

其次,安慰剂检验的原理基于安慰剂效应。

安慰剂是一种看似治疗但实际上没有药理作用的物质,它可以产生一种心理效应,使人们感觉到症状的改善。

通过与安慰剂组进行比较,可以评估治疗组的效果是否超过了安慰剂效应,从而确定治疗方法的疗效。

此外,安慰剂检验的原理还基于双盲设计。

在双盲设计中,既有参与者也有研究人员不知道他们所接受的是治疗还是安慰剂。

这样可以避免主观偏见的影响,确保结果的客观性和可靠性。

另外,安慰剂检验的原理还涉及到样本量的确定。

为了提高检验的统计功效,需要合理确定样本量。

通常,样本量的大小与研究的效应大小、显著性水平和统计功效有关。

通过足够大的样本量,可以提高研究的可靠性和推广性。

最后,安慰剂检验的原理还包括数据的收集和分析。

在安慰剂检验中,收集的数据可以是定量的(如生理指标、疾病进展等)或定性的(如症状改善的主观评价)。

这些数据可以通过统计分析方法来比较两组之间的差异,以确定治疗方法的效果。

综上所述,安慰剂检验的原理涉及随机分配、安慰剂效应、双盲设计、样本量确定和数据分析等多个方面。

这些原理的综合运用可以提高研究的可靠性和有效性,从而评估治疗方法的疗效。

安慰剂临床实验用药

安慰剂临床实验用药

安慰剂临床实验用药安慰剂,即被作为对照组的一种虚拟治疗手段,在临床实验中发挥着重要的作用。

本文将从安慰剂的定义、临床实验用药的原则以及安慰剂在临床实验中的应用等方面进行探讨。

一、安慰剂的定义安慰剂是指在临床实验中使用的一种虚假的治疗方法,通常是通过提供一种与真实药物相似的外观和味道的制剂,但实际上并不包含任何有效的药物成分。

安慰剂被用来作为对照组,与测试药物进行比较,以评估测试药物的疗效。

二、临床实验用药的原则在进行临床实验时,用药应遵循一些原则,以确保实验的可靠性和真实性。

1. 随机分组:实验对象应根据一定的随机原则分为试验组和对照组,以尽量减少实验结果的偏差。

2. 盲法:实验的参与者、研究者和评估者应该对用药情况保持不知情的状态,以避免主观因素对实验结果的影响。

3. 双盲对照:随机分组和盲法应结合起来,既让实验对象不知道自己所接受的是测试药物还是安慰剂,也让研究者和评估者不知道实验对象所接受的是哪种药物。

三、安慰剂在临床实验中的应用安慰剂在临床实验中发挥着重要的作用,可以用于以下几个方面:1. 评估疗效:用安慰剂作为对照组,可以帮助评估测试药物的疗效。

通过与安慰剂组的比较,可以判断出测试药物的真实疗效是否超过了安慰剂的效果。

2. 评估安全性:安慰剂组可以作为一个安全控制组,对测试药物的安全性进行评估。

通过与安慰剂组的对比,可以判断出测试药物是否存在不良反应。

3. 增加可靠性:在某些疾病的治疗中,患者的主观感受可能会受到心理因素的影响,使用安慰剂可以排除这些因素,使实验结果更加可靠。

4. 建立基线:在临床实验开始之初,将安慰剂用于对照组可以建立一条基线,用于与后续的实验结果进行比较,从而更好地评估测试药物的效果。

总结:安慰剂作为一种常用的虚拟治疗方法,在临床实验中具有重要的作用。

它可以用于评估测试药物的疗效和安全性,增加实验的可靠性,并建立基线以进行后续的比较。

在选择安慰剂时,需要遵循临床实验用药的原则,采取随机分组和盲法等措施,以确保实验结果的准确性和可靠性。

临床研究中的安慰剂使用

临床研究中的安慰剂使用

临床研究中的安慰剂使用临床研究是评估和验证医疗手段的有效性和安全性的重要方法之一。

在这个过程中,安慰剂是一个不可或缺的部分,被广泛应用在临床试验中。

本文将探讨临床研究中安慰剂的使用,其作用和必要性。

一、安慰剂在临床研究中的定义安慰剂(Placebo)是指一种无治疗效果的药物或治疗措施,通常是一种无效的药物或治疗方法。

在临床研究中,将安慰剂分配给某一组研究对象,以与实验组进行对照比较,评估新药物或治疗的真实效果。

二、安慰剂的作用和目的1. 作为对照组进行比较:在临床试验中,新药或新疗法的效果需要与安慰剂进行比较,以评估其真正的治疗效果。

通过与安慰剂组进行比较,可以减少其他因素对结果的干扰,从而更准确地评估药物或治疗的疗效。

2. 评估安全性:安慰剂可以用于评估新药物或治疗方法的安全性。

通过与安慰剂组进行对照,可以确定是否存在严重的不良反应,并进一步评估药物或治疗的风险和受益比。

3. 解决心理因素:安慰剂的使用可以帮助满足患者的心理需求。

在一些疾病或疼痛的病例中,患者可能更容易在安慰剂的使用下感到心理上的满足和安宁,从而提高治疗效果。

三、安慰剂使用的伦理考虑在使用安慰剂时,研究人员需要考虑以下伦理问题:1. 患者知情同意:研究人员需要向患者详细说明研究内容和可能的风险,并获得其知情同意。

患者应当清楚了解自己可能被分配到安慰剂组,以便做出明智的决策。

2. 安全保障:研究人员需要确保安慰剂的使用不会对患者的健康带来严重威胁。

在安慰剂的选择上,应尽量避免可能引发不良反应的物质,保障患者的安全。

3. 伦理审查:对于临床研究中涉及到安慰剂的实验,应进行伦理审查。

伦理委员会将评估研究的科学性、必要性、安全性和患者权益的保障,并决定是否批准该研究的进行。

四、安慰剂使用的争议安慰剂在临床研究中的使用也引发了一些争议。

其中主要的争议点包括:1. 伦理问题:一些人认为,在安慰剂组中使用安慰剂是对患者的不人道行为,违背了伦理原则中的最佳利益原则。

多期DID之安慰剂检验、平行趋势检验

多期DID之安慰剂检验、平行趋势检验

多期DID之安慰剂检验、平行趋势检验❝这期将介绍多期DID中安慰剂检验的实现步骤,相关数据后台回复20200628获取。

❞在传统DID模型中,所有单位的政策时间一致,安慰剂检验只需在所有单位中随机抽取固定数量的若干单位作为实验组便可。

但是,在多期DID中每个单位的政策时间不同,该种方法便不再适用。

解决办法就是:为每个样本对象随机抽取样本期作为其政策时间。

比如,本文中提供了我国30个省2000-2018年的数据,在多期DID 中就需要为这30个省中每个省随机抽取2000-2018中的某一个年份作为它的政策时间。

首先,让我们来看一下原始政策时间下的多期DID估计情况:多期DID估计cd ×××××××××use 数据0.dta, clearxtset id year* 生成单位时间处理变量gen DT = ((id == 1 & year >= 2005) | (id == 2 & year >= 2 005) | (id == 3 & year >= 2006) | (id == 4 & year >= 2006) | (id == 5 & year >= 2006) | (id == 6 & year >= 2006) | (id == 7 & y ear >= 2006) | (id == 8 & year >= 2006) | (id == 9 & year >= 2005) | (id == 10 & year >= 2003) | ( id == 11 &year >= 2004) | (id == 12 & year >= 2006) | (id == 13 & year >= 2006) | (id == 14 & year >= 2006) | (id == 15 & year >= 2005) | (id == 16 & ye ar >= 2006) | (id == 17 & year >= 2006) | (id == 18 & year >= 2006) | (id == 19 & year >= 2002) | (id == 20 & year >= 2006) | (id == 21 & year >= 2003) | (id == 22 & year >= 2006) | (id == 23 & year >= 2006) | (id == 24 & year >= 2006) | (id == 25 & year >= 2006) | (id == 27 & year >= 2005) | (id == 28 & year >= 2006) | (id == 29 & year >= 2006) | (id == 30 & year >= 2006)) 多期DID估计:xtreg y DT x1-x6, fe结果:Fixed-effects (within) regression Number of obs = 570 Group variable: id Number of groups = 30R-sq: Obs per group:within = 0.5624 min = 19between = 0.0578 avg = 19.0 overall = 0.2567 max = 19F(7,533) = 97.87corr(u_i, Xb) = -0.3584 Prob > F = 0.0000------------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------DT | .0792811 .0104967 7.55 0.000 .0586611 .099 901x1 | .7772515 .4184583 1.86 0.064 -.0447783 1.59 9281x2 | .0796133 .0644417 1.24 0.217 -.0469776 .2062043x3 | .2698871 .0823515 3.28 0.001 .1081137 .4316 605x4 | .1934262 .1931832 1.00 0.317 -.1860676 .572 9201x5 | -.7536294 .2024724 -3.72 0.000 -1.151371 -.3558877x6 | -1.154312 .2885104 -4.00 0.000 -1.721069 -.5875551_cons | .8809552 .0650585 13.54 0.000 .7531526 1 .008758-------------+----------------------------------------------------------------sigma_u | .10451545sigma_e | .06805517rho | .70224943 (fraction of variance due to u_i)------------------------------------------------------------------------------F test that all u_i=0: F(29, 533) = 26.25 Prob > F = 0.0000平行趋势检验* 平行趋势检验gen current = ((id == 1 & year == 2005) | (id == 2 & year == 2005) | (id == 3 & year == 2006) | (id == 4 & year == 2006) | (id == 5 & year == 2006) | (id == 6 & year == 2006) | (id == 7 & year == 2006) | (id == 8 & year == 2006) | (id == 9 & year == 2005) | (id == 10 & year == 2003) | ( id == 11 &year == 200 4) | (id == 12 & year == 2006) | (id == 13 & year == 2006) | (id == 14 & year == 2006) | (id == 15 & year == 2005) | (id == 16 & year == 2006) | (id == 17 & year == 2006) | (id == 18 & year== 2006) | (id == 19 & year == 2002) | (id == 20 & year == 200 6) | (id == 21 & year == 2003) | (id == 22 & year == 2006) | (id == 23 & year == 2006) | (id == 24 & year == 2006) | (id == 25 & year == 2006) | (id == 27 & year == 2005) | (id == 28 & year == 2006) | (id == 29 & year == 2006) | (id == 30 & year == 200 6))gen pre6 = f6.currentgen pre5 = f5.currentgen pre4 = f4.currentgen pre3 = f3.currentgen pre2 = f2.currentgen pre1 = f.currentgen post1 = l.currentgen post2 = l2.currentgen post3 = l3.currentgen post4 = l4.currentgen post5 = l5.currentgen post6 = l6.currentgen post7 = l7.currentreplace pre1 = 0 if pre1 == .replace pre2 = 0 if pre2 == .replace pre3 = 0 if pre3 == .replace pre4 = 0 if pre4 == .replace pre5 = 0 if pre5 == .replace pre6 = 0 if pre6 == .replace post1 = 0 if post1 == .replace post2 = 0 if post2 == .replace post3 = 0 if post3 == .replace post4 = 0 if post4 == .replace post5 = 0 if post5 == .replace post6 = 0 if post6 == .replace post7 = 0 if post7 == .xtreg y pre3 pre2 pre1 current post1 post2 post3 post4 post 5 post6 post7 x1-x6, fecoefplot, keep(pre3 pre2 pre1 current post1 post2 post3 po st4 post5 post6 post7) vertical addplot(line @b @at) yline(0) lev els(95)平行趋势检验安慰剂检验生成备用矩阵mat b = J(500,1,0)mat se = J(500,1,0)mat p = J(500,1,0)抽样过程-方案1:在变量year中随机抽取30个数据依次作为这30个省份的政策时间forvalues i=1/500{use 数据0.dta, clearxtset id yearsample 30, countkeep yearmkmat year, matrix(sampleyear) //向量转化为矩阵,方便调用use 数据.dta,clearxtset id yeargen treat = 0* 生成单位时间处理变量foreach j of numlist 1/30 {replace trea = 1 if (id == `j' & year >= sampleyear[`j',1])}qui xtreg y treat x1-x6, fe* 存储并计算所需回归结果mat b[`i',1] = _b[treat]mat se[`i',1] = _se[treat]mat p[`i',1] = 2*ttail(e(df_r),abs(_b[treat]/_se[treat]))}抽样过程-方案2:与方案1不同,这里首先将数据按照省份分组,然后在每个省份组内的year变量中随机抽取一个年份作为其政策时间。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

安慰剂检验介绍(Placebo test)安慰剂是一种附加实证检验的思路,并不存在一个具体的特定的操作方法。

一般存在两种寻找安慰剂变量的方法。

比如,在已有的实证检验中,发现自变量Xi会影响自变量Zi与因变量Yi之间存在相关关系。

在其后的实证检验中,采用其他主体(国家,省份,公司)的Xj变量作为安慰剂变量,检验Xj是否影响Zi与Yi之间的相关关系。

如果不存在类似于Xi的影响,即可排除Xi 的安慰剂效应,使得结果更为稳健。

另一种寻找安慰剂变量的方法。

已知,Xi是虚拟变量,Xi=1,if t>T;Xi=0 if t<T;Xi对Zi对Yi的影响的影响在T时前后有显著差异(DID)。

在其后的实证检验中,将Xi`设定为Xi`=1,if t>T+n;Xi`=0 if t<T+n,其中n根据实际情况取值,可正可负。

检验Xi`是否影响Zi与Yi之间的相关关系。

如果不存在类似于Xi的影响,即可排除Xi的安慰剂效应,使得结果更为稳健。

举例:以美国市场某种政策冲击识别策略的因果关系考察,在最后部分选取英国同期的因变量,检验是否有类似的特征,就是安慰剂检验。

以中国2007年所得税改革作为减税的政策冲击以验证减税对企业创新的影响。

亦可以通过把虚拟的政策实施时间往前往后推几年,作为虚拟的政策时点,如果检验发现没有类似的因果,文章的主要结论就更加可信了。

以下是详细的例题,安慰剂检验在最后。

Surviving Graduate Econometrics with R:Difference-in-Differences Estimation — 2 of 8The following replication exercise closely follows the homework assignment #2 in ECNS 562. The data for this exercise can be found here.The data is about the expansion of the Earned Income Tax Credit. This is a legislation aimed at providing a tax break for low income individuals. For some background on the subject, seeEissa, Nada, and Jeffrey B. Liebman. 1996. Labor Supply Responses to the Earned Income Tax Credit. Quarterly Journal of Economics. 111(2): 605-637.The homework questions (abbreviated):1.Describe and summarize data.2.Calculate the sample means of all variables for (a) single women with nochildren, (b) single women with 1 child, and (c) single women with 2+ children.3.Create a new variable with earnings conditional on working (missing fornon-employed) and calculate the means of this by group as well.4.Construct a variable for the “treatment” called ANYKIDS and a variablefor after the expansion (called POST93—should be 1 for 1994 and later).5.Create a graph which plots mean annual employment rates by year(1991-1996) for single women with children (treatment) and without children (control).6.Calculate the unconditional difference-in-difference estimates of theeffect of the 1993 EITC expansion on employment of single women.7.Now run a regression to estimate the conditional difference-in-differenceestimate of the effect of the EITC. Use all women with children as the treatment group.8.Reestimate this model including demographic characteristics.9.Add the state unemployment rate and allow its effect to vary by thepresence of children.10.Allow the treatment effect to vary by those with 1 or 2+ children.11.Estimate a “placebo” treatment model. Take data from only thepre-reform period. Use the same treatment and control groups. Introduce a placebo policy that begins in 1992 (so 1992 and 1993 both have this fake policy).A review: Loading your dataRecall the code for importing your data:STATA:/*Last modified 1/11/2011 */**************************************************************************The following block of commands go at the start of nearly all do files*/*Bracket comments with /* */ or just use an asterisk at line beginningclear /*Clears memory*/set mem 50m /*Adjust this for your particular dataset*/cd "C:\DATA\Econ 562\homework" /*Change this for your */log using stata_assign2.log, replace /*Log all commands & results*/display "$S_DATE $S_TIME"set more offinsheet using eitc.dta, clear*************************************************************************R:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # Kevin Goulding # ECNS 562 - Assignment 2 ########################################################################## # Load the foreign package require(foreign) # Import data from web site # update: first download the from this link: # # Then import from your hard drive: eitc = read.dta("C:/link/to/my/download/folder/eitc.dta")</pre> Note that any comments can be embedded into R code, simply by putting a <code> # </code> to the can download the data file, and import it from your hard drive: eitc = read.dta("C:\DATA\Courses\Econ 562\homework\eitc.dta")Describe and summarize your dataRecall from part 1 of this series, the following code to describe and summarizeyour data:STATA:dessumR:In R, each column of your data is assigned a class which will determine how your data is treated in various functions. To see what class R has interpreted for all your variables, run the following code:1 2 3 4 sapply(eitc,class) summary(eitc)source('sumstats.r') sumstats(eitc)To output the summary statistics table to LaTeX, use the following code:1 2 require(xtable) # xtable package helps create LaTeX code xtable(sumstats(eitc))Note: You will need to re-run the code for sumstats() which you can find in an earlier post.Calculate Conditional Sample MeansSTATA:summarize if children==0summarize if children == 1summarize if children >=1summarize if children >=1 & year == 1994mean work if post93 == 0 & anykids == 1R:1 2 3 4 5 6 7 8 91011121314 # The following code utilizes the sumstats function (you will need to re-run this code) sumstats(eitc[eitc$children == 0, ])sumstats(eitc[eitc$children == 1, ])sumstats(eitc[eitc$children >= 1, ])sumstats(eitc[eitc$children >= 1 & eitc$year == 1994, ])# Alternately, you can use the built-in summary functionsummary(eitc[eitc$children == 0, ])summary(eitc[eitc$children == 1, ])summary(eitc[eitc$children >= 1, ])summary(eitc[eitc$children >= 1 & eitc$year == 1994, ])# Another example: Summarize variable 'work' for women with one child from 1993 onwards. summary(subset(eitc, year >= 1993 & children == 1, select=work))The code above includes all summary statistics – but say you are only interested in the mean. You could then be more specific in your coding, like this:1 2 3 mean(eitc[eitc$children == 0, 'work']) mean(eitc[eitc$children == 1, 'work']) mean(eitc[eitc$children >= 1, 'work'])Try out any of the other headings within the summary output, they should also work: min() for minimum value, max() for maximum value, stdev() for standard deviation, and others.Create a New VariableTo create a new variable called “c.earn” equal to earnings conditional on working (if “work” = 1), “NA” otherwise (“work” = 0) – use the following code:STATA:gen cearn = earn if work == 1R:1 2 3 4 5 6 7 eitc$c.earn=eitc$earn*eitc$workz = names(eitc)X = as.data.frame(eitc$c.earn)X[] = lapply(X, function(x){replace(x, x == 0, NA)}) eitc = cbind(eitc,X)eitc$c.earn = NULLnames(eitc) = zConstruct a Treatment VariableConstruct a variable for the treatment called “anykids” = 1 for treated individual (has at least one child); and a variable for after the expansion called “post93” = 1 for 1994 and later.STATA:gen anykids = (children >= 1)gen post93 = (year >= 1994)R:1 2 eitc$post93 = as.numeric(eitc$year >= 1994) eitc$anykids = as.numeric(eitc$children > 0)Create a plotCreate a graph which plots mean annual employment rates by year (1991-1996) for single women with children (treatment) and without children (control).STATA:preservecollapse work, by(year anykids)gen work0 = work if anykids==0label var work0 "Single women, no children"gen work1 = work if anykids==1label var work1 "Single women, children"twoway (line work0 year, sort) (line work1 year, sort), ytitle(Labor Force Participation Rates)graph save Graph "homework\eitc1.gph", replaceR:1 2 3 4 5 6 7 8 9101112131415 # Take average value of 'work' by year, conditional on anykidsminfo = aggregate(eitc$work, list(eitc$year,eitc$anykids == 1), mean)# rename column headings (variables)names(minfo) = c("YR","Treatment","LFPR")# Attach a new column with labelsminfo$Group[1:6] = "Single women, no children"minfo$Group[7:12] = "Single women, children"minforequire(ggplot2) #package for creating nice plotsqplot(YR, LFPR, data=minfo, geom=c("point","line"), colour=Group,xlab="Year", ylab="Labor Force Participation Rate") The ggplot2 package produces some nice looking charts.Calculate the D-I-D Estimate of the Treatment EffectCalculate the unconditional difference-in-difference estimates of the effect of the 1993 EITC expansion on employment of single women.STATA:mean work if post93==0 & anykids==0mean work if post93==0 & anykids==1mean work if post93==1 & anykids==0mean work if post93==1 & anykids==1R:1 2 3 4 5 a = colMeans(subset(eitc, post93 == 0 & anykids == 0, select=work))b = colMeans(subset(eitc, post93 == 0 & anykids == 1, select=work))c = colMeans(subset(eitc, post93 == 1 & anykids == 0, select=work))d = colMeans(subset(eitc, post93 == 1 & anykids == 1, select=work)) (d-c)-(b-a)Run a simple D-I-D RegressionNow we will run a regression to estimate the conditional difference-in-difference estimate of the effect of the Earned I ncome Tax Credit on “work”, using all women with children as the treatment group. The regression equation is asfollows:Where is the white noise error term.STATA:gen interaction = post93*anykidsreg work post93 anykids interactionR:1 2 reg1 = lm(work ~ post93 + anykids + post93*anykids, data = eitc) summary(reg1)Include Relevant Demographics in RegressionAdding additional variables is a matter of including them in your coded regression equation, as follows:STATA:gen age2 = age^2 /*Create age-squared variable*/gen nonlaborinc = finc - earn /*Non-labor income*/reg work post93 anykids interaction nonwhite age age2 ed finc nonlaborinc R:1 2 3 reg2 = lm(work ~ anykids + post93 + post93*anykids + nonwhite+ age + I(age^2) + ed + finc + I(finc-earn), data = eitc) summary(reg2)Create some new variablesWe will create two new interaction variables:1.The state unemployment rate interacted with number of children.2.The treatment term interacted with individuals with one child, or morethan one child.STATA:gen interu = urate*anykidsgen onekid = (children==1)gen twokid = (children>=2)gen postXone = post93*onekidgen postXtwo = post93*twokidR:1 2 3 4 5 6 7 8 9101112 # The state unemployment rate interacted with number of childreneitc$urate.int = eitc$urate*eitc$anykids### Creating a new treatment term:# First, we'll create a new dummy variable to distinguish between one child and 2+. eitc$manykids = as.numeric(eitc$children >= 2)# Next, we'll create a new variable by interacting the new dummy# variable with the original interaction term.eitc$tr2 = eitc$p93kids.interaction*eitc$manykidsEstimate a Placebo ModelTesting a placebo model is when you arbitrarily choose a treatment time before your actual treatment time, and test to see if you get a significant treatment effect.STATA:gen placebo = (year >= 1992)gen placeboXany = anykids*placeboreg work anykids placebo placeboXany if year<1994In R, first we’ll subset the data to exclude the time period aft er the real treatment (1993 and later). Next, we’ll create a new treatment dummy variable, and run a regression as before on our data subset.R:1 2 3 4 5 6 7 8 9 10 # sub set the data, including only years before 1994.eitc.sub = eitc[eitc$year <= 1993,]# Create a new "after treatment" dummy variable# and interaction termeitc.sub$post91 = as.numeric(eitc.sub$year >= 1992)# Run a placebo regression where placebo treatment = post91*anykids reg3 <- lm(work ~ anykids + post91 + post91*anykids, data = eitc.sub) summary(reg3)The entire code for this post is available here (File –> Save As). If you have any questions or find problems with my code, you can e-mail me directlyat kevingoulding {at} gmail [dot] com.To continue on to Part 3 of our series, Fixed Effects estimation, click here.。

相关文档
最新文档