面板数据分位数回归的一个简单方法

面板数据分位数回归的一个简单方法
面板数据分位数回归的一个简单方法

Econometrics Journal (2011),volume 14,pp.368–386.

doi:10.1111/j.1368-423X.2011.00349.x

A simple approach to quantile regression for panel data

I VAN A.C ANAY ?

?

Department of Economics,Northwestern University,2001Sheridan Rd,Evanston,

IL 60208,USA.

E-mail:iacanay@https://www.360docs.net/doc/7d9869692.html,

First version received:May 2010;?nal version accepted:April 2011

Summary This paper provides a set of suf?cient conditions that point identify a quantile regression model with ?xed effects.It also proposes a simple transformation of the data that gets rid of the ?xed effects under the assumption that these effects are location shifters.The new estimator is consistent and asymptotically normal as both n and T grow.

Keywords:Deconvolution ,Panel data models ,Quantile regression ,Two-step estimator .

1.INTRODUCTION

Panel data models and quantile regression models are both widely used in applied econometrics and popular topics of research in theoretical papers.Quantile regression models allow the researcher to account for unobserved heterogeneity and heterogeneous covariates effects,while the availability of panel data potentially allows the researcher to include ?xed effects to control for some unobserved covariates.There has been little but growing work at the intersection of these two methodologies (e.g.Koenker,2004,Geraci and Bottai,2007,Abrevaya and Dahl,2008,Galvao,2008,Rosen,2009,and Lamarche,2010).This initial lack of attention is possibly due to a fundamental issue associated with conditional quantiles.This is,as it is the case with non-linear panel data models,standard demeaning (or differencing)techniques do not result in feasible approaches.These techniques rely on the fact that expectations are linear operators,which is not the case for conditional quantiles.This paper provides suf?cient conditions under which the parameter of interest is identi?ed for ?xed T and shows that there is a simple transformation of the data that eliminates the ?xed effects as T →∞,when the ?xed effects are viewed as location shift variables (i.e.variables that affect all quantiles in the same way).The resulting two-step estimator is consistent and asymptotically normal when both n and T go to in?nity.Also,the new estimator is extremely simple to compute and can be implemented in standard econometrics packages.The paper is organized as follows.Section 2presents the model.Section 3provides an identi?cation result based on deconvolution arguments.Section 4introduces a two-step estimator for panel data quantile regression models.Asymptotic properties of the new estimator are presented in the same section.Section 5includes a small Monte Carlo experiment to study the ?nite sample properties of the two-step estimator.Finally,Section 6concludes.Appendix A provides proofs of results.An estimator of the covariance kernel and the bootstrap method are given in Appendix B.

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.Published by Blackwell Publishing Ltd,9600

Garsington Road,Oxford OX42DQ,UK and 350Main Street,Malden,MA,02148,USA.

Journal

The

Econometrics

A simple approach to quantile regression for panel data369

2.THE MODEL

Consider the following model

Y it=X itθ(U it)+αi,t=1,...,T,i=1,...,n,(2.1) where(Y it,X it)∈R×R k are observable variables and(U it,αi)∈R×R are unobservable.

Throughout the paper the vector X it is assumed to include a constant term,i.e.X

it =(1,X s it)

with X s

it ∈R k?1.The functionτ→X θ(τ)is assumed to be strictly increasing inτ∈(0,1)and

the parameter of interest is assumed to beθ(τ).Ifαi were observable it would follow that

P

Y it≤X itθ(τ)+αi|X i,αi

=τ,(2.2)

under the assumption that U it~U[0,1]conditional on X i=(X i1,...,X iT) andαi.This type of representation has been extensively used in the literature(e.g.Chernozhukov and Hansen, 2006,2008).The difference with the model in equation(2.1)and the standard quantile regression model introduced by Koenker and Bassett(1978)lies in the presence of the unobservedαi.This random variable could be arbitrarily related to the rest of the random variables in equation(2.1) (i.e.αi=αi(U it,X i,ηi)for some i.i.d.sequenceηi)rendering condition(2.2)as not particularly useful in terms of identi?cation.The question is under what additional conditions on(U it,αi)the parameterθ(τ)can be identi?ed and consistently estimated from the data.

Rosen(2009)recently showed that conditional on covariates quantile restriction alone does not identifyθ(τ).That is,let Q Z(τ|A)denote theτ-quantile of a random variable Z

conditional on another random variable A,let e it(τ)≡X

it [θ(U it)?θ(τ)],and write the model

in equation(2.1)as

Y it=X itθ(τ)+αi+e it(τ),Q e it(τ)(τ|X i)=0.(2.3)

Then,the conditional quantile restriction Q e

it (τ)

(τ|X i)=0does not have suf?cient identi?cation

power.1Rosen(2009)then provides different assumptions,i.e.support conditions and some form of conditional independence of e it(τ)across time,that(point and partially)identifyθ(τ).

Abrevaya and Dahl(2008)use the correlated random-effects model of Chamberlain(1982, 1984)as a way to get an estimator ofθ(τ).This model views the unobservableαi as a linear projection onto the observables plus a disturbance,i.e.

αi(τ,X i,ηi)=X i T(τ)+ηi.

The authors view the model as an approximation to the true conditional quantile and proceed to get estimates ofθ(τ)and T(τ)by running a quantile regression of Y it on X it and X i.In cases where there is no disturbanceηi,such a regression identi?esθ(τ).However,it is immediate to see that a quantile restriction alone does not identifyθ(τ)wheneverηi is present non-trivially since

the conditional behaviour of X

it θ(U it)+X

i

T(τ)+ηi depends on the joint distribution of the

unobservables U it andηi.This is problematic since not even a correctly speci?ed function for αi(τ,X i,ηi)helps in identifyingθ(τ),meaning that the correlated random-effects model might work poorly in many contexts.The simulations of Section5illustrate this point.

1Note that the distribution of e it(τ)need not be identical across t even when U it is i.i.d.,but that e it(τ)has the same τ-quantile for all t.

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

370I.A.Canay

Koenker(2004)takes a different approach and treats{αi}n i=1as parameters to be jointly estimated withθ(τ)for q different quantiles.He proposes the penalized estimator

?θ,{?α

i}n i=1

≡argmin

(θ,{αi}n i=1)

q

k=1

n

i=1

T

t=1

ρτ

k

Y it?X itθ(τk)?αi

n

i=1

|αi|,(2.4)

whereρτ(u)=u[τ?I(u<0)],I(·)denotes the indicator function,andλ≥0is a penalization parameter that shrinks the?αs towards a common value.Solving equation(2.4) can be computationally demanding when n is large(even forλ=0)and has the additional complication involved in the choice ofλ.2

Finally,there is a related literature on non-separable panel data models.These type of models are?exible enough to provide quantile treatment effects(see,e.g.Chernozhukov et al.,2010, and Graham and Powell,2010).For example,Chernozhukov et al.(2010)show that the quantile treatment effect of interest is partially identi?ed(for?xed T)and provide bounds for those effects in the model

Y it=g0(X it,αi,U it),U it|X i,αi=d U it |X i,αi,(2.5) where X it is assumed discrete.They also derive rates of shrinkage of the identi?ed set to a point as T goes to in?nity.The model in Chernozhukov et al.(2010)is more general than the one in equation(2.1)as it is non-separable inαi and it involves weaker assumptions on the unobservable U it.However,it leads to less powerful identi?cation results and more complicated estimators.

In this context this paper contributes to the literature in two ways.The next section shows that when the model in equation(2.1)is viewed as a deconvolution model,a result from Neumann(2007)can be applied to show thatθ(τ)is identi?ed when there are at least two time periods available andαi has a pure location shift effect.3This identi?cation result could be potentially used to construct estimators ofθ(τ)based on non-parametric estimators of conditional distribution functions.Such non-parametric estimators would rarely satisfy the end-goal of this paper,that is,to provide an easy-to-use estimator that can be implemented in standard econometric packages,would suffer from the common curse of dimensionality,and would typically involve a delicate choice of tuning parameters for their implementation.4Thus,when moving from identi?cation to estimation,this paper takes a different approach and shows that there exists a simple transformation of the data that eliminates the?xed effectsαi as T→∞. The transformation leads to an extremely simple asymptotically normal estimator forθ(τ)that can be easily computed even for very large values of n.Standard errors for this new estimator can be computed from the asymptotically normal representation.

3.IDENTIFICATION

In this section,I prove that the parameter of interestθ(τ)is identi?ed for T≥2under independence restrictions and existence of moments.The intuition behind the result is quite 2Lamarche(2010)proposes a method to choseλunder the additional assumption thatαi and X i are independent. Galvao(2008)further extends this idea to dynamic panels.

3This means that ifαi captures unobserved covariates Z

i β(τ)that enter the model and are constant over time,such

variables must have coef?cients that are constant acrossτ,that is,β=β(τ)for allτ.

4The advantage of such non-parametric estimators would be their consistency for asymptotics in which n→∞and T remains?xed.

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

A simple approach to quantile regression for panel data 371

simple.Letting S t ≡X t θ(U t )(the dependence on i is omitted for convenience here),it follows from equation (2.1)that Y t =S t +αis a convolution of S t and αconditional on X ,provided αand U t are independent conditional on X .It then follows that the conditional distributions of S t and αcan be identi?ed from the conditional distribution of Y t by using a deconvolution argument similar to that in Neumann (2007).This in turn results in identi?cation of θ(τ)after exploiting the fact that U t is conditionally U [0,1]together with some regularity conditions.

For ease of exposition let T =2and consider the following assumption where the lower case x =(x 1,x 2)denotes a realization of the random variable X =(X 1,X 2).

A SSUMPTION 3.1.Denote by φS t |x and φα|x the conditional on X =x characteristic functions of the distributions P S t |x and P α|x ,respectively.Then (a)conditional on X =x the random

variables S 1,S 2and αare independent for all x ∈ˉX

,where ˉX denotes the support of X =(X 1,X 2);(b)the set ≡{ω:φS t |x (ω2?k )=0for t ∈{1,2}and φα|x (ω2?k )=0,k =0,1,...}

is dense in R for all x ∈ˉX

.Assumption 3.1follows Neumann (2007),but it does not impose P S 1|x =P S 2|x for all x ∈

ˉX

,which does not hold in equation (2.1).Assumption 3.1(a)implies that αi does not change across quantiles as αi is independent of (U 1,U 2).Assumption 3.1(b)excludes characteristic functions that vanish on non-empty open subsets of R but allows the characteristic function to have countably many zeros.This includes cases ruled out in the deconvolution analysis of Li and Vuong (1998)and Evdokimov (2010),among others.More generally,any deconvolution analysis based on Kotlarski’s Lemma (which has been widely applied to identify and estimate a variety of models in economics)will fail to include such cases.5For example,if α~U [?1,1],or if αis discrete uniform,φαhas countably many zeros and so Kotlarski’s Lemma would not apply while Assumption 3.1(b)would be satis?ed.Having this extra dimension of generality is important here as the distribution of αi is left unspeci?ed in this paper.

Assumption 3.1implies that Y t is a convolution of S t and αconditional on X =x and so

φ(Y 1,Y 2)|x (ω1,ω2)=E [exp(iω Y )]=E [exp(iω1S 1+iω2S 2+i (ω1+ω2)α)|X =x ]

(3.1)=φS 1|x (ω1)φS 2|x (ω2)φα|x (ω1+ω2).

(3.2)

L EMMA 3.1.Suppose that P S t |x and P α|x are distributions with characteristic functions φS t |x

and φα|x satisfying Assumption 3.1.Let ?P

α|x and ?P S t |x be further distributions with respective characteristic functions ?φ

α|x and ?φS t |x .If now φS 1|x (ω1)φS 2|x (ω2)φα|x (ω1+ω2)=?φ

S 1|x (ω1)?φS 2|x (ω2)?φα|x (ω1+ω2),?ω1,ω2∈R ,

(3.3)

and all x ∈ˉX

,then there exist constants c x ∈R such that ?P

S t |x =P (S t ?c x )|x and

?P

α|x =P (α+c x )|x .(3.4)

5

An extension of the result by Kotlarski for the case of characteristic functions with zeros was recently proposed by Evdokimov and White (2010)by using conditions on the derivatives of the characteristic functions.

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

372I.A.Canay

That is,?P

S t |x and P S t |x for t ∈{1,2}as well as ?P α|x and P α|x are equal up to a location shift.

R EMARK 3.1.Lemma 3.1extends immediately to models that are slightly more general than the one in equation (2.1).For example,consider the case where Y it =q (U it ,X it )+αi and q (τ,

x )is strictly increasing in τfor all x ∈ˉX

.This model reduces to equation (2.1)if q (U it ,X it )=X

it

θ(U it ).Lemma 3.1holds in this case by letting S t =q (U t ,X t ).Another example would be a random coef?cients model where Y it =X

it

θi (U it )and θi (U it )=θ(U it )+?αi .This model can be written as Y it =X

it θ(U it )+αit and so the additional generality relative to equation (2.1)comes

from αit ≡X it

?αi varying across i and t .Lemma 3.1also applies to this model conditional on the event X 1=X 2.

R EMARK 3.2.It is worth noting that the identi?cation result in Lemma 3.1also applies to the correlated random effects model in Abrevaya and Dahl (2008)where

Y t =θ(U t )X t +α(U t ,X,η),

α(U t ,X,η)= 1(U t )X 1+ 2(U t )X 2+η.

(3.5)

Here Assumption 3.1must hold for φS t |x and φη|x where S t =(θ(U t )+ t (U t ))X t + ?t (U t )X ?t .Lemma 3.1identi?es the distributions up to location.Typically,to be able to identify the entire distribution one would need to add a location assumption.In the case of quantile regression the standard cross-section speci?cation assumes that U ~U [0,1]independent of X .The extension of this assumption to the panel case together with some additional regularity conditions allows identi?cation of the location of S t conditional on X =x as well as the parameter θ(τ).This is the role of Assumption 3.2.

A SSUMPTION 3.2.(a)U it ⊥(X i ,αi )and U it ~U [0,1];(b) UU ≡E [(θ(U it )?θμ)(θ(U it )?θμ) ],where θμ≡E [θ(U it )],is non-singular with ?nite norm;(c)letting X t =(1,X s t )for t =1,

2,there exists no A ?R k ?1such that A has probability 1under the distribution of X s 2

?X s 1and A is a proper linear subspace of R k ?1

;(d)(Y t ,X t )have ?nite ?rst moments for t ={1,2}.Assumption 3.2(a)is standard in quantile regression models except that here U it is also assumed independent of αi .Assumption 3.2(b)implies that θμ∈R k exists and this implies that the location of S t is well de?ned.The restriction on UU is not used in Lemma 3.2but it is important for the derivation of the asymptotic variance of the two-step estimator of the next section.Assumption 3.2(c)is a standard rank-type condition on the subvector of regressors that excludes the constant term.Assumption 3.2(d)is implied by Assumption 4.1in the next section.

It is immediate to see that Assumption 3.2(a)(b)implies E [Y 2?Y 1|X ]=(X s 2

?X s 1) θs μand θ0μ=E [Y 1]?E [X s 1] θs μ,where θ μ=(θ0μ,θs

μ).Assumption 3.2(c)(d)then implies that θμis

identi?ed.6Since E (S t |X )=X

t θμ,the location of S t conditional on X =x is identi?ed.Also,note that Assumption 3.2implies

Q S t |X =x (t |x )=θ[Q U t |X =x (t |x )]x =θ(τ)x.

(3.6)

The following Lemma then follows immediately.

6

This follows from A ={x ∈R k ?1:x (θ?μ?θ??μ)=0}being a proper linear subspace of R k ?1if θ?μ=θ??μ

.In addition,if X t does not include a constant then Assumption 3.2(c)must hold for X 2?X 1.C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

A simple approach to quantile regression for panel data 373

L EMMA 3.2.Under Assumptions 3.1and 3.2the location θμand the function θ(τ)for τ∈(0,1)are identi?ed.

R EMARK 3.3.The extension to the case T >2is straightforward under the same assumptions.The exception is Assumption 3.1(ii),which should be replaced by ≡{ω:φS t |x (ω(T ?1)?k )=

0,for t =1,...,T ,k =0,1,...}is dense in R for all x ∈ˉX

;see Neumann (2007).The results of this section show that the parameter of interest θ(τ)is point identi?ed from the

distribution of the observed data and Assumptions 3.1and 3.2.The next step then is to derive a consistent estimator for this parameter and study its asymptotic properties.Section 4presents a two-step estimator that follows a different intuition relative to the one behind the identi?cation result,but has the virtue of being extremely simple to compute and has an asymptotically normal distribution as both n and T go to in?nity.

4.TWO-STEP ESTIMATOR

The two-step estimator that I introduce in this section exploits two direct consequences of Assumption 3.2and the fact that αi is a location shift (Assumption 3.1(a)).The ?rst implication is in equation (2.3),where only θ(τ)and e it (τ)depend on τ.The second implication arises by

letting u it ≡X

it

[θ(U it )?θμ]and writing a conditional mean equation for Y it as follows 7Y it =X

it θμ+αi +u it ,

E (u it |X i ,αi )=0.

(4.1)

Equation (4.1)implies that αi is also present in the conditional mean of Y it .Therefore,from

equation (4.1)I can compute a √T -consistent estimator of αi given a √nT -consistent estimator of θμ.This includes for example the standard within estimator of θμgiven in equation (A.17)in Appendix A.Then,using equation (2.3)I estimate θ(τ)by a quantile regression of the random

variable ?Y

it ≡Y it ??αi on X it .To be precise,the two steps are described below where I use the notation E T (·)≡T ?1

T t =1(·)and E nT (·)≡(nT )?1 T t =1 n i =1(·).

Step 1.Let ?θμbe a √nT -consistent estimator of θμ.De?ne ?αi ≡E T [Y it ?X it ?θμ].Step 2.Let ?Y it ≡Y it ??α

i and de?ne the two-step estimator ?θ(τ)as:?θ(τ)≡argmin θ∈

E nT ρτ ?Y it ?X it θ .(4.2)

Intuitively,the two-step estimator in equation (4.2)works because ?Y it Y ?it

≡Y it ?αi as T →∞,where denotes weak convergence.This is so because ?Y it ≡Y ?it +?r

i ,where ?r i ≡(αi ??αi )=E T (X it ) (?θμ?θμ)?E T [Y ?it ?X it θμ]→p 0,as T →∞.

(4.3)

Then,the random variable ?Y it converges in probability,as T →∞,to the variable Y ?it

which implies weak convergence,?Y it Y ?it .The next proposition shows that the two-step

estimator de?ned in equation (4.2)is consistent and asymptotically normal under the following assumptions.

7

Note that E (e it (τ)|X i )=X it [θμ?θ(τ)]=0unless θμ=θ(τ).Also,Pr(u it ≤0|X i )=Pr(X it θ(U it )≤

X it θμ|X i ) τ,depending on whether θ(U it ) θμ.C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

374I.A.Canay

A SSUMPTION 4.1.Let ?τ(u )=τ?I (u <0),I (·)denote the indicator function,W =

(Y ?,X ),and g τ(W ,θ,r )≡?τ(Y ??X θ+r )X .(a)(Y ?

it

,X it ,αi )are i.i.d.de?ned on the probability space (W ,F ,P ),take values in a compact set Y ×X ×A ,and E (αi )=0;(b)for all τ∈T ,θ(τ)∈int ,where is compact and convex and T is a closed subinterval of (0,

1);(c)Y ?∈Y has bounded conditional on X density a.s.,sup ?y ∈Y f (?y

)

?θ (θ,τ,r )that is continuous and has full

rank uniformly over ×T ×R ,and J 2(θ,τ,r )=?

?r (θ,τ,r )is uniformly continuous over ×T ×R .

Assumption 4.1imposes similar conditions to those in Chernozhukov and Hansen (2006).

Note that 4.1(a)imposes i.i.d.on the unobserved variable Y ?

it and not on Y it .Condition 4.1(c)is used for asymptotic normality.Finally,in order to derive the expression for the covariance kernel of the limiting Gaussian process provided in equation (4.7),I use the following assumption on

the preliminary estimator ?θ

μ.A SSUMPTION 4.2.The preliminary estimator ?θμadmits the expansion √nT (?θμ?θμ)=√

nT E nT (ψit )+o p (1),

(4.4)

where ψit is an i.i.d.sequence of random variables with E [ψit ]=0and ?nite ψψ=E [ψit ψ

it ].

T HEOREM 4.1.Let n /T s →0for some s ∈(1,∞).Under Assumptions 3.2,4.1and 4.2

sup τ∈T

(τ)?θ(τ) →p 0,and

nT (?θ

(·)?θ(·))=[?J 1(·)]?1√nT E nT {?τ(εit (τ))X it +J 2(·)ξit }+o p (1),(4.5) G (·)

in ∞(T ),

(4.6)

where εit (τ)≡Y ?it ?X it θ(τ),ξit ≡μ X ψit ?u it ,u it ≡Y ?it ?X

it θμ,μX ≡E [X it ],J 1(τ)≡J 1(θ(τ),τ,0),J 2(τ)≡J 2(θ(τ),τ,0),G (·)is a mean-zero Gaussian process with covariance function E G (τ)G (τ ) =J 1(τ)?1 (τ,τ )[J 1(τ )?1] , (τ,τ )is de?ned in equation (4.7)below,and ∞(T )in the set of all uniformly bounded functions on T .The matrix (τ,τ )is given by

(τ,τ )=S (τ,τ )+J 2(τ) ξg (τ )+ gξ(τ)J 2(τ )+J 2(τ) ξξJ 2(τ ) ,

(4.7)

where S (τ,τ )≡(min {τ,τ }?ττ )E (XX ), g ξ(τ)≡E [g τ(W ,θ(τ))ξ],and ξξ≡E [ξ2].

The asymptotic expansion for √

nT (?θ

(·)?θ(·))presented in Theorem 4.1has two terms.The ?rst term is the standard term in quantile regressions while the second term captures the

fact αi is being estimated by ?α

i .Assumption 4.2is not necessary for convergence to a limit Gaussian process but it is used to derive the expression of (τ,τ )in equation (4.7).This is a mild assumption as,for example,the usual within estimator of θμsatis?es Assumption 4.2under Assumptions 3.2and 4.1(see Lemma A.4in Appendix A).

Appendix B contains expressions for estimating each element entering the covariance

function J 1(τ)?1 (τ,τ )[J 1(τ )?1] that can be used to make inference on ?θ

(τ).Alternatively,I conjecture that the standard i.i.d.bootstrap on (Y i ,X i )would work in this context and therefore provide a different approach to inference (Appendix B).The proof of this conjecture is however

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

A simple approach to quantile regression for panel data375 beyond the scope of this paper and the supporting evidence is limited to the simulations of the Section5.

5.SIMULATIONS

To illustrate the performance of the two-step estimator I conduct a small simulation study. Tables1and2summarize the results by reporting percentage bias(%Bias)and mean squared error(MSE)for each estimator considered.The simulated model is

Y it=( it?1)+ it X it+αi,

(5.1)

αi=γ(X i1+···+X iT+ηi)?E(αi),

Table1.Bias and MSE for Model5.1:γ=2and0.25.

n=100n=5000

θ(·)~N(2,1),θ[1](τ)=1.3255θ(·)~N(2,1),θ[1](τ)=1.3255 Estimator QR CRE INFE2-STEP QR CRE INFE2-STEP T=5%Bias1.75690.21570.00680.14941.74440.20740.00040.1496 MSE5.82200.27940.10320.14735.35430.07980.00200.0414 T=10%Bias1.76600.21270.00250.07931.76870.2063?0.00200.0757 MSE5.75570.16390.04940.06055.50210.07650.00100.0111 T=20%Bias1.77500.19030.00320.03771.80840.2070?0.00010.0397 MSE5.76780.10440.02280.02645.75060.07620.00050.0032θ(·)~exp(1)+2,θ[1](τ)=2.2877θ(·)~exp(1)+2,θ[1](τ)=2.2877 Estimator QR CRE INFE2-STEP QR CRE INFE2-STEP T=5%Bias1.02020.1291?0.00030.09761.02850.1367?0.00000.0961 MSE5.81060.20660.01740.10035.54340.10020.00030.0493 T=10%Bias1.03100.1184?0.00020.04871.03930.1368?0.00010.0489 MSE5.83060.11980.00870.02865.65930.09900.00020.0128 T=20%Bias1.05240.09190.00080.02151.05930.1361?0.00010.0209 MSE6.04320.06410.00450.00875.87770.09750.00010.0024

θ(·)~Mixture,θ[1](τ)=1.3097θ(·)~Mixture,θ[1](τ)=1.3097 Estimator QR CRE INFE2-STEP QR CRE INFE2-STEP T=5%Bias2.05650.53690.01900.25612.10610.53700.00120.1897 MSE7.68370.78550.51150.49637.56250.49650.00360.0682 T=10%Bias2.12210.56830.00260.04242.14330.54130.00230.0280 MSE7.97630.66790.14500.20887.82950.50160.00180.0046 T=20%Bias2.17700.6010?0.0039?0.00082.17500.5412?0.0013?0.0032 MSE8.32000.67440.05360.07638.06160.50010.00090.0013 Note:1000MC replications.QR:standard quantile regression estimator;CRE:correlated random effect estimator;INFE: infeasible estimator,2-STEP:two-step estimator from4.2.All regressions include an intercept term.

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

376I.A.Canay

Table2.Bias and MSE for Model5.1:γ=2and0.90.

n=100n=5000

θ(·)~N(2,1),θ[1](τ)=3.2815θ(·)~N(2,1),θ[1](τ)=3.2815 Estimator QR CRE INFE2-STEP QR CRE INFE2-STEP T=5%Bias0.4069?0.1697?0.0065?0.12230.4321?0.1575?0.0002?0.1144 MSE2.40480.64550.15090.31622.02200.27430.00320.1439 T=10%Bias0.4039?0.1547?0.0069?0.06450.4079?0.1594?0.0006?0.0603 MSE2.18940.39950.07630.12281.80010.27680.00150.0406 T=20%Bias0.3777?0.13070.0013?0.02800.3833?0.15790.0001?0.0301 MSE1.84100.24660.03930.04791.58840.27000.00080.0105θ(·)~exp(1)+2,θ[1](τ)=4.3026θ(·)~exp(1)+2,θ[1](τ)=4.3026 Estimator QR CRE INFE2-STEP QR CRE INFE2-STEP T=5%Bias0.3120?0.1517?0.0015?0.08140.3134?0.1366?0.0002?0.0804 MSE2.57810.95980.47300.45831.83330.35660.00950.1264 T=10%Bias0.2911?0.14420.0012?0.04070.2984?0.1361?0.0005?0.0417 MSE2.04930.61000.23370.24441.65850.34790.00490.0363 T=20%Bias0.2748?0.12610.0008?0.02240.2823?0.1371?0.0003?0.0216 MSE1.81360.40230.11660.11811.48340.35010.00230.0107

θ(·)~Mixture,θ[1](τ)=3.335θ(·)~Mixture,θ[1](τ)=3.335 Estimator QR CRE INFE2-STEP QR CRE INFE2-STEP T=5%Bias0.4921?0.0843?0.0040?0.09840.4922?0.09480.0004?0.0958 MSE3.25730.24990.01960.17512.70500.10470.00040.1034 T=10%Bias0.4779?0.0808?0.0005?0.05840.4784?0.09490.0009?0.0568 MSE2.96310.12750.01010.05902.55420.10210.00020.0363 T=20%Bias0.4556?0.0732?0.0002?0.03190.4585?0.09530.0006?0.0325 MSE2.65370.07840.00480.01912.34480.10180.00010.0119 Note:1000MC replications.QR:standard quantile regression estimator;CRE:correlated random effect estimator;INFE: infeasible estimator,2-STEP:two-step estimator from4.2.All regressions include an intercept term.

where X it~Beta(1,1),ηi~N(0,1)and the distribution of it changes with the model speci?cation.In Model1, it~N(2,1);in Model2, it~exp(1)+2;while in Model3, it~B it N(1,.1)+(1?B it)N(3,.1)with B it~Bernoulli(p=0.3).The conditionalτth quantiles are given byθ[0](τ)+θ[1](τ)X+α.In Model1,for example,θ[0](τ)= ?1(τ)+1and θ[1](τ)= ?1(τ)+2for ?1(τ)the inverse of the normal CDF evaluated atτ.For all models I set n={100,5000},T={5,10,20},τ={0.25,0.90},γ=2and consider four estimators. These are a standard quantile regression estimator of Y it on X it(QR),a correlated random effect

estimator(CRE)from Abrevaya and Dahl(2008),an infeasible estimator of Y?

it ≡Y it?αi on

X it(INFE)and the two-step estimator(2-STEP)from Section4.8Note that QR and CRE are inconsistent estimators.Finally,I set the number of Monte Carlo simulations to1000.

8I use the within estimator of equation(A.17)as?rst step estimator.

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

A simple approach to quantile regression for panel data377

Table3.Bias and MSE for Model5.1:γ=2and n=100.

Model1Model2Model3

τ=0.25Estimator%Bias MSE Time%Bias MSE Time%Bias MSE Time

T=52-STEP0.14940.14690.0090.09760.10030.0090.25610.49630.009 KOEN0.10840.14730.133?0.00140.02190.1360.50380.55650.135 T=102-STEP0.07220.05790.0170.04870.02860.0180.04240.20880.019 KOEN0.05040.05450.258?0.00610.00860.2540.34880.28410.284 T=202-STEP0.03770.02640.0160.02150.00870.020?0.00080.07630.015 KOEN0.02230.02680.240?0.00240.00480.2410.19240.11290.233

τ=0.90Estimator%Bias MSE Time%Bias MSE Time%Bias MSE Time

T=52-STEP?0.12230.31620.009?0.08140.45830.013?0.09840.17510.011 KOEN?0.18600.53770.137?0.23821.25740.187?0.06170.06910.158 T=102-STEP?0.06450.12280.019?0.04070.24440.018?0.05840.05900.022 KOEN?0.05550.13710.262?0.08500.37080.239?0.01770.01800.270 T=202-STEP?0.02800.04790.021?0.02240.11810.024?0.03190.01910.026 KOEN?0.02800.05430.241?0.04840.15940.253?0.00810.00680.275 Note:1000MC replications.KOEN:Koenker’s estimator for panel data quantile regression,2-STEP:two-step estimator from4.2.All regressions include an intercept term.Time is reported in seconds.

Tables1and2show equivalent patterns.CRE has a larger bias than2-STEP and its bias does not improve as T grows.92-STEP does show a bias that decreases as T grows.This is consistent with the analysis presented in Sections2and4.Also,it is worth noticing that the bias of 2-STEP is not affected as N increases and T remains?https://www.360docs.net/doc/7d9869692.html,parisons based on MSE arrive to similar conclusions.Finally,additional simulations not reported show that these results also hold for other quantiles,and that the two-step estimator performs similarly whenαi is speci?ed as a non-linear function of X i andηi.

Table3reports bias,MSE and computational time for2-STEP and KOEN,the estimator proposed by Koenker(2004),see equation(2.4).Only the case n=100is shown since KOEN had problems handling the big matrices for the case n=5000.The results are quite mixed.2-STEP performs better than KOEN in about half of the cases.However,the worst performance of 2-STEP(bias:25%,MSE:0.50)is better than that of KOEN(bias:50%,MSE:1.26).Note also that2-STEP is about15times faster than KOEN.

Finally,Table4reports standard errors and95%con?dence intervals computed using the formulas provided in Appendix B.The coverage is very close to the nominal level when T= 20and below the nominal level for T=5.It is worth noting that the coverage is expected to deteriorate in two circumstances.Given a value of n,a smaller T implies a larger?nite sample bias and so a?nite sample distribution centred further away from the truth.In addition,given a value of T,a larger value of n keeps the?nite sample bias unaffected but implies a?nite sample distribution that is more concentrated about the wrong place.However,even for a case with12% of bias(model1,T=5),the actual coverage levels are about85%,which look very decent for such small values of T and bias above10%.

9The performance of CRE depends on the parameterγ,so asγgrows its performance deteriorates.

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

378I.A.Canay

Table4.Standard Errors and95%Con?dence Intervals for Model5.1:γ=2and

n=100.

τ=0.25τ=0.90

T=5T=10T=20T=5T=10T=20

Estimator 1.5329 1.4194 1.3719Estimator 2.9057 3.0995 3.1822

Asy SE0.29340.21620.1555Asy SE0.35030.26200.1901 Model1Boot SE0.28680.21310.1537Boot SE0.36890.27130.1944 Asy Cov0.85400.92300.9420Asy Cov0.76000.89200.9240

Boot Cov0.83200.90300.9340Boot Cov0.79300.88700.9240

Estimator 2.5028 2.4046 2.3369Estimator 3.9575 4.1257 4.2009

Asy SE0.19610.12310.0773Asy SE0.53360.42900.3254 Model2Boot SE0.19040.11810.0738Boot SE0.56970.44470.3335 Asy Cov0.77600.82400.8910Asy Cov0.86200.90900.9220

Boot Cov0.72600.79800.8910Boot Cov0.87200.90900.9220

Estimator 1.6305 1.3778 1.3110Estimator 3.0058 3.1488 3.2298

Asy SE0.47510.39430.2630Asy SE0.22180.13200.0822 Model3Boot SE0.53640.44940.2918Boot SE0.23330.13680.0842 Asy Cov0.86300.92900.9460Asy Cov0.65200.70500.7400

Boot Cov0.87200.93100.9460Boot Cov0.68000.70500.7400 Note:1000MC replications.Asy SE:asymptotic standard errors.Boot SE:bootstrap standard errors.Asy Cov:Coverage of the asymptotic con?dence interval.Boot Cov:Coverage of the Bootstrap percentile interval.

6.DISCUSSION

This paper provided an identi?cation result for quantile regression in panel data models and introduced a two-step estimator that is attractive for its computational simplicity.There are many issues that remain to be investigated.First,several panels available have a short time span and therefore approximations taking T to in?nity might result in poor approximations for those cases. However,a computationally simple estimator that works for?xed T and large N is extremely challenging since,even under the assumption thatαi is independent of the rest of the variables of the model,we would still have to face similar problems to those discussed in Sections2and3. Second,the assumption thatαi does not depend on the quantiles restricts the type of unobserved heterogeneity that the model can handle.Improvements in any of these directions are important for future research.

ACKNOWLEDGMENTS

This paper was previously circulated under the title‘A Note on Quantile Regression for Panel Data Models.’I am deeply grateful to Jack Porter and Bruce Hansen for thoughtful discussions.I would also like to thank the Editor and two anonymous referees whose comments have led to an improved version of this paper.

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

A simple approach to quantile regression for panel data379

REFERENCES

Abrevaya,J.and C.M.Dahl(2008).The effects of birth inputs on birthweight:Evidence from quantile estimation on panel data.Journal of Business and Economic Statistics26,379–

97.

Chamberlain,G.(1982).Multivariate regression models for panel data.Journal of Econometrics18, 5–46.

Chamberlain,G.(1984).Panel data.In Z.Griliches and M. D.Intriligator(Eds.),Handbook of Econometrics,Volume2,1247–318.New York:Academic Press.

Chernozhukov,V.,I.Fern′a ndez-Val,J.Hahn,and W.Newey(2010).Average and quantile effects in nonseparable panel models.Working paper,M.I.T.,Cambridge.

Chernozhukov,V.and C.Hansen(2006).Instrumental quantile regression inference for structural and treatment effect models.Journal of Econometrics123,491–525.

Chernozhukov,V.and C.Hansen(2008).Instrumental variable quantile regression:a robust inference approach.Journal of Econometrics142,379–98.

Evdokimov,K.(2010).Identi?cation and estimation of a nonparametric panel data model with unobserved heterogeneity.Working paper,Princeton University.

Evdokimov,K.and H.White(2010).An extension of a lemma of kotlarski.Manuscript,Princeton University,New Jersey.

Galvao,A.(2008).Quantile regression for dynamic panel data with?xed effects.Working paper,University of Illinois.Urbana-Champaign.

Geraci,M.and M.Bottai(2007).Quantile regression for longitudinal data using the asymmetric laplace distribution.Biostatistics8,140–54.

Graham,B.S.and J.Powell(2010).Identi?cation and estimation of average partial effects in“irregular”

correlated random coef?cient panel data models.Working paper,New York University.

Koenker,R.(2004).Quantile regression for longitudinal data.Journal of Multivariate Analysis91, 74–89.

Koenker,R.(2005).Quantile Regression.Econometric Society Monograph,ESM38,Cambridge: Cambridge University Press.

Koenker,R.and G.Bassett(1978).Regression quantiles.Econometrica46,33–50.

Lamarche, C.(2010).Robust penalized quantile regression estimation for panel data.Journal of Econometrics157,396–408.

Li,T.and Q.Vuong(1998).Nonparametric estimation of the measurement error model using multiple indicators.Journal of Multivariate Analysis65,139–65.

Neumann,M.(2007).Deconvolution from panel data with unknown error distribution.Journal of Multivariate Analysis98,1955–68.

Powell,J.(1991).Estimation of monotonic regression models under quantile restrictions.In W.Barnett, J.Powell,and G.Tauchen(Eds.),Nonparametric and Semiparametric Methods in Econometrics, 357–84.Cambridge:Cambridge University Press.

Rosen,A.(2009).Set identi?cation via quantile restrictions in short panels.Working paper,University College,London.

van der Vaart,A.W.(1998).Asymptotic Statistics.Cambridge:Cambridge University Press.

van der Vaart,A.W.and J.A.Wellner(1996).Weak Convergence and Empirical Processes.New York: Springer-Verlag.

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

380I.A.Canay

APPENDIX A:PROOF OF THE LEMMAS AND THEOREMS Throughout the Appendix I use the following notation.For W=(Y?,X),

g→E nT g(W)≡

1

nT

T

t=1

n

i=1

g(W it),g→G nT g(W)≡

1

nT

T

t=1

n

i=1

[g(W it)?Eg(W it)].

In addition, denotes weak convergence,CMT refers to the Continuous Mapping Theorem and LLN refers to the Law of Large Numbers.The symbols o,O,o p and O p denote the usual order of magnitudes for non-random and random sequences.

Proof of Lemma3.1:The proof is a simple extension of the result Lemma2.1from Neumann(2007).I

write it here for completeness.SinceφS

t|x andφα|x are characteristic functions there exists anω0>0such

thatφS

t|x (ω)=0for t∈{1,2}andφα|x(2ω)=0if|ω|≤ω0(In this case,ω0might depend on x but we

omit this dependence for simplicity).For|ω|≤2ω0de?ne,

gα|x(ω)=?φα|x(ω)/φα|x(ω),

a continuous complex function which equals1at0.It follows from equation(3.3),forω1,ω2∈[?ω0,ω0], that

φα|x(ω1+ω2)φα|x(ω1)φα|x(ω2)=φα|x(ω1

+ω2)φS

1|x

(ω1)φS

2|x

(ω2)

φα|x(ω1+0)φS1|x(ω1)φS2|x(0)φα|x(0+ω2)φS1|x(0)φS2|x(ω2)

=

α|x

(ω1+ω2)?φS1|x(ω1)?φS2|x(ω2)

α|x

(ω1?φS

1|x

(ω1)?φS

2|x

(0)?φα|x(0+ω2)?φS

1|x

(0)?φS

2|x

(ω2)

=

α|x

(ω1+ω2)

α|x

(ω1)?φα|x(ω2)

,

(A.1)

which implies

gα|x(ω1+ω2)=gα|x(ω1)gα|x(ω2)?ω1,ω2∈[?ω0,ω0].

The unique solution to this equation satisfying gα|x(0)=1and gα|x(?ω)=gα|x(ω)(a Hermitian function) is gα|x(ω)=e icω,from some real c.Therefore we conclude that,

α|x

(ω)=e icωφα|x(ω)?ω∈[?2ω0,2ω0].(A.2) Furthermore,equation(3.3)yields that forω2=0,

φS

1|x (ω1)φα|x(ω1)=e icω1φα|x(ω1)?φS

1|x

(ω1),?ω1∈[?2ω0,2ω0],

so that,

S1|x (ω)=e?icωφS

1|x

(ω)?ω∈[?2ω0,2ω0].(A.3)

Settingω1=0it follows that equation(A.3)holds for S2as well.Now it remains to extend these results to the whole real line.Letω∈ be arbitrary.We obtain,analogously to equation(A.1),that

φα|x(ω) (φα|x(ω/2))2=

α|x

(ω)

(?φα|x(ω/2))2

and

φα|x(ω)

(φα|x(ω2?k))2k

=

α|x

(ω)

(?φα|x(ω2?k))2k

,

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

A simple approach to quantile regression for panel data 381

after https://www.360docs.net/doc/7d9869692.html,ing this equation with a k large enough such that |ω2?k |≤2ω0we conclude from equation (A.2)that

α|x (ω)=φα|x (ω) ?φα|x (ω2?k

)φα|x (ω2?k )

2k =e icωφα|x (ω).Since is dense in R and {ω:?φ

α|x (ω)=e icωφα|x (ω)}is a closed set,we conclude that ?φα|x (ω)=e icωφα|x (ω),?ω∈R ,this is,?P

α|x =P (α+c )|x .This implies,again by equation (3.3)that ?φS t |x (ω)=e

?icωφS t |x (ω),?ω∈R ,which yields ?P S t |x =P (S t ?c )|x for t ∈{1,2}. L EMMA A.1.Under Assumptions 3.2and 4.1,the following statements are true.(a)G nT g ·(W ,θ(·))

G ?1(·)in ∞(T ),where G ?1is a Gaussian process with covariance function E G ?1(τ)G ?1(τ ) =(min {τ,τ

}?

ττ )E (XX ).(b)If sup τ∈T

θ(τ)?θ(τ) =o p (1)and max i ≤n | r i |=o p (1),then sup τ∈T

G nT g τ(W , θ(τ), r i )?G nT g τ(W ,θ(τ),0) =o p (1).

Proof:The proof follows by similar arguments to those in Lemma B.2of Chernozhukov and Hansen

(2006)after noticing that the class of functions,H ={h =(θ,τ,r )→?τ(Y ??X θ+r )X,θ∈ ,τ∈T ,r ∈R }is Donsker (it is formed by taking products and sums of bounded Donsker classes)by Theorem 2.10.6in van der Vaart and Wellner (1996). L EMMA A.2.Under Assumptions 3.2and 4.1,max i ≤n |?r i |≡max i ≤n |αi ??αi |→0provided n

T s

→0

for some s ∈(1,∞).

Proof:Let u it ≡Y ?it

?X it θμand note that by the triangle inequality,|αi ??αi |≤|E T (X it )| |?θ

μ?θμ|+|E T (u it )|.X it and Y it have compact support so that max i ≤n |X it |≤C x <∞and max i ≤n |u it |≤C z <∞.It is immediate

then that

max i ≤n

|E T (X it )| |?θ

μ?θμ|=o p (1).Since E (u it )=0and E (|u it |2s )≤C 2s

z

<∞for all s ∈(1,∞),it follows from the Markov inequality that for any η>0,Pr(|E T (u it )|>η)=O (T ?s )and then

Pr

max i ≤n

|E T (u it )|>η ≤n Pr(|E T (u it )|>η)=O (n/T s )=o (1).

Proof of Theorem 4.1:

Consistency .De?ne the following two criterion functions,

Q nT (θ,τ)=E nT ρτ ?Y it ?X it θ and Q (θ,τ)=E ρτ Y ?it ?X it θ .

The ?rst step shows that Q nT (θ,τ)converge uniformly to Q (θ,τ).To this end,note that since ?Y

it Y ?it ,it follows from van der Vaart (1998,Lemma 2.2)that

E ρτ ?Y it ?X it θ ?E ρτ Y ?it ?X it θ →0,as T →∞,

since ρτ(·)is a bounded Lipschitz function (it is bounded due to 4.1(a)and 4.1(b)).Due to the compactness

of ×T and the continuity of E [ρτ(?Y

it ?X it θ)]and E [ρτ(Y ?it ?X it θ)]implied by 4.1(c),the above convergence is also uniform,

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

382

I.A.Canay

sup

(θ,τ)∈ ×T

|E [ρτ(?Y

?X θ)]?E [ρτ(Y ??X θ)]|→0,as T →∞.(A.4)

Next note that functions in the class F =[(θ,τ)?→ρτ(Y ?X θ)]are bounded,uniformly Lipschitz

over ×T and form a Donsker class.This also means that F is Glivenko-Cantelli so that,

sup

(θ,τ)∈ ×T

sup |E nT [ρτ(?Y

?X θ)]?E [ρτ(?Y ?X θ)]|→p 0,as n,T →∞.(A.5)

It follows from equations (A.4)and (A.5)that Q nT (θ,τ)converges uniformly to Q (θ,τ)as both n

and T go to in?nity.Under Assumption 4.1(c)θ(τ)uniquely solves θ(τ)≡arg inf θ∈ E [ρτ(Y ??X θ)]and Q (θ,τ)is continuous over ×T so that

sup τ∈T

(τ)?θ(τ) →p 0,after invoking Chernozhukov and Hansen (2006,Lemma B.1).

Asymptotic Normality .From the properties of standard quantile regression it follows that

√nT E nT g ·(W ,?θ

(·),?r i )is o p (1),and then the following expansion is valid,o p (1)=G nT g ·(W ,?θ

(·),?r i )+1√nT T t =1n i =1

Eg ·(W ,?θ(·),?r i )(1)

=G nT g ·(W ,θ(·))+o p (1)+1√nT T

t =1n i =1Eg ·(W ,?θ(·),?r i )in ∞(T ).

Here =(1)follows from Lemma A.1.Now,expand Eg ·(W ,?θ

(·),?r i ).1√

nT T t =1n i =1Eg ·(W ,?θ(·),?r i )=1√nT T t =1n i =1

J 1 θ?(·),·,r ?i (?θ(·)?θ(·))+J 2 θ?(·),·,r ?i ?r i =J 1(·)√

(·)?θ(·))+J 2(·)1√nT T t =1n i =1

?r i +o p (1),where θ?is on the line connecting ?θ

(τ)and θ(τ)for each τand r ?i is on the line connecting 0and ?r i .The second equality follows from

sup i ≤n,τ∈T

J k θ?(·),τ,r ?i

?J k (θ(τ),τ,0) =o p (1),for k =1,2,

(A.6)which in turn follows from the uniform continuity assumption,and the fact that max i ≤n |?r

i |=o p (1)by Lemma A.2.Solving for √nT (?θ(·)?θ(·)),

nT (?θ

(·)?θ(·))=[?J 1(·)]?1

G nT g τ(W ,θ(·))+[?J 1(·)]

?1

J 2(·)1√n n

i =1

T ?r

i +o p (1) G (·),

in ∞(T ),

where G (·)is a gaussian process with covariance kernel J 1(τ)?1 (τ,τ )[J 1(τ )?1] ,where (τ,τ )is de?ned in equation (A.10).This follows from the ?rst term converging to a Gaussian process by Lemma A.1,and the second term

1√

n n i =1√?r i =1√n n

i =1

√ E T (X it ) (?θμ?θμ)?E T Y ?it ?X it θμ ,=E nT (X it ) √nT (?θ

μ?θμ)?√nT E nT Y ?it ?X it θμ ,C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

A simple approach to quantile regression for panel data 383

being asymptotically normal due to √nT (?θμ?θμ),and 1√nT

T t =1 n

i =1(Y ?it ?X it θμ)both being asymptotically normal.

Covariance Kernel .We ?rst need to derive the expression for (τ,τ

).Under Assumption 4.2we can

write the expansion √nT (?θ

(·)?θ(·))as √nT (?θ(·)?θ(·))=[?J 1(·)]?1 G nT g τ(W ,θ(·))+J 2(·)1√n n i =1

√T ?r

i +o p (1),=[?J 1(·)]?1√nT {g ·(W it ,θ(·))+J 2(·)ξit }+o p (1),

(A.7)where ξit ≡μ X ψit ?u it ,μX ≡E (X ),and u it ≡Y ?it ?X

it θμ.In addition,

J 1(τ)=E [f ε(τ)(0|X )XX ],

where f ε(τ)(0|X )denotes the conditional on X density of ε(τ)≡Y ??X θ(τ)at 0and

J 2(τ)=E [f ε(τ)(0|X )X ].

Under Assumptions 3.2and 4.1it follows that

√nT E nT g ·(W it ,θ(·))ξit

G ?(·),

in ∞(T ),

(A.8)

where G ?(·)is a zero-mean gaussian process with covariance kernel

(τ,τ )≡ S (τ,τ

) gξ

(τ) ξg (τ ) ξξ

,

(A.9)

and S (τ,τ )≡(min {τ,τ }?ττ )E (XX ), g ξ(τ)≡E [g τ(W ,θ(τ))ξ],and ξξ≡E [ξ2].The above result

implies that

(τ,τ )≡E [(g τ(W ,θ(τ))+J 2(τ)ξit )(g τ (W ,θ(τ ))+J 2(τ )ξit ) ]

=S (τ,τ )+J 2(τ) ξg (τ )+ gξ(τ)J 2(τ )+J 2(τ) ξξJ 2(τ ) .

(A.10)

We can conclude from equation (A.7)that

nT (?θ(·)?θ(·)) G (·),in ∞(T ),

(A.11)

where G (·)is a gaussian process with covariance kernel

E G (τ)G (τ ) =J 1(τ)?1 (τ,τ )[J 1(τ )?1] .

(A.12)

L EMMA A.3.If ||v i ,T ?v ||→0a.s.uniformly in i as T →∞,and there exists a function q i ≥0such

that ||v i ,T ||≤q i for all i and T with E [sup i q i ]<∞,then sup i E v i,T ?v →0,as T →∞.Proof:Let h i ,T =||v i ,T ?v ||and note that 0≤sup i h i,T ≤2sup i q i .By Fatou’s Lemma

2E

sup i

q i =E lim inf T →∞

2sup i

q i ?sup i

h i,T

≤lim inf T →∞

E 2sup i

q i ?sup i

h i,T ,(A.13)

≤2E sup i

q i ?lim sup T →∞

E

sup i

h i,T ,

(A.14)

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

384I.A.Canay

meaning that lim sup T→∞E[sup i h i,T]≤0.Then,the result directly follows from

0≤lim sup

T→∞sup

i

E[h i,T]≤lim sup

T→∞

E

sup

i

h i,T

≤0.(A.15)

L EMMA A.4.Assume XX≡E[(X s it?μs X)(X s it?μs X) ]is non-singular with?nite norm,n T a→0for some a∈(0,∞)and let Assumptions3.2and4.1hold.The within estimator ofθμsatis?es Assumption4.2

with the in?uence function

ψit=

ψ0

it

ψs

it

Y it?μY?μs X ?1XX

X s

it

?μs X

u it

?1

XX

X s

it

?μs X

u it

,(A.16)

where X

it =(1,X s it),μs X≡E(X s it),μY≡E(Y it),u it is i.i.d.with E[u it|X i]=0and E[u2it|X i]=

X

it

UU X it,and UU non-singular with?nite norm.

Proof:Use the partitionθ

μ

=(θ0μ,θs μ),whereθ0μ∈R andθsμ∈R k?1.Then

?θs μ≡

E nT

?X s

it

?X s

it

?1

E nT

?X s

it

Y it

,and?θ0

μ

≡E nT(Y it)?E nT

X s

it

?θs

μ

,(A.17)

where?X s

it ≡X s it?ˉX s i andˉX s i≡E T(X s it).By equations(4.1)and(A.17)it follows that

?θs

μ

?θsμ

=

E nT

?X s

it

?X s

it

?1√

E nT

?X s

it

u it

,(A.18)

where u it≡X it(θ(U it)?θμ).By Assumption3.2,E[u it]=0,E[u2it|X it]=X it UU X it,and UU is non-singular and has?nite norm.By Assumptions3.2and4.1

1 nT

n

i=1

T

t=1

?X s

it

?X s

it

= XX+o p(1),as n,T→∞,(A.19)

for XX non-singular and therefore

√nT

?θs

μ

?θsμ

= ?1XX1

nT

n

i=1

T

t=1

?X s

it

u it+o p(1).(A.20)

Next note that

√nT

?θs

μ

?θsμ

= ?1XX

1

nT

n

i=1

T

t=1

X s

it

?μs X

u it?

1

nT

n

i=1

T

t=1

ˉX s

i

?μs X

u it

+o p(1),≡1

nT

n

i=1

T

t=1

ψs

it

? ?1XX1

nT

n

i=1

T

t=1

ˉX s

i

?μs X

u it+o p(1),(A.21)

meaning that the result for the slope coef?cients would follow provided the second term in equation(A.21) is o p(1).To show this,write this term as

1√nT

n

i=1

T

t=1

ˉX s

i

?μs X

u it=

1

n

n

i=1

ˉX s

i

?μs X

1

T

T

t=1

u it=

1

n

n

i=1

?i,T,(A.22)

where?i,T is i.i.d.across i for all T and satis?es E[?i,T]=0and

E ?i,T 2=E

ˉX s

i

?μs X

1

T

T

t=1

u it

2

≤E

ˉ

X s

i

?μs X

E

1

T

T

t=1

u it

(A.23)

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

A simple approach to quantile regression for panel data385

≤sup

i E

ˉ

X s

i

?μs X

×O(1)→0,as T→∞,(A.24)

since sup i E ˉX s

i ?μs X →0as T→∞by Lemma A.4(note that by Assumption4.1,the function q i is

just the upper bound of the support X)and the fact thatˉX s

i →pμs X uniformly over i as T→∞by similar

arguments to those in Lemma A.2provided n

T a →0for some a∈(0,∞).

Finally,from equation(A.17),θ0

μ=μY?μs Xθsμ,the expansion in equation(A.21),and a few algebraic

manipulations,it follows that

nT(?θ0

μ?θ0μ)=

nT

E nT(Y it)?μY+μs Xθsμ?E nT

X s

it

?θs

μ

=

nT E nT[Y it?μY]?μs X

nT

?θs

μ

?θsμ

+o p(1)

=

nT E nT

Y it?μY?μs Xψs it

+o p(1).(A.25)

Lettingψ0

it

≡Y it?μY?μs Xψs it=Y it?μY?μs X ?1XX(X s it?μs X)u it,the result follows.

APPENDIX B:ESTIMATOR OF THE COV ARIANCE KERNEL AND THE

BOOTSTRAP

The components of the asymptotic variance in equation(A.10)can be estimated using sample analogs. The expressions below correspond to the case where?θμis the within estimator and soψit is given by equation(A.16).They can be naturally extended to cover any other preliminary estimator satisfying Assumption4.2.

The matrix S(τ,τ )can be estimated by its sample counterpart

?S(τ,τ )≡(min{τ,τ }?ττ )1

nT

n

i=1

T

t=1

X it X

it

.(B.1)

For the matrices J1(τ)and J2(τ),I follow Powell(1991)and propose

?J 1(τ)≡

1

2nT h n

n

i=1

T

t=1

I(|?εit(τ)|≤h n)X it X

it

,

?J 2(τ)≡

1

2nT h n

n

i=1

T

t=1

I(|?εit(τ)|≤h n)X it,

where?εit(τ)≡?Y it?X it?θ(τ),?Y it≡Y it??αi,and h n is an appropriately chosen bandwidth such that h n→

0and nT h2

n

→∞.Following Koenker(2005,pp.81and140),one possible choice is

h n=κ( ?1(τ+b n)? ?1(τ?b n)),b n=(nT)?1/3z2/3α

1.5φ2( ?1(τ))

2 ?1(τ)2+1

1/3

,(B.2)

whereκis a robust estimate of scale,zα= ?1(1?α/2)andαdenotes the desired size of the test.

For the terms involvingξit and gτ(W,θ(τ))de?ne

it≡

Y it??μY??μs X? ?1XX?X s it?u it

? ?1

XX

?X s

it

?u it

,(B.3)

where?X s

it ≡X s it?ˉX s i,?u it≡?Y it?X it?θμ,?μY≡E nt(Y it),?μs X≡E nT(X s it)and? XX≡E nT[?X s it?X s it].This

way,we can de?ne?ξit≡?μ X?ψit??u it,where?μX≡E nT(X it).Finally,letting?gτ,it≡?τ(?εit(τ))X it we have C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

386I.A.Canay the following sample counterparts for the remaining terms

?

gξ≡

1

nT

n

i=1

T

t=1

?gτ,it?ξit,and? ξξ≡

1

nT

n

i=1

T

t=1

?ξ2

it

.(B.4)

The estimator of the covariance matrix would then be?J1(τ)?1? (τ,τ )[?J1(τ )?1] ,where? (τ,τ )is the matrix in equation(A.10)where all matrices have been replaced by their respective sample analogs.

In the simulations of Section5I use the following bootstrap algorithm to compute standard errors and

con?dence intervals for?θ(τ).Let{Y?

i ,X?

i

}n i=1,j=1,...,B,denote the j th i.i.d.sample of size n distributed

according to?P n,the empirical measure of{Y i,X i}n i=1,where Y i=(Y i1,...,Y iT)and X i=(X i1,...,X iT). For each j=1,...,B compute the two step estimator as described in Section4and denote this estimator by

?θ?j (τ).This involves computing preliminary estimators?θ?

μ,j

and?xed effects?α?

i,j

for each bootstrap sample

j=1,...,B.

The bootstrap estimate of the variance covariance matrix for?θ(τ)is given by

1 B

B

j=1

?θ?

j

(τ)?ˉθ?(τ)

?θ?

j

(τ)?ˉθ?(τ)

,(B.5)

whereˉθ?(τ)≡1 B

j=1

?θ?

j

(τ).In the simulations of Section5I also report the coverage of the1?α

percentile interval

C?n (τ)=

q?

n

(α/2,τ),q?

n

(1?α/2,τ)

,(B.6)

where q?

n (α,τ)is theα-quantile of the empirical distribution of{?θ?

j

(τ)}B

j=1

.This con?dence interval is

translation invariant,which is a good property when working with quantile regressions.Symmetric and equally-tailed intervals can be alternatively computed using the same algorithm.

C 2011The Author(s).The Econometrics Journal C 2011Royal Economic Society.

面板数据模型

第十讲经典面板数据模型 一、面板数据(panel data) 一维数据: 时间序列数据(cross section data):变量在时间维度上的数据截面数据(time series data):变量在截面空间维度上的数据)。 二维数据: 面板数据(同时在时间和截面空间上取得的,也称时间序列截面数据(time series and cross section data)或混合数据(pool data)。 面板数据=截面数据+时间序列数据。

面板数据用双下标变量表示。例如 y i t, i = 1, 2, …, N; t = 1, 2, …, T N表示面板数据中含有N个个体。T表示时间序列的最大长度。若固定t不变,y i ., ( i = 1, 2, …, N)是随机变量在横截面上的N个数据;若固定i不变,y. t, (t = 1, 2, …, T)是纵剖面上的一个时间序列(个体)。 平衡面板数据(balanced panel data)。 非平衡面板数据(unbalanced panel data)。 例1998-2002年中国东北、华北、华东15个省级地区的居民家庭人均消费(不变价格)和人均收入数据见表1。人均消费和收入两个面板数据都是平衡面板数据,各有15个个体。

表1.中国部分省级地区的居民数据(不变价格,元)

二、面板数据模型及其作用 1.经典面板数据模型 建立在古典假定基础上的线性面板数据模型. 2.非经典面板数据模型 (1)非平稳时间序列问题的面板数据模型(面板数据协整模型) (2)非线性面板数据模型(如面板数据logit模型, 面板数据计数模型模型) (3)其他模型(如面板数据分位数回归模型) 3.面板数据模型作用 (1)描述个体行为差异。

面板数据分析简要步骤与注意事项(面板单位根—面板协整—回归分析)

面板数据分析简要步骤与注意事项(面板单位根检验—面板协整—回归分析) 面板数据分析方法: 面板单位根检验—若为同阶—面板协整—回归分析 —若为不同阶—序列变化—同阶建模随机效应模型与固定效应模型的区别不体现为R2的大小,固定效应模型为误差项和解释变量是相关,而随机效应模型表现为误差项和解释变量不相关。先用hausman检验是fixed 还是random,面板数据R-squared值对于一般标准而言,超过0.3为非常优秀的模型。不是时间序列那种接近0.8为优秀。另外,建议回归前先做stationary。很想知道随机效应应该看哪个R方?很多资料说固定看within,随机看overall,我得出的overall非常小0.03,然后within是53%。fe和re输出差不多,不过hausman检验不能拒绝,所以只能是re。该如何选择呢? 步骤一:分析数据的平稳性(单位根检验) 按照正规程序,面板数据模型在回归前需检验数据的平稳性。李子奈曾指出,一些非平稳的经济时间序列往往表现出共同的变化趋势,而这些序列间本身不一定有直接的关联,此时,对这些数据进行回归,尽管有较高的R平方,但其结果是没有任何实际意义的。这种情况称为称为虚假回归或伪回归(spurious regression)。他认为平稳的真正含义是:一个时间序列剔除了不变的均值(可视为截距)和时间趋势以后,剩余的序列为零均值,同方差,即白噪声。因此单位根检验时有三种检验模式:既有趋势又有截距、只有截距、以上都无。 因此为了避免伪回归,确保估计结果的有效性,我们必须对各面板序列的平稳性进行检验。而检验数据平稳性最常用的办法就是单位根检验。首先,我们可以先对面板序列绘制时序图,以粗略观测时序图中由各个观测值描出代表变量的折线是否含有趋势项和(或)截距项,从而为进一步的单位根检验的检验模式做准备。单位根检验方法的文献综述:在非平稳的面板数据渐进过程中,Levin andLin(1993)很早就发现这些估计量的极限分布是高斯分布,这些结果也被应用在有异方差的面板数据中,并建立了对面板单位根进行检验的早期版本。后来经过Levin et al.(2002)的改进,提出了检验面板单位根的LLC法。Levin et al.(2002)指出,该方法允许不同截距和时间趋势,异方差和高阶序列相关,适合于中等维度(时间序列介于25~250之间,截面数介于10~250之间)的面板单位根检验。Im et al.(1997)还提出了检验面板单位根的IPS法,但Breitung(2000)发现IPS法对限定性趋势的设定极为敏感,并提出了面板单位根检验的Breitung法。Maddala and Wu(1999)又提出了ADF-Fisher和PP-Fisher面板单位根检验方法。 由上述综述可知,可以使用LLC、IPS、Breintung、ADF-Fisher和PP-Fisher5种方法进行面板单位根检验。 其中LLC-T、BR-T、IPS-W、ADF-FCS、PP-FCS、H-Z分别指Levin,Lin&Chu t*

最新张晓峒分位数回归讲义

第15章分位数回归模型 15.1 总体分位数和总体中位数 15.2 总体中位数的估计 15.3 分位数回归 15.4 分位数回归模型的估计 15.5 分位数回归模型的检验 15.6 分位数的计算与分位数回归的EViews操作 15.7 分位数回归的案例分析 以往介绍的回归模型实际上是研究被解释变量的条件期望。人们当然也关心解释变量与被解释变量分布的中位数,分位数呈何种关系。这就是分位数回归,它最早由Koenker和Bassett(1978)提出,是估计一组回归变量X与被解释变量Y的分位数之间线性关系的建模方法。 正如普通最小二乘OLS回归估计量的计算是基于最小化残差平方和一样,分位数回归估计量的计算也是基于一种非对称形式的绝对值残差最小化,其中,中位数回归运用的是最小绝对值离差估计(LAD,least absolute deviations estimator)。它和OLS主要区别在于回归系数的估计方法和其渐近分布的估计。在残差检验、回归系数检验、模型设定、预测等方面则基本相同。 分位数回归的优点是,(1)能够更加全面的描述被解释变量条件分布的全貌,而不是仅仅分析被解释变量的条件期望(均值),也可以分析解释变量如何影响被解释变量的中位数、分位数等。不同分位数下的回归系数估计量常常不同,即解释变量对不同水平被解释变量的影响不同。 另外,中位数回归的估计方法与最小二乘法相比,估计结果对离群值则表现的更加稳健,而且,分位数回归对误差项并不要求很强的假设条件,因此对于非正态分布而言,分位数回归系数估计量则更加稳健。 15.1 总体分位数和总体中位数 在介绍分位数回归之前先介绍分位数和中位数概念。 对于一个连续随机变量y,其总体第τ分位数是y(τ)的定义是:y小于等于y(τ)的概率是τ,即τ = P( y≤y(τ)) = F(y(τ)) 其中P(?)表示概率,F(y(τ)) 表示y的累积(概率)分布函数(cdf)。 比如y(0.25) = 3,则意味着y≤ 3的概率是0.25。且有 y(τ) = F-1(y(τ)) 即F(y(τ))的反函数是y(τ)。当τ=0.5时,y(τ)是y的中位数。τ= 0.75时,y(τ)是y的第3/4分位数,τ= 0.25时,y(τ) 是y的第1/4分位数。若y服从标准正态分布,y(0.5) = 0,y(0.95) =1.645,y(0.975) =1.960。 另外,如果随机变量y的分布是对称的,那么其均值与中位数是相同的。当其中位数小于均值时,分布是右偏的。反之,分布是左偏的。 对于回归模型,被解释变量y t对以X为条件的第τ分位数用函数y(τ)t|X表示,其含义是:以X为条件的y t小于等于y(τ)t|X的概率是τ。这里的概率是用y t对X的条件分布计算的。且有 y(τ)t|X= F-1(y(τ)t|X) 其中F(y(τ)t|X) 是y t在给定X条件下的累积概率分布函数(cdf)。则y(τ)t|X称作被解释变量y t对X 的条件分位数函数。而F '(y(τ)t|X)= f (y(τ)t|X)则称作分位数概率密度函数。其中F'(y(τ)t|X)表示F(y(τ)t|X)

面板数据分析简要步骤与注意事项 面板单位根—面板协整—回归分析

面板数据分析简要步骤与注意事项 (面板单位根—面板协整—回归分析)步骤一:分析数据的平稳性(单位根检验) 按照正规程序,面板数据模型在回归前需检验数据的平稳性。李子奈曾指出,一些非平稳的经济时间序列往往表现出共同的变化趋势,而这些序列间本身不一定有直接的关联,此时,对这些数据进行回归,尽管有较高的R平方,但其结果是没有任何实际意义的。这种情况称为称为虚假回归或伪回归(spurious regression)。他认为平稳的真正含义是:一个时间序列剔除了不变的均值(可视为截距)和时间趋势以后,剩余的序列为零均值,同方差,即白噪声。因此单位根检验时有三种检验模式:既有趋势又有截距、只有截距、以上都无。 因此为了避免伪回归,确保估计结果的有效性,我们必须对各面板序列的平稳性进行检验。而检验数据平稳性最常用的办法就是单位根检验。首先,我们可以先对面板序列绘制时序图,以粗略观测时序图中由各个观测值描出代表变量的折线是否含有趋势项和(或)截距项,从而为进一步的单位根检验的检验模式做准备。 单位根检验方法的文献综述:在非平稳的面板数据渐进过程中,Levin andLin(1993) 很早就发现这些估计量的极限分布是高斯分布,这些结果也被应用在有异方差的面板数据中,并建立了对面板单位根进行检验的早期版本。后来经过Levin et al. (2002)的改进,提出了检验面板单位根的LLC 法。Levin et al. (2002) 指出,该方法允许不同截距和时间趋势,异方差和高阶序列相关,适合于中等维度(时间序列介于25~250 之间,截面数介于10~250 之间) 的面板单位根检验。Im et al. (1997) 还提出了检验面板单位根的IPS 法,但Breitung(2000) 发现IPS 法对限定性趋势的设定极为敏感,并提出了面板单位根检验的Breitung 法。Maddala and Wu(1999)又提出了ADF-Fisher和PP-Fisher面板单位根检验方法。 由上述综述可知,可以使用LLC、IPS、Breintung、ADF-Fisher 和PP-Fisher5种方法进行面板单位根检验。其中LLC-T 、BR-T、IPS-W 、ADF-FCS、PP-FCS 、H-Z 分别指Levin, Lin & Chu t* 统计量、Breitung t 统计量、lm Pesaran & Shin W 统计量、ADF- Fisher Chi-square统计量、PP-Fisher Chi-square统计量、Hadri Z统计量,并且Levin, Lin & Chu t* 统计量、Breitung t统计量的原假设为存在普通的单位根过程,lm Pesaran & Shin W 统计量、ADF- Fisher Chi-square统计量、PP-Fisher Chi-square统计量的原假设为存在有效的单位根过程,Hadri Z 统计量的检验原假设为不存在普通的单位根过程。 有时,为了方便,只采用两种面板数据单位根检验方法,即相同根单位根检验LLC(Levin-Lin-Chu)检验和不同根单位根检验Fisher-ADF检验(注:对普通序列(非面板序列)的单位根检验方法则常用ADF检验),如果在两种检验中均拒绝存在单位根的原假设则我们说此序列是平稳的,反之则不平稳。如果我们以T(trend)代表序列含趋势项,以I(intercept)代表序列含截距项,T&I代表两项都含,N(none)代表两项都不含,那么我们可以基于前面时序图得出的结论,在单位根检验中选择相应检验模式。 但基于时序图得出的结论毕竟是粗略的,严格来说,那些检验结构均

面板数据的分析步骤

面板数据的分析步骤 面板数据的分析方法或许我们已经了解许多了,但是到底有没有一个基本的步骤呢?那些步骤是必须的?这些都是我们在研究的过程中需要考虑的,而且又是很实在的问题。面板单位根检验如何进行?协整检验呢?什么情况下要进行模型的修正?面板模型回归形式的选择?如何更有效的进行回归?诸如此类的问题我们应该如何去分析并一一解决?以下是我近期对面板数据研究后做出的一个简要总结,和大家分享一下,也希望大家都进来讨论讨论。 步骤一:分析数据的平稳性(单位根检验) 按照正规程序,面板数据模型在回归前需检验数据的平稳性。李子奈曾指出,一些非平稳的经济时间序列往往表现出共同的变化趋势,而这些序列间本身不一定有直接的关联,此时,对这些数据进行回归,尽管有较高的R平方,但其结果是没有任何实际意义的。这种情况称为称为虚假回归或伪回归(spurious regression)。他认为平稳的真正含义是:一个时间序列剔除了不变的均值(可视为截距)和时间趋势以后,剩余的序列为零均值,同方差,即白噪声。因此单位根检验时有三种检验模式:既有趋势又有截距、只有截距、以上都无。 因此为了避免伪回归,确保估计结果的有效性,我们必须对各面板序列的平稳性进行检验。而检验数据平稳性最常用的办法就是单位根检验。首先,我们可以先对面板序列绘制时序图,以粗略观测时序图中由各个观测值描出代表变量的折线是否含有趋势项和(或)截距项,从而为进一步的单位根检验的检验模式做准备。 单位根检验方法的文献综述:在非平稳的面板数据渐进过程中,Levin andLin(1993) 很早就发现这些估计量的极限分布是高斯分布,这些结果也被应用在有异方差的面板数据中,并建立了对面板单位根进行检验的早期版本。后来经过Levin et al. (2002)的改进,提出了检验面板单位根的LLC 法。Levin et al. (2002) 指出,该方法允许不同截距和时间趋势,异方差和高阶序列相关,适合于中等维度(时间序列介于25~250 之间,截面数介于10~250 之间) 的面板单位根检验。Im et al. (1997) 还提出了检验面板单位根的IPS 法,但Breitung(2000) 发现IPS 法对限定性趋势的设定极为敏感,并提出了面板单位根检验的Breitung 法。Maddala and Wu(1999)又提出了ADF-Fisher和PP-Fisher面板单位根检验方法。 由上述综述可知,可以使用LLC、IPS、Breintung、ADF-Fisher 和PP-Fisher5种方法进行面板单位根检验。 其中LLC-T 、BR-T、IPS-W 、ADF-FCS、PP-FCS 、H-Z 分别指Levin, Lin & Chu t* 统计量、Breitung t 统计量、lm Pesaran & Shin W 统计量、ADF- Fisher Chi-square统计量、PP-Fisher Chi-square 统计量、Hadri Z统计量,并且Levin, Lin & Chu t* 统计量、Breitung t统计量的原假设为存在普通的单位根过程,lm Pesaran & Shin W 统计量、ADF- Fisher Chi-square统计量、PP-Fisher Chi-square统计量的原假设为存在有效的单位根过程,Hadri Z统计量的检验原假设为不存在普通的单位根过程。 有时,为了方便,只采用两种面板数据单位根检验方法,即相同根单位根检验LLC (Levin-Lin-Chu)检验和不同根单位根检验Fisher-ADF检验(注:对普通序列(非面板序列)的单位根检验方法则常用ADF检验),如果在两种检验中均拒绝存在单位根的原假设则我们

面板数据分析简要步骤与注意事项面板单位根面板协整回归分析

面板数据分析简要步骤与注意事项 面板单位根—面板协整—回归分析) 步骤一:分析数据的平稳性(单位根检验) 按照正规程序,面板数据模型在回归前需检验数据的平稳性。李子奈曾指出,一些非平稳的经济时间序列往往表现出共同的变化趋势,而这些序列间本身不一定有直接的关联,此时,对这些数据进行回归,尽管有较高的R平方,但其结果是没有任何实 际意义的。这种情况称为称为虚假回归或伪回归( spurious regression )。他认为平稳的真正含义是:一个时间序列剔除了不变的均值(可视为截距)和时间趋势以后,剩余的序列为零均值,同方差,即白噪声。因此单位根检验时有三种检验模式:既有趋势又有截距、只有截距、以上都无。 因此为了避免伪回归,确保估计结果的有效性,我们必须对各面板序列的平稳性进行检验。而检验数据平稳性最常用的办法就是单位根检验。首先,我们可以先对面板序列绘制时序图,以粗略观测时序图中由各个观测值描出代表变量的折线是否含有趋势项和(或)截距项,从而为进一步的单位根检验的检验模式做准备。单位根检验方法的文献综述:在非平稳的面板数据渐进过程中 ,Levin andLin(1993) 很早就发现这些估计量的极限分布是高斯分布 , 这些结果也被应用在有异方差的面板数据中,并建立了对面板单位根进行检验的早期版本。后来经过Levin et al. (2002) 的改进, 提出了检验面板单位根的LLC法。Levin et al. (2002)指出,该方法允许不同截距和时间趋势,异方差和高阶序列相关,适合于中等维度(时间序列介于25?250之间,截面数介于10?250之间)的面板单位根检验。Im et al. (1997) 还提出了检验面板单位根的 IPS 法, 但 Breitung(2000) 发现 IPS 法对限定性趋势的设定极为敏感 , 并提出了面板单位根检验的 Breitung 法。Maddala and Wu(1999)又提出了 ADF-Fisher 和 PP-Fisher 面板单位根检验方法。 由上述综述可知,可以使用 LLC、IPS、Breintung 、ADF-Fisher 和 PP-Fisher5 种方法进行面板单位根检验。其中LLC-T 、BR-T、IPS-W 、ADF-FCS、PP-FCS、H-Z 分 别指 Levin, Lin & Chu t* 统计量、 Breitung t 统计量、 lm Pesaran & Shin W 统 量、计 ADF- Fisher Chi-square 统计量、PP-Fisher Chi-square 统计量、Hadri Z 统计 量,并且 Levin, Lin & Chu t* 统计量、 Breitung t 统计量的原假设为存在普通的单位根过程, lm Pesaran & Shin W 统计量、 ADF- Fisher Chi-square 统计量、 PP-Fisher Chi-square 统计量的原假设为存在有效的单位根过程, Hadri Z 统计量的检验原假设为不存在普通的单位根过程。 有时,为了方便,只采用两种面板数据单位根检验方法,即相同根单位根检验 LLC(Levin-Lin-Chu )检验和不同根单位根检验 Fisher-ADF 检验(注:对普通序列(非面板序列)的单位根检验方法则常用 ADF检验),如果在两种检验中均拒绝存在单位根的原假设则我 们说此序列是平稳的,反之则不平稳。 如果我们以 T(trend )代表序列含趋势项,以 I (intercept )代表序列含截距项, T&I 代表两项都含,N (none)代表两项都不含,那么我们可以基于前面时序图得出的结论,在单位根检验中选择相应检验模式。 但基于时序图得出的结论毕竟是粗略的,严格来说,那些检验结构均需一一检验。具体操作可以参照李子奈的说法:ADF检验是通过三个模型来完成,首先从含有截距和趋势项的模型开始,再检验只含截距项的模型,最后检验二者都不含的模型。并且认

基于面板分位数回归的辽宁省国有工业企业经营绩效影响研究

基于面板分位数回归的辽宁省国有工业企业经营绩效影响 研究 [摘要]深入研究区域国有工业企业资本结构、规模扩张特征对其经营绩效的影响方向和程度,有利于制定区域产业结构调整和化解过剩产能的政策。利用辽宁省2001―2016年国有工业企业相关数据,通过面板分位数回归测算资本结构、行业集中度及总资产增长率对经营绩效的影响。结果发现:资本结构对经营绩效的影响呈现显著倒“U”型关系,行业集中度对经营绩效的影响也是微弱的倒“U”关系。提出降低辽宁省国有工业企业的资产负债率、降低行业外延式扩张速度等对策建议。 [关键词]工业企业;资本结构;行业集中度;经营绩效;面板分位数回归 [中图分类号]F425 [文献标识码]A [文章编号]2095-3283(2018)04-0082-05 Abstract:Further study of regional state-owned industrial enterprise capital structure and scale expansion characteristics on its business performance direction and degree,is beneficial to develop regional industrial structure adjustment and the

policy of excess capacity. Based on the data of state-owned industrial enterprises from 2001 to 2016 in liaoning province,this paper calculates the influence of capital structure,industry concentration and total asset growth rate on operating performance through the regression of panel quantile regression. The results show that the influence of capital structure on business performance is significantly inverted “U”,and the influence of industry concentration on operating performance is also weak “U”. The paper puts forward some Suggestions to reduce the asset-liability ratio of state-owned industrial enterprises in liaoning province and reduce the expansion speed of the industry. Keywords:Industrial Enterprise;Capital Structure;Industry Concentration;Operating Performance;Panel Quantile Regression 一、引言 ??有工业企业在国家和地区经济发展中起到非常关键的作用,深入研究国有工业企业资本结构、规模扩张及行业集中度对其经营绩效的影响,对于制定产业政策和地区经济发展政策具有重要意义。 国内外众多学者对资本结构、行业集中度与企业绩效关系进行了广泛和深入的研究。一是资本结构与公司绩效关系

面板数据模型的稳健分析方法研究

面板数据模型的稳健分析方法研究 在计量经济学领域,面板数据是极其重要的一类数据类型。在宏观经济的研究中,面板数据模型被广泛地应用于汇率决定理论、跨国经济增长收敛理论的检验、产业结构的分析、技术创新的研究等领域;在微观经济的研究中,面板数据模型被大量地应用于企业成本分析、就业、家庭消费等领域。 随着面板数据模型在经济领域的广泛应用,传统面板数据分析方法的某些局限性也逐渐凸显出来。首先,面板数据模型通常假定误差项服从正态分布,而实际数据很难满足这种假定,利用传统方法得到的估计可能是有偏的甚至是无效的。 其次,在数据的收集过程中,常常会由于人为因素或其他因素导致数据受到污染,即出现不合理的异常值,这样利用传统方法得到的估计与真实值可能存在较大的偏差,用这种有偏的估计结果分析经济问题会得出不合理的结论。针对这些局限性,中外学者们做了大量的工作,如构造面板数据模型的稳健估计以及研究面板数据的分位数回归模型,然而,这些方法仍存在一些不足。 首先,针对面板数据模型的稳健估计通常是利用Huber损失函数降低异常值影响,这样有两个缺点:一是稳健性不高,二是有效性较低,即估计的方差较大;其次,若面板数据的分位数回归模型中存在内生性,现有的工具变量方法计算复杂且需要估计大量的冗余参数。论文基于面板数据均值回归模型提出了一种更加稳健有效的估计方法(ELS-EL),并将此方法推广到复杂的面板数据模型如广义线性模型、部分线性模型中;此外,本文基于面板数据的分位数回归模型提出了一种两阶段的工具变量方法(2S-IVFEQR),降低了计算复杂度,并将新方法推广到动态面板数据的分位数回归模型中。 论文的主体框架分为七个章节。第一章,介绍了论文的研究背景、研究意义,

面板数据的计量方法

1.什么是面板数据? 面板数据(panel data)也称时间序列截面数据(time series and cross section data)或混合数据(pool data)。面板数据是截面数据与时间序列综合起来的一种数据资源,是同时在时间和截面空间上取得的二维数据。 如:城市名:北京、上海、重庆、天津的GDP分别为10、11、9、8(单位亿元)。这就是截面数据,在一个时间点处切开,看各个城市的不同就是截面数据。如:2000、2001、2002、2003、2004各年的北京市GDP分别为8、9、10、11、12(单位亿元)。这就是时间序列,选一个城市,看各个样本时间点的不同就是时间序列。 如:2000、2001、2002、2003、2004各年中国所有直辖市的GDP分别为: 北京市分别为8、9、10、11、12; 上海市分别为9、10、11、12、13; 天津市分别为5、6、7、8、9; 重庆市分别为7、8、9、10、11(单位亿元)。 这就是面板数据。 2.面板数据的计量方法 利用面板数据建立模型的好处是:(1)由于观测值的增多,可以增加估计量的抽样精度。(2)对于固定效应模型能得到参数的一致估计量,甚至有效估计量。(3)面板数据建模比单截面数据建模可以获得更多的动态信息。例如1990-2000 年30 个省份的农业总产值数据。固定在某一年份上,它是由30 个农业总产值数字组成的截面数据;固定在某一省份上,它是由11 年农业总产值数据组成的一个时间序列。面板数据由30 个个体组成。共有330 个观测值。 面板数据模型的选择通常有三种形式:混合估计模型、固定效应模型和随机效应模型 第一种是混合估计模型(Pooled Regression Model)。如果从时间上看,不同个体之间不存在显著性差异;从截面上看,不同截面之间也不存在显著性差异,那么就可以直接把面板数据混合在一起用普通最小二乘法(OLS)估计参数。 第二种是固定效应模型(Fixed Effects Regression Model)。在面板数据散点图中,如果对于不同的截面或不同的时间序列,模型的截距是不同的,则可以采用在模型中加虚拟变量的方法估计回归参数,称此种模型为固定效应模型(fixed effects regression model)。 固定效应模型分为3种类型,即个体固定效应模型(entity fixed effects regression model)、时刻固定效应模型(time fixed effects regression model)和时刻个体固定效应模型(time and entity fixed effects regression model)。(1)个体固定效应模型。 个体固定效应模型就是对于不同的个体有不同截距的模型。如果对于不同的时间序列(个体)截距是不同的,但是对于不同的横截面,模型的截距没有显著性变化,那么就应该建立个体固定效应模型。注意:个体固定效应模型的EViwes输出结果中没有公共截距项。 (2)时刻固定效应模型。 时刻固定效应模型就是对于不同的截面(时刻点)有不同截距的模型。如果确知

面板数据分析步骤

转载:面板数据分析的思路和Eviews操作: 面板数据一般有三种:混合估计模型;随机效应模型和固定效应模型。首先,第一步是作固定效应和随机效应模型的选择,一般是用Hausman检验。 如果你选用的是所有的企业,反映的是总体的效应,则选择固定效应模型,如果你选用的是抽样估计,则要作Hausman检验。这个可以在Eviews 5.1里头做。 H0:应该建立随机效应模型。 H1:应该建立固定效应模型。 先使用随机效应回归,然后做Hausman检验,如果是小概率事件,拒绝原假设则应建立固定效应模型,反之,则应该采用随机效应模型进行估计。 第二步,固定效应模型分为三种:个体固定效应模型、时刻固定效应模型和个体时刻固定效应模型(这三个模型的含义我就不讲了,大家可以参考我列的参考书)。如果我们是对个体固定,则应选择个体固定效用模型。但是,我们还需作个体固定效应模型和混合估计模型的选择。所以,就要作F值检验。相对于混合估计模型来说,是否有必要建立个体固定效应模型可以通过F检验来完成。 H0:对于不同横截面模型截距项相同(建立混合估计模型)。SSEr H1:对于不同横截面模型的截距项不同(建立时刻固定效应模型)。SSEu

F统计量定义为:F=[( SSEr - SSEu)/(T+k-2)]/[ SSEu/(NT-T-k)] 其中,SSEr,SSEu分别表示约束模型(混合估计模型的)和非约束模型(个体固定效应模型的)的残差平方和(Sum squared resid)。非约束模型比约束模型多了T–1个被估参数。需要指出的是:当模型中含有k 个解释变量时,F统计量的分母自由度是NT-T- k。通过对F统计量我们将可选择准确、最佳的估计模型。 在作回归是也是四步:第一步,先作混合效应模型:在cross-section 一栏选择None ,Period也是None;Weights是cross-section Weights,然后把回归结果的Sum squared resid值复制出来,就是SSEr 第二步:作个体固定效用模型:在cross-section 一栏选择Fixed ,Period也是None;Weights是cross-section Weights,然后把回归结果的Sum squared resid值复制出来,就是SSEu 第三步:根据公式F=[( SSEr - SSEu)/(T+k-2)]/[ SSEu/(NT-T-k)]。计算出结果。其中,T为年数,不管我们的数据是unbalance还是balance 看observations就行了,也即Total pool (balanced) observations:的值,但是如果是balance我们也可以计算,也即是每一年的企业数的总和。比如说我们研究10年,每一年又500加企业,则NT=10×500=5000。K为解释变量,不含被解释变量。 第四步,根据计算出来的结果查F值分布表。看是否通过检验。检验准则:当F> Fα(T-1, NT-T-k) , α=0.01,0.05或0.1时,拒绝原假设,则结论是应该建立个体固定效应模型,反之,接受原假设,则不能建立个体固定效应模型。

最新 面板数据的自适应Lasso分位回归方法的统计分析-精品

面板数据的自适应Lasso分位回归方法 的统计分析 一、引言 面板数据模型是当前学术界讨论最多的模型之一。传统的面板数据模型实际上是一种条件均值模型,即讨论在给定解释变量的条件下响应变量均值变化规律。这种模型的一个固有缺陷是只描述了响应变量的均值信息,其他信息则都忽略了。然而,数据的信息应该是全方位的,这种只对均值建模的方法有待改进。Koenker等提出的分位回归模型是对均值回归模型的一种有效改进,该模型可以在给定解释变量后对响应变量的任意分位点处进行建模,从而可以从多个层次刻画数据的分布信息[1]。同时,分位回归的参数估计是通过极小化加权残差绝对值之和得到,比传统均值回归模型下二次损失函数获得的最小二乘估计更为稳健[2]。 对于简单的线性模型,与分位回归方法相对应的参数点估计、区间估计、模型检验及预测已经有很多成熟的研究结果,但有关面板数据模型的分位回归方法研究文献还不多见。Koenker对固定效应的面板数据模型采用带Lasso惩罚的分位回归方法,通过对个体固定效应实施L1范数惩罚,该方法能够在各种偏态及厚尾分布下得到明显优于均值回归的估计,然而惩罚参数如何确定是该方法的一个难点[3];罗幼喜等也提出了3种新的固定效应面板数据分位回归方法,模拟显示,这些新方法在误差非正态分布情况下所得估计优于传统的最小二乘估计和极大似然估计,但新方法对解释变量在时间上进行了差分运算,当解释变量中包含有不随时间变化的协变量时,这些方法则无法使用[4];Tian等对含随机效应的面板数据模型提出了一种分层分位回归法,并利用EQ算法给出模型未知参数的估计,但该算法只针对误差呈正态分布而设计,限制了其应用范围[5]。以上文献均是直接从损失函数的角度考虑分位回归模型的建立及求解;Liu等利用非对称拉普拉斯分布与分位回归检验损失函数之间的关系,从分布的角度建立了含随机效应面板数据的条件分位回归模型,通过蒙特卡罗EM算法解决似然函数高维积分问题[6];Luo等则在似然函数的基础上考虑加入参数先验信息,从贝叶斯的角度解决面板数据的分位回归问题,模拟显示,贝叶斯分位回归法能有效地处理模型中随机效应参数[7];朱慧明等也考虑过将贝叶斯分位回归法应用于自回归模型,模拟和实证显示该方法能有效地揭示滞后变量对响应变量的位置、尺度和形状的影响[8]。 然而,上述方法均不能对模型中自变量进行选择,但在实际的经济问题中,人们在建立模型之前经常会面临较多解释变量,且对哪个解释变量最终应该留在模型中没有太多信息。如果将一些不重要的噪声变量包含在模型之中,不仅会影响其他重要解释变量估计的准确性,也会使模型可解释性和预测准确性降低。Park等在研究完全贝叶斯分层模型时提出了一种新的贝叶斯Lasso方法,通过假定回归系数有条件Laplace先验信息给出了参数估计的Gibbs抽样算法,这一工作使得一些正则化的惩罚方法都能够纳入到贝叶斯的框架中来,通过特殊的先验信息对回归系数进行压缩,该方法能够在估计参数的同时对模型中自变量进行选择[9-10]。Alhamzawi等将贝叶斯Lasso方法引入到面板数据分位回归模型中来,使得在估计分位回归系数的同时能够对模型中重要解释变

面板数据的计量方法

面板数据的计量方法 1.什么是面板数据? 面板数据(panel data)也称时间序列截面数据(time series and cross section data)或混合数据(pool data)。面板数据是截面数据与时间序列综合起来的一种数据资源,是同时在时间和截面空间上取得的二维数据。 如:城市名:北京、上海、重庆、天津的GDP分别为10、11、9、8(单位亿元)。这就是截面数据,在一个时间点处切开,看各个城市的不同就是截面数据。如:2000、2001、2002、2003、2004各年的北京市GDP分别为8、9、10、11、12(单位亿元)。这就是时间序列,选一个城市,看各个样本时间点的不同就是时间序列。 如:2000、2001、2002、2003、2004各年中国所有直辖市的GDP分别为: 北京市分别为8、9、10、11、12; 上海市分别为9、10、11、12、13; 天津市分别为5、6、7、8、9; 重庆市分别为7、8、9、10、11(单位亿元)。 这就是面板数据。 2.面板数据的计量方法 利用面板数据建立模型的好处是:(1)由于观测值的增多,可以增加估计量的抽样精度。(2)对于固定效应模型能得到参数的一致估计量,甚至有效估计量。(3)面板数据建模比单截面数据建模可以获得更多的动态信息。例如1990-2000 年30 个省份的农业总产值数据。固定在某一年份上,它是由30 个农业总产值数字组成的截面数据;固定在某一省份上,它是由11 年农业总产值数据组成的一个时间序列。面板数据由30 个个体组成。共有330 个观测值。 面板数据模型的选择通常有三种形式:混合估计模型、固定效应模型和随机效应模型 第一种是混合估计模型(Pooled Regression Model)。如果从时间上看,不同个体之间不存在显著性差异;从截面上看,不同截面之间也不存在显著性差异,那么就可以直接把面板数据混合在一起用普通最小二乘法(OLS)估计参数。 第二种是固定效应模型(Fixed Effects Regression Model)。在面板数据散点图中,如果对于不同的截面或不同的时间序列,模型的截距是不同的,则可以采用在模型中加虚拟变量的方法估计回归参数,称此种模型为固定效应模型(fixed effects regression model)。 固定效应模型分为3种类型,即个体固定效应模型(entity fixed effects regression model)、时刻固定效应模型(time fixed effects regression model)和时刻个体固定效应模型(time and entity fixed effects regression model)。(1)个体固定效应模型。 个体固定效应模型就是对于不同的个体有不同截距的模型。如果对于不同的时间序列(个体)截距是不同的,但是对于不同的横截面,模型的截距没有显著性变化,那么就应该建立个体固定效应模型。注意:个体固定效应模型的EViwes输

面板数据分析方法步骤

1.面板数据分析方法步骤 面板数据的分析方法或许我们已经了解许多了,但是到底有没有一个基本的步骤呢?那些步骤是必须的?这些都是我们在研究的过程中需要考虑的,而且又是很实在的问题。面板单位根检验如何进行?协整检验呢?什么情况下要进行模型的修正?面板模型回归形式的选择?如何更有效的进行回归?诸如此类的问题我们应该如何去分析并一一解决?以下是我近期对面板数据研究后做出的一个简要总结,和大家分享一下,也希望大家都进来讨论讨论。 步骤一:分析数据的平稳性(单位根检验) 按照正规程序,面板数据模型在回归前需检验数据的平稳性。李子奈曾指出,一些非平稳的经济时间序列往往表现出共同的变化趋势,而这些序列间本身不一定有直接的关联,此时,对这些数据进行回归,尽管有较高的R平方,但其结果是没有任何实际意义的。这种情况称为虚假回归或伪回归(spurious regression)。他认为平稳的真正含义是:一个时间序列剔除了不变的均值(可视为截距)和时间趋势以后,剩余的序列为零均值,同方差,即白噪声。因此单位根检验时有三种检验模式:既有趋势又有截距、只有截距、以上都无。 因此为了避免伪回归,确保估计结果的有效性,我们必须对各面板序列的平稳性进行检验。而检验数据平稳性最常用的办法就是单位根检验。首先,我们可以先对面板序列绘制时序图,以粗略观测时序图中由各个观测值描出代表变量的折线是否含有趋势项和(或)截距项,从而为进一步的单位根检验的检验模式做准备。 单位根检验方法的文献综述:在非平稳的面板数据渐进过程中,Levin andLin(1993) 很早就发现这些估计量的极限分布是高斯分布,这些结果也被应用在有异方差的面板数据中,并建立了对面板单位根进行检验的早期版本。后来经过Levin et al. (2002)的改进,提出了检验面板单位根的LLC 法。Levin et al. (2002) 指出,该方法允许不同截距和时间趋势,异方差和高阶序列相关,适合于中等维度(时间序列介于25~250 之间,截面数介于10~250 之间) 的面板单位根检验。Im et al. (1997) 还提出了检验面板单位根的IPS 法,但Breitung(2000) 发现IPS 法对限定性趋势的设定极为敏感,并提出了面板单位根检验的Breitung 法。Maddala and Wu(1999)又提出了ADF-Fisher和PP-Fisher面板单位根检验方法。 由上述综述可知,可以使用LLC、IPS、Breintung、ADF-Fisher 和PP-Fisher5种方法进行面板单位根检验。 其中LLC-T 、BR-T、IPS-W 、ADF-FCS、PP-FCS 、H-Z 分别指Levin, Lin & Chu t* 统计量、Breitung t 统计量、lm Pesaran & Shin W 统计量、

分位数回归及其实例

分位数回归及其实例 一、分位数回归的概念 分位数回归(Quantile Regression):是计量经济学的研究前沿方向之一,它利用解释变量的多个分位数(例如四分位、十分位、百分位等)来得到被解释变量的条件分布的相应的分位数方程。与传统的OLS 只得到均值方程相比,它可以更详细地描述变量的统计分布。 传统的线性回归模型描述了因变量的条件分布受到自变量X 的影响过程。普通最dx--乘法是估计回归系数的最基本的方法,它描述了自变量X 对于因变量y 的均值影响。如果模型中的随机扰动项来自均值为零而且同方差的分布,那么回归系数的最dx--乘估计为最佳线性无偏估计(BLUE);如果近一步随机扰动项服从正态分布,那么回归系数的最dx--乘法或极大似然估计为最小方差无偏估计(M Ⅵ甩)。但是在实际的经济生活中,这种假设常常不被满足,饲如数据出现尖峰或厚尾的分布、存在显著的异方差等情况,这时的最小二乘法估计将不再具有上述优良性且稳健性非常差。最小二乘回归假定自变量X 只能影响因变量的条件分布的位置,但不能影响其分布的刻度或形状的任何其他方面。 为了弥补普通最dx--乘法(0Ls)在回归分析中的缺陷,Koenkel"和Pxassett 于1978年提出了分位数回归(Quantile Regression)的思想。它依据因变量的条件分位数对自变量X 进行回归,这样得到了所有分位数下的回归模型。因此分位数回归相比普通最小二乘回归只能描述自变量X 对于因变量y 局部变化的影响而言,更能精确地描述自变量X 对于因变量y 的变化范围以及条件分布形状的影响。 分位数回归是对以古典条件均值模型为基础的最小二乘法的延伸,用多个分位函数来估计整体模型。中位数回归是分位数回归的特殊情况,用对称权重解决残差最小化问题,而其他的条件分位数回归则用非对称权重解决残差最小化。 一般线性回归模型可设定如下: ()((0)),(0,1).x t t I t ρττ=-<∈ 在满足高斯-马尔可夫假设前提下,可表示如下: 01122(|)...k k E y x x x x αααα=++++ 其中u 为随机扰动项k αααα,...,,,210为待估解释变量系数。这是均值回归(OLS )模型表达式,类似于均值回归模型,也可以定义分位数回归模型如下: 01122(|)...()y k k u Q x x x x Q ταααατ=+++++ 对于分位数回归模型,则可采取线性规划法(LP )估计其最小加权绝对偏差,从而得到解释变量的回归系数,可表示如下: 01122min (...)x k k E y x x x ραααα----- 求解得:01122?????(|)y k k Q x a a x a x a x τ=++++

相关文档
最新文档