LINEAR MIXED MODELS IN STATA
Stata入门手册 STATA操作方法概述

统计分析与计量分析的结合
单元统计:描述统计、假设检验(参数、非参数)、ANOVA、质量控制、统计 作图
多元统计:MANOVA、主成分、因子分析、典型相关、聚类、判别分析、对应 分析、多维标度 线性回归、非线性回归、工具变量回归、广义线性回归、分位数回归(稳健回 归)、系统方程模型(SUR、联立方程)、离散选择模型(二项选择、排序选择、 多项选择、条件Logit、嵌套Logit模型、二元选择模型等)、计数模型(泊松回归、 负二项回归)、截断与归并模型、海克曼选择模型、逐步回归(stepwise)等。 时间序列分析:时间序列的平滑、相关图、ARIMAX、GARCH、单位根检验、 Johansen协整检验、 VAR、VEC、滚动回归等。 面板数据(线性模型、工具变量回归、动态面板、分层混合效应、广义估计方 程(GEE)、随机边界模型等)。
语法结构(varlist)
已存在的变量
varlist表示若干变量。对于数据中存在的变量,允许的表达形式包括 *、?和。其中,*表示任意字符,?表示一个字符,表示两个变量 之间的所有变量(根据数据中变量的存放位置)。 比如,数据文件中共有20个变量,依次为var1、var2、… 、 var20,则var* 表示所有变量var1-var20,var?表示变量var1、 var2、… 、var9,var1-var6表示变量var1、var2、… 、var6。 新变量
生成新变量时,变量名称不能简化。如果变量具有相同的前缀并且 都以数字结尾,可以用-表示。比如,生成新变量V1、V2、V3、V4 input v1 v2 v3 v4 或者 . input v1-v4。
16
《STATA应用高级培训教程》 南开大学数量经济研究所 王群勇
语法结构(varlist)
STATA 42g多层次线性中介模型示例说明书

Title Example42g—One-and two-level mediation models(multilevel)Description Remarks and examples References Also seeDescriptionTo demonstrate linear mediation models,we use the following data:.use https:///data/r18/gsem_multmed(Fictional job-performance data).summarizeVariable Obs Mean Std.dev.Min Maxbranch1,5003821.65593175support1,500.0084667.5058316-1.6 1.8satis1,500.0212.6087235-1.62perform1,500 5.005317.8949845 2.350228.084294.notes_dta:1.Fictional data on job performance,job satisfaction,and perceivedsupport from managers for1,500sales employees of a large departmentstore in75locations.2.Variable support is average of Likert-scale questions,each questionscored from-2to2.3.Variable satis is average of Likert-scale questions,each question scoredfrom-2to2.4.Variable perform is job performance measured on continuous scale.See Structural models1:Linear regression and Multilevel mixed-effects models in[SEM]Intro5 for background.Remarks and examples Remarks are presented under the following headings:One-level model with semOne-level model with gsemTwo-level model with gsemFitting the models with the BuilderOne-level model with semYou canfit single-level mediation models with sem or gsem.You will be better off using sem because then you can use estat teffects afterward to compute indirect and total effects.12Example42g—One-and two-level mediation models(multilevel)The model we wish tofit is the simplest form of a mediation model,namely,We are interested in the effect of managerial support on job performance,but we suspect a portion of the effect might be mediated through job satisfaction.In traditional mediation analysis,the model would befit by a series of linear regression models as described in Baron and Kenny(1986).That approach is sufficient because the errors are not correlated.The advantage of using structural equation modeling is that you canfit a single model and estimate the indirect and total effects,and you can embed the simple mediation model in a larger model and even use latent variables to measure any piece of the mediation model.Tofit this model with the command syntax,we type.sem(perform<-satis support)(satis<-support)Endogenous variablesObserved:perform satisExogenous variablesObserved:supportFitting target model:Iteration0:Log likelihood=-3779.9224Iteration1:Log likelihood=-3779.9224Structural equation model Number of obs=1,500Estimation method:mlLog likelihood=-3779.9224OIMCoefficient std.err.z P>|z|[95%conf.interval] Structuralperformsatis.8984401.025190335.670.000.849068.9478123support.6161077.030314320.320.000.5566927.6755227_cons 4.981054.0150589330.770.000 4.951539 5.010569 satissupport.2288945.03050477.500.000.1691064.2886826_cons.019262.0154273 1.250.212-.0109749.0494989 var(e.perf~m).3397087.0124044.3162461.364912var(e.satis).3569007.0130322.3322507.3833795LR test of model vs.saturated:chi2(0)=0.00Prob>chi2=.Example42g—One-and two-level mediation models(multilevel)3Notes:1.The direct effect of managerial support on job performance is measured by perform<-supportand is estimated to be0.6161.The effect is small albeit highly statistically significant.The standard deviations of performance and support are0.89and0.51.A one standard deviation increase in support improves performance by a third of a standard deviation.2.The direct effect of job satisfaction on job performance is measured by perform<-satis andis estimated to be0.8984.That also is a moderate effect,practically speaking,and is highly statistically significant.3.The effect of managerial support on job satisfaction is measured by satis<-support and ispractically small but statistically significant.4.What is the total effect of managerial support on performance?It is the direct effect(0.6161)plus the indirect effect of support on satisfaction on performance(0.2289×0.8984=0.2056), meaning the total effect is0.8217.It would be desirable to put a standard error on that,but that’s more work.We can use estat teffects after estimation to obtain the total effect and its standard error: .estat teffectsDirect effectsOIMCoefficient std.err.z P>|z|[95%conf.interval] Structuralperformsatis.8984401.025190335.670.000.849068.9478123support.6161077.030314320.320.000.5566927.6755227 satissupport.2288945.03050477.500.000.1691064.2886826 Indirect effectsOIMCoefficient std.err.z P>|z|[95%conf.interval] Structuralperformsatis0(no path)support.205648.02800667.340.000.150756.26054 satissupport0(no path)Total effectsOIMCoefficient std.err.z P>|z|[95%conf.interval] Structuralperformsatis.8984401.025190335.670.000.849068.9478123support.8217557.040457920.310.000.7424597.9010516 satissupport.2288945.03050477.500.000.1691064.28868264Example42g—One-and two-level mediation models(multilevel)One-level model with gsemWe canfit the same model with gsem.The command is the same except that we substitute gsem for sem,and results are identical:.gsem(perform<-satis support)(satis<-support)Iteration0:Log likelihood=-2674.3421Iteration1:Log likelihood=-2674.3421(backed up)Generalized structural equation model Number of obs=1,500Response:performFamily:GaussianLink:IdentityResponse:satisFamily:GaussianLink:IdentityLog likelihood=-2674.3421Coefficient Std.err.z P>|z|[95%conf.interval] performsatis.8984401.025190335.670.000.849068.9478123support.6161077.030314320.320.000.5566927.6755227_cons 4.981054.0150589330.770.000 4.951539 5.010569 satissupport.2288945.03050477.500.000.1691064.2886826_cons.019262.0154273 1.250.212-.0109749.0494989 var(e.perf~m).3397087.0124044.3162461.364912var(e.satis).3569007.0130322.3322507.3833795 After gsem,however,we cannot use estat teffects:.estat teffectsestat teffects not validr(321);We can,however,calculate the indirect and total effects for ourselves and obtain the standard error by using nlcom.Referring back to note4of the previous section,the formula for the indirect effect and total effects areindirect effect=β1β4total effect=β2+β1β4whereβ1=path coefficient for perform<-satisβ4=path coefficient for satis<-supportβ2=path coefficient for perform<-support It turns out that we can access the coefficients by typingβ1=b[perform:satis]β4=b[satis:support]β2=b[perform:support]Example42g—One-and two-level mediation models(multilevel)5which is most easily revealed by typing.gsem,coeflegend(output omitted)Thus we can obtain the indirect effect by typing.nlcom_b[perform:satis]*_b[satis:support]_nl_1:_b[perform:satis]*_b[satis:support]Coefficient Std.err.z P>|z|[95%conf.interval] _nl_1.205648.02800667.340.000.150756.26054and we can obtain the total effect by typing.nlcom_b[perform:support]+_b[perform:satis]*_b[satis:support]_nl_1:_b[perform:support]+_b[perform:satis]*_b[satis:support]Coefficient Std.err.z P>|z|[95%conf.interval]_nl_1.8217557.040457920.310.000.7424597.9010516Two-level model with gsemIt may be easier to use sem rather than gsem forfitting single-level models,but if you want tofit multilevel models,you must use gsem.A variation on the model we justfit isIn this model,we include a random intercept in each equation at the branch(individual store) level.The model above is one of many variations on two-level mediation models;see Krull and MacKinnon(2001)for an introduction to multilevel mediation models,and see Preacher,Zyphur,and Zhang(2010)for a discussion offitting these models with structural equation modeling.6Example42g—One-and two-level mediation models(multilevel)Tofit this model with the command syntax,we type.gsem(perform<-satis support M1[branch])(satis<-support M2[branch]),>cov(M1[branch]*M2[branch]@0)Fitting fixed-effects model:Iteration0:Log likelihood=-2674.3421Iteration1:Log likelihood=-2674.3421Refining starting values:Grid node0:Log likelihood=-2132.1613Fitting full model:Iteration0:Log likelihood=-2132.1613(not concave)Iteration1:Log likelihood=-1801.3155Iteration2:Log likelihood=-1769.6421Iteration3:Log likelihood=-1705.1282Iteration4:Log likelihood=-1703.746Iteration5:Log likelihood=-1703.7141Iteration6:Log likelihood=-1703.714Generalized structural equation model Number of obs=1,500Response:performFamily:GaussianLink:IdentityResponse:satisFamily:GaussianLink:IdentityLog likelihood=-1703.714(1)[perform]M1[branch]=1(2)[satis]M2[branch]=1Coefficient Std.err.z P>|z|[95%conf.interval] performsatis.604264.033639817.960.000.5383313.6701968support.6981525.025043227.880.000.6490687.7472364 M1[branch]1(constrained)_cons 4.986596.0489465101.880.000 4.890663 5.082529 satissupport.2692633.017964914.990.000.2340528.3044739 M2[branch]1(constrained)_cons.0189202.05708680.330.740-.0929678.1308083var(M1[branch]).1695962.0302866.119511.2406713var(M2[branch]).2384738.0399154.1717781.3310652var(e.perf~m).201053.0075451.1867957.2163985var(e.satis).1188436.0044523.1104299.1278983Notes:1.In One-level model with sem above,we measured the direct effects on job performance of jobsatisfaction and managerial support as0.8984and0.6161.Now the direct effects are0.6043and0.6982.Example42g—One-and two-level mediation models(multilevel)72.We can calculate the indirect and total effects just as we did in the previous section,whichwe will do below.We mentioned earlier that there are other variations of two-level mediation models,and how you calculate total effects depends on the model chosen.In this case,the indirect effect is.nlcom_b[perform:satis]*_b[satis:support]_nl_1:_b[perform:satis]*_b[satis:support]Coefficient Std.err.z P>|z|[95%conf.interval]_nl_1.1627062.014138211.510.000.1349958.1904165and the total effect is.nlcom_b[perform:support]+_b[perform:satis]*_b[satis:support]_nl_1:_b[perform:support]+_b[perform:satis]*_b[satis:support]Coefficient Std.err.z P>|z|[95%conf.interval]_nl_1.8608587.025750133.430.000.8103894.911328Fitting the models with the BuilderUse the diagram in One-level model with sem above for reference.1.Open the dataset.In the Command window,type.use https:///data/r18/gsem_multmed2.Open a new Builder diagram.Select menu item Statistics>SEM(structural equation modeling)>Model building and estimation.3.Create a regression component for the perform outcome.Select the Add regression component tool,,and then click in the center of the diagram.In the resulting dialog box,a.select perform in the Dependent variable control;b.select support with the Independent variables control;c.select Left in the Independent variables’direction control;d.click on OK.e the Select tool,,to select only the perform rectangle,and drag it to the right toincrease the distance between the rectangles.(You can hold the Shift key while dragging toensure that the movement is directly to the right.)4.Create the mediating variable.a.Select the Add observed variable tool,,and then click in the diagram above the pathfrom support to perform.b.In the Contextual Toolbar,select satis with the Variable control.8Example42g—One-and two-level mediation models(multilevel)5.Create the paths to and from the mediating variable.a.Select the Add path tool,.b.Click in the upper right of the support rectangle(it will highlight when you hover overit),and drag a path to the lower left of the satis rectangle(it will highlight when you can release to connect the path).c.Continuing with the tool,draw a path from the lower right of the satis rectangle tothe upper left of the perform rectangle.6.Clean up the direction of the error term.We want the error for each of the endogenous variables to be to the right of the rectangle.The error for satis may have been created in another direction.If so,a.choose the Select tool,;b.click in the satis rectangle;c.click on one of the Error rotation buttons,,in the Contextual Toolbar until the erroris to the right of the rectangle.7.Clean up the location of the paths.If you do not like where the paths have been connected to the rectangles,use the Select tool, ,to click on the path,and then simply click on where it connects to a rectangle and drag the endpoint.8.Estimate.Click on the Estimate button,,in the Standard Toolbar,and then click on OK in the resulting SEM estimation options dialog box.9.Tofit the model in Two-level model with gsem,continue with the previous diagram,and put thebuilder in gsem mode by clicking on the button.10.Create the multilevel latent variable corresponding to the random intercept for satis.a.Select the Add multilevel latent variable tool,,and click above the rectangle for satis.b.In the Contextual Toolbar,click on the button.c.Select the nesting level and nesting variable by selecting2from the Nesting depth controland selecting branch>Observations in the next control.d.Specify M1as the Base name.e.Click on OK.11.Create the multilevel latent variable corresponding to the random intercept for perform.a.Select the Add multilevel latent variable tool,,and click above the rectangle for satisand to the right of the branch1double oval.b.In the Contextual Toolbar,click on the button.c.Select the nesting level and nesting variable by selecting2from the Nesting depth controland selecting branch>Observations in the next control.d.Specify M2as the Base name.e.Click on OK.Example42g—One-and two-level mediation models(multilevel)9 12.Draw paths from the multilevel latent variables to their corresponding endogenous variables.a.Select the Add pathtool,.b.Click in the bottom of the branch1double oval,and drag a path to the top of the satisrectangle.c.Continuing withthe tool,click in the bottom of the branch2double oval,and drag apath to the top of the perform rectangle.13.Estimate again.Click on the Estimatebutton,,in the Standard Toolbar,and then click on OK in the resultingGSEM estimation options dialog box.You can open a completed diagram for thefirst model in the Builder by typing.webgetsem sem_medYou can open a completed diagram for the second model in the Builder by typing .webgetsem gsem_mlmedReferencesBaron,R.M.,and D.A.Kenny.1986.The moderator–mediator variable distinction in social psychological research: Conceptual,strategic,and statistical considerations.Journal of Personality and Social Psychology51:1173–1182.https:///10.1037//0022-3514.51.6.1173.Krull,J.L.,and D.P.MacKinnon.2001.Multilevel modeling of individual and group level mediated effects.Multivariate Behavorial Research36:249–277.https:///10.1207/S1*******MBR360206.Preacher,K.J.,M.J.Zyphur,and Z.Zhang.2010.A general multilevel SEM framework for assessing multilevel mediation.Psychological Methods15:209–233.https:///10.1037/a0020141.Also see[SEM]Example38g—Random-intercept and random-slope models(multilevel)[SEM]Intro5—Tour of models[SEM]gsem—Generalized structural equation model estimation command[CAUSAL]mediate—Causal mediation analysisStata,Stata Press,and Mata are registered trademarks of StataCorp LLC.Stata andStata Press are registered trademarks with the World Intellectual Property Organizationof the United Nations.Other brand and product names are registered trademarks ortrademarks of their respective companies.Copyright c 1985–2023StataCorp LLC,College Station,TX,USA.All rights reserved.®。
Stata 8 时间序列功能扩展文档说明书

As with a VAR or structural VAR (SVAR), the fi rst step in testing forcointegration and fi tting VECMs is to determine the appropriate lagorder of the model, for which Stata’s existing varsoc command canbe used. The new vecrank command uses Johansen’s method todetermine the number of cointegrating relationships, and the new vec command estimates the parameters of the VECM. After fi tting the model, the new vecstable command can be used to check ifit is stable, and the new veclmar and vecnorm commands testwhether the residuals are serially correlated and normally distributed,respectively.Stata’s varirf and varfcast commands have been updated and renamed irf and fcast because they now work after vec as well as after var and svar. Existing VARIRF result fi les will work fl awlessly with the new irf commands. Various commands make creating publication-quality graphics and tables a snap.As with all of Stata’s estimation commands, the new and revised time-series commands can be accessed from the command line or the graphical user interface.Accompanying the new commands for cointegration and VECMs is a revised version of the Stata 8 Time-Series Reference Manual. Nearly seventy pages of new material on VECMs and associated commands have been added, and many other commands’ entries have been revamped and improved. Anyone interested in using Stata for VECMs and time-series analysis in general will fi nd the updated manual indispensable.The table of contents and online ordering information can be found at /bookstore/ts.html. You can also order the manual using the enclosed bookstore order form.Time-Series Reference Manual, 2d ed Publisher: Stata PressCopyright: 2004Pages: 390; paperbackISBN: 1-881228-86-XPrice: $45.00Date: August 23rd - August 24th, 2004V enue: Boston, MassachusettsLongwood Galleria Conference Center342 Longwood AvenueThe North American Stata Users Group meeting is less than a month away, so do not let an opportunity to visit Boston in the summer pass you by. This year’s program has an exceptionally wide variety of topics. From data management to data integration, from sample-size calculations in the health sciences to sensitivity analysis in transportation research, from sunflower plots to metagraphiti, there are sure to be topics that interest you. The training sessions cover two hot topics: mixed models and graphics. This is also your opportunity to talk directly to the developers of Stata, to tell us what kind of work you do and how we can make Stata a better tool for you.Program CommitteeElizabeth Allred, Harvard School of Public HealthKit Baum, Dept. of Economics, Boston College and RePEc Nicholas J. Cox, University of Durham Rich Goldstein, consultantPeter A. Lachenbruch, OBE/CBER of the FDA Marcello Pagano, Harvard School of Public HealthProgramPresentations (Day 1)Session 1: 8:30–9:45 Statistical MethodsUse of Gaussian integration in StataAlan Feiveson, NASA - Johnson Space CenterGenerating random variables from the N/I distributionsPeter A. Lachenbruch, U.S. FDAEconometric techniques for estimating treatment effectsZhehui Luo, Dept. of Epidemiology, Michigan State UniversitySample-size calculation for longitudinal studiesPhil Schumm, Dept. of Health Studies, University of ChicagoBreak: 9:45–10:15Session 2: 10:15–11:45 Data Management Using StataUsing Stata for questionnaire developmentTheodore Pollari & Phil Schumm, Dept. of Health Studies, University of ChicagoTranslating data between MySQL and StataMichael Johnson & Phil Schumm, Dept. of Health Studies, University of ChicagoWorking with ODBC data sources in Stata: tips and techniquesJoseph Coveney, Cobridge Co., Ltd., TokyoUsing Stata with large datasets in corporate America: lessons learnedEd Bassin, ProfSoft, Inc.Lunch: 11:45–1:00(Buffet lunch at conference center is included with registration.)Session 3: 1:00–2:15 Stata GraphicsGraphics for categories and compositionsNicholas J. Cox, Dept. of Geography, University of Durham, UKMetagraphiti by Stata: Visuographical exploration and presenta-tion of meta-analytic data using StataBen Dwamena, University of Michigan Medical SchoolDensity-distribution sunflower plots in Stata 8William D. Dupont, Dept. of Biostatistics, Vanderbilt University School of Medicine3RD NORTH AMERICAN STATA USERS GROUP MEETINGBreak: 2:15–2:40Session 4: 2:40–4:00 Data Analysis Using StataReplication methods for complex survey analysis in StataNicholas Winter, Dept. of Government, Cornell UniversityRolling regressions with StataKit Baum, Dept. of Economics, Boston College and RePEcImplementation of quasi-least squares using xtgee in StataJustine Shults, Dept. of Biostatistics, University of PennsylvaniaTo help others in teaching statistics using the Stata softwareSusan Hailpern, Albert Einstein College of MedicineSensitivity analysis on traffic crash prediction models by using StataDeo Chimba, Dept. of Civil Engineering, Florida State UniversityBreak: 4:00–4:15Session 5: 4:15–5:30 StataCorp on StataReport to users/Wishes and grumblesWilliam Gould, President, StataCorpFeatured training courses (Day 2)Training course 1: 8:30–12:00 (includes 30-minute break)Generalized linear latent and mixed models (GLLAMMs)Sophia Rabe-Hesketh, University of California, Berkeley;co-author of “Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models” (2004; see featured book on page 3)Lunch: 12:00–1:30(Buffet lunch at conference center is included with registration.)Training course 2: 1:30–5:00 (includes 30-minute break)Stata graphicsVince Wiggins, Vice President of Scientific Development, StataCorpThis course will cover in detail the basic commands and concepts for building high-quality Stata graphs from scratch. You will learn new approaches to creating graphs, including organizing and managing your data and creating custom schemes.Registration and informationWeb: /boston04Email: stata@Tel: 979-696-4600 or 800-782-8272 Fax: 979-696-4601Cost: $85.00 ($45.00 students); includes lunch and refreshments(optional dinner extra)2Copyright:Generalized Latent Variable Modeling Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation ModelsHesketh unifies the principles behind latent variable modeling, which includes multilevel, longitudinal, and structural-equation models, as well as generalized mixed models, random-coefficient models, item-response models, factor models, panel models, repeated-measures models, latent-class models, and frailty models. Since latent variable models are utilized by researchers from various disciplines with little or no cross-referencing from other disciplines, unifying these modelsChips Electronics Serv i ng Brunei, Indonesia, Malaysia, Singapore 62 - 21 - 452 17 61 ****************.id Cosinus Computing BV Serving The Netherlands +31 416-378 125 ***************Dittrich & Partner Consulting Serv i ng Austria, Czech Republic, Ger m a n y, Hungary +49 2 12 / 26 066 - 0 ************Ixon Technology Company Ltd Serv i ng Taiwan +886-(0)2-27045535 *************.tw JasonTGServing South Korea +82-2-470-4143 ****************MercoStat Consultores Serv i ng Argentina, Brazil, Paraguay, Uruguay 598-2-613-7905 ******************.uy Metrika Consulting Serv i ng the Baltic States, Denmark, Finland, Iceland, Norway, Sweden +46-708-163128 ****************MultiON Consulting S.A. de C.V. Serv i ng Belize, Cos t a Rica,El Salvador, Guatemala, Honduras, Mexico, Nicaragua, Panama 52 (55) 55 59 40 50*********************.mx Ritme Informatique Serv i ng Belgium, France, Luxembourg +33 (0)1 42 46 00 42 **************Scientifi c Solutions S.A. Serving Switzerland 41 (0)21 711 15 20 info@scientifi c-solutions.ch SOFTWARE shop Inc Serv i ng Belize, Bolivia, Chile, Colombia, Costa Rica, Ecuador, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Peru, Venezuela 425-651-4090 ************************Sur v ey Design & Analysis Services Serv i ng Australia, New Zealand +61 (0)3 9878 7373 ***********************.au Timberlake Consultants Serv i ng Eire, U.K. +44 (0)208 697 3377 ******************.uk Timberlake Consulting S.L. Serving Spain+34 (9) 5 560 14 30 ***********************************Timberlake Consultores, Lda. Serving Portugal +351 214 702 869 ******************TStat S.r.l.Serving Italy+39 0864 210101 **************Vishvas Marketing-Mix Services Serving India 91-22-25892639 ****************DISTRIBUTORSColumbia CP Ltd Serv i ng China, Hong Kong, Malaysia, Singapore +852-******** *******************CreActive snc Serv i ng Italy +39 0575 333297 *******************IEMServing Botswana, Lesotho,Mozambique, Namibia,South Africa, Swaziland, Zimbabwe +27-11-8286169 *******************.za Informatique Inc Serving Japan +81-3-3505-1250 *******************.jp MP & AssociatesServing Greece, Cyprus +30-210-7600955********************NFUCA Serv i ng Japan 81-3-5307-1133 ********************.jp Quantec Research (Pty) Ltd Serving South Africa, Southern Africa, Côte d’Ivoire, Ghana +27-12-3615154*******************.za SoftLine CompanyServing Armenia, Azerbaijan, Belarus, Georgia, Kazakhstan, Kyrgystan, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine, Uzbekistan +7-095-2320023****************Tashtit Scientifi c Consultants Ltd Serv i ng Israel +972-3-523-0825 *******************.il Timberlake Consultores Brasil Serving Brazil+55-11-3040-3042*******************.br Timberlake Consultants Polska Serving Poland +48 600 370 435 ******************RESELLERSHOW TO REACH US StataCorp4905 Lakeway Drive College Station TX 77845USA PHONE 979-696-4600FAX 979-696-4601EMAIL ***************WEB。
Stata软件之回归分析

obs:
1,225
vars:
11
25 Aug 2009 08:38
size:
58,800 (99.4% of memory free)
storage display variable name type format
value label
variable label
age female married edulevel
2、给出数据的简要描述。使用describe命令,简写为: des 得到以下运行结果;
三、简单回归分析的Stata软件操作实例
Contains data fromD:\½²¿Î×ÊÁÏ\ÖÜÝíµÄÉÏ¿Î×ÊÁÏ\Êý¾Ý\¡¾ÖØÒª¡¿\¡¾¼ÆÁ¿¾¼ÃѧÈí¼þÓ¦Ó
> ÿμþ¡¿\10649289\stata10\¹¤×Ê·½³Ì1.dta
age in years 1:female; 0:male 1:married; 0:unmarried 1:primary; 2:junior; 3:senior;
4:college years of education years of work experience:
age-edu-6 exp^2 1:bad; 2:good; 3:very good 1:migrant worker; 0:local worker hourly wage
graph twoway lfit wage edu || scatter wage edu 得到以下运行结果,保存该运行结果;
40
30
20
10
0
0
5
10
15
20
years of education
STATA 简介

– 输入新数据时,Stata自动将变量命名为var1,var2等。双击var1会 弹出另一个页面,可以给重新命名、定义变量
– 内容区最左边是行的标志;内容区列出每个个案的变量取值
• Data Browser:数据浏览窗口。只可浏览、不能更改数据 • 注意:在程序运行的时候,该窗口必须是关闭的。否则 Stata将不工作
stata混合回归和固定效应回归的f检验命令

Stata是一个广泛使用的统计分析软件,用于进行数据分析和统计建模。
在Stata中,混合回归和固定效应回归是两种常见的统计建模方法,用于处理面板数据和纵向数据。
在进行混合回归和固定效应回归分析时,我们经常需要进行F检验来检验模型的拟合优度和模型中自变量的显著性。
一、混合回归和固定效应回归的概念1. 混合回归模型混合回归模型是一种用于处理面板数据的统计建模方法,它将个体效应和时间效应引入到回归模型中,来控制个体特征和时间特征对因变量的影响。
混合回归模型通常包括固定效应回归和随机效应回归两种形式,可以通过引入个体固定效应和时间固定效应来控制面板数据中的个体异质性和时间趋势。
2. 固定效应回归模型固定效应回归模型是一种用于处理纵向数据的统计建模方法,它将个体效应引入到回归模型中,来控制个体特征对因变量的影响。
固定效应回归模型通常用于分析面板数据中个体特征对因变量的影响,通过引入个体固定效应来消除个体异质性对回归估计的影响。
二、混合回归和固定效应回归的F检验命令在Stata中进行混合回归和固定效应回归分析时,我们可以使用"xtreg"命令进行模型拟合,并使用"test"命令进行F检验。
1. xtreg命令在Stata中,我们可以使用"xtreg"命令来拟合混合回归和固定效应回归模型。
"xtreg"命令的基本语法为:"xtreg depvar indepvars, fe/re"其中,depvar表示因变量,indepvars表示自变量,fe表示固定效应回归,re表示随机效应回归。
通过使用"xtreg"命令,我们可以得到混合回归和固定效应回归模型的回归系数和显著性检验结果。
2. test命令在得到混合回归和固定效应回归模型的回归系数后,我们可以使用"test"命令来进行F检验。
混合效应模型stata命令

混合效应模型stata命令一、什么是混合效应模型混合效应模型(Mixed Effects Model)是一种广泛应用于统计学领域的模型,也被称为随机效应模型(Random Effects Model)。
它是一种可以同时考虑固定效应和随机效应的统计模型,可以用于解决多层次数据分析问题。
在混合效应模型中,不同个体之间的差异被分为两部分:一个是由固定因素所解释的差异,另一个是由随机因素所解释的差异。
二、混合效应模型的优点1. 能够充分利用多层次数据结构的信息,避免了忽略层次结构带来的偏误。
2. 能够同时考虑固定因素和随机因素对结果的影响。
3. 可以减少估计参数个数和提高估计精度。
4. 可以很好地处理缺失数据问题。
三、stata中混合效应模型命令在stata中,使用mixed命令进行混合效应模型分析。
mixed命令支持各种类型的随机和固定因素,并且可以进行不同类型的协方差结构估计。
下面我们来逐步介绍mixed命令的语法和参数设置。
1. mixed命令语法mixed depvar [indepvars] || groupvar : [indepvars] [if] [in] , options其中,depvar表示因变量,indepvars表示自变量,groupvar表示分组变量。
如果存在多个自变量,需要用空格隔开。
如果存在多个分组变量,则需要用“||”隔开。
options是可选参数。
2. mixed命令参数设置(1)固定效应:在mixed命令中使用factors选项指定固定效应的变量列表。
(2)随机效应:在mixed命令中使用re(random effects)选项指定随机效应的变量列表。
(3)协方差结构:在mixed命令中使用covstruct选项指定协方差结构类型。
常见的协方差结构有unstructured、ar(1)、cs、ar(2)等。
(4)最大似然估计:在mixed命令中使用ml(maximum likelihood)选项指定最大似然估计方法。
线性混合效应模型入门之一(linear mixed effects model)

适用场景线性混合效应模型入门(linear mixed effects model),缩写LMM,在生物医学或社会学研究中经常会用到。
它主要适用于内部存在层次结构或聚集的数据,大体上有两种情况:(1)内部聚集数据:比如要研究A、B两种教学方法对学生考试成绩的影响,从4所学校选取1000名学生作为研究对象。
由于学校之间的差异,来自其中某一所学校的学生成绩可能整体都好于另一所学校,换句话说就是学生成绩在学校这个维度上存在聚集现象。
(2)重复测量数据:比如要研究A、B两种降压药物对高血压患者血压的影响,在每个患者服药前、服药后1个月、3个月、6个月分别测量血压。
由于同一个患者的每次血压之间存在明显的相关性,不能适用于传统的方差分析方法。
随机效应与固定效应之所以称为“线性混合效应模型”,就是因为这种模型结合了固定效应和随机效应。
固定效应(fixed effect):所谓固定效应,指的是这个因素的每个水平(level)已经“穷举”出来了,不能或者不需要再做“推广”。
比如上面的降压药物研究,虽然降压药物有很多,但是研究者只关心A、B两种药物的效果,所以可以视为固定效应。
固定效应影响的是响应变量或因变量(如血压)的均值。
随机效应(random effect):指的是该因素是从一个更大的总体中抽取出来的样本,我们的研究结果要推广到整个总体。
还是上面的药物研究,参与研究的患者只是一个小样本,所以患者作为随机效应。
随机效应影响的是响应变量(血压)的变异程度即方差。
图a中演示是固定效应因子,每次重复实验,因子都是A1、A2、A3三个水平,三个水平的效应均值是固定的。
图b演示的是随机效应因子,每次重复实验,因子水平都不一样,如第一次是B1、B2、B3,第二次是B4、B5、B6,以此类推。
所以因子的每个水平对均值的影响都是随机的,不固定的。
当然这两种效应有时并不是绝对的,主要还是看研究的目的。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Random effects are not directly estimated, but instead characterized by the elements of G, known as variance components As such, you fit a mixed model by estimating β , σ 2, and the variance components.
ONE-LEVEL MODELS
Data on math scores
Consider the Junior School Project data which compares math scores of various schools in the third and fifth years Data on n = 887 pupils in M = 48 schools Let’s fit the model math5ij = β0 + β1math3ij + ui +
THE LINEAR MIXED MODEL
Definition
y = Xβ + Zu + where y is the n × 1 vector of responses X is the n × p fixed-effects design matrix β are the fixed effects Z is the n × q random-effects design matrix u are the random effects is the n × 1 vector of errors such that
[95% Conf. Interval] .113352 1.550372 4.953521 .3224593 2.774112 5.467034
chi2(2) =
Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference
LR test vs. linear regression: chibar2(01) =
57.59 Prob >= chibar2 = 0.0000
For the most part, this is the same as xtreg
Adding a random slope
Consider instead the model math5ij = β0 + β1math3ij + u0i + u1imath3ij +
LINEAR MIXED MODELS IN STATA
Roberto G. Gutierrez StataCorp LP
OUTLINE I. THE LINEAR MIXED MODEL A. Definition B. Panel representation II. ONE-LEVEL MODELS A. B. C. D. E. Data on math scores Adding a random slope Predict Covariance structures ML or REML?
. predict r1 r0, reffects . describe r* storage display variable name type format r1 float %9.0g r0 float %9.0g . gen b0 = _b[_cons] + r0 . gen b1 = _b[math3] + r1 . bysort school: gen tolist = _n==1 . list school b0 b1 if school<=10 & tolist school 1. 26. 36. 44. 68. 93. 106. 116. 142. 163. 1 2 3 4 5 6 7 8 9 10 b0 27.52259 30.35573 31.49648 28.08686 30.29471 31.04652 31.93729 30.83009 27.90685 31.31212 b1 .5527437 .5036528 .5962557 .7505417 .5983001 .5532793 .6756551 .6885387 .6950143 .7024184
III. TWO-LEVEL MODELS A. Productivity data B. Constraints on variance components IV. FACTOR NOTATION A. Motivation B. Fitting the model C. Alternate ways to fit models V. A GLIMPSE AT THE FUTURE
value label
variable label BLUP r.e. for school: math3 BLUP r.e. for school: _cons
We could use these intercepts and slopes to plot the estimated lines for each school. Equivalently, we could just plot the “fitted” values
Random-effects Parameters school: Independent sd(math3) sd(_cons) sd(Residual) LR test vs. linear regression:
Estimate .1911842 2.073863 5.203947
Std. Err. .0509905 .3078237 .1309477 65.35
ij
In essence, each school has its own random regression line such 2 2 that the intercept is N (β0, σ0 ) and the slope on math3 is N (β1, σ1 )
. xtmixed math5 math3 || school: math3 Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -2766.6463 Iteration 1: log restricted-likelihood = -2766.6442 Iteration 2: log restricted-likelihood = -2766.6442 Computing standard errors: Mixed-effects REML regression Group variable: school Number of obs Number of groups Obs per group: min avg max Wald chi2(1) Prob > chi2 z 13.88 84.42 P>|z| 0.000 0.000 = = = = = = = 887 48 5 18.5 62 192.62 0.0000
ij
for i = 1, ..., 48 schools and j = 1, ..., ni pupils. ui is a random effect (intercept) at the school level
. xtmixed math5 math3 || school: Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -2770.5233 Iteration 1: log restricted-likelihood = -2770.5233 Computing standard errors: Mixed-effects REML regression Group variable: school Number of obs Number of groups Obs per group: min avg max Wald chi2(1) Prob > chi2 z 18.63 85.98 P>|z| 0.000 0.000 = = = = = = = 887 48 5 18.5 62 347.21 0.0000
LR test is conservative. What does that mean? lrtest can compare this model to the previous one
Predict
Random effects are not estimated, but they can be predicted (BLUPs)
Random-effects Parameters school: Identity sd(_cons) sd(Residual)
Estimate 2.038896 5.306476
Std. Err. .3017985 .1295751
[95% Conf. Interval] 1.525456 5.058495 2.72515 5.566614
Log restricted-likelihood = -2770.5233 math5 math3 _cons Coef. .6088557 30.36506 Std. Err. .0326751 .3531615
[95% Conf. Interval] .5448137 29.67287 .6728978 31.05724
i
where ui ∼ N (0, S), for q × q variance S, and Z1 0 · · · 0 u 1 0 Z2 · · · 0 . Z= . . ; G = IM ⊗ S ; u = . . . . . . . . . . . . uM 0 0 0 ZM For example, take a random intercept model. In the classical framework, the random intercepts are random coefficients on indicator variables identifying each panel It is better to just think at the panel level and consider M realizations of a random intercept This generalizes to more than one level of nested panels Issue of terminology for multi-level models