田毅-Spark in GrowingIO

合集下载

Fireworks Algorithm for Optimization 烟花算法

Fireworks Algorithm for Optimization   烟花算法

Fireworks Algorithm for OptimizationYing Tan and Yuanchun ZhuKey Laboratory of Machine Perception(MOE),Peking UniversityDepartment of Machine Intelligence,School of Electronics Engineering and Computer Science,Peking University,Beijing,100871,China{ytan,ychzhu}@Abstract.Inspired by observingfireworks explosion,a novel swarm in-telligence algorithm,called Fireworks Algorithm(F A),is proposed forglobal optimization of complex functions.In the proposed F A,two typesof explosion(search)processes are employed,and the mechanisms forkeeping diversity of sparks are also well designed.In order to demon-strate the validation of the F A,a number of experiments were conductedon nine benchmark test functions to compare the F A with two vari-ants of particle swarm optimization(PSO)algorithms,namely StandardPSO and Clonal PSO.It turns out from the results that the proposed F Aclearly outperforms the two variants of the PSOs in both convergencespeed and global solution accuracy.Keywords:natural computing,swarm intelligence,fireworks algorithm,particle swarm optimization,function optimization.1IntroductionIn recent years,Swarm Intelligence(SI)has become popular among researchers working on optimization problems all over the world[1,2].SI algorithms,e.g.Par-ticle Swarm Optimization(PSO)[3],Ant System[4],Clonal Selection Algorithm [5],and Swarm Robots[6],etc.,have advantages in solving many optimization problems.Among all the SI algorithms,PSO is one of the most popular algorithm for searching optimal locations in a D-dimensional space.In1995,Kennedy and Eberhart proposed PSO as a powerful global optimization algorithm inspired by the behavior of bird blocks[3].Since then,the PSO has attracted the attentions of researchers around the globe,and a number of variants of PSO have been con-tinually proposed[7,8].Like PSO,most of swarm intelligence algorithms are inspired by some intelli-gent colony behaviors in nature.In this paper,inspired by the emergent swarm behavior offireworks,a novel swarm intelligence algorithm called Fireworks Al-gorithm(FA)is proposed for function optimization.The FA is presented and implemented by simulating the explosion process offireworks.In the FA,two explosion(search)processes are employed and mechanisms for keeping diversity of sparks are also well designed.To validate the performance of the proposed FA,comparison experiments were conducted on nine benchmark test functions Y.Tan,Y.Shi,and K.C.Tan(Eds.):ICSI2010,Part I,LNCS6145,pp.355–364,2010.c Springer-Verlag Berlin Heidelberg2010356Y.Tan and Y.C.Zhuamong the FA,the Standard PSO(SPSO),and the Clonal PSO(CPSO)[8].It is shown that the FA clearly outperforms the SPSO and the CPSO in bothoptimization accuracy and convergence speed.The remainder of this paper is organized as follows.Section2describes theframework of the FA and introduces two types of search processes and mecha-nisms for keeping diversity.In Section3,experimental results are presented to validate the performance of the FA.Section4concludes the paper.2Fireworks Algorithm2.1F A FrameworkWhen afirework is set off,a shower of sparks willfill the local space around the firework.In our opinion,the explosion process of afirework can be viewed asa search in the local space around a specific point where thefirework is set offthrough the sparks generated in the explosion.When we are asked tofind a point x j satisfying f(x j)=y,we can continually set off‘fireworks’in potential space until one‘spark’targets or is fairly near the point x j.Mimicking the process ofsetting offfireworks,a rough framework of the FA is depicted in Fig.1.In the FA,for each generation of explosion,wefirst select n locations,wherenfireworks are set off.Then after explosion,the locations of sparks are obtained and evaluated.When the optimal location is found,the algorithm stops.Oth-erwise,n other locations are selected from the current sparks andfireworks for the next generation of explosion.From Fig.1,it can be seen that the success of the FA lies in a good designof the explosion process and a proper method for selecting locations,which arerespectively elaborated in subsection2.2and subsection2.3.Fig.1.Framework offireworks algorithmFireworks Algorithm for Optimization357 2.2Design of Fireworks ExplosionThrough observingfireworks display,we have found two specific behavior of fireworks explosion.Whenfireworks are well manufactured,numerous sparks are generated,and the sparks centralize the explosion center.In this case,we enjoy the spectacular display of thefireworks.However,for a badfirework explosion, quite few sparks are generated,and the sparks scatter in the space.The two manners are depicted in Fig.2.From the standpoint of a search algorithm,a goodfirework denotes that thefirework locates in a promising area which may be close to the optimal location.Thus,it is proper to utilize more sparks to search the local area around thefirework.In the contrast,a badfirework means the optimal location may be far from where thefirework locates.Then, the search radius should be larger.In the FA,more sparks are generated and the explosion amplitude is smaller for a goodfirework,compared to a bad one.(a)Good explosion(b)Bad explosionFig.2.Two types offireworks explosionNumber of Sparks.Suppose the FA is designed for the general optimization problem:Minimize f(x)∈R,x min x x max,(1) where x=x1,x2,...,x d denotes a location in the potential space,f(x)is anobjective function,and x min and x max denote the bounds of the potential space.Then the number of sparks generated by eachfirework x i is defined as follows.s i=m·y max−f(x i)+ξni=1(y max−f(x i))+ξ,(2)where m is a parameter controlling the total number of sparks generated by the nfireworks,y max=max(f(x i))(i=1,2,...,n)is the maximum(worst) value of the objective function among the nfireworks,andξ,which denotes the smallest constant in the computer,is utilized to avoid zero-division-error.To avoid overwhelming effects of splendidfireworks,bounds are defined for s i,which is shown in Eq.3.358Y.Tan and Y.C.Zhuˆs i=⎧⎪⎨⎪⎩round(a·m)if s i<amround(b·m)if s i>bmround(s i)otherwise,a<b<1,(3)where a and b are const parameters.Amplitude of Explosion.In contrast to the design of sparks number,the am-plitude of a goodfirework explosion is smaller than that of a bad one.Amplitude of explosion for eachfirework is defined as follows.A i=ˆA·f(x i)−y min+ξni=1(f(x i)−y min)+ξ,(4)whereˆA denotes the maximum explosion amplitude,and y min=min(f(x i))(i= 1,2,...,n)is the minimum(best)value of the objective function among the n fireworks.Generating Sparks.In explosion,sparks may undergo the effects of explosion from random z directions(dimensions).In the FA,we obtain the number of the affected directions randomly as follows.z=round(d·rand(0,1)),(5) where d is the dimensionality of the location x,and rand(0,1)is an uniform distribution over[0,1].The location of a spark of thefirework x i is obtained using Algorithm1. Mimicking the explosion process,a spark’s location˜x j isfirst generated.Then if the obtained location is found to fall out of the potential space,it is mapped to the potential space according to the algorithm.Algorithm1.Obtain the location of a sparkInitialize the location of the spark:˜x j=x i;z=round(d·rand(0,1));Randomly select z dimensions of˜x j;Calculate the displacement:h=A i·rand(−1,1);for each dimension˜x j k∈{pre-selected z dimensions of˜x j}do˜x j k=˜x j k+h;if˜x j k<x mink or˜x jk>x maxk thenmap˜x j k to the potential space:˜x j k=x mink+|˜x jk|%(x maxk−x mink);end ifend forTo keep the diversity of sparks,we design another way of generating sparks—Gaussian explosion,which is show in Algorithm2.A function Gaussian(1,1), which denotes a Gaussian distribution with mean1and standard deviation1,is utilized to define the coefficient of the explosion.In our experiments,ˆm sparks of this type are generated in each explosion generation.Fireworks Algorithm for Optimization359Algorithm2.Obtain the location of a specific sparkInitialize the location of the spark:ˆx j=x i;z=round(d·rand(0,1));Randomly select z dimensions ofˆx j;Calculate the coefficient of Gaussian explosion:g=Gaussian(1,1);for each dimensionˆx j k∈{pre-selected z dimensions ofˆx j}doˆx j k=ˆx j k·g;ifˆx j k<x mink orˆx jk>x maxk thenmapˆx j k to the potential space:ˆx j k=x mink +|ˆx j k|%(x maxk−x mink);end ifend for2.3Selection of LocationsAt the beginning of each explosion generation,n locations should be selected for thefireworks explosion.In the FA,the current best location x∗,upon which the objective function f(x∗)is optimal among current locations,is always kept for the next explosion generation.After that,n−1locations are selected based on their distance to other locations so as to keep diversity of sparks.The general distance between a location x i and other locations is defined as follows.R(x i)=j∈K d(x i,x j)=j∈Kx i−x j ,(6)where K is the set of all current locations of bothfireworks and sparks.Then the selection probability of a location x i is defined as follows.p(x i)=R(x i)j∈KR(x j).(7)When calculating the distance,any distance measure can be utilized including Manhattan distance,Euclidean distance,Angle-based distance,and so on[9]. When d(x i,x j)is defined as|f(x i)−f(x j)|,the probability is equivalent to the definition of the immune density based probability in Ref.[10].2.4SummaryAlgorithm3summarizes the framework of the FA.During each explosion genera-tion,two types of sparks are generated respectively according to Algorithm1and Algorithm2.For thefirst type,the number of sparks and explosion amplitude depend on the quality of the correspondingfirework(f(x i)).In the contrast,the second type is generated using a Gaussian explosion process,which conducts search in a local Gaussian space around afirework.After obtaining the locations of the two types of sparks,n locations are selected for the next explosion gener-ation.In the FA,approximate n+m+ˆm function evaluations are done in each generation.Suppose the optimum of a function can be found in T generations, then we can deduce that the complexity of the FA is O(T∗(n+m+ˆm)).360Y.Tan and Y.C.ZhuAlgorithm 3.Framework of the FARandomly select n locations for fireworks;while stop criteria=false doSet offn fireworks respectively at the n locations:for each firework x i doCalculate the number of sparks that the firework yields:ˆs i ,according to Eq.3;Obtain locations of ˆs i sparks of the firework x i using Algorithm 1;end for for k=1:ˆm doRandomly select a firework x j ;Generate a specific spark for the firework using Algorithm 2;end forSelect the best location and keep it for next explosion generation;Randomly select n −1locations from the two types of sparks and the current fireworks according to the probability given in Eq.7;end while3Experiments3.1Benchmark FunctionsTo investigate the performance of the proposed FA,we conducted experiments on nine benchmark functions.The feasible bounds for all functions are set as [−100,100]D .The expression of the functions,initialization intervals and di-mensionalities are listed in Table 1.Table 1.Nine benchmark functions utilized in our experiments Function ExpressionInitializationD SphereF 1= D i =1x 2i [30,50]D 30Rosenbrock F 2= D −1i =1(100(x i +1−x 2i )2+(x i −1)2)[30,50]D 30Rastrigrin F 3= D i =1(x 2i −10cos(2πx i )+10)[30,50]D 30Griewank F 4=1+ D i =1x 2i 4000− D i =1cos(x i √i )[30,50]D 30Ellipse F 5= D i =1104i −1D−1x 2i [15,30]D 30Cigar F 6=x 21+ D i =2104x 2i [15,30]D 30Tablet F 7=104x 21+ D i =2x 2i [15,30]D 30Schwefel F 8= Di =1((x 1−x 2i )2+(x i −1)2)[15,30]D 30AckleyF 9=20+e −20exp −0.2 1DD i =1x 2i[15,30]D30−exp 1D D i =1cos(2πx 2i )3.2Comparison Experiments among the F A,the CPSO and the SPSOIn this section,we compare the performance of the FA with the CPSO and the SPSO in terms of both convergence speed and optimization accuracy.Fireworks Algorithm for Optimization361 Table2.Statistical mean and standard deviation of solutions found by the F A,the CPSO and the SPSO on nine benchmark functions over20independent runsFunctionFunction F A’s mean CPSO’s mean SPSO’s mean evluations(StD)(StD)(StD)Sphere5000000.0000000.0000001.909960 (0.000000)(0.000000)(2.594634)Rosenbrock6000009.56949333.403191410.522552 (12.12829)(42.513450)(529.389139)Rastrigrin5000000.0000000.053042167.256119 (0.000000)(0.370687)(42.912873)Griewank2000000.0000000.6324032.177754 (0.000000)(0.327648)(0.294225)Ellipse5000000.0000000.00000053.718807 (0.000000)(0.000000)(68.480173)Cigar6000000.0000000.0000000.002492 (0.000000)(0.000000)(0.005194)Tablet5000000.0000000.0000001.462832 (0.000000)(0.000000)(1.157021)Schwefel6000000.0000000.0950990.335996 (0.000000)(0.376619)(0.775270)Ackley2000000.0000001.68364912.365417 (0.000000)(1.317866)(1.265322)The parameters of both the CPSO and the SPSO are set as those in Ref.[8]. For the FA,the parameters were selected by some preliminary experiments.We found that the FA worked quite well at the setting:n=5,m=50,a=0.04, b=0.8,ˆA=40,andˆm=5,which is applied in all the comparison experiments.Table2depicts the optimization accuracy of the three algorithms on nine benchmark functions,which are averaged over20independent runs.It can be seen that the proposed FA clearly outperforms both the CPSO and SPSO on all the functions.In addition,the FA canfind optimal solutions on most benchmark functions in less than10000function evaluations,as shown in Table3.However, the optimization accuracy of the CPSO and the SPSO is unacceptable within 10000function evaluations.Besides optimization accuracy,convergence speed is quite essential to an opti-mizer.To validate the convergence speed of the FA,we conducted more thorough experiments.Fig.3depicts the convergence curves of the FA,the CPSO and the SPSO on eight benchmark functions averaged over20independent runs.From these results,we can arrive at a conclusion that the proposed FA has a much faster speed than the CPSO and the SPSO.From Table3,we canfind that the FA canfind excellent solutions with only10000times of function evaluations. This also reflects the fast convergence speed of the proposed FA.362Y.Tan and Y.C.Zhu(a)Sphere(b)Rosenbrock(c)Rastrigin(d)Griewank(e)Ellipse(f)Cigar(g)Tablet(h)SchwefelFig.3.Convergence curves of the F A,the CPSO and the SPSO on eight benchmark functions.The function fitness are averaged over 20independent runs.Fireworks Algorithm for Optimization363 Table3.Statistical mean and standard deviation of solutions found by the F A,the CPSO and the SPSO on nine benchmark functions over20independent runs of10000 function evaluationsFunction F A’s mean CPSO’s mean SPSO’s mean(StD)(StD)(StD)Sphere0.00000011857.42578124919.099609 (0.000000)(3305.973067)(3383.241523)Rosenbrock19.383302750997504.0000005571942400.000000 (11.94373)(1741747548.420642)(960421617.568024)Rastrigrin0.00000010940.14843824013.001953 (0.000000)(3663.484331)(4246.961530)Griewank0.0000003.4572737.125976 (0.000000)(0.911027)(0.965788)Ellipse0.0000002493945.5000005305106.500000 (0.000000)(1199024.648305)(1117954.409340)Cigar0.000000122527168.000000149600864.000000 (0.000000)(28596381.089661)(13093322.778560)Tablet0.00000015595.10742242547.488281 (0.000000)(8086.792234)(8232.221882)Schwefel4.3537338775860.0000006743699.000000 (1.479332)(1217609.288290)(597770.084232)Ackley0.00000015.90766518.423347 (0.000000)(1.196082)(0.503372)3.3DiscussionAs shown in the experiments,the FA has a faster convergence speed and a better optimization accuracy,compared to the PSO.We consider the success of the FA lies in the following two aspects.–In the FA,sparks suffer the power of explosion from z dimensions simul-taneously,and the z dimensions are randomly selected for each spark˜x i.Thus,there is a probability that the differences between thefirework and the target location happen to lie in these z dimensions.In this scenario,the sparks of thefirework can move towards the target location from z directions simultaneously,which endues the FA with a fast convergence speed.–Two types of sparks are generated to keep the diversity of sparks,and the specific selection process for locations is a mechanism for keeping diversity.Therefore,the FA has the capability of avoiding premature convergence.4ConclusionsMimicking the explosion process offireworks,the so-called FA is proposed and implemented for function optimization.The experiments among the FA,the CPSO and the SPSO have shown that the proposed FA has a promising per-formance.It clearly outperforms the CPSO and the SPSO on nine benchmark364Y.Tan and Y.C.Zhufunctions in terms of both optimization accuracy and convergence speed,which endues the FA with a promising prospect of application and extension.In future work,we will seek a deep theoretical analysis on the FA and try to apply the FA to some practical engineering applications.Finally,we intend to discuss the relationship between the FA and other general-purpose optimization algorithms. Acknowlegements.This work was supported by National Natural Science Foundation of China(NSFC),under Grant No.60875080and60673020,and partially supported by the National High Technology Research and Develop-ment Program of China(863Program),with Grant No.2007AA01Z453.Authors appreciate Mr.Chao Rui for his facility with the early phase of the experiments. References1.Garnier,S.,Gautrais,J.,Theraulaz,G.:The biological principles of swarm intelli-gence.Swarm Intelligence1(1),3–31(2007)2.Das,S.,Abraham,A.,Konar,A.:Swarm intelligence algorithms in bioinformatics.Studies in Computational Intelligence94,113–147(2008)3.Kennedy,J.,Eberhart,R.C.:Particle swarm optimization.In:Proceedings of IEEEInternational Conference on Neural Networks,vol.4,pp.1942–1948(1995)4.Dorigo,M.,Maniezzo,V.,Colorni,A.:Ant system:optimization by a colony ofcooperating agents.IEEE Transactions on Systems,Man,and Cybernetics,Part B:Cybernetics26(1),29–41(1996)5.De Castro,L.N.,Von Zuben, F.J.:Learning and optimization using theclonal selection principle.IEEE Transactions on Evolutionary Computation6(3), 239–251(2002)6.Beni,G.,Wang,J.:Swarm intelligence in cellular robotic systems.In:Proceedingsof NATO Advanced Workshop on Robots and Biological Systems(1989)7.Bratton,D.,Kennedy,J.:Defining a standard for particle swarm optimization.In:Proceedings of IEEE Swarm Intelligence Symposium,pp.120–127(2007)8.Tan,Y.,Xiao,Z.M.:Clonal particle swarm optimization and its applications.In:Proceedings of IEEE Congress on Evolutionary Computation,pp.2303–2309 (2007)9.Perlibakas,V.:Distance measures for PCA-based face recognition.Pattern Recog-nition Letters25(6),711–724(2004)10.Lu,G.,Tan,D.,Zhao,H.:Improvement on regulating definition of antibody densityof immune algorithm.In:Proceedings of the9th International Conference on Neural Information Processing,vol.5,pp.2669–2672(2002)。

土地市场化的经济增长效应及传导机制——理论机制与实证检验

土地市场化的经济增长效应及传导机制——理论机制与实证检验

第 40 卷 ,第 4 期 2023 年8 月15 日国土资源科技管理Vol. 40,No.4Aug. 15,2023 Scientific and Technological Management of Land and Resourcesdoi:10.3969/j.issn.1009-4210.2023.04.009土地市场化的经济增长效应及传导机制——理论机制与实证检验邓郴宜,万 勇(上海社会科学院 应用经济研究所,上海 200235)摘 要:土地要素在中国式现代化过程中起到了重要作用。

在市场化改革背景下,运用数理模型演绎法、面板数据固定效应模型、系统GMM模型、中介效应模型以及bootstrap检验考察土地出让市场化影响城市经济增长的效果及影响渠道。

研究结果表明:(1)土地出让市场化每提高一个单位,城市经济增长提高0.132个百分点;(2)二者之间通过提高资源配置效率、提高城市创新水平以及提高城市土地融资水平三条路径产生作用。

(3)资源配置、城市创新、土地融资中介效应大小分别为0.081、0.052、0.314。

因此,中国应继续深化土地要素市场化改革,加强法规建设,充分考虑地区差异性,将政府与市场相结合。

关键词:土地市场化;城市经济增长;统一大市场中图分类号:F290 文献标志码:A 文章编号:1009-4210-(2023)04-091-15Land allocation and urban economic growth in the market oriented reform: theoretical mechanism and empirical testDENG Chen-yi,WAN Yong(Institute of Applied Economics,Shanghai Academy of Social Sciences,Shanghai 200235,China)Abstract: Land plays an important role in Chinese path to modernization. In the context of market-oriented reform,panel data fixed effect model,system GMM model,mesomeric effect model and bootstrap test are applied to investigate the effect and influence of land marketization on urban economic growth. The research results are as follows:(1) For each unit of land marketization increase,urban economic growth increases by 0.132 percentage points;(2) The two play a role by improving the efficiency of resource allocation,the level of urban innovation and improving the level of urban land financing.(3) The intermediary effects of resource allocation,urban innovation and land financing are 0.081,0.052 and 0.314 respectively. It is concluded that China should continue to deepen the market-based allocation of factors of land,strengthen the construction of laws and regulations,give full consideration to regional differences,and combine the government with the market.Key words: Land marketization,urban economic growth,unified market收稿日期:2023-04-07;改回日期:2023-05-16基金项目:国家社科基金重点项目(19AZS018)作者简介:邓郴宜(1993—),女,博士研究生,从事城市规划与城市经济学研究。

C.parvum全基因组序列

C.parvum全基因组序列

DOI: 10.1126/science.1094786, 441 (2004);304Science et al.Mitchell S. Abrahamsen,Cryptosporidium parvum Complete Genome Sequence of the Apicomplexan, (this information is current as of October 7, 2009 ):The following resources related to this article are available online at/cgi/content/full/304/5669/441version of this article at:including high-resolution figures, can be found in the online Updated information and services,/cgi/content/full/1094786/DC1 can be found at:Supporting Online Material/cgi/content/full/304/5669/441#otherarticles , 9 of which can be accessed for free: cites 25 articles This article 239 article(s) on the ISI Web of Science. cited by This article has been /cgi/content/full/304/5669/441#otherarticles 53 articles hosted by HighWire Press; see: cited by This article has been/cgi/collection/genetics Genetics: subject collections This article appears in the following/about/permissions.dtl in whole or in part can be found at: this article permission to reproduce of this article or about obtaining reprints Information about obtaining registered trademark of AAAS.is a Science 2004 by the American Association for the Advancement of Science; all rights reserved. The title Copyright American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the Science o n O c t o b e r 7, 2009w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o m3.R.Jackendoff,Foundations of Language:Brain,Gram-mar,Evolution(Oxford Univ.Press,Oxford,2003).4.Although for Frege(1),reference was established rela-tive to objects in the world,here we follow Jackendoff’s suggestion(3)that this is done relative to objects and the state of affairs as mentally represented.5.S.Zola-Morgan,L.R.Squire,in The Development andNeural Bases of Higher Cognitive Functions(New York Academy of Sciences,New York,1990),pp.434–456.6.N.Chomsky,Reflections on Language(Pantheon,New York,1975).7.J.Katz,Semantic Theory(Harper&Row,New York,1972).8.D.Sperber,D.Wilson,Relevance(Harvard Univ.Press,Cambridge,MA,1986).9.K.I.Forster,in Sentence Processing,W.E.Cooper,C.T.Walker,Eds.(Erlbaum,Hillsdale,NJ,1989),pp.27–85.10.H.H.Clark,Using Language(Cambridge Univ.Press,Cambridge,1996).11.Often word meanings can only be fully determined byinvokingworld knowledg e.For instance,the meaningof “flat”in a“flat road”implies the absence of holes.However,in the expression“aflat tire,”it indicates the presence of a hole.The meaningof“finish”in the phrase “Billfinished the book”implies that Bill completed readingthe book.However,the phrase“the g oatfin-ished the book”can only be interpreted as the goat eatingor destroyingthe book.The examples illustrate that word meaningis often underdetermined and nec-essarily intertwined with general world knowledge.In such cases,it is hard to see how the integration of lexical meaning and general world knowledge could be strictly separated(3,31).12.W.Marslen-Wilson,C.M.Brown,L.K.Tyler,Lang.Cognit.Process.3,1(1988).13.ERPs for30subjects were averaged time-locked to theonset of the critical words,with40items per condition.Sentences were presented word by word on the centerof a computer screen,with a stimulus onset asynchronyof600ms.While subjects were readingthe sentences,their EEG was recorded and amplified with a high-cut-off frequency of70Hz,a time constant of8s,and asamplingfrequency of200Hz.14.Materials and methods are available as supportingmaterial on Science Online.15.M.Kutas,S.A.Hillyard,Science207,203(1980).16.C.Brown,P.Hagoort,J.Cognit.Neurosci.5,34(1993).17.C.M.Brown,P.Hagoort,in Architectures and Mech-anisms for Language Processing,M.W.Crocker,M.Pickering,C.Clifton Jr.,Eds.(Cambridge Univ.Press,Cambridge,1999),pp.213–237.18.F.Varela et al.,Nature Rev.Neurosci.2,229(2001).19.We obtained TFRs of the single-trial EEG data by con-volvingcomplex Morlet wavelets with the EEG data andcomputingthe squared norm for the result of theconvolution.We used wavelets with a7-cycle width,with frequencies ranging from1to70Hz,in1-Hz steps.Power values thus obtained were expressed as a per-centage change relative to the power in a baselineinterval,which was taken from150to0ms before theonset of the critical word.This was done in order tonormalize for individual differences in EEG power anddifferences in baseline power between different fre-quency bands.Two relevant time-frequency compo-nents were identified:(i)a theta component,rangingfrom4to7Hz and from300to800ms after wordonset,and(ii)a gamma component,ranging from35to45Hz and from400to600ms after word onset.20.C.Tallon-Baudry,O.Bertrand,Trends Cognit.Sci.3,151(1999).tner et al.,Nature397,434(1999).22.M.Bastiaansen,P.Hagoort,Cortex39(2003).23.O.Jensen,C.D.Tesche,Eur.J.Neurosci.15,1395(2002).24.Whole brain T2*-weighted echo planar imaging bloodoxygen level–dependent(EPI-BOLD)fMRI data wereacquired with a Siemens Sonata1.5-T magnetic reso-nance scanner with interleaved slice ordering,a volumerepetition time of2.48s,an echo time of40ms,a90°flip angle,31horizontal slices,a64ϫ64slice matrix,and isotropic voxel size of3.5ϫ3.5ϫ3.5mm.For thestructural magnetic resonance image,we used a high-resolution(isotropic voxels of1mm3)T1-weightedmagnetization-prepared rapid gradient-echo pulse se-quence.The fMRI data were preprocessed and analyzedby statistical parametric mappingwith SPM99software(http://www.fi/spm99).25.S.E.Petersen et al.,Nature331,585(1988).26.B.T.Gold,R.L.Buckner,Neuron35,803(2002).27.E.Halgren et al.,J.Psychophysiol.88,1(1994).28.E.Halgren et al.,Neuroimage17,1101(2002).29.M.K.Tanenhaus et al.,Science268,1632(1995).30.J.J.A.van Berkum et al.,J.Cognit.Neurosci.11,657(1999).31.P.A.M.Seuren,Discourse Semantics(Basil Blackwell,Oxford,1985).32.We thank P.Indefrey,P.Fries,P.A.M.Seuren,and M.van Turennout for helpful discussions.Supported bythe Netherlands Organization for Scientific Research,grant no.400-56-384(P.H.).Supporting Online Material/cgi/content/full/1095455/DC1Materials and MethodsFig.S1References and Notes8January2004;accepted9March2004Published online18March2004;10.1126/science.1095455Include this information when citingthis paper.Complete Genome Sequence ofthe Apicomplexan,Cryptosporidium parvumMitchell S.Abrahamsen,1,2*†Thomas J.Templeton,3†Shinichiro Enomoto,1Juan E.Abrahante,1Guan Zhu,4 Cheryl ncto,1Mingqi Deng,1Chang Liu,1‡Giovanni Widmer,5Saul Tzipori,5GregoryA.Buck,6Ping Xu,6 Alan T.Bankier,7Paul H.Dear,7Bernard A.Konfortov,7 Helen F.Spriggs,7Lakshminarayan Iyer,8Vivek Anantharaman,8L.Aravind,8Vivek Kapur2,9The apicomplexan Cryptosporidium parvum is an intestinal parasite that affects healthy humans and animals,and causes an unrelenting infection in immuno-compromised individuals such as AIDS patients.We report the complete ge-nome sequence of C.parvum,type II isolate.Genome analysis identifies ex-tremely streamlined metabolic pathways and a reliance on the host for nu-trients.In contrast to Plasmodium and Toxoplasma,the parasite lacks an api-coplast and its genome,and possesses a degenerate mitochondrion that has lost its genome.Several novel classes of cell-surface and secreted proteins with a potential role in host interactions and pathogenesis were also detected.Elu-cidation of the core metabolism,including enzymes with high similarities to bacterial and plant counterparts,opens new avenues for drug development.Cryptosporidium parvum is a globally impor-tant intracellular pathogen of humans and animals.The duration of infection and patho-genesis of cryptosporidiosis depends on host immune status,ranging from a severe but self-limiting diarrhea in immunocompetent individuals to a life-threatening,prolonged infection in immunocompromised patients.Asubstantial degree of morbidity and mortalityis associated with infections in AIDS pa-tients.Despite intensive efforts over the past20years,there is currently no effective ther-apy for treating or preventing C.parvuminfection in humans.Cryptosporidium belongs to the phylumApicomplexa,whose members share a com-mon apical secretory apparatus mediating lo-comotion and tissue or cellular invasion.Many apicomplexans are of medical or vet-erinary importance,including Plasmodium,Babesia,Toxoplasma,Neosprora,Sarcocys-tis,Cyclospora,and Eimeria.The life cycle ofC.parvum is similar to that of other cyst-forming apicomplexans(e.g.,Eimeria and Tox-oplasma),resulting in the formation of oocysts1Department of Veterinary and Biomedical Science,College of Veterinary Medicine,2Biomedical Genom-ics Center,University of Minnesota,St.Paul,MN55108,USA.3Department of Microbiology and Immu-nology,Weill Medical College and Program in Immu-nology,Weill Graduate School of Medical Sciences ofCornell University,New York,NY10021,USA.4De-partment of Veterinary Pathobiology,College of Vet-erinary Medicine,Texas A&M University,College Sta-tion,TX77843,USA.5Division of Infectious Diseases,Tufts University School of Veterinary Medicine,NorthGrafton,MA01536,USA.6Center for the Study ofBiological Complexity and Department of Microbiol-ogy and Immunology,Virginia Commonwealth Uni-versity,Richmond,VA23198,USA.7MRC Laboratoryof Molecular Biology,Hills Road,Cambridge CB22QH,UK.8National Center for Biotechnology Infor-mation,National Library of Medicine,National Insti-tutes of Health,Bethesda,MD20894,USA.9Depart-ment of Microbiology,University of Minnesota,Min-neapolis,MN55455,USA.*To whom correspondence should be addressed.E-mail:abe@†These authors contributed equally to this work.‡Present address:Bioinformatics Division,Genetic Re-search,GlaxoSmithKline Pharmaceuticals,5MooreDrive,Research Triangle Park,NC27009,USA.R E P O R T S SCIENCE VOL30416APRIL2004441o n O c t o b e r 7 , 2 0 0 9 w w w . s c i e n c e m a g . o r g D o w n l o a d e d f r o mthat are shed in the feces of infected hosts.C.parvum oocysts are highly resistant to environ-mental stresses,including chlorine treatment of community water supplies;hence,the parasite is an important water-and food-borne pathogen (1).The obligate intracellular nature of the par-asite ’s life cycle and the inability to culture the parasite continuously in vitro greatly impair researchers ’ability to obtain purified samples of the different developmental stages.The par-asite cannot be genetically manipulated,and transformation methodologies are currently un-available.To begin to address these limitations,we have obtained the complete C.parvum ge-nome sequence and its predicted protein com-plement.(This whole-genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the project accession AAEE00000000.The version described in this paper is the first version,AAEE01000000.)The random shotgun approach was used to obtain the complete DNA sequence (2)of the Iowa “type II ”isolate of C.parvum .This isolate readily transmits disease among numerous mammals,including humans.The resulting ge-nome sequence has roughly 13ϫgenome cov-erage containing five gaps and 9.1Mb of totalDNA sequence within eight chromosomes.The C.parvum genome is thus quite compact rela-tive to the 23-Mb,14-chromosome genome of Plasmodium falciparum (3);this size difference is predominantly the result of shorter intergenic regions,fewer introns,and a smaller number of genes (Table 1).Comparison of the assembled sequence of chromosome VI to that of the recently published sequence of chromosome VI (4)revealed that our assembly contains an ad-ditional 160kb of sequence and a single gap versus two,with the common sequences dis-playing a 99.993%sequence identity (2).The relative paucity of introns greatly simplified gene predictions and facilitated an-notation (2)of predicted open reading frames (ORFs).These analyses provided an estimate of 3807protein-encoding genes for the C.parvum genome,far fewer than the estimated 5300genes predicted for the Plasmodium genome (3).This difference is primarily due to the absence of an apicoplast and mitochondrial genome,as well as the pres-ence of fewer genes encoding metabolic functions and variant surface proteins,such as the P.falciparum var and rifin molecules (Table 2).An analysis of the encoded pro-tein sequences with the program SEG (5)shows that these protein-encoding genes are not enriched in low-complexity se-quences (34%)to the extent observed in the proteins from Plasmodium (70%).Our sequence analysis indicates that Cryptosporidium ,unlike Plasmodium and Toxoplasma ,lacks both mitochondrion and apicoplast genomes.The overall complete-ness of the genome sequence,together with the fact that similar DNA extraction proce-dures used to isolate total genomic DNA from C.parvum efficiently yielded mito-chondrion and apicoplast genomes from Ei-meria sp.and Toxoplasma (6,7),indicates that the absence of organellar genomes was unlikely to have been the result of method-ological error.These conclusions are con-sistent with the absence of nuclear genes for the DNA replication and translation machinery characteristic of mitochondria and apicoplasts,and with the lack of mito-chondrial or apicoplast targeting signals for tRNA synthetases.A number of putative mitochondrial pro-teins were identified,including components of a mitochondrial protein import apparatus,chaperones,uncoupling proteins,and solute translocators (table S1).However,the ge-nome does not encode any Krebs cycle en-zymes,nor the components constituting the mitochondrial complexes I to IV;this finding indicates that the parasite does not rely on complete oxidation and respiratory chains for synthesizing adenosine triphosphate (ATP).Similar to Plasmodium ,no orthologs for the ␥,␦,or εsubunits or the c subunit of the F 0proton channel were detected (whereas all subunits were found for a V-type ATPase).Cryptosporidium ,like Eimeria (8)and Plas-modium ,possesses a pyridine nucleotide tran-shydrogenase integral membrane protein that may couple reduced nicotinamide adenine dinucleotide (NADH)and reduced nico-tinamide adenine dinucleotide phosphate (NADPH)redox to proton translocation across the inner mitochondrial membrane.Unlike Plasmodium ,the parasite has two copies of the pyridine nucleotide transhydrogenase gene.Also present is a likely mitochondrial membrane –associated,cyanide-resistant alter-native oxidase (AOX )that catalyzes the reduction of molecular oxygen by ubiquinol to produce H 2O,but not superoxide or H 2O 2.Several genes were identified as involved in biogenesis of iron-sulfur [Fe-S]complexes with potential mitochondrial targeting signals (e.g.,nifS,nifU,frataxin,and ferredoxin),supporting the presence of a limited electron flux in the mitochondrial remnant (table S2).Our sequence analysis confirms the absence of a plastid genome (7)and,additionally,the loss of plastid-associated metabolic pathways including the type II fatty acid synthases (FASs)and isoprenoid synthetic enzymes thatTable 1.General features of the C.parvum genome and comparison with other single-celled eukaryotes.Values are derived from respective genome project summaries (3,26–28).ND,not determined.FeatureC.parvum P.falciparum S.pombe S.cerevisiae E.cuniculiSize (Mbp)9.122.912.512.5 2.5(G ϩC)content (%)3019.43638.347No.of genes 38075268492957701997Mean gene length (bp)excluding introns 1795228314261424ND Gene density (bp per gene)23824338252820881256Percent coding75.352.657.570.590Genes with introns (%)553.9435ND Intergenic regions (G ϩC)content %23.913.632.435.145Mean length (bp)5661694952515129RNAsNo.of tRNA genes 454317429944No.of 5S rRNA genes 6330100–2003No.of 5.8S ,18S ,and 28S rRNA units 57200–400100–20022Table parison between predicted C.parvum and P.falciparum proteins.FeatureC.parvum P.falciparum *Common †Total predicted proteins380752681883Mitochondrial targeted/encoded 17(0.45%)246(4.7%)15Apicoplast targeted/encoded 0581(11.0%)0var/rif/stevor ‡0236(4.5%)0Annotated as protease §50(1.3%)31(0.59%)27Annotated as transporter ࿣69(1.8%)34(0.65%)34Assigned EC function ¶167(4.4%)389(7.4%)113Hypothetical proteins925(24.3%)3208(60.9%)126*Values indicated for P.falciparum are as reported (3)with the exception of those for proteins annotated as protease or transporter.†TBLASTN hits (e Ͻ–5)between C.parvum and P.falciparum .‡As reported in (3).§Pre-dicted proteins annotated as “protease or peptidase”for C.parvum (CryptoGenome database,)and P.falciparum (PlasmoDB database,).࿣Predicted proteins annotated as “trans-porter,permease of P-type ATPase”for C.parvum (CryptoGenome)and P.falciparum (PlasmoDB).¶Bidirectional BLAST hit (e Ͻ–15)to orthologs with assigned Enzyme Commission (EC)numbers.Does not include EC assignment numbers for protein kinases or protein phosphatases (due to inconsistent annotation across genomes),or DNA polymerases or RNA polymerases,as a result of issues related to subunit inclusion.(For consistency,46proteins were excluded from the reported P.falciparum values.)R E P O R T S16APRIL 2004VOL 304SCIENCE 442 o n O c t o b e r 7, 2009w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o mare otherwise localized to the plastid in other apicomplexans.C.parvum fatty acid biosynthe-sis appears to be cytoplasmic,conducted by a large(8252amino acids)modular type I FAS (9)and possibly by another large enzyme that is related to the multidomain bacterial polyketide synthase(10).Comprehensive screening of the C.parvum genome sequence also did not detect orthologs of Plasmodium nuclear-encoded genes that contain apicoplast-targeting and transit sequences(11).C.parvum metabolism is greatly stream-lined relative to that of Plasmodium,and in certain ways it is reminiscent of that of another obligate eukaryotic parasite,the microsporidian Encephalitozoon.The degeneration of the mi-tochondrion and associated metabolic capabili-ties suggests that the parasite largely relies on glycolysis for energy production.The parasite is capable of uptake and catabolism of mono-sugars(e.g.,glucose and fructose)as well as synthesis,storage,and catabolism of polysac-charides such as trehalose and amylopectin. Like many anaerobic organisms,it economizes ATP through the use of pyrophosphate-dependent phosphofructokinases.The conver-sion of pyruvate to acetyl–coenzyme A(CoA) is catalyzed by an atypical pyruvate-NADPH oxidoreductase(Cp PNO)that contains an N-terminal pyruvate–ferredoxin oxidoreductase (PFO)domain fused with a C-terminal NADPH–cytochrome P450reductase domain (CPR).Such a PFO-CPR fusion has previously been observed only in the euglenozoan protist Euglena gracilis(12).Acetyl-CoA can be con-verted to malonyl-CoA,an important precursor for fatty acid and polyketide biosynthesis.Gly-colysis leads to several possible organic end products,including lactate,acetate,and ethanol. The production of acetate from acetyl-CoA may be economically beneficial to the parasite via coupling with ATP production.Ethanol is potentially produced via two in-dependent pathways:(i)from the combination of pyruvate decarboxylase and alcohol dehy-drogenase,or(ii)from acetyl-CoA by means of a bifunctional dehydrogenase(adhE)with ac-etaldehyde and alcohol dehydrogenase activi-ties;adhE first converts acetyl-CoA to acetal-dehyde and then reduces the latter to ethanol. AdhE predominantly occurs in bacteria but has recently been identified in several protozoans, including vertebrate gut parasites such as Enta-moeba and Giardia(13,14).Adjacent to the adhE gene resides a second gene encoding only the AdhE C-terminal Fe-dependent alcohol de-hydrogenase domain.This gene product may form a multisubunit complex with AdhE,or it may function as an alternative alcohol dehydro-genase that is specific to certain growth condi-tions.C.parvum has a glycerol3-phosphate dehydrogenase similar to those of plants,fungi, and the kinetoplastid Trypanosoma,but(unlike trypanosomes)the parasite lacks an ortholog of glycerol kinase and thus this pathway does not yield glycerol production.In addition to themodular fatty acid synthase(Cp FAS1)andpolyketide synthase homolog(Cp PKS1), C.parvum possesses several fatty acyl–CoA syn-thases and a fatty acyl elongase that may partici-pate in fatty acid metabolism.Further,enzymesfor the metabolism of complex lipids(e.g.,glyc-erolipid and inositol phosphate)were identified inthe genome.Fatty acids are apparently not anenergy source,because enzymes of the fatty acidoxidative pathway are absent,with the exceptionof a3-hydroxyacyl-CoA dehydrogenase.C.parvum purine metabolism is greatlysimplified,retaining only an adenosine ki-nase and enzymes catalyzing conversionsof adenosine5Ј-monophosphate(AMP)toinosine,xanthosine,and guanosine5Ј-monophosphates(IMP,XMP,and GMP).Among these enzymes,IMP dehydrogenase(IMPDH)is phylogenetically related toε-proteobacterial IMPDH and is strikinglydifferent from its counterparts in both thehost and other apicomplexans(15).In con-trast to other apicomplexans such as Toxo-plasma gondii and P.falciparum,no geneencoding hypoxanthine-xanthineguaninephosphoribosyltransferase(HXGPRT)is de-tected,in contrast to a previous report on theactivity of this enzyme in C.parvum sporo-zoites(16).The absence of HXGPRT sug-gests that the parasite may rely solely on asingle enzyme system including IMPDH toproduce GMP from AMP.In contrast to otherapicomplexans,the parasite appears to relyon adenosine for purine salvage,a modelsupported by the identification of an adeno-sine transporter.Unlike other apicomplexansand many parasitic protists that can synthe-size pyrimidines de novo,C.parvum relies onpyrimidine salvage and retains the ability forinterconversions among uridine and cytidine5Ј-monophosphates(UMP and CMP),theirdeoxy forms(dUMP and dCMP),and dAMP,as well as their corresponding di-and triphos-phonucleotides.The parasite has also largelyshed the ability to synthesize amino acids denovo,although it retains the ability to convertselect amino acids,and instead appears torely on amino acid uptake from the host bymeans of a set of at least11amino acidtransporters(table S2).Most of the Cryptosporidium core pro-cesses involved in DNA replication,repair,transcription,and translation conform to thebasic eukaryotic blueprint(2).The transcrip-tional apparatus resembles Plasmodium interms of basal transcription machinery.How-ever,a striking numerical difference is seenin the complements of two RNA bindingdomains,Sm and RRM,between P.falcipa-rum(17and71domains,respectively)and C.parvum(9and51domains).This reductionresults in part from the loss of conservedproteins belonging to the spliceosomal ma-chinery,including all genes encoding Smdomain proteins belonging to the U6spliceo-somal particle,which suggests that this par-ticle activity is degenerate or entirely lost.This reduction in spliceosomal machinery isconsistent with the reduced number of pre-dicted introns in Cryptosporidium(5%)rela-tive to Plasmodium(Ͼ50%).In addition,keycomponents of the small RNA–mediatedposttranscriptional gene silencing system aremissing,such as the RNA-dependent RNApolymerase,Argonaute,and Dicer orthologs;hence,RNA interference–related technolo-gies are unlikely to be of much value intargeted disruption of genes in C.parvum.Cryptosporidium invasion of columnarbrush border epithelial cells has been de-scribed as“intracellular,but extracytoplas-mic,”as the parasite resides on the surface ofthe intestinal epithelium but lies underneaththe host cell membrane.This niche may al-low the parasite to evade immune surveil-lance but take advantage of solute transportacross the host microvillus membrane or theextensively convoluted parasitophorous vac-uole.Indeed,Cryptosporidium has numerousgenes(table S2)encoding families of putativesugar transporters(up to9genes)and aminoacid transporters(11genes).This is in starkcontrast to Plasmodium,which has fewersugar transporters and only one putative ami-no acid transporter(GenBank identificationnumber23612372).As a first step toward identification ofmulti–drug-resistant pumps,the genome se-quence was analyzed for all occurrences ofgenes encoding multitransmembrane proteins.Notable are a set of four paralogous proteinsthat belong to the sbmA family(table S2)thatare involved in the transport of peptide antibi-otics in bacteria.A putative ortholog of thePlasmodium chloroquine resistance–linkedgene Pf CRT(17)was also identified,althoughthe parasite does not possess a food vacuole likethe one seen in Plasmodium.Unlike Plasmodium,C.parvum does notpossess extensive subtelomeric clusters of anti-genically variant proteins(exemplified by thelarge families of var and rif/stevor genes)thatare involved in immune evasion.In contrast,more than20genes were identified that encodemucin-like proteins(18,19)having hallmarksof extensive Thr or Ser stretches suggestive ofglycosylation and signal peptide sequences sug-gesting secretion(table S2).One notable exam-ple is an11,700–amino acid protein with anuninterrupted stretch of308Thr residues(cgd3_720).Although large families of secretedproteins analogous to the Plasmodium multi-gene families were not found,several smallermultigene clusters were observed that encodepredicted secreted proteins,with no detectablesimilarity to proteins from other organisms(Fig.1,A and B).Within this group,at leastfour distinct families appear to have emergedthrough gene expansions specific to the Cryp-R E P O R T S SCIENCE VOL30416APRIL2004443o n O c t o b e r 7 , 2 0 0 9 w w w . s c i e n c e m a g . o r g D o w n l o a d e d f r o mtosporidium clade.These families —SKSR,MEDLE,WYLE,FGLN,and GGC —were named after well-conserved sequence motifs (table S2).Reverse transcription polymerase chain reaction (RT-PCR)expression analysis (20)of one cluster,a locus of seven adjacent CpLSP genes (Fig.1B),shows coexpression during the course of in vitro development (Fig.1C).An additional eight genes were identified that encode proteins having a periodic cysteine structure similar to the Cryptosporidium oocyst wall protein;these eight genes are similarly expressed during the onset of oocyst formation and likely participate in the formation of the coccidian rigid oocyst wall in both Cryptospo-ridium and Toxoplasma (21).Whereas the extracellular proteins described above are of apparent apicomplexan or lineage-specific in-vention,Cryptosporidium possesses many genesencodingsecretedproteinshavinglineage-specific multidomain architectures composed of animal-and bacterial-like extracellular adhe-sive domains (fig.S1).Lineage-specific expansions were ob-served for several proteases (table S2),in-cluding an aspartyl protease (six genes),a subtilisin-like protease,a cryptopain-like cys-teine protease (five genes),and a Plas-modium falcilysin-like (insulin degrading enzyme –like)protease (19genes).Nine of the Cryptosporidium falcilysin genes lack the Zn-chelating “HXXEH ”active site motif and are likely to be catalytically inactive copies that may have been reused for specific protein-protein interactions on the cell sur-face.In contrast to the Plasmodium falcilysin,the Cryptosporidium genes possess signal peptide sequences and are likely trafficked to a secretory pathway.The expansion of this family suggests either that the proteins have distinct cleavage specificities or that their diversity may be related to evasion of a host immune response.Completion of the C.parvum genome se-quence has highlighted the lack of conven-tional drug targets currently pursued for the control and treatment of other parasitic protists.On the basis of molecular and bio-chemical studies and drug screening of other apicomplexans,several putative Cryptospo-ridium metabolic pathways or enzymes have been erroneously proposed to be potential drug targets (22),including the apicoplast and its associated metabolic pathways,the shikimate pathway,the mannitol cycle,the electron transport chain,and HXGPRT.Nonetheless,complete genome sequence analysis identifies a number of classic and novel molecular candidates for drug explora-tion,including numerous plant-like and bacterial-like enzymes (tables S3and S4).Although the C.parvum genome lacks HXGPRT,a potent drug target in other api-complexans,it has only the single pathway dependent on IMPDH to convert AMP to GMP.The bacterial-type IMPDH may be a promising target because it differs substan-tially from that of eukaryotic enzymes (15).Because of the lack of de novo biosynthetic capacity for purines,pyrimidines,and amino acids,C.parvum relies solely on scavenge from the host via a series of transporters,which may be exploited for chemotherapy.C.parvum possesses a bacterial-type thymidine kinase,and the role of this enzyme in pyrim-idine metabolism and its drug target candida-cy should be pursued.The presence of an alternative oxidase,likely targeted to the remnant mitochondrion,gives promise to the study of salicylhydroxamic acid (SHAM),as-cofuranone,and their analogs as inhibitors of energy metabolism in the parasite (23).Cryptosporidium possesses at least 15“plant-like ”enzymes that are either absent in or highly divergent from those typically found in mammals (table S3).Within the glycolytic pathway,the plant-like PPi-PFK has been shown to be a potential target in other parasites including T.gondii ,and PEPCL and PGI ap-pear to be plant-type enzymes in C.parvum .Another example is a trehalose-6-phosphate synthase/phosphatase catalyzing trehalose bio-synthesis from glucose-6-phosphate and uridine diphosphate –glucose.Trehalose may serve as a sugar storage source or may function as an antidesiccant,antioxidant,or protein stability agent in oocysts,playing a role similar to that of mannitol in Eimeria oocysts (24).Orthologs of putative Eimeria mannitol synthesis enzymes were not found.However,two oxidoreductases (table S2)were identified in C.parvum ,one of which belongs to the same families as the plant mannose dehydrogenases (25)and the other to the plant cinnamyl alcohol dehydrogenases.In principle,these enzymes could synthesize protective polyol compounds,and the former enzyme could use host-derived mannose to syn-thesize mannitol.References and Notes1.D.G.Korich et al .,Appl.Environ.Microbiol.56,1423(1990).2.See supportingdata on Science Online.3.M.J.Gardner et al .,Nature 419,498(2002).4.A.T.Bankier et al .,Genome Res.13,1787(2003).5.J.C.Wootton,Comput.Chem.18,269(1994).Fig.1.(A )Schematic showing the chromosomal locations of clusters of potentially secreted proteins.Numbers of adjacent genes are indicated in paren-theses.Arrows indicate direc-tion of clusters containinguni-directional genes (encoded on the same strand);squares indi-cate clusters containingg enes encoded on both strands.Non-paralogous genes are indicated by solid gray squares or direc-tional triangles;SKSR (green triangles),FGLN (red trian-gles),and MEDLE (blue trian-gles)indicate three C.parvum –specific families of paralogous genes predominantly located at telomeres.Insl (yellow tri-angles)indicates an insulinase/falcilysin-like paralogous gene family.Cp LSP (white square)indicates the location of a clus-ter of adjacent large secreted proteins (table S2)that are cotranscriptionally regulated.Identified anchored telomeric repeat sequences are indicated by circles.(B )Schematic show-inga select locus containinga cluster of coexpressed large secreted proteins (Cp LSP).Genes and intergenic regions (regions between identified genes)are drawn to scale at the nucleotide level.The length of the intergenic re-gions is indicated above or be-low the locus.(C )Relative ex-pression levels of CpLSP (red lines)and,as a control,C.parvum Hedgehog-type HINT domain gene (blue line)duringin vitro development,as determined by semiquantitative RT-PCR usingg ene-specific primers correspondingto the seven adjacent g enes within the CpLSP locus as shown in (B).Expression levels from three independent time-course experiments are represented as the ratio of the expression of each gene to that of C.parvum 18S rRNA present in each of the infected samples (20).R E P O R T S16APRIL 2004VOL 304SCIENCE 444 o n O c t o b e r 7, 2009w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o m。

基于代价敏感XGBoost的企业信用评估模型

基于代价敏感XGBoost的企业信用评估模型

文章编号: 1009 − 444X(2022)02 − 0218 − 06基于代价敏感XGBoost的企业信用评估模型

张田华 ,张 怡 ,谢晓金(上海工程技术大学 数理与统计学院,上海 201620)摘要:我国信用不良的企业数量远小于信用良好的企业数量,样本类别的极端不平衡导致传统的信用评估模型在训练时无法充分学习信用不良企业的特征. 为提高极端梯度提升算法(Extreme Gradient Boosting, XGBoost)在企业信用评估这种不平衡分类问题中的准确率,提出一种基于代价敏感XGBoost的企业信用评估模型. 在XGBoost算法拟合过程中,加入代价敏感损失函数迫使模型更加关注少数类的特征,并引入贝叶斯优化调整模型的重要超参数. 以我国A股市场中小板块企业2016—2020年数据为样本,实证结果表明,基于代价敏感XGBoost的企业信用评估模型能够在保证总体识别精度的情况下提高对信用不良企业的识别准确率.关键词:损失函数;极端梯度提升算法;代价敏感;企业信用评估中图分类号: TP391 文献标志码: A

Enterprise credit evaluation model based oncost sensitive XGBoost

ZHANG Tianhua ,ZHANG Yi ,XIE Xiaojin( School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China)Abstract:In China, the number of enterprises with bad credit is much smaller than that of enterprises withgood credit. The extreme imbalance of sample categories results in the traditional credit evaluation modelunable to fully learn the characteristics of bad credit enterprises during training. In order to improve theaccuracy of extreme gradient boosting (XGBoost) in unbalanced classification problems such as enterprisecredit evaluation, an enterprise credit evaluation model based on cost sensitive XGBoost was proposed. In theprocess of XGBoost algorithm fitting, the cost sensitive loss function was added to force the model to pay moreattention to the characteristics of minority classes, and the bayesian optimization was introduced to adjust thehyperparameters of the model. Taking the datas of small and medium-sized enterprises in China's A-sharemarket from 2016 to 2020 as the sample, the experimental results show that the enterprise credit evaluationmodel based on cost sensitive XGBoost can improve the identification accuracy of bad credit enterprises whileensuring the overall identification accuracy.Key words:loss function;extreme gradient boosting (XGBoost);cost sensitivity;enterprise credit evaluation

基于TLS_辅助的长白落叶松一级枝条生物量模型构建

基于TLS_辅助的长白落叶松一级枝条生物量模型构建

第47卷㊀第2期2023年3月南京林业大学学报(自然科学版)JournalofNanjingForestryUniversity(NaturalSciencesEdition)Vol.47,No.2Mar.,2023㊀收稿日期Received:2022⁃04⁃15㊀㊀㊀㊀修回日期Accepted:2022⁃06⁃13㊀基金项目:国家自然科学基金区域联合基金项目(U21A20244)㊂㊀第一作者:唐依人(2959017551@qq.com)㊂∗通信作者:贾炜玮(jiaww2002@163.com),教授㊂㊀引文格式:唐依人,贾炜玮,王帆,等.基于TLS辅助的长白落叶松一级枝条生物量模型构建[J].南京林业大学学报(自然科学版),2023,47(2):130-140.TANGYR,JIAWW,WANGFetal.ConstructingabiomassmodelofLarixolgensisprimarybranchesbasedonTLS[J].JournalofNanjingForestryUniversity(NaturalSciencesEdition),2023,47(2):130-140.DOI:10.12302/j.issn.1000-2006.202204037.基于TLS辅助的长白落叶松一级枝条生物量模型构建唐依人1,贾炜玮1,2∗,王㊀帆1,孙毓蔓1,张㊀颖1(1.东北林业大学林学院,黑龙江㊀哈尔滨㊀150040;2.东北林业大学森林生态系统可持续经营教育部重点实验室,黑龙江㊀哈尔滨㊀150040)摘要:ʌ目的ɔ探究利用地基激光雷达(terrestriallaserscanning,TLS)点云数据估测枝条生物量的可行性,构建预测长白落叶松(黄花落叶松)枝条生物量的最优模型㊂ʌ方法ɔ以利用孟家岗林场26株长白落叶松点云数据提取出的733个一级枝条的特征因子[枝长(LBL)㊁弦长(LBCL)㊁基径(dB)㊁着枝角度(AB)㊁弓高(HBAH)㊁枝条基部断面积(SBAB)㊁相对着枝深度(dRDINC)]和对应的实测数据为数据源,分别建立枝条水平上的一级枝条生物量基础模型,通过对比基础模型之间的差异来分析利用TLS数据建立枝条生物量模型的可行性㊂最后利用TLS数据分别对比基础模型㊁混合效应模型和随机森林模型的预测效果㊂ʌ结果ɔ基础模型中最终选定的自变量为SBAB和LBCL㊂利用TLS数据建立的枝条生物量基础模型具有更好的预测精度㊂对比3种模型预测能力结果显示,随机森林模型无论在训练集还是测试集上都表现出最好的效果,具体顺序为:随机森林模型>混合效应模型>基础模型㊂其中随机森林模型的决定系数(R2)相较于混合模型和基础模型分别提高了1.32%和4.89%,均方根误差(RMSE)分别降低了11.23%和13.60%㊂ʌ结论ɔ基于TLS利用随机森林算法能够准确对枝条生物量进行估测,不仅为随机森林算法在林分生长模型上的应用奠定了一定的实践基础,也为TLS在树冠结构研究中的应用提供了重要的参考价值㊂关键词:长白落叶松(黄花落叶松);点云数据;枝条特征因子;枝条生物量;混合模型;随机森林中图分类号:S758㊀㊀㊀㊀㊀㊀㊀文献标志码:A开放科学(资源服务)标识码(OSID):文章编号:1000-2006(2023)02-0130-11ConstructingabiomassmodelofLarixolgensisprimarybranchesbasedonTLSTANGYiren1,JIAWeiwei1,2∗,WANGFan1,SUNYuman1,ZHANGYing1(1.SchoolofForestry,NortheastForestryUniversity,Harbin150040,China;2.KeyLaboratoryofSustainableForestEcosystemManagement,MinistryofEducation,NortheastForestryUniversity,Harbin150040,China)Abstract:ʌObjectiveɔToexplorethefeasibilityofusingterrestriallaserscanning(TLS)pointclouddatatoestimatebranchbiomassandfindtheoptimummodeltopredictthebranchbiomassofLarixolgensis.ʌMethodɔThecharacteristicfactors(branchlength(LBL),branchchordlength(LBCL),branchdiameter(dB),branchangle(AB),brancharchheight(HBAH),basalareaofbranches(SBAB),andrelativedepthwithinthecrown(dRDINC))of733first⁃orderbranchesextractedusingpointclouddataof26artificiallarchplantsandthecorrespondingmeasureddatafromMengjiagangForestryFarmwereusedasdatasourcestoestablishfirst⁃levelbranchbiomassbasemodelsatthebranchlevel.Additionally,thefeasibilityofusingTLSdatatoestablishbranchbiomassmodelsbycomparingthedifferencesbetweenbasemodelswereanalyzed.Finally,thepredictioneffectsofthebasemodel,mixed⁃effectsmodel,andrandomforestmodelwerecomparedseparatelyusingTLSdata.ʌResultɔTheindependentvariablesselectedinthebasemodelwereSBABandLBCL.UsingTLSdatatobuildthebranchbiomassbasemodelhadabetterpredictionaccuracy.Comparingtheresultsofthethreemodelsshowedthattherandomforestmodelshowedthebestresultsbothonthetrainingandtest㊀第2期唐依人,等:基于TLS辅助的长白落叶松一级枝条生物量模型构建sets,inthefollowingorder:randomforestmodel>mixed⁃effectsmodel>basemodel.TheR2oftherandomforestmodelimprovedby1.32%and4.89%,andtheRMSEdecreasedby11.23%and13.60%,respectively,comparedtothemixedmodelandthebasemodel.ʌConclusionɔTheresultsofthispaperprovedthatthebranchbiomasscanbeestimatedaccuratelybasedonTLSusingtherandomforestalgorithm,whichnotonlylaidapracticalfoundationfortheapplicationoftherandomforestalgorithmonstandgrowthmodels,butalsoprovidedanimportantreferencevaluefortheapplicationofTLSinthestudyofcrownstructure.Keywords:Larixolgensis;pointclouddata;branchcharacterizationfactor;branchbiomass;mixed⁃effectsmodel;randomforest㊀㊀树木生物量的估计对于模拟林分的总初级生产和理解森林在全球碳循环中的作用至关重要,能否对森林生物量进行准确估计是森林经营管理与评价过程中的关键一环[1]㊂树枝生物量是树木地上生物量的重要组成部分,可以帮助反映气候变化如何影响碳分配模式[2]㊂另一方面,单个树枝可以反映树冠生物量的垂直分布,或通过节子对木材质量产生影响[3-4],这对于确定采伐作业和优化可持续森林管理的疏伐战略至关重要㊂传统上,建立和模拟树木结构㊁生物量和生长需要进行破坏性测量㊂由于野外工作费时费力,地面激光扫描(ter⁃restriallaserscanning,TLS)为快速㊁无损地测量树木结构提供了一种解决方案㊂已有研究表明,利用TLS可以成功地获取各种树木结构变量,在弥补森林研究中破坏性测量的缺点方面显示出巨大的潜力[5-10]㊂以往的研究都是采用模型估计法,即通过异速生长或基于理论方程估计森林生物量,这些方程的变量可以从容易测量的因子(如胸径㊁树高㊁冠长)中推导得出[11-14],能有效估测森林生态系统内的生物量和生产力㊂有研究者采用模型估计法进行了枝条生物量的预测[15-16],这些学者主要采用胸径和树高因子为自变量研究枝条生物量,模型预测精度通常较低㊂因此,开始关注构建枝条水平上的高精度枝条生物量模型研究[1,17],并进行混合模型技术的应用[18-20]㊂在枝条水平上建立的枝条生物量模型预测精度明显提高,但是往往需要通过破坏性手段来获取所需要的枝条因子,而TLS的出现完美弥补了这一缺点,使得无损高效测量树木生物量成为可能㊂一些研究者基于异速生长模型与TLS衍生出的参数来进行枝条生物量的无损估计[21-22],或者使用从树木定量结构模型(TreeQSMs)得到的枝条体积乘以枝条木材密度来获取枝条生物量[23-24]㊂这两种估测方法只能在单木水平上估计总体枝条的生物量,而对于单个分枝的生物量估计,目前还鲜有报道㊂线性㊁非线性㊁混合模型[1,17,25]等传统的参数模型在森林生物量的研究中已被广泛应用㊂但是,参数模型在使用时往往需满足一些基本的统计假设条件,例如数据需符合正态分布㊁等方差等;由于林木生长数据需要进行连续观测,通常难以满足以上条件[26],因此必须寻找新的方法㊂随机森林是基于集成学习的决策树模型[27],这种决策树模型能有效处理各种类型(如分类㊁连续和偏态)的预测变量,不需要对数据进行任何形式的分布假设,还能很好地克服数据中可能存在的噪音㊁缺失㊁异方差和共线性等问题[28],已成为一种主流的非参数模型㊂目前,随机森林法已用于分析林木生长的适应性[29]㊁林木生物量转换和扩展因子的影响因素[30],并在预测单木胸径生长[31]㊁森林蓄积量[32]方面有了很好的应用㊂综合以上分析可知,目前基于TLS数据建立的枝条生物量模型主要是单木水平上的总体枝条生物量;利用随机森林模型预测枝条生物量的研究目前还未见报道㊂鉴于此,本研究以利用TLS数据提取的733个一级枝条的特征因子及其对应的实测数据为数据源,分析对比两种数据类型下基础模型之间的差异,证明利用TLS数据构建枝条生物量模型的可行性㊂随后采用混合模型技术建立预测单个分枝生物量的传统数学模型,采用随机森林算法建立估测单个分枝生物量的新型机器学习模型㊂对比分析传统数学模型与新型机器学习模型的优劣性,找出精确估测单个分枝生物量的最优模型㊂1㊀材料与方法1.1㊀研究区概况研究区位于黑龙江省佳木斯市孟家岗镇林场境内(130ʎ32ᶄ 130ʎ52ᶄE,46ʎ20ᶄ 46ʎ30ᶄN)㊂孟家岗林场属完达山西麓余脉,主要地貌为低山丘陵;坡度10ʎ 20ʎ,平均海拔250m,主要属于东亚大陆性季风气候,春季降雨量少;秋季易产生霜冻;夏季短暂,雨量充足;冬季漫长,寒冷干燥㊂最高温与最低温分别约35.6ħ和-34.7ħ,年均温度131南京林业大学学报(自然科学版)第47卷2.7ħ左右;无霜期120d,年均降水量550 670mm㊂林场内主要土壤类型是典型暗棕壤,还分布着少量草甸暗棕壤㊁潜育暗棕壤和原始暗棕壤等;此外还存在着草甸土㊁沼泽土,以及泥炭土㊂林场内人工林面积约占林分总面积的2/3,天然次生林面积约为1/3,森林覆盖率为81.7%㊂1.2㊀数据来源本研究所使用的数据于2019年11月在孟家岗林场进行采集,主要包括地基激光雷达扫描数据和伐倒解析木的地面实测数据㊂枝条生物量采集过程:在孟家岗林场的6块黄花落叶松(Larixolgensis,俗名长白落叶松)人工林标准样地中按等断面积法将树木分成5个等级,每个等级分别选出1株树木用来进行枝条解析,共选出30株解析木㊂解析木伐倒后对各轮的一级枝条编号,在每轮中再选取1个长势均匀的枝条作为标准枝条,测量该枝条的鲜质量,然后对所有标准枝条进行取样,用烘箱在105ħ条件下对样品进行烘干㊂最后利用各标准枝条的鲜质量与干质量推导计算出解析木各枝条的生物量(Wb)㊂有4株解析木由于树冠遮挡严重导致点云质量较差,将该数据剔除,最终在26株解析木上进行了枝条因子的提取,包括枝长(LBL)㊁基径(dB)㊁弦长(LBCL)㊁着枝角度(AB)㊁弓高(HBAH)和着枝高度(HBH),通过与实测数据进行对比,结果显示LBL㊁LBCL㊁dB㊁AB㊁HBAH㊁HBH平均提取精度分别为:90%㊁90 6%㊁71 5%㊁82 5%㊁70 7%㊁98 6%㊂枝条因子提取示意图如图1,具体提取方法参见文献[9]㊂本研究根据上述因子额外计算了相对着枝深度(dRDINC)和枝条基部断面积(SBAB)㊂dRDINC=(H-HBH)/l;(1)SBAB=π㊃(dB/2)2㊂(2)式中:H为树高,l为冠长㊂对数据进行整理,剔除异常值之后最终获得26株单木的733个枝条的特征因子㊂按照随机抽样的方式以4ʒ1的比例分为训练集和测试集,具体数据见表1㊂表1㊀单木因子与枝条因子统计表Table1㊀Thestatisticaltableoftreefactorandbranchfactors数据data变量variable训练集train测试集test平均值mean最大值max最小值min标准差SD平均值mean最大值max最小值min标准差SDTLS数据TLSdata单木因子treefactorn=20n=6树高/m(H)18.6925.8213.123.8519.7423.0514.053.56胸径/cm(DBH)19.2231.109.065.8118.3126.3512.325.53枝条因子branchfactorn=601n=132枝长/cm(LBL)217.40525.0015.8091.28198.71530.5917.8793.54弦长/cm(LBCL)201.20493.0014.8084.21186.19500.3015.0385.15弓高/cm(HBAH)26.0080.000.0012.9419.9778.431.0014.80基径/mm(dB)26.6069.4911.1013.1528.1061.5612.0013.02枝条基部断面积/cm2(SBAB)6.8137.931.048.057.5229.761.147.14着枝角度/(ʎ)(AB)60.00150.0015.0016.0253.00104.0019.0015.00相对着枝深度(dRDINC)0.611.000.100.200.621.000.100.20实测数据realdata单木因子treefactorn=20n=6树高/m(H)18.7526.0013.173.9219.6023.2213.983.26胸径/cm(DBH)19.4031.009.005.4918.5826.1212.085.13枝条因子branchfactorn=601n=132枝长/cm(LBL)210.00532.0010.0099.51182.00515.0013.0092.18弦长/cm(LBCL)200.00518.00882.00180.00496.0010.0083.00弓高/cm(HBAH)23.0070.000.0012.0019.0081.001.0015.00基径/mm(dB)27.3270.126.2814.5529.4865.748.5113.74枝条基部断面积/cm2(SBAB)7.0338.090.858.887.7331.290.927.85着枝角度/(ʎ)(AB)56.00135.0010.0018.0059.00115.0016.0017.00相对着枝深度(dRDINC)0.621.000.140.210.621.000.110.20枝条生物量/g(Wb)416.934303.323.26539.97460.343075.594.71572.88231㊀第2期唐依人,等:基于TLS辅助的长白落叶松一级枝条生物量模型构建A.枝长branchlength;B.弦长branchchordlength;C.弓高brancharchheight;D.着枝角度branchangle;E.基径branchdiameter㊂图1㊀提取枝条因子示意图Fig.1㊀Thediagramofextractingbranchfactors1.3㊀基础模型根据以往的研究,枝条生物量与枝条因子之间存在着密切的联系[1,17],因此本研究决定通过分析枝条因子与枝条生物量之间的相关性,并考虑各因子之间的共线性问题,选择相关性最高且无共线性的枝条因子作为自变量,进行枝条生物量基础模型的构建㊂模型形式如下:Wb=a1xa21xa32 xan+1n㊂(3)式中:Wb代表枝条生物量;x1,x2,...,xn代表枝条因子;a1,a2,...,an+1代表方程参数㊂1.4㊀混合模型混合效应模型是一种新的统计方法,模型中分为固定效应和随机效应两部分,其中固定效应用来描述样本总体的趋势,而随机效应用来描述总体样本中的个体变化,因此混合效应模型拥有十分灵活的优点㊂为了研究单木对枝条生物量的随机干扰,本研究在方程(3)的基础上加入随机效应建立样木水平的枝条生物量非线性混合模型,具体的模型形式如下所示:㊀Wij=f(δij,vij)+εij,i=1, ,M,j=1, ,ni;δij=Aijχ+Bijβi;βi N(0,D),εij N(0,σ2Ri);Ri=σ2G0.5iΓiG0.5i㊂ìîíïïïïïï(4)式中:Wij表示第i株样木中第j个枝条的生物量观测值;vij为函数f的自变量;εij是模型的误差项;M代表划分的组别即样木数量;ni代表各组内的观测值数量;Aij㊁Bij为已知的设计矩阵;βi为(qˑ1)维随机效应向量;χ为(pˑ1)维固定效应向量;D为随机效应的方差⁃协方差矩阵;σ2是方差;Gi为描述方差异质性的对角矩阵;Ri为分组内方差⁃协方差矩阵;Γi是样木内误差的相关性结构㊂构建混合效应模型首先需要确定参数效应,本研究利用赤池信息准则(AIC)㊁对数似然值(LogLikelilood)㊁贝叶斯准则(BIC)作为评价指标,将所有可能的参数组合作为随机效应对模型进行拟合㊂AIC与BIC越小,对数似然值越大,模型的拟合效果越好㊂同时对参数个数不同的模型进行似然比(LRT)检验,以避免模型过参数化,最后选择参数少而显著性好的模型作为最优模型㊂随机效应方差⁃协方差矩阵能反映个体随机效应间的差异,例如引入样木层次的随机效应,体现的是样木间的差异㊂从复合对称矩阵(CS)㊁对角矩阵[UN(1)]和广义正定矩阵(UN)中进行选择,将AIC㊁BIC值最小和对数似然值最大时的矩阵作为最优矩阵形式㊂从一阶自回归结构[AR(1)]㊁一阶自回归与移动平均结构[ARMA(1,1)]和复合对称矩阵(CS)中来选择最佳误差项方差⁃协方差矩阵㊂为了解决生物量模型中普遍存在的异方差问题,本研究采用指数函数对其进行校正[19]㊂混合模型的建立是基于R软件的nlme包完成的㊂1.5㊀随机森林回归模型随机森林(RandomForest,RF)是以决策树为基学习器的集成式算法㊂利用有放回的随机重采样方式构造出一系列的基学习器(如n个),这一系列的决策树即构成了 森林 ,最后将这些基学习器预测的结果组合起来进行输出㊂在构建基学习器的过程中,RF通过节点分裂从全部Q个变量中随机选择m个(1ɤmɤQ),再从m中找出最优的划分变量作为分裂节点㊂对于回归问题,RF模型将所有决策树预测得到的结果进行加权平均来输出最终的结果[31]㊂在每次抽样过程中约有1/3的样本未被选中,这部分样本称为袋外数据(out⁃of⁃bag,OOB),利用这些样本计算可得袋外错误率(OOBerrorrate)㊂为避免训练出的模型出现过拟合的现象,可以利用袋外错误率验证模型的泛化能力㊂RF模型可以通过提供自变量的相对重要性以及因变量受自变量影响的偏依赖图来提高RF模型的可解释性㊂通过偏依赖图有助于将因变量对自变量的依赖关系进行可视化㊂在此需要强调,计算依赖关系时不能忽略其他变量对因变量的影响,应该考虑所有变量对因变量影响的平均效果㊂本研究以7个枝条因子(LBL㊁LBCL㊁HBAH㊁dB㊁SBAB㊁AB㊁dRDINC)和2个单木因子(DBH㊁H)为自变量,采用随机森林回归算法构建人工长白落叶松枝条生物量预测模型㊂在RF模型的构建过程中,超参数的设置十分重要,本研究采用Python软件331南京林业大学学报(自然科学版)第47卷sklearn库中的RandomizedSearchCV和GridSearchCV函数对RF算法中比较重要的4个超参数进行自动寻优:决策树个数(n_estimators)㊁最小分离样本数(min_samples_split)㊁最小叶子节点样本数(min_samples_leaf)㊁最大分离特征数(max_features),超参数的自动寻优过程基于全部的训练集㊂寻优结束后利用最优模型做出自变量对因变量影响的相对重要性和偏依赖关系图㊂1.6㊀模型检验与评价本研究在独立的测试集上进行模型拟合和预测能力的评价,评价指标为决定系数(R2)㊁平均相对偏差绝对值(RMAE)㊁均方根误差(RMSE)㊁预估精度(Fp)和平均绝对偏差(MAE)等㊂各指标公式如下所示:决定系数(R2)R2=1-ðni=1(yi-y^i)2ðni=1(yi-y-i)2éëêêùûúú;(5)均方根误差[RMSE,式中记为σ(RMSE)]σ(RMSE)=ðni=1(yi-y^i)2n-p;(6)平均绝对偏差[MAE,式中记为σ(MAE)]σ(MAE)=ðni=1yi-y^in;(7)平均相对偏差绝对值[RMAE,式中记为σ(RMAE)]σ(RMAE)=1nðni=1yi-y^iyiˑ100%;(8)预估精度(Fp)Fp=1-t0.05yi㊃ð(yi-y^)2n(n-p)æèççöø÷÷ˑ100%㊂(9)式中:yi为观测值;y^为预测值;n为样本个数; yi为所有观测值的平均值;p为模型参数个数㊂非线性混合模型中的固定效应部分可以采用传统方式进行检验,而随机效应参数的计算需利用下述公式[1]㊂β^k=D^Z^Tk(Z^kD^Z^Tk+R^k)-1ε^k㊂(10)式中:β^k为随机效应参数;D^为随机效应参数的方差⁃协方差矩阵;Z^k为设计矩阵;R^k为组内方差⁃协方差矩阵;ε^k是观测值与固定效应参数计算的预测值的差值㊂2㊀结果与分析2.1㊀基础模型的建立枝条生物量与枝条特征因子之间的相关性结果如图2所示㊂图2㊀枝条生物量与枝条因子相关性热图Fig.2㊀Theheatmapofcorrelationbetweenbranchbiomassandbranchfactor㊀㊀从图2中可以看出两种数据类型下都显示SBAB㊁dB㊁LBCL和LBL与枝条生物量的相关性最高,但同时引入这4个因子会造成模型难以收敛的问题,并且SBAB和dB㊁LBCL和LBL之间分别存在明显的共线性,因此本研究最终选择了枝条基部断面积(SBAB)和弦长(LBCL)两个因子作为枝条生物量建模的自变量㊂模型的拟合结果见表2,模型的最终表达式如式(11)㊁(12)所示㊂从表2中可以看出两种数据类型下模型参数的预测值十分接近,利用TLS数据构建的基础模型具有更好的拟合优度与预测精度㊂因此可以利用TLS数据对枝条生物量模型进行更加深入的研究㊂Wb实=0.06977㊃S0.72014BAB㊃L1.35026BCL;(11)WbTLS=0.04956㊃S0.78871BAB㊃L1.34201BCL㊂(12)式中:Wb实和WbTLS分别代表枝条的实测生物量和基于TLS计算的生物量;SBAB代表枝条基部断面积;LBCL代表弦长㊂431㊀第2期唐依人,等:基于TLS辅助的长白落叶松一级枝条生物量模型构建表2㊀基础模型的拟合与检验结果Table2㊀Thefittingandtestresultsofbasemodel数据data参数parameter预测值estimate标准误SEtP拟合优度goodness⁃of⁃fitstatistic检验结果validationresultR2RMSE/gMAE/gRMAE/%Fp/%实测数据realdataa00.069770.025779.951<0.0001a10.720140.0347415.184<0.00010.8496171.7997.6639.5695.71a21.350260.0840612.975<0.0001TLS数据TLSdataa00.049560.0170112.914<0.0001a10.788710.0332723.708<0.00010.8808156.9390.0735.6696.84a21.342010.0693119.363<0.00012.2㊀枝条生物量非线性混合模型的建立2.2.1㊀随机参数的确定以公式(12)为基础模型,利用R软件的nlme包对公式(12)不同随机效应参数组合形式进行了拟合㊂利用AIC㊁BIC㊁LogLikelihood㊁R2对各模型的拟合优度进行对比,选出最佳的随机参数组合形式㊂同时,对模型(12)㊁模型(12.2)和模型(12.6)进行似然比(LRT)检验,避免模型出现过参数化的问题,当P<0.05时表示模型间差异显著㊂具体拟合结果见表3㊂表3㊀基于不同随机效应参数组合的枝条AGB模型拟合结果Table3㊀FittingresultsofAGBmodelforbranchesbasedondifferentcombinationsofrandomeffectparameters模型model随机效应参数randomeffectparameter参数个数numberofparameterR2赤池信息准则AIC贝叶斯准则BIC对数似然值LogLikelihood似然比LRTP12无30.8798455.5628509.378-4217.5612.1a050.8878433.5888455.825-4211.7912.2a150.9048364.2908386.527-4177.1566.044<0.000112.3a250.9048367.5448389.780-4178.7712.4a0㊁a170.9058437.5888468.719-4211.7912.5a0㊁a270.9048371.5448402.675-4178.7712.6a1㊁a270.9138338.7418369.872-4162.3729.549<0.000112.7a0㊁a1㊁a210不收敛㊀㊀从表3中可知,加入混合效应之后,模型的AIC和BIC相较于基础模型(12)均有所降低,R2有所提高,说明混合模型的拟合效果更佳㊂当随机效应参数个数为1时,模型(12.2)的拟合效果明显优于模型(12.3)和模型(12.1);当随机效应参数个数为2时,模型(12 6)的拟合效果明显优于模型(12.4)和模型(12.5);当随机效应参数个数为3时,模型(12 7)不收敛㊂模型的LRT检验结果表明:基础模型(12)和模型(12.2)之间差异极显著;模型(12.2)和模型(12.6)之间差异极显著,并且模型(12.6)的R2和LogLikelihood最高,AIC和BIC最低,因此将模型(12.6)作为枝条生物量的最优非线性混合模型㊂2.2.2㊀方差⁃协方差矩阵的确定在模型(12.6)的基础上分析对角矩阵UN(1)㊁复合对称矩阵CS和广义正定矩阵对混合模型拟合效果的差异,见表4㊂根据表4可知,以复合对称结构CS作为随机效应方差⁃协方差矩阵时模型不收敛,而广义正定矩阵显示了最好的拟合效果,并且与对角矩阵UN(1)显示出显著的差异,因此将广义正定矩阵作为模型(12.6)的最佳随机效应方差⁃协方差矩阵㊂将林业上常用的3种自相关结构矩阵引入混合模型(12.6),结果见表4㊂从表中可以看出:复合对称矩阵CS依然不收敛;一阶自回归矩阵AR(1)和一阶自回归与移动平均矩阵ARMA(1,1)虽然收敛,但是两者之间的差异并不显著,且模型的拟合优度并无明显提升,因此可以不考虑模型(12 6)误差项的方差⁃协方差结构㊂531南京林业大学学报(自然科学版)第47卷表4㊀不同随机效应方差⁃协方差矩阵和自相关结构矩阵混合模型拟合结果比较Table4㊀Comparisonsoffittingresultsofdifferentrandomeffectsvariance⁃covariancematrixandautocorrelationstructuremixed⁃effectsmodels类型type矩阵matrix参数个数parametersnumber赤池信息准则AIC贝叶斯准则BIC对数似然值LogLikelihood似然比LRTP方差⁃协方差variance⁃covariance复合对称矩阵CS6不收敛对角矩阵UN(1)68364.8328391.516-4176.42广义正定矩阵78338.7418369.872-4162.3728.0907<0.0001自相关autocorrelation复合对称矩阵CS8不收敛一阶自回归矩阵AR(1)88340.1948375.773-4162.10一阶自回归与移动平均矩阵ARMA(1,1)98342.1338382.159-4162.070.06070.80542.3㊀随机森林模型的建立2.3.1㊀最优超参数的确定本研究采用RandomizedSearchCV和Grid⁃SearchCV组合的方式来自动寻找最优超参数组合㊂RandomizedSearchCV由随机搜索和CV交叉验证组成,通过随机搜索超参数不同的组合形式来对模型进行训练,将交叉验证结果最好的超参数组合作为最佳的超参数组合㊂GridSearchCV由网格搜索和CV交叉验证组成,通过遍历所有的超参数组合对模型进行训练,然后选择交叉验证结果最好的超参数组合作为最佳超参数㊂本研究先给定每个超参数一个较大的范围,利用RandomizedSearchCV选出一个最优超参数组合㊂再以此最优参数组合为中心,建立小范围超参数组合,随后利用GridSearchCV遍历所有的超参数组合,以此来确定最终的最优超参数㊂为了验证最优超参数组合结果的准确性,采用 控制变量法 的方式,对每一个超参数的拟合结果进行了可视化,如图3㊂图3㊀不同参数下随机森林的OOB误差曲线图Fig.3㊀PlotsofOOBerrorsforrandomforestwithdifferentparameters㊀㊀图3A表示的是n_estimators对袋外误差的影响,从图中可以看出当n_estimators数量为340时,模型的袋外误差最小;图3B㊁3C㊁3D分别表示min_samples_split㊁min_samples_leaf㊁max_features对袋外误差的影响,当min_samples_split为4,min_sam⁃ples_leaf为1,max_features为3时模型的袋外误差分别达到最小,并且最小值均出现在n_estimators为340时,因此最终选择随机森林模型的最优超参数分别是:n_estimators为340,min_samples_split为4,min_samples_leaf为1,max_features为3㊂2.3.2㊀相对重要性与偏依赖关系各个自变量因子对枝条生物量影响的相对重631㊀第2期唐依人,等:基于TLS辅助的长白落叶松一级枝条生物量模型构建要性见图4㊂图4㊀各自变量对枝条生物量影响的相对重要性得分Fig.4㊀Relativeimportancescoresoftheeffectsofeachindependentvariableonbranchbiomass㊀㊀从图4可以看出,枝条因子是影响枝条生物量的主要因素,单木因子对枝条生物量的影响较小㊂7个枝条因子对枝条生物量影响的相对重要性共计93.4%,其中重要性排在前4位的分别是枝条基部断面积(SBAB)25.7%㊁弦长(LBCL)24 3%㊁基径(dB)22 6%和枝长(LBL)16 6%㊂相对着枝深度(dRDINC)㊁弓高(HBAH)和角度(AB)对枝条生物量的影响最小,分别为1 5%㊁1 4%和1 3%㊂两个单木因子胸径(DBH)和树高(H)对枝条生物量影响的相对重要性分别为3 9%和2 8%㊂枝条生物量受各自变量影响的偏依赖关系见图5㊂图5㊀枝条生物量受各自变量影响的偏依赖关系图Fig.5㊀Plotsofbiasdependenceofbranchbiomassaffectedbyrespectivevariables㊀㊀从总体上看枝条生物量对LBCL㊁LBL㊁dB和SBAB的偏依赖性很强,而对H㊁DBH㊁dRDINC㊁AB㊁HBAH的偏依赖性较小㊂枝条生物量随着枝长的增加逐渐增大到一定程度后开始趋于稳定;随着基径和枝条基部断面积的增大先缓慢增加至一定程度,随后开始快速增加最后趋于平缓;而枝条生物量随弦长的增大呈幂函数形式的状态增长㊂2.4㊀模型检验结果基础模型和混合模型的标准化残差图见图6,从图6可以看出,基础模型存在较为明显的异方差现象,在混合模型中用指数函数对异方差进行矫正后模型的异方差现象被明显消除㊂分别采用各最优模型在训练集上进行训练,在独立的测试集上验证模型的预测能力,结果见表5㊂从表中可以看出各模型在训练集上的拟合效果表现良好,模型拟合效果排序为随机森林模型>混合模型>基础模型㊂利用训练好的模型在测试集上检验模型的预测能力,各模型的R2均高于0.88㊂其中随机森林模型的预测能力最佳,相较于基础模型和混合模型,R2分别提高了4.89%和1.32%;RMSE分别降低了13.60%和11.23%;MAE分别降低了15.25%和12.12%;RMAE分别降低了20.02%731南京林业大学学报(自然科学版)第47卷和12.49%㊂各模型的预测能力排序为:随机森林模型>混合模型>基础模型㊂综合上述结果可以看出,随机森林模型可以作为预测一级枝条生物量的最优模型㊂图6㊀基础模型与混合模型的标准化残差图Fig.6㊀Normalizedresidualplotsofthebasemodelandthemixed⁃effectsmodel表5㊀最优模型拟合与预测能力评价表Table5㊀Theevaluationtableofoptimalmodelfittingandpredictioncapability模型model数据集dataR2RMSE/gMAE/gRMAE/%Fp/%基础模型basemodel训练集train0.879162.1493.5138.1396.51测试集test0.880156.9390.0735.6696.84混合模型mixedmodel训练集train0.913150.4983.6531.5897.02测试集test0.911152.7586.8632.5996.99随机森林模型randomforestmodel训练集train0.930130.2367.2324.3097.83测试集test0.923135.5976.3328.5297.653㊀讨㊀论通过对枝条生物量与枝条因子进行相关性分析,发现与枝条生物量相关性较高的因子分别为枝条基部断面积(SBAB)㊁枝条基径(dB)㊁弦长(LBCL)和枝长(LBL),由于这4个变量之间存在共线性,最终选择将相关性更高的SBAB和LBCL作为基础模型的自变量㊂利用TLS数据建立的枝条生物量模型相较于实测数据模型参数差异不大,但TLS数据模型的预测效果更好,证明可以利用TLS数据进行枝条生物量模型的研究㊂基于样木效应的混合模型拟合效果明显优于基础模型,广义正定矩阵可以作为随机效应的最优方差⁃协方差矩阵㊂当随机效应分别加在SBAB和LBCL上时,模型有最佳的拟合优度㊂指数函数为校正模型异方差的最佳权函数形式,可明显校正模型的异方差现象㊂随机森林模型作为一种非参数模型可以很好地对枝条生物量进行预测㊂通过分析各自变量对枝条生物量影响的相对重要性,发现重要性排在前4位的分别为SBAB㊁LBCL㊁dB和LBL,这与传统参数模型中分析相关性得出的结论几乎一致,证明了非参数模型也具有一定的解释性㊂3个模型的拟合效果排序为:随机森林模型>混合模型>基础模型㊂以往关于枝条生物量模型的研究大都是采用参数方程的方法,这些方程中采用的变量一般为单木的易测因子如胸径㊁树高[15-16]等㊂这些方程虽然用起来十分简便,但是对于枝条生物量的预测能力较低,难以满足在树冠水平上对枝条属性的研究㊂本研究利用点云数据提取出的枝条特征因子在枝条水平上建立了人工长白落叶松枝条生物量非线性混合模型,采用的自变量为SBAB和LBCL,模型R2较高,大大提高了枝条生物量的预测精度㊂董利虎等[1]㊁许昊等[17]分别利用枝条特征因子在枝条水平进行了人工林红松和人工林杉木枝条生物量的研究,通过对非线性模型进行对数转换来消除生物量模型中普遍存在的异方差现象,建立了枝条生物量线性混合效应模型,最后通过校正系数对所得结果进行校正以获得生物量的无偏估计㊂而本研究直接以枝条基部断面积和弦长为自变量建立了枝条生物量非线性混合模型,并利用指数函数对模型的异方差进行校正,取得了良好的效果㊂传统上基于枝条特征因子建立枝条生物量模型时往往需要伐倒树木进行枝条解析[1,17-20,25]以获得枝条特征因子数据㊂随着激光雷达技术的发展,利用点云数据重现树木三维结构[6,8-9,24]已成为现实㊂本研究利用点云数据提取出的枝条特征因子及其实测数据分别建立了长白落叶松枝条生物量基础模型,证明了利用TLS数据进行枝条生物量研究的可行性,为利用无损化测量得到的枝条因子的实际应用奠定了一定的实践基础㊂831。

2012第四届全国大学生数学竞赛四川赛区获奖名单

2012第四届全国大学生数学竞赛四川赛区获奖名单

2012第四届全国大学生数学竞赛四川赛区获奖名单更新时间:2012-11-6 16:55:252012第四届全国大学生数学竞赛四川赛区评奖工作已经结束,经过全国大学生数学竞赛四川赛区组委会专家评审,现将四川赛区本次获奖名单公布如下:第四届全国大学生数学竞赛四川赛区数学类奖名单赛区一等奖编号姓名性别学校名称学院名称参赛类型准考证1周超男四川大学应用数学数学专业201202王宇男四川大学应用数学数学专业201203陈攀男电子科技大学数学数学专业201204周世奇男电子科技大学数学数学专业201205戴烜中男四川大学吴玉章学院数学专业201206邓田娟女四川大学数学基地班数学专业201207杨爽女乐山师范学院数学与应用数学数学专业201208任偲骐男四川大学应用数学数学专业201209李昊天男四川大学数学大类数学专业2012010李波男乐山师范学院数学与应用数学数学专业2012011傅费思男四川大学应用数学数学专业20120赛区二等奖编号姓名性别学校名称学院名称参赛类型准考证1李姗蔓女宜宾学院数学与应用数学数学专业201202李佑铭男四川大学数学与应用数学(基础) 数学专业201203孔祥飞男电子科技大学数学数学专业201204石云鹏男四川大学数学大类数学专业201205佘睿男四川大学基地班数学专业201206郭汝驰男四川大学应用数学数学专业201207常建龙男电子科技大学数学数学专业2012010402 16 8鲁东男西南交通大学数学数学专业2012010803 16 9雷文超男四川大学信息与计算科学数学专业2012010807 20 10岳阳男西南交通大学计算数学专业2012010703 21 11罗开宝男内江师范学院数学与应用数学专业数学专业2012010905 22 12胡方涛男电子科技大学数学数学专业2012010730 23 13余东洲男四川大学数学大类数学专业2012010401 24 14涂代源男四川大学应用数学数学专业2012010422 24 15何俊材男四川大学应用数学数学专业2012010427 24 16孙硕男电子科技大学数学数学专业2012010829 24 17钟友林男四川大学应用数学数学专业2012011027 24 赛区三等奖编号姓名性别学校名称学院名称参赛类型准考证号名次1唐浩耘男四川大学应用数学数学专业2012010127 29 2谭大超男乐山师范学院数学与应用数学数学专业2012010214 29 3纪圣塨男西南交通大学数学数学专业2012010710 29 4赵威男四川大学数学大类数学专业2012011003 29 5胡留春男西南交通大学统计数学专业2012011119 29 6张瑞梅女内江师范学院数学与应用数学专业数学专业2012011224 29 7周义满男电子科技大学数学数学专业2012010129 35 8李艳晶女四川师范大学数学与应用数学数学专业2012010228 35 9夏王洋男四川大学应用数学数学专业2012010706 35 10王玖玮男四川大学应用数学数学专业2012010801 35 11杨红明男电子科技大学数学数学专业2012010808 35 12王琦男四川大学数学基地班数学专业2012010310 40 13周玲女四川大学数学大类数学专业2012010330 40 14李福团男西南交通大学计算数学专业2012010906 40 15唐志骥男四川大学数学与应用数学(金融)数学专业2012011129 40 16刘春燕女西华师范大学数学与应用数学数学专业2012010221 44 17王世莉女乐山师范学院数学与应用数学数学专业2012010413 4418胡又壬女四川大学基地班数学专业2012010816 4419阳密女四川师范大学数学与应用数学数学专业2012011107 4420刘连林男西南交通大学数学数学专业2012010105 4821侯振铎男四川大学数学基地班数学专业2012010202 4822张鹏成男四川大学吴玉章学院数学专业2012010227 4823王凯男四川大学数学大类数学专业2012011202 4824李洋洋男中国民航飞行学院信息0902 数学专业2012011221 4825郭凯迪女四川大学数学与经济创新班数学专业2012010120 4826王豪男四川民族学院数学与应用数学专业数学专业2012010404 48第四届全国大学生数学竞赛四川赛区非数学类获名单赛区一等奖编号姓名性别学校名称学院名称参赛类型准考证号名1 姚青松男西南交通大学经济学非数学专业2012023420 12 刘同科男电子科技大学计算机非数学专业2012023715 23 郑飞洋男电子科技大学物电非数学专业2012023214 34 孙健男电子科技大学通信非数学专业2012022301 45 胡欢男电子科技大学微固非数学专业2012023004 56 李娇女电子科技大学生命非数学专业2012024823 57 马骥男电子科技大学自动化非数学专业2012022613 78 汪威男电子科技大学数理基科班非数学专业2012022428 89 赵旭男西南交通大学材料成型非数学专业2012023218 810 陈宣奇男电子科技大学微固非数学专业2012024110 811 李辉男电子科技大学物电非数学专业2012023128 1112 孟帆男电子科技大学电工非数学专业2012021612 1213 王冰川男西华大学自动化非数学专业2012022720 1314 何加智男电子科技大学物电非数学专业2012022724 1415 许亿志男电子科技大学通信非数学专业2012021528 1516 吴昊男西南交通大学茅以升学院非数学专业2012022919 1517 温勇兵男四川大学制造学院非数学专业2012024524 1518 张亦弛男电子科技大学微固非数学专业2012024822 1519 余锦斌男电子科技大学通信非数学专业2012022014 1920 王洋男中国民航飞行学院计科0901 非数学专业2012022427 1921 朱启勋男电子科技大学通信非数学专业2012023601 1922 解为良男西南交通大学测绘工程非数学专业2012023606 2223 颜在满男西南交通大学生命学院非数学专业2012023814 2224 邹艳波男成都理工大学电气工程及其自动化非数学专业2012024014 2225 李斯文男四川大学高分子科学与工程学院非数学专业2012023714 2526 娄舜男电子科技大学光电非数学专业2012023903 2527 胡勇男电子科技大学光电非数学专业2012024028 2528 屈伸男电子科技大学数理基科班非数学专业2012024322 2529 何涛男电子科技大学机电非数学专业2012024806 2530 李珂男西南交通大学热能与动力工程非数学专业2012021630 3031 卢享男电子科技大学计算机非数学专业2012023827 3032 陈勇男西南石油大学电气工程非数学专业2012024122 30赛区二等奖编号姓名性别学校名称学院名称参赛类型准考证号名1 舒海男电子科技大学物电非数学专业2012021713 332 赵海林男电子科技大学通信非数学专业2012024001 333 张永伟男电子科技大学数理基科班非数学专业2012024225 334 米从威男电子科技大学物电非数学专业2012021807 365 余坤益男四川大学化学学院非数学专业2012022905 366 张齐艳男四川大学高分子科学与工程学院非数学专业2012024226 367 郭甜甜女电子科技大学自动化非数学专业2012024628 368 沙永祥男四川大学建筑与环境学院非数学专业2012023226 409 陈军伟男四川大学水利水电学院非数学专业2012023405 4010 徐霈强男成都信息工程学院大气科学非数学专业2012024019 4011 时北极男西南交通大学力学非数学专业2012024723 4012 徐晓燕女成都理工大学旅游管理非数学专业2012024724 4013 李裕宇男西南交通大学软件工程非数学专业2012022629 4514 王刚男四川大学物理学院非数学专业2012023915 4515 黄睿捷男四川大学化学学院非数学专业2012024114 4516 钱华明男电子科技大学机电非数学专业2012024427 4517 肖电坤男西南交通大学土木非数学专业2012021523 4918 邵俊波男四川大学建筑与环境学院非数学专业2012022403 4919 刘岩男西南交通大学力学非数学专业2012022801 4920 陈廷君男西南交通大学土木非数学专业2012022906 4921 王林男成都理工大学土木工程非数学专业2012023006 4922 张志业女成都理工大学资源勘查工程非数学专业2012023125 4923 何永成男西南交通大学土木非数学专业2012024516 4924 张峰男四川大学电气信息学院非数学专业2012024616 4925 王迪松男电子科技大学数理基科班非数学专业2012024824 4926 李孟林男电子科技大学数理基科班非数学专业2012024826 4927 郭文博男电子科技大学自动化非数学专业2012022729 5928 薄文斐男电子科技大学物电非数学专业2012022808 5929 叶雷男四川大学制造学院非数学专业2012023205 5930 王玉东男四川大学物理学院非数学专业2012023222 5931 游军杰男四川大学电气信息学院非数学专业2012023527 5932 谭启贵男西南石油大学油气储运非数学专业2012024617 5933 郭焱林男四川大学电气信息学院非数学专业2012021801 6534 陶源男电子科技大学通信非数学专业2012022030 6535 孟楠男电子科技大学英才非数学专业2012022120 6536 张志伟男成都理工大学公共事业管理非数学专业2012022518 6537 张一幸男电子科技大学数理基科班非数学专业2012022704 6538 张三锋男电子科技大学微固非数学专业2012022810 6539 李坤男电子科技大学通信非数学专业2012023020 6540 赵晓凤女四川大学水利水电学院非数学专业2012023914 6541 李行健男电子科技大学数理基科班非数学专业2012024128 6542 凡航男四川大学电气信息学院非数学专业2012024415 6543 黎燕文男西南交通大学土木学院非数学专业2012021404 7544 宋昌意男电子科技大学通信非数学专业2012021510 7545 康文武男四川大学电子信息学院非数学专业2012022203 7546 王欣男西南科技大学土木非数学专业2012023101 7547 周大鸥男成都理工大学核工程与核技术非数学专业2012023207 7548 张路男西南石油大学石油工程非数学专业2012023502 7549 刘鹏男西华大学热动非数学专业2012023809 75赛区三等奖编号姓名性别学校名称学院名称参赛类型准考证号名1 付雪冬女电子科技大学自动化非数学专业2012021816 822 郭达鸿男西南交通大学力学与工程非数学专业2012022016 823 李飞跃男四川大学工商管理学院非数学专业2012022214 824 徐文男西南交通大学测绘非数学专业2012022621 825 应宏辉男电子科技大学物电非数学专业2012023906 826 颜峰男电子科技大学自动化非数学专业2012024113 827 邵春雨女电子科技大学光电非数学专业2012024417 828 刘忠智男成都理工大学物探非数学专业2012024604 829 刘丰男四川大学物理学院非数学专业2012022401 9010 张磊男西南交通大学电气非数学专业2012022425 9011 郭尚岐男电子科技大学数理基科班非数学专业2012022524 9012 万文杰男成都理工大学核工程与核技术非数学专业2012022830 9013 齐佳宏男四川大学建筑与环境学院非数学专业2012022913 9014 罗文刚男西南科技大学机械非数学专业2012022915 9015 尤成宇男西南财经大学保险非数学专业2012023306 9016 许加宝男中国民航飞行学院飞构1102 非数学专业2012023705 9017 赵亚丽女四川大学电子信息学院非数学专业2012023802 9018 韩小宁男西南交通大学茅机非数学专业2012024606 9019 林碧玉女西南交通大学地学学院非数学专业2012021508 1020 汪冬兵男西南交通大学土木非数学专业2012021602 1021 王雪微女电子科技大学物电非数学专业2012021608 1022 莫一奉男四川大学电气信息学院非数学专业2012022103 1023 许度男西南交通大学土木工程非数学专业2012022213 1024 池启康男四川大学电子信息学院非数学专业2012022811 1025 王智坚男成都学院自动化非数学专业2012022816 1026 张恩来男四川大学建筑与环境学院非数学专业2012023301 1027 孙继发男电子科技大学自动化非数学专业2012023625 1028 潘誉男电子科技大学能源非数学专业2012023822 1029 何彪男电子科技大学微固非数学专业2012023916 1030 马世琪男电子科技大学通信非数学专业2012024021 1031 何伟男四川大学锦江学院土木工程非数学专业2012021611 1132 张熙男电子科技大学数理基科班非数学专业2012021814 1133 丁祎晓男电子科技大学微固非数学专业2012022318 1134 郑炯卫男电子科技大学微固非数学专业2012022429 1135 汤华男四川大学电气信息学院非数学专业2012022505 1136 吕彦璆女电子科技大学电工非数学专业2012022512 1137 孙建东男宜宾学院电子信息非数学专业2012022520 1138 宋泽良男成都学院机械设计非数学专业2012022714 1139 冯俊羲男四川大学电子信息学院非数学专业2012023126 1140 牛玉宁女成都理工大学核工程与核技术非数学专业2012023401 1141 黎渺军男四川大学电气信息学院非数学专业2012023414 1142 张邦健男内江师范学院物理学非数学专业2012023707 1143 尹鑫鑫男电子科技大学生命非数学专业2012024018 1144 郝勇男电子科技大学物电非数学专业2012024302 1145 杨云帆男西南交通大学茅以升学院非数学专业2012024608 1146 郭征明男四川大学建筑与环境学院非数学专业2012024623 1147 蒋勇男西南交通大学机械非数学专业2012021422 1248 阮艳女成都理工大学测控技术与仪器非数学专业2012021505 1249 刘海男四川大学锦江学院土木工程非数学专业2012021725 1250 李凯男电子科技大学英才非数学专业2012021823 1251 陈程男四川大学物理学院非数学专业2012021928 1252 肖龙男电子科技大学机电非数学专业2012022101 1253 陈晋荣男成都理工大学勘查技术与工程非数学专业2012022606 1254 夏建雄男四川大学物理学院非数学专业2012022712 1255 孙立志男四川大学电气信息学院非数学专业2012022902 1256 陈宇皓男西华大学热动非数学专业2012023127 1257 冯琳女西南交通大学交控非数学专业2012023217 1258 汪迁男四川大学建筑与环境学院非数学专业2012023323 1259 张庆伟男电子科技大学成都学院通信工程非数学专业2012024313 1260 熊林云男四川大学吴玉章学院非数学专业2012024811 1261 罗明有男宜宾学院生物工程非数学专业2012021719 1462 徐明男西南科技大学土木非数学专业2012021829 1463 魏于量男西南交通大学土木非数学专业2012022114 1464 王握男西南交通大学交通运输非数学专业2012022123 1465 张海波男电子科技大学光电非数学专业2012022210 1466 郑力铭男西南石油大学化学工程非数学专业2012023201 1467 徐斌男电子科技大学物电非数学专业2012023209 1468 李广西男电子科技大学计算机非数学专业2012023329 1469 黄凯文男西南交通大学土木非数学专业2012023522 1470 陈国钱男西南科技大学核工非数学专业2012023719 1471 安盼盼女西南交通大学信息学院非数学专业2012024009 1472 舒秉亮男电子科技大学通信非数学专业2012024330 1473 涂洋男成都理工大学电气工程及其自动化非数学专业2012024404 1474 唐静女四川大学工商管理学院非数学专业2012024505 1475 谭孝元男成都理工大学电气工程及其自动化非数学专业2012024511 1476 鲁宇航男成都理工大学空间信息与数字技术非数学专业2012021730 1577 朱贵平男西南石油大学石油工程非数学专业2012022007 1578 石茂林男四川大学物理学院非数学专业2012022514 1579 丁占岭男四川大学吴玉章学院非数学专业2012023619 1580 李宜筱女西南科技大学核工非数学专业2012023702 1581 郑康立男成都学院通信工程非数学专业2012024813 15。

融合Albert_模型的珍稀濒危植物知识图谱的构建

湖南农业大学学报(自然科学版) 2023,49(5):616–623.DOI:10.13331/j.cnki.jhau.2023.05.017 Journal of Hunan Agricultural University(Natural Sciences)

引用格式: 田梦晖,陈明,席晓桃.融合Albert模型的珍稀濒危植物知识图谱的构建[J].湖南农业大学学报(自然科学版),2023,49(5):616–623. TIAN M H,CHEN M, XI X T.Construction of the knowledge graph for the rare and endangered plants based on Albert model[J].Journal of Hunan Agricultural University(Natural Sciences),2023,49(5):616–623. 投稿网址:http://xb.hunau.edu.cn

融合Albert模型的珍稀濒危植物知识图谱的构建 田梦晖1,2,陈明1,2*,席晓桃1,2

(1.上海海洋大学信息学院,上海 201306;2.农业农村部渔业信息重点实验室,上海 201306) 摘 要:针对珍稀濒危植物形态特征、分类等级、濒危系数、保护措施等知识不明确的问题,设计了文本融合轻量

级双向转换编码表示模型(Albert)的知识抽取模型框架,实现批量抽取珍稀濒危植物知识,从而构建珍稀濒危植物知识图谱:1) 在现存一般性植物本体的基础上,采用自顶向下的方式构建珍稀濒危植物本体,得到5个体系,即物种分类体系、生长形态特征体系、命名体系、保护现状体系和生态习性体系;2) 采取Albert预训练模型来增强下游任务模型输入向量的珍稀濒危植物属性描述文本语义的表征能力;3) 利用BiLSTM–CRF模型和BiGRU–Attention模型分别实现命名实体识别和关系抽取。在珍稀濒危植物数据测试集上对模型的有效性进行验证,结果表明,命名实体识别模型和关系抽取模型的召回率和准确率的调和平均值(F1)值分别达到98.07%和93.76%,将得到的大量的实体和关系所形成的三元组存储在图数据库Neo4j中,完成珍稀濒危植物知识图谱的可视化展示。 关 键 词:珍稀濒危植物;Albert模型;知识图谱;本体;命名实体识别;关系抽取

2015年浙江省科技进步奖会评公示


J15050024 高端功能座椅橡胶弹簧复合底盘关键支撑技术及行业示范应用 浙江永艺家具股份有限公司,浙江大学 J15050025 直连式铸焊电动车用密封铅酸蓄电池 J15050308 基于机械化学的珍珠复合美白因子的提取与分离技术 J15060001 岛屿型截面的新型热熔纤维的研制及应用 超威电源有限公司 浙江欧诗漫集团有限公司 绍兴文理学院
浙江中医药大学
浙江科技学院,浙江省围海建设集团股份有限公司,杭州陶钩科技有限公 浙江科技学院,浙江新东港药业股份有限公司 浙江大学医学院附属第一医院 浙江大学医学院附属第二医院 浙江大学
J15140127 畜禽抗生素减量和养分减排的新型微生态制剂技术研究与产业化 浙江大学,浙江农林大学,浙江惠嘉生物科技有限公司,杭州加大生物科 J15140129 突发事件下密集人群疏散方法研究及应用 J15140130 生产性服务业创新能力评价体系研究 J15450002 新型门座起重机关键技术的研发与应用 J15540001 高流变性软土地层地铁车站基坑修建关键技术 J15550001 拱形钢塔斜拉桥建设和养护关键技术研究与工程示范 J15570003 瓯飞围垦工程关键水力学技术研究与应用 J15590001 毛竹秋冬季出笋高效培育关键技术研究 浙江大学,杭州市地铁集团有限责任公司 浙江大学 水利部杭州机械设计研究所,杭州江河机电装备工程有限公司
J15050016 典型养殖排放污水(物)减污控制和净化处理技术研发与应用 湖州师范学院,浙江省淡水水产研究所,浙江省农业科学院,长兴县虹星 J15130007 高清视频物联的感知层和应用层关键技术及产业化 J15140001 阿托伐他汀钙化学-酶法合成关键技术及产业化 J15140005 面向海绵城市建设的生态铺装关键技术及其应用 浙江大华技术股份有限公司,杭州电子科技大学 浙江工业大学,浙江新东港药业股份有限公司

交通银行2010年大学生招聘笔试名单

李颖琦

贵州财经学院
本科
财务管理
第一场
JYZ1001201073
李勇

南开大学
硕士研究生
国际金融
第一场
JYZ1001201074
李曌

贵州大学
本科
财务管理
第一场
JYZ1001201075
梁璐

贵州大学
本科
国际经济与贸易
第一场
JYZ1001201076
林莉

贵州大学
本科
财务管理
第一场
JYZ1001201077
戴宜娥

贵州大学
本科
金融学
第一场
JYZ1001201028
邓琴

贵州财经学院
本科
财务管理
第一场
JYZ1001201029
邓莎

贵州财经学院
本科
会计学
第一场
JYZ1001201030
邓文青

贵州大学
本科
国际经济与贸易
第一场
JYZ1001201031
邓益

南昌航空大学
本科
经济学
第一场
JYZ1001201032
杨振宇

贵州财经学院
本科
金融学
第一场
JYZ1001201158
叶洪光

贵州财经学院
交通银行2010年大学生招聘笔试名单
第一场:13:30-15:30第二场:15:45-17:45
场次
考号
姓名
性别
毕业学校
学历
专业
第一场
JYZ1001201001

2011年国家自然基金获得者名录——国防科学技术大学


F020808 F020809 F020705 F020601 F010202 F010907 F020208 F010304 A040403 E020401 A040204 E050402 E050401 F030301 F010408 F020302 F020701 F040407 A011701 F020503 F050804 E051002 F020502 F010304 F020301 A040204 E051202 F010602 F050304 F010408
ห้องสมุดไป่ตู้
金额 64 70 24 56 20 25 25 23 20 20 22 22 28 17 58 25 20 23 56 24 22 26 56 28 23 25 23 24 48
61170287 11172326 61170048 61170286 61120106005 11102229 71102004 51105369 11172322 61171019 61171016 61170049 11172329 51175500 61171017 51105365 61175015 61170083 71101150 31171266 61174002 61174206 11172327 11104350 11172324 11173068 11174369 61170045 51173202 61108089
F020805 A020403 F020203 F020805 F0203 A020207 G0214 E050601 A020205 F010502 F010602 F020305 A020602 E050201 F010505 E050202 F030403 F0203 G0104 C060702 F030301 F030301 A020413 A040302 A020401 A030802 A040406 F0203 E030301 F0513
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
相关文档
最新文档