Feature-Based Tagger of Approximations of Functional Arabic Morphology

合集下载

人工智能领域中英文专有名词汇总

名词解释中英文对比<using_information_sources> social networks 社会网络abductive reasoning 溯因推理action recognition(行为识别)active learning(主动学习)adaptive systems 自适应系统adverse drugs reactions(药物不良反应)algorithm design and analysis(算法设计与分析) algorithm(算法)artificial intelligence 人工智能association rule(关联规则)attribute value taxonomy 属性分类规范automomous agent 自动代理automomous systems 自动系统background knowledge 背景知识bayes methods(贝叶斯方法)bayesian inference(贝叶斯推断)bayesian methods(bayes 方法)belief propagation(置信传播)better understanding 内涵理解big data 大数据big data(大数据)biological network(生物网络)biological sciences(生物科学)biomedical domain 生物医学领域biomedical research(生物医学研究)biomedical text(生物医学文本)boltzmann machine(玻尔兹曼机)bootstrapping method 拔靴法case based reasoning 实例推理causual models 因果模型citation matching (引文匹配)classification (分类)classification algorithms(分类算法)clistering algorithms 聚类算法cloud computing(云计算)cluster-based retrieval (聚类检索)clustering (聚类)clustering algorithms(聚类算法)clustering 聚类cognitive science 认知科学collaborative filtering (协同过滤)collaborative filtering(协同过滤)collabrative ontology development 联合本体开发collabrative ontology engineering 联合本体工程commonsense knowledge 常识communication networks(通讯网络)community detection(社区发现)complex data(复杂数据)complex dynamical networks(复杂动态网络)complex network(复杂网络)complex network(复杂网络)computational biology 计算生物学computational biology(计算生物学)computational complexity(计算复杂性) computational intelligence 智能计算computational modeling(计算模型)computer animation(计算机动画)computer networks(计算机网络)computer science 计算机科学concept clustering 概念聚类concept formation 概念形成concept learning 概念学习concept map 概念图concept model 概念模型concept modelling 概念模型conceptual model 概念模型conditional random field(条件随机场模型) conjunctive quries 合取查询constrained least squares (约束最小二乘) convex programming(凸规划)convolutional neural networks(卷积神经网络) customer relationship management(客户关系管理) data analysis(数据分析)data analysis(数据分析)data center(数据中心)data clustering (数据聚类)data compression(数据压缩)data envelopment analysis (数据包络分析)data fusion 数据融合data generation(数据生成)data handling(数据处理)data hierarchy (数据层次)data integration(数据整合)data integrity 数据完整性data intensive computing(数据密集型计算)data management 数据管理data management(数据管理)data management(数据管理)data miningdata mining 数据挖掘data model 数据模型data models(数据模型)data partitioning 数据划分data point(数据点)data privacy(数据隐私)data security(数据安全)data stream(数据流)data streams(数据流)data structure( 数据结构)data structure(数据结构)data visualisation(数据可视化)data visualization 数据可视化data visualization(数据可视化)data warehouse(数据仓库)data warehouses(数据仓库)data warehousing(数据仓库)database management systems(数据库管理系统)database management(数据库管理)date interlinking 日期互联date linking 日期链接Decision analysis(决策分析)decision maker 决策者decision making (决策)decision models 决策模型decision models 决策模型decision rule 决策规则decision support system 决策支持系统decision support systems (决策支持系统) decision tree(决策树)decission tree 决策树deep belief network(深度信念网络)deep learning(深度学习)defult reasoning 默认推理density estimation(密度估计)design methodology 设计方法论dimension reduction(降维) dimensionality reduction(降维)directed graph(有向图)disaster management 灾害管理disastrous event(灾难性事件)discovery(知识发现)dissimilarity (相异性)distributed databases 分布式数据库distributed databases(分布式数据库) distributed query 分布式查询document clustering (文档聚类)domain experts 领域专家domain knowledge 领域知识domain specific language 领域专用语言dynamic databases(动态数据库)dynamic logic 动态逻辑dynamic network(动态网络)dynamic system(动态系统)earth mover's distance(EMD 距离) education 教育efficient algorithm(有效算法)electric commerce 电子商务electronic health records(电子健康档案) entity disambiguation 实体消歧entity recognition 实体识别entity recognition(实体识别)entity resolution 实体解析event detection 事件检测event detection(事件检测)event extraction 事件抽取event identificaton 事件识别exhaustive indexing 完整索引expert system 专家系统expert systems(专家系统)explanation based learning 解释学习factor graph(因子图)feature extraction 特征提取feature extraction(特征提取)feature extraction(特征提取)feature selection (特征选择)feature selection 特征选择feature selection(特征选择)feature space 特征空间first order logic 一阶逻辑formal logic 形式逻辑formal meaning prepresentation 形式意义表示formal semantics 形式语义formal specification 形式描述frame based system 框为本的系统frequent itemsets(频繁项目集)frequent pattern(频繁模式)fuzzy clustering (模糊聚类)fuzzy clustering (模糊聚类)fuzzy clustering (模糊聚类)fuzzy data mining(模糊数据挖掘)fuzzy logic 模糊逻辑fuzzy set theory(模糊集合论)fuzzy set(模糊集)fuzzy sets 模糊集合fuzzy systems 模糊系统gaussian processes(高斯过程)gene expression data 基因表达数据gene expression(基因表达)generative model(生成模型)generative model(生成模型)genetic algorithm 遗传算法genome wide association study(全基因组关联分析) graph classification(图分类)graph classification(图分类)graph clustering(图聚类)graph data(图数据)graph data(图形数据)graph database 图数据库graph database(图数据库)graph mining(图挖掘)graph mining(图挖掘)graph partitioning 图划分graph query 图查询graph structure(图结构)graph theory(图论)graph theory(图论)graph theory(图论)graph theroy 图论graph visualization(图形可视化)graphical user interface 图形用户界面graphical user interfaces(图形用户界面)health care 卫生保健health care(卫生保健)heterogeneous data source 异构数据源heterogeneous data(异构数据)heterogeneous database 异构数据库heterogeneous information network(异构信息网络) heterogeneous network(异构网络)heterogenous ontology 异构本体heuristic rule 启发式规则hidden markov model(隐马尔可夫模型)hidden markov model(隐马尔可夫模型)hidden markov models(隐马尔可夫模型) hierarchical clustering (层次聚类) homogeneous network(同构网络)human centered computing 人机交互技术human computer interaction 人机交互human interaction 人机交互human robot interaction 人机交互image classification(图像分类)image clustering (图像聚类)image mining( 图像挖掘)image reconstruction(图像重建)image retrieval (图像检索)image segmentation(图像分割)inconsistent ontology 本体不一致incremental learning(增量学习)inductive learning (归纳学习)inference mechanisms 推理机制inference mechanisms(推理机制)inference rule 推理规则information cascades(信息追随)information diffusion(信息扩散)information extraction 信息提取information filtering(信息过滤)information filtering(信息过滤)information integration(信息集成)information network analysis(信息网络分析) information network mining(信息网络挖掘) information network(信息网络)information processing 信息处理information processing 信息处理information resource management (信息资源管理) information retrieval models(信息检索模型) information retrieval 信息检索information retrieval(信息检索)information retrieval(信息检索)information science 情报科学information sources 信息源information system( 信息系统)information system(信息系统)information technology(信息技术)information visualization(信息可视化)instance matching 实例匹配intelligent assistant 智能辅助intelligent systems 智能系统interaction network(交互网络)interactive visualization(交互式可视化)kernel function(核函数)kernel operator (核算子)keyword search(关键字检索)knowledege reuse 知识再利用knowledgeknowledgeknowledge acquisitionknowledge base 知识库knowledge based system 知识系统knowledge building 知识建构knowledge capture 知识获取knowledge construction 知识建构knowledge discovery(知识发现)knowledge extraction 知识提取knowledge fusion 知识融合knowledge integrationknowledge management systems 知识管理系统knowledge management 知识管理knowledge management(知识管理)knowledge model 知识模型knowledge reasoningknowledge representationknowledge representation(知识表达) knowledge sharing 知识共享knowledge storageknowledge technology 知识技术knowledge verification 知识验证language model(语言模型)language modeling approach(语言模型方法) large graph(大图)large graph(大图)learning(无监督学习)life science 生命科学linear programming(线性规划)link analysis (链接分析)link prediction(链接预测)link prediction(链接预测)link prediction(链接预测)linked data(关联数据)location based service(基于位置的服务) loclation based services(基于位置的服务) logic programming 逻辑编程logical implication 逻辑蕴涵logistic regression(logistic 回归)machine learning 机器学习machine translation(机器翻译)management system(管理系统)management( 知识管理)manifold learning(流形学习)markov chains 马尔可夫链markov processes(马尔可夫过程)matching function 匹配函数matrix decomposition(矩阵分解)matrix decomposition(矩阵分解)maximum likelihood estimation(最大似然估计)medical research(医学研究)mixture of gaussians(混合高斯模型)mobile computing(移动计算)multi agnet systems 多智能体系统multiagent systems 多智能体系统multimedia 多媒体natural language processing 自然语言处理natural language processing(自然语言处理) nearest neighbor (近邻)network analysis( 网络分析)network analysis(网络分析)network analysis(网络分析)network formation(组网)network structure(网络结构)network theory(网络理论)network topology(网络拓扑)network visualization(网络可视化)neural network(神经网络)neural networks (神经网络)neural networks(神经网络)nonlinear dynamics(非线性动力学)nonmonotonic reasoning 非单调推理nonnegative matrix factorization (非负矩阵分解) nonnegative matrix factorization(非负矩阵分解) object detection(目标检测)object oriented 面向对象object recognition(目标识别)object recognition(目标识别)online community(网络社区)online social network(在线社交网络)online social networks(在线社交网络)ontology alignment 本体映射ontology development 本体开发ontology engineering 本体工程ontology evolution 本体演化ontology extraction 本体抽取ontology interoperablity 互用性本体ontology language 本体语言ontology mapping 本体映射ontology matching 本体匹配ontology versioning 本体版本ontology 本体论open government data 政府公开数据opinion analysis(舆情分析)opinion mining(意见挖掘)opinion mining(意见挖掘)outlier detection(孤立点检测)parallel processing(并行处理)patient care(病人医疗护理)pattern classification(模式分类)pattern matching(模式匹配)pattern mining(模式挖掘)pattern recognition 模式识别pattern recognition(模式识别)pattern recognition(模式识别)personal data(个人数据)prediction algorithms(预测算法)predictive model 预测模型predictive models(预测模型)privacy preservation(隐私保护)probabilistic logic(概率逻辑)probabilistic logic(概率逻辑)probabilistic model(概率模型)probabilistic model(概率模型)probability distribution(概率分布)probability distribution(概率分布)project management(项目管理)pruning technique(修剪技术)quality management 质量管理query expansion(查询扩展)query language 查询语言query language(查询语言)query processing(查询处理)query rewrite 查询重写question answering system 问答系统random forest(随机森林)random graph(随机图)random processes(随机过程)random walk(随机游走)range query(范围查询)RDF database 资源描述框架数据库RDF query 资源描述框架查询RDF repository 资源描述框架存储库RDF storge 资源描述框架存储real time(实时)recommender system(推荐系统)recommender system(推荐系统)recommender systems 推荐系统recommender systems(推荐系统)record linkage 记录链接recurrent neural network(递归神经网络) regression(回归)reinforcement learning 强化学习reinforcement learning(强化学习)relation extraction 关系抽取relational database 关系数据库relational learning 关系学习relevance feedback (相关反馈)resource description framework 资源描述框架restricted boltzmann machines(受限玻尔兹曼机) retrieval models(检索模型)rough set theroy 粗糙集理论rough set 粗糙集rule based system 基于规则系统rule based 基于规则rule induction (规则归纳)rule learning (规则学习)rule learning 规则学习schema mapping 模式映射schema matching 模式匹配scientific domain 科学域search problems(搜索问题)semantic (web) technology 语义技术semantic analysis 语义分析semantic annotation 语义标注semantic computing 语义计算semantic integration 语义集成semantic interpretation 语义解释semantic model 语义模型semantic network 语义网络semantic relatedness 语义相关性semantic relation learning 语义关系学习semantic search 语义检索semantic similarity 语义相似度semantic similarity(语义相似度)semantic web rule language 语义网规则语言semantic web 语义网semantic web(语义网)semantic workflow 语义工作流semi supervised learning(半监督学习)sensor data(传感器数据)sensor networks(传感器网络)sentiment analysis(情感分析)sentiment analysis(情感分析)sequential pattern(序列模式)service oriented architecture 面向服务的体系结构shortest path(最短路径)similar kernel function(相似核函数)similarity measure(相似性度量)similarity relationship (相似关系)similarity search(相似搜索)similarity(相似性)situation aware 情境感知social behavior(社交行为)social influence(社会影响)social interaction(社交互动)social interaction(社交互动)social learning(社会学习)social life networks(社交生活网络)social machine 社交机器social media(社交媒体)social media(社交媒体)social media(社交媒体)social network analysis 社会网络分析social network analysis(社交网络分析)social network(社交网络)social network(社交网络)social science(社会科学)social tagging system(社交标签系统)social tagging(社交标签)social web(社交网页)sparse coding(稀疏编码)sparse matrices(稀疏矩阵)sparse representation(稀疏表示)spatial database(空间数据库)spatial reasoning 空间推理statistical analysis(统计分析)statistical model 统计模型string matching(串匹配)structural risk minimization (结构风险最小化) structured data 结构化数据subgraph matching 子图匹配subspace clustering(子空间聚类)supervised learning( 有support vector machine 支持向量机support vector machines(支持向量机)system dynamics(系统动力学)tag recommendation(标签推荐)taxonmy induction 感应规范temporal logic 时态逻辑temporal reasoning 时序推理text analysis(文本分析)text anaylsis 文本分析text classification (文本分类)text data(文本数据)text mining technique(文本挖掘技术)text mining 文本挖掘text mining(文本挖掘)text summarization(文本摘要)thesaurus alignment 同义对齐time frequency analysis(时频分析)time series analysis( 时time series data(时间序列数据)time series data(时间序列数据)time series(时间序列)topic model(主题模型)topic modeling(主题模型)transfer learning 迁移学习triple store 三元组存储uncertainty reasoning 不精确推理undirected graph(无向图)unified modeling language 统一建模语言unsupervisedupper bound(上界)user behavior(用户行为)user generated content(用户生成内容)utility mining(效用挖掘)visual analytics(可视化分析)visual content(视觉内容)visual representation(视觉表征)visualisation(可视化)visualization technique(可视化技术) visualization tool(可视化工具)web 2.0(网络2.0)web forum(web 论坛)web mining(网络挖掘)web of data 数据网web ontology lanuage 网络本体语言web pages(web 页面)web resource 网络资源web science 万维科学web search (网络检索)web usage mining(web 使用挖掘)wireless networks 无线网络world knowledge 世界知识world wide web 万维网world wide web(万维网)xml database 可扩展标志语言数据库附录 2 Data Mining 知识图谱（共包含二级节点15 个，三级节点93 个）间序列分析)监督学习)领域二级分类三级分类。

基于特征分选策略的中文共指消解方法

ＣｈｎｓｒｆｒｎｅＲｅｏｕｉｎＭｈｄｉｅｅＣｏｅｅｅｃｓｌｔｅｏｏｔＢａｅｎＦｅｔｒｓｅｔｖｅｅｔｎＳｒｔｇｓｄ０ａｕｅＲｅｐｃｉｅＳｌｃｉｔａｅｙｏ
［ｂｔａｔｈｓｐｐｒｔｄｅｉｅｅｔｅｔｒｓａｅｐ０ｅｔｐｆｏｎｐｒｓｈｎｓｏｅｅｅｃｓｌｔｎｂｓｄｏｃｉｅｌａｎｎ，Ａｓｒｃ］ＴｉａｅｕｉｓｆｒｎａｅｓｄｕｎｔｅｕｈａｅｎＣｉｅｅｒｆｒｎｅｒｏｕｉａｅｎｍａｈｎｒｉｇｓｄｆｆｕｂｈｙｏｎｉｃｅｏｅ
１概述
共指现象广泛存在于自然语言的各种表达中，表示篇章中的一个语言单位与之前出现的语言单位存在语义上的关联（本文不讨论回指和零指），用于指向的语言单位称为照应语，
ｒｓｅｔｅｙＯｔｉｔｏａｅｕｅｓｍｅ “ ｏｓ ” ａｄｕｉｉｅｆａｕｅｆｅｔｅｙＥｘｅｉｅｔｌｒｓｌｓｓｏｔａｔｏａｍｐｏｅｔｅｅｐｃｉｌ，Ｓｓｍｅｈｄｃｎｒｄｃｏｖｈｎｉｅｎｔｚｅｔｒｓｅｃｉｌ．ｐｒｍｎａｅｕｔｈｗｈｔｔｍｅｄＣｌｉｒｖｌｖｈｅｈｌｈｐｒｂａｃｆｃｒｆｒｎｅｒｓｕｉｎｓｔｍ，ｎｍｅｓｒｅｃｅ０．％．ｅｔｒｎｅｏｏｅｅｅｃｅｏｌｔｙｓｅａｄＦ－ａｕｅｒａｈｓ８７２ｍｏ

Bursty and Hierarchical Structure in Streams_ACM_2002

Bursty and Hierarchical Structure in Streams∗Jon Kleinberg†AbstractA fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time.E-mail and news articles are two natural examples of such streams,each characterized by topics that appear, grow in intensity for a period of time,and then fade away.The published literature in a particular researchﬁeld can be seen to exhibit similar phenomena over a much longer time scale.Underlying much of the text mining work in this area is the following intuitive premise—that the appearance of a topic in a document stream is signaled by a“burst of activity,”with certain features rising sharply in frequency as the topic emerges.The goal of the present work is to develop a formal approach for modeling such “bursts,”in such a way that they can be robustly and eﬃciently identiﬁed,and can provide an organizational framework for analyzing the underlying content.The ap-proach is based on modeling the stream using an inﬁnite-state automaton,in which bursts appear naturally as state transitions;it can be viewed as drawing an analogy with models from queueing theory for bursty network traﬃc.The resulting algorithms are highly eﬃcient,and yield a nested representation of the set of bursts that imposes a hierarchical structure on the overall stream.Experiments with e-mail and research paper archives suggest that the resulting structures have a natural meaning in terms of the content that gave rise to them.1IntroductionDocuments can be naturally organized by topic,but in many settings we also experience their arrival over time.E-mail and news articles provide two clear examples of such docu-ment streams:in both cases,the strong temporal ordering of the content is necessary for making sense of it,as particular topics appear,grow in intensity,and then fade away again. Over a much longer time scale,the published literature in a particular researchﬁeld can be meaningfully understood in this way as well,with particular research themes growing and diminishing in visibility across a period of years.Work in the areas of topic detection and tracking[2,3,6,67,68],text mining[39,62,63,64],and visualization[29,47,66]has explored techniques for identifying topics in document streams comprised of news stories, using a combination of content analysis and time-series modeling.Underlying a number of these techniques is the following intuitive premise—that the appearance of a topic in a document stream is signaled by a“burst of activity,”with certain features rising sharply in frequency as the topic emerges.The goal of the present work is to develop a formal approach for modeling such“bursts,”in such a way that they can be robustly and eﬃciently identiﬁed,and can provide an organizational framework for analyzing the underlying content.The approach presented here can be viewed as drawing an analogy with models from queueing theory for bursty network traﬃc(see e.g.[4,18,35]).In addition, however,the analysis of the underlying burst patterns reveals a latent hierarchical structure that often has a natural meaning in terms of the content of the stream.My initial aim in studying this issue was a very concrete one:I wanted a better organizing principle for the enormous archives of personal e-mail that I was accumulating.Abundant anecdotal evidence,as well as academic research[7,46,65],suggested that my own experience with“e-mail overload”corresponded to a near-universal phenomenon—a consequence of both the rate at which e-mail arrives,and the demands of managing volumes of saved personal correspondence that can easily grow into tens and hundreds of megabytes of pure text content.And at a still larger scale,e-mail has become the raw material for legal proceedings[37]and historical investigation[9,41,48]—with the National Archives,for example,agreeing to accept tens of millions of e-mail messages from the Clinton White House [50].In sum,there are several settings where it is a crucial problem toﬁnd structures that can help in making sense of large volumes of e-mail.An active line of research has applied text indexing and classiﬁcation to develop e-mail interfaces that organize incoming messages into folders on speciﬁc topics,sometimes recom-mending further actions on the part of a user[5,10,14,32,33,42,51,52,54,55,56,59,60]—in eﬀect,this framework seeks to automate a kind ofﬁling system that many users im-plement manually.There has also been work on developing query interfaces to fully-indexed collections of e-mail[8].My interest here is in exploring organizing structures based more explicitly on the role of time in e-mail and other document streams.Indeed,even theﬂow of a single focused topicis modulated by the rate at which relevant messages or documents arrive,dividing naturally into more localized episodes that correspond to bursts of activity of the type suggested above.For example,my saved e-mail contains over a thousand messages relevant to the topic “grant proposals”—announcements of new funding programs,planning of proposals,and correspondence with co-authors.While one could divide this collection into sub-topics based on message content—certain people,programs,or funding agencies form the topics of some messages but not others—an equally natural and substantially orthogonal organization for this topic would take into account the sequence of episodes reﬂected in the set of messages —bursts that surround the planning and writing of certain proposals.Indeed,certain sub-topics(e.g.“the process of gathering people together for our large NSF ITR proposal”) may be much more easily characterized by a sudden conﬂuence of message-sending over a particular period of time than by textual features of the messages themselves.One can easily argue that many of the large topics represented in a document stream are naturally punctuated by bursts in this way,with theﬂow of relevant items intensifying in certain key periods.A general technique for highlighting these bursts thus has the potential to expose a great deal ofﬁne-grained structure.Before moving to a more technical overview of the methodology,let me suggest one further perspective on this issue,quite distant from computational concerns.If one were to view a particular folder of e-mail not simply as a document stream but also as something akin to a narrative that unfolds over time,then one immediately brings into play a body of work that deals explicitly with the bursty nature of time in narratives,and the way in which particular events are signaled by a compression of the time-sense.In an early concrete reference to this idea,E.M.Forster,lecturing on the structure of the novel in the1920’s, asserted that...there seems something else in life besides time,something which may conve-niently be called“value,”something which is measured not by minutes or hoursbut by intensity,so that when we look at our past it does not stretch back evenlybut piles up into a few notable pinnacles,and when we look at the future it seemssometimes a wall,sometimes a cloud,sometimes a sun,but never a chronologicalchart[20].This role of time in narratives is developed more explicitly in work of Genette[22,23],Chat-man[12],and others on anisochronies,the non-uniform relationships between the amount of time spanned by a story’s events and the amount of time devoted to these events in the actual telling of the story.Modeling Bursty Streams.Suppose we were presented with a document stream—for concreteness,consider a large folder of e-mail on a single broad topic.How should we go about identifying the main bursts of activity,and how do they help impose additional structure on the stream?The basic point emerging from the discussion above is that suchbursts correspond roughly to points at which the intensity of message arrivals increases sharply,perhaps from once every few weeks or days to once every few hours or minutes. But the rate of arrivals is in general very“rugged”:it does not typically rise smoothly to a crescendo and then fall away,but rather exhibits frequent alternations of rapidﬂurries and longer pauses in close proximity.Thus,methods that analyze gaps between consecutive message arrivals in too simplistic a way can easily be pulled into identifying large numbers of short spurious bursts,as well as fragmenting long bursts into many smaller ones.Moreover, a simple enumeration of close-together sets of messages is only aﬁrst step toward more intricate structure.The broader goal is thus to extract global structure from a robust kind of data reduction—identifying bursts only when they have suﬃcient intensity,and in a way that allows a burst to persist smoothly across a fairly non-uniform pattern of message arrivals.My approach here is to model the stream using an inﬁnite-state automaton A,which at any point in time can be in one of an underlying set of states,and emits messages at diﬀerent rates depending on its state.Speciﬁcally,the automaton A has a set of states that correspond to increasingly rapid rates of emission,and the onset of a burst is signaled by a state transition—from a lower state to a higher state.By assigning costs to state transitions, one can control the frequency of such transitions,preventing very short bursts and making it easier to identify long bursts despite transient changes in the rate of the stream.The overall framework is developed in Section2.It draws on the formalism of Markov sources used in modeling bursty network traﬃc[4,18,35],as well as the formalism of hidden Markov models [53].Using an automaton with states that correspond to higher and higher intensities provides an additional source of analytical leverage—the bursts associated with state transitions form a naturally nested structure,with a long burst of low intensity potentially containing several bursts of higher intensity inside it(and so on,recursively).For a folder of related e-mail messages,we will see in Sections2and3that this can provide a hierarchical decomposition of the temporal order,with long-running episodes intensifying into briefer ones according to a natural tree structure.This tree can thus be viewed as imposing aﬁne-grained organization on the sub-episodes within the message stream.Following this development,Section4focuses on the problem of enumerating all signif-icant bursts in a document stream,ranked by a measure of“weight.”Applied to a case in which the stream is comprised not of e-mail messages but of research paper titles over the past several decades,the set of bursts corresponds roughly to the appearance and disappear-ance of certain terms of interest in the underlying research area.The approach makes sense for many other datasets of an analogousﬂavor;in Section4,I also discuss an example based on U.S.Presidential State of the Union Addresses from1790to2002.Section5discusses the connections to related work in a range of areas,particularly the striking recent work of Swan,Allan,and Jensen[62,63,64]on overview timelines,which forms the body of research closest to the approach here.Finally,Section6discusses some further applications of themethodology—how burstiness in arrivals can help to identify certain messages as“land-marks”in a large corpus of e-mail;and how the overall framework can be applied to logs of Web usage.2A Weighted Automaton ModelPerhaps the simplest randomized model for generating a sequence of message arrival times is based on an exponential distribution:messages are emitted in a probabilistic manner,so that the gap x in time between messages i and i+1is distributed according to the“memoryless”exponential density function f(x)=αe−αx,for a parameterα>0.(In other words,the probability that the gap exceeds x is equal to e−αx.)The expected value of the gap in this model isα−1,and hence one can refer toαas the rate of message arrivals.Intuitively,a“bursty”model should extend this simple formulation by exhibiting periods of lower rate interleaved with periods of higher rate.A natural way to do this is to construct a model with multiple states,where the rate depends on the current state.Let us start with a basic model that incorporates this idea,and then extend it to the models that will primarily be used in what follows.A two-state model.Arguably the most basic bursty model of this type would be con-structed from a probabilistic automaton A with two states q0and q1,which we can think of as corresponding to“low”and“high.”When A is in state q0,messages are emitted at a slow rate,with gaps x between consecutive messages distributed independently according to a density function f0(x)=α0e−α0x When A is in state q1,messages are emitted at a faster rate,with gaps distributed independently according to f1(x)=α1e−α1x,whereα1>α0. Finally,between messages,A changes state with probability p∈(0,1),remaining in its current state with probability1−p,independently of previous emissions and state changes.Such a model could be used to generate a sequence of messages in the natural way.A begins in state q0.Before each message(including theﬁrst)is emitted,A changes state with probability p.A message is then emitted,and the gap in time until the next message is determined by the distribution associated with A’s current state.One can apply this generative model toﬁnd a likely state sequence,given a set of mes-sages.Suppose there is a given set of n+1messages,with speciﬁed arrival times;this determines a sequence of n inter-arrival gaps x=(x1,x2,...,x n).The development here will use the basic assumption that all gaps x i are strictly positive.We can use the Bayes procedure(as in e.g.[15])to determine the conditional probability of a state sequence ,...,q i n);note that this must be done in terms of the underlying density functions, q=(q i1since the gaps are not drawn from discrete distributions.Each state sequence q induces a density function f q over sequences of gaps,which has the form f q(x1,...,x n)= n t=1f i t(x t). If b denotes the number of state transitions in the sequence q—that is,the number ofindices i t so that q i t =q i t +1—then the (prior)probability of q is equal to( i t =i t +1p )(i t =i t +11−p )=p b (1−p )n −b = pq Pr [q ]f q (x )=11−p b (1−p )nn t =1f i t (x t ),where Z is the normalizing constant q Pr [q ]f q (x ).Finding a state sequence q maximizing this probability is equivalent to ﬁnding one that minimizes−ln Pr [q |x ]=b ln 1−pp + n t =1−ln f i t (x t )Finding a state sequence to minimize this cost function is a problem that can be motivated intuitively on its own terms,without recourse to the underlying probabilistic model.The ﬁrst of the two terms in the expression for c (q |x )favors sequences with a small number of state transitions,while the second term favors state sequences that conform well to the sequence x of gap values.Thus,one expects the optimum to track the global structure of bursts in the gap sequence,while holding to a single state through local periods of non-uniformity.Varying the coeﬃcient on b controls the amount of “inertia”ﬁxing the automaton in its current state.The next step is to extend this simple “high-low”model to one with a richer state set,using a cost model;this will lead to a method that also extracts hierarchical structure from the pattern of bursts.An inﬁnite-state model.Consider a sequence of n +1messages that arrive over a period of time of length T .If the messages were spaced completely evenly over this time interval,then they would arrive with gaps of size ˆg =T/n .Bursts of greater and greater intensity would be associated with gaps smaller and smaller than ˆg .This suggests focusing on an inﬁnite-state automaton whose states correspond to gap sizes that may be arbitrarily small,so as to capture the full range of possible bursts.The development here will use a cost model0132γln n per state 2013tree representation0132burstsb)optimal state sequence a)q q q q 0123q itransition costtransition cost 0emissions at rate -1s ig Figure 1:An inﬁnite-state model for bursty sequences.(a)The inﬁnite-state automaton A ∗s,γ;in state q i ,messages are emitted at a spacing in time that is distributed according to f (x )=αi e −αi x ,where αi =ˆg −1s i .There is a cost to move to states of higher index,but not to states of lower index.(b)Given a sequence of gaps between message arrivals,an optimal state sequence in A ∗s,γis computed.This gives rise to a set of nested bursts :intervals of time in which the optimal state has at least a certain index.The inclusions among the set of bursts can be naturally represented by a tree structure.as in the two-state case,where the underlying goal is to ﬁnd a state sequence of minimum cost.Thus,consider an automaton with a “base state”q 0that has an associated exponential density function f 0with rate α0=ˆg −1=n/T —consistent with completely uniform message arrivals.For each i >0,there is a state q i with associated exponential density f i having rate αi =ˆg −1s i ,where s >1is a scaling parameter.(i will be referred to as the index of the state q i .)In other words,the inﬁnite sequence of states q 0,q 1,...models inter-arrival gaps that decrease geometrically from ˆg ;there is an expected rate of message arrivals that intensiﬁes for larger and larger values of i .Finally,for every i and j ,there is a cost τ(i,j )associated with a state transition from q i to q j .The framework allows considerable ﬂexibility in formulating the cost function;for the work described here,τ(·,·)is deﬁned so that the cost of moving from a lower-intensity burst state to a higher-intensity one is proportional to the number of intervening states,but there is no cost for the automaton to end a higher-intensityburst and drop down to a lower-intensity one.Speciﬁcally,when j>i,moving from q i to q j incurs a cost of(j−i)γln n,whereγ>0is a parameter;and when j<i,the cost is0.See Figure1(a)for a schematic picture.This automaton,with its associated parameters s andγ,will be denoted A∗s,γ.Given a sequence of positive gaps x=(x1,x2,...,x n)between message arrivals,the goal—by analogy with the two-state model above—is toﬁnd a state sequence q=(q i,...,q i n)that1minimizes the cost functionc(q|x)= n−1 t=0τ(i t,i t+1) + n t=1−ln f i t(x t) .(Let i0=0in this expression,so that A∗s,γstarts in state q0.)Since the set of possible q is inﬁnite,one cannot automatically assert that the minimum is even well-deﬁned;but this will be established in Theorem2.1below.As before,minimizing theﬁrst term is consistent with having few state transitions—and transitions that span only a few distinct states—while minimizing the second term is consistent with passing through states whose rates agree closely with the inter-arrival gaps.Thus,the combined goal is to track the sequence of gaps as well as possible without changing state too much.Observe that the scaling parameter s controls the“resolution”with which the discrete rate values of the states are able to track the real-valued gaps;the parameterγcontrols the ease with which the automaton can change states.In what follows,γwill often be set to a default value of1;we can use A∗s to denote A∗s,1.Computing a minimum-cost state sequence.Given a a sequence of positive gaps x=(x1,x2,...,x n)between message arrivals,consider the algorithmic problem ofﬁnding a,...,q i n)in A∗s,γthat minimizes the cost c(q|x);such a sequence state sequence q=(q i1will be called optimal.To establish that the minimum is well-deﬁned,and to provide a means of computing it,it is useful toﬁrst deﬁne a naturalﬁnite restriction of the automaton:for a natural number k,one simply deletes all states but q0,q1,...,q k−1from A∗s,γ,and denotes the resulting k-state automaton by A k s,γ.Note that the two-state automaton A2s,γis essentially equivalent(by an amortization argument)to the probabilistic two-state model described earlier.It is not hard to show that computing an optimal state sequence in A∗s,γis equivalent to doing so in one of itsﬁnite restrictions.x i andTheorem2.1Letδ(x)=min ni=1k= 1+log s T+log sδ(x)−1 .(Note thatδ(x)>0,since all gaps are positive.)If q∗is an optimal state sequence in A k s,γ, then it is also an optimal state sequence in A∗s,γ.Proof.Let q∗=(q1,...,q n)be an optimal state sequence in A k s,γ,and let q=(q i1,...,q i n)be an arbitrary state sequence in A∗s,γ.As before,set 0=i0=0,since both sequences start in state q0;for notational purposes,it is useful to deﬁne n+1=i n+1=0as well.The goal is to show that c(q∗|x)≤c(q|x).If q does not contain any states of index greater than k−1,this inequality follows from the fact that q∗is an optimal state sequence in A k s,γ.Otherwise,consider the state sequenceq =(q i1,...,q in)where i t=min(i t,k−1).It is straightforward to verify thatn−1t=0τ(i t,i t+1)≤n−1 t=0τ(i t,i t+1).Now,for a particular choice of t between1and n,consider the expression−ln f j(x t)=αj x t−lnαj;what is the value of j for which it is minimized?The function h(α)=αx t−lnαis concave upwards over the interval(0,∞),with a global minimum atα=x−1t.Thus,if j∗is such thatαj∗≤x−1t≤αj∗+1,then the minimum of−ln f j(x t)is achieved at one of j∗or j∗+1;moreover,if j ≥j ≥j∗+1,then−ln f j (x t)≥−ln f j (x).Since k= 1+log s T+log sδ(x)−1 ,one hasαk−1=ˆg−1s k−1=nT·s log s T+log sδ(x)−1=1δ(x)=1In view of the theorem,it is enough to give an algorithm that computes an optimal state sequence in an automaton of the form A k s,γ.This can be done by adapting the standard forward dynamic programming algorithm used for hidden Markov models[53]to the model and cost function deﬁned here:One deﬁnes C j(t)to be the minimum cost of a state sequence for the input x1,x2,...,x t that must end with state q j,and then iteratively builds up the values of C j(t)in order of increasing t using the recurrence relation C j(t)=−ln f j(x t)+ min (C (t−1)+τ( ,j))with initial conditions C0(0)=0and C j(0)=∞for j>0.Inall the experiments here,an optimal state sequence in A∗s,γcan be found by restricting to a number of states k that is a very small constant,always at most25.Note that although theﬁnal computation of an optimal state sequence is carried out by recourse to aﬁnite-state model,working with the inﬁnite model has the advantage that a number of states k is notﬁxed a priori;rather,it emerges in the course of the computation, and in this way the automaton A∗s,γessentially“conforms”to the particular input instance. 3Hierarchical Structure and E-mail StreamsExtracting hierarchical structure.From an algorithm to compute an optimal state sequence,one can then deﬁne the basic representation of a set of bursts,according to a hierarchical structure.For a set of messages generating a sequence of positive inter-arrival gaps x=(x1,x2,...,x n),suppose that an optimal state sequence q=(q i1,q i2,...,q i n)in A∗s,γhas been determined.Following the discussion of the previous section,we can formally deﬁne a burst of intensity j to be a maximal interval over which q is in a state of index j or higher.More precisely, it is an interval[t,t ]so that i t,...,i t ≥j but i t−1and i t +1are less than j(or undeﬁned if t−1<0or t +1>n).It follows that bursts exhibit a natural nested structure:a burst of intensity j may contain one or more sub-intervals that are bursts of intensity j+1;these in turn may contain sub-intervals that are bursts of intensity j+2;and so forth.This relationship can be represented by a rooted treeΓ,as follows.There is a node corresponding to each burst;and node v is a child of node u if node u represents a burst B u of intensity j(for some value of j),and node v represents a burst B v of intensity j+1such that B v⊆B u.Note that the root ofΓcorresponds to the single burst of intensity0,which is equal to the whole interval[0,n].Thus,the treeΓcaptures hierarchical structure that is implicit in the underlying stream. Figure1(b)shows the transformation from an optimal state sequence,to a set of nested bursts,to a tree.Hierarchy in an e-mail stream.Let us now return to one of the initial motivations for this model,and consider a stream of e-mail messages.What does the hierarchical structure of bursts look like in this setting?I applied the algorithm to my own collection of saved e-mail,consisting of messages sent and received between June9,1997and August23,2001.(The cut-oﬀdates are chosen here so as to roughly cover four academic years.)First,here is a brief summary of this collection. Every piece of mail I sent or received during this period of time,using my e-mail address,can be viewed as belonging to one of two categories:ﬁrst,messages consisting of one or more largeﬁles,such as drafts of papers mailed between co-authors(essentially, e-mail asﬁle transfer);and second,all other messages.The collection I am considering here consists simply of all messages belonging to the second,much larger category;thus,to arough approximation,it is all the mail I sent and received during this period,unﬁltered by content but excluding longﬁles.It contains34344messages in UNIX mailbox format, totaling41.7megabytes of ascii text,excluding message headers.1Subsets of the collection can be chosen by selecting all messages that contain a particular string or set of strings;this can be viewed as an analogue of a“folder”of related messages, although messages in the present case are related not because they were manuallyﬁled together but because they are the response set to a particular query.Studying the stream induced by such a response set raises two distinct but related questions.First,is it in fact the case that the appearance of messages containing particular words exhibits a“spike,”in some informal sense,in the(temporal)vicinity of signiﬁcant times such as deadlines,scheduled events,or unexpected developments?And second,do the algorithms developed here provide a means for identifying this phenomenon?In fact such spikes appear to be quite prevalent,and also rich enough that the algo-rithms of the previous section can extract hierarchical structure that in many cases is quite deep.Moreover,the algorithms are eﬃcient enough that computing a representation for the bursts on a query to the full e-mail collection can be done in real-time,using a simple implementation on a standard PC.To give a qualitative sense for the kind of structure one obtains,Figures2and3show the results of computing bursts for two diﬀerent queries using the automaton A∗2.Figure2shows an analysis of the stream of all messages containing the word“ITR,”which is prominent in my e-mail because it is the name of a large National Science Foundation program for which my colleagues and I wrote two proposals in1999-2000.There are many possible ways to organize this stream of messages,but one general backdrop against which to view the stream is the set of deadlines imposed by the NSF for theﬁrst run of the rge proposals were submitted in a three-phase process,with deadlines of11/15/99,1/5/00,and 4/17/00for letters of intent,pre-proposals,and full proposals respectively.Small proposals were submitted in a two-phase process,with deadlines of1/5/00and2/14/00for letters of intent and full proposals respectively.I participated in a group writing a proposal of each kind.Turning to theﬁgure,part(a)is a plot of the raw input to the automaton A∗2,showing the arrival time of each message in the response set.Part(b)shows a nested interval representation of the set of bursts for the optimal state sequence in A∗2;the intervals are annotated with theﬁrst and last dates of the messages they contain,and the dates of the NSF deadlines are lined up with the intervals that contain them.Note that this is a schematic representation,designed to show the inclusions that give rise to the treeΓ;the lengths and centering of the intervals in the drawing are not signiﬁcant.Part(c)shows a drawing of the resulting treeΓ.The root corresponds to the single burst of intensity0that is present in any state sequence.One sees that the two children of the root span intervals surrounding the。

高光谱图像基于像素结构的改进PCA算法

2017年第8期信息通信2017(总第176 期）INFORMATION & COMMUNICATIONS (Sum. No 176)高光谱图像基于像素结构的改进P C A算法任劼^焦亚萌1(西安工程大学电子信息学院，陕西西安710048)摘要:主成分分析法(PCA)作为一种常用的降维算法，被广泛的应用到如高光谱图像处理等需要进行大量数据处理的应用中。

PCA的主要目的是利用正交变换，将具有相关性的高维数据的分量转换为线性不相关的新的成分变量，但当矩阵维数超过百万时候会造成严重的计算困难问题。

本文针对PCA运算中协方差矩阵计算过程中内存调度的问题,提出了一种基于像素结构的改进的协方差矩阵计算方法，可以在确保与常规PCA具有相同性能的同时有效地降低计算所需的存储器规模。

实验中分别采用传统PCA算法和改进算法对高光谱图像数据进行特征提取后利用支持向量机(SVM)进行分类，对比结果验证了改进算法的有效性和可靠性。

关键词:结构主成分分析;高光谱图像;特征提取;数据降维中图分类号:TP391 文献标识码:A文章编号:1673-1131(2017)08-0273-04An Improved PCA Algorithm using Pixel-Based Structure in Hyperspectral ImagingRen Jie1，Jiao Yameng2(1.College of E lectronics and Mbrmation,Xi7an Polytechnic University,Xi7an710048, China2. S chool of A utomation and Liformation Engineering,X i;an University of T echnology,Xi^an710048, China)A bstract:Principal component analysis(PCA),which convert correlatedhigh dimensional data to linearly uncorrelatedprincipalcomponents by using orthogonal transfannation,is widely used in many applications with requirement of l arge dataprocessing, such as hyperspectral imaging.Because of the computational problem of PCA which appears on one dimension of the matrix over100k,an improved PCA algorithm using pixel-based structure is proposed in this paring with traditional PCA, the proposed method can reduce the memory requirement while maintain the ing support vector machines(SVM) to classify both the aew components of t wo PCA algorithms,classification results show the proposed algorithm is reliable and effective.Keywords:structural principal component analysis(SPCA),hyperspectral imaging,feature extraction,dimension reduction.0引言高光谱图像(HSI)是在一系列光谱波段中采集到的数据集合，其光谱波段不仅仅覆盖了常规设备可观测的可见光区域(400nm-76Qnm)，还可能包括紫夕卜(200nm-400nm)、近红外 (760nm-256Qnm)以及波长大于2560nm的区域。

基于邻域粗糙集和概率神经网络集成的基因表达谱分类方法

ｃａｓｆｃｔｏｄ１Ｔｈｅｅｐｒｍｅｅｕｔｈｗｈａｈｔｏａｒｍｐｔｎｆｅｔｖｌｅｅｔｃｎｅｎｏａｉｅｇｎｓｌｓｉａｉｎｍｏｅ．ｉｘｅｉｎｔｒｓｌｓｓｏｔｔｔｅｍｅｈｄｃｎｐｏｌａｄｅｆｃｉｅｙｓｌｃａｃｒｉｆｒｔｖｅｅ，ｙｍ
域粗糙集和概率神经网络集成的分类方法。首先利用Ｒｌｆ法对基因进行排序，ｅｅ算ｉ然后利用邻域粗糙集选取分
类特征基因，最后结合概率神经网络集成分类模型进行癌症分类。实验结果表明，方法可以快速有效地选取该
明利特，蒋芸，王勇，王明芳
（．１西北师范大学数学与信息科学学院，兰州７０７；２西北工业大学计算机学院，３００．西安７０７）１０２
摘要：从癌症基因表达谱分析入手，对基因达谱维数高、针表样本少的特点，出一种用于癌症分类的基于邻提
第２８卷第１２期
２１０１年１２月
计算机应用研究
ＡｐｌａｉｎＲｅｅｒｈｏｏｕｅｓｐｉｔｓａｃｆＣｍｐｔｒｃｏ
Ｖ０＿８Ｎｏ１Ｉ２．２Ｄｅ．２１ｃ０１
基于邻域粗糙集和概率神经网络集成的基因表达谱分类方法术
ｄｉ１．９９ｊｉｎ１０ —６５２１．２００ｏ：０３６／．ｓ．０１３９．０１．１ｓ１

电化学传感器检测植物生长调节剂的研究进展

DOI：10.13822/ki.hxsj.2021007896化学试剂，2021,43(4),458〜465电化学传感器检测植物生长调节剂的研究进展张艳，杜海军*，杜科志，张欣月，艾纪星，胡华丽(贵州民族大学化学工程学院，贵州贵阳550025)摘要：近年来，植物生长调节剂(PGRs)的检测方法备受研究者广泛关注。

由于电化学传感器具有较高的灵敏度和选择性、响应时间短、成本低、方便携带等优点，在现场快速检测中受到研究者的青睐。

从电化学传感器检测PGRs的4个方面进行了总结：1)直接电化学行为传感，主要是通过不同的修饰材料对电极进行增敏，从而实现对具有电化学活性的PGRs进行直接检测；2)电化学生物传感，主要通过酶和抗原抗体对PGRs进行特异性识别；3)光电化学传感，主要是利用具有光催化性能的材料实现对电化学活性较差的PGRs进行光电转化检测；4)分子印迹电化学传感，通过构建能够特异性识别PGRs的聚合物薄膜从而进行专属检测。

同时对电化学传感器现状进行了阐述以及对未来发展趋势进行展望。

关键词：植物生长调节剂；电化学传感器；电化学分析；修饰电极中图分类号：0657.1文献标识码：A文章编号:0258-3283(2021)04-0458-08Progress on the Detection of Plant Growth Regulators by Electrochemical Sensors ZHANG Yan,DU Hai-jun*,DU Ke-zhi, ZHANG Xin-yue,AI Ji-xing,HU Hua-li(School of Chemical Engineering,Guizhou Minzu University,Guiyang550025,China), Huaxue Shiji,2021,43(4),458~465Abstract:In recent years,the detection methods of plant growth regulators(PGRs)have attracted extensive attention.Electrochemical sensors are favored in the field of rapid detection because of their high sensitivity and selectivity,short response time,low cost,and convenient carrying.This work summarizes the four aspects of PGRs detection by electrochemical sensors.(1)Direct electrochemical behavior sensing,which mainly uses different modified materials to sensitize the electrode,so as to realize the direct detection of electrochemically active PGRs.(2)Electrochemical biosensing,PGRs is specifically recognized by enzymes and antigens and antibodies.(3)Photoelectrochemical sensing,which mainly uses materials with photocatalytic properties to achieve photoelectric conversion detection of PGRs with poor electrochemical activity.(4)Molecular imprinting electrochemical sensing,by constructing a polymer film that can specifically recognize PGRs for exclusive detection.And it was explained as well as the future development trends and prospect.Key words:plant growth regulators；electrochemical sensor；electrochemical analysis；modified electrode植物生长调节剂(Plant Growth Regulators, PGRs)包括植物内源激素和人工合成的结构生理特性与植物激素相似的一类物质⑴。

粳稻农艺性状对外源赤霉素敏感性的QTL定位研究英文

Q T L M app in g fo r the S e n s itiv ity o f th e Tra its R e la te d to O u tc ro s s i ng o f J ap o n ic a R ic e to Exog e no u s GA 3FU S h u 2h u a n,GUO Yu an ,L IU J ian ,XU Q i,HO NG D e 2li n3Sta t e Key Labo ra t o ry of C rop Gene ti c s and Ger m p l asm En hancem en t,N an j ing Ag ricu lt u ra l U nive rsity,Nan ji ng 210095Ab s t ra ct [O bject i ve]The res earch a i m ed t o m ap Q T L fo r the sen s iti vit y of the t raits re l ated o utcro s ss i n g o f J apon i ca rice t o exo geno us GA 3and p rovide t heo re ti ca l b as is fo r b reed i ng and i m p ro vi ng the h i gh 2sen s i t i vit y s t e ri l e li ne.[M ethod]Taki n g J apon i ca rice Xius hui 79and C bao and the i r recom b i nan t i nb red l i n e pop u l ati o n 260l i n es a s tes t m a te ri a ls,t he sen siti v i ty o f 4tra its re l ated o u tcr o s s i ng to Exo genou s GA 3an d t he irQ TL m app i ng w ere s t ud i ed by u s i ng comp o site i n t e rval m app i ng..[R esu lt]Th ree Q TL s,wh i ch con tro ll ed t h e s en siti v i ty of fl a g l eaf ang l e t o GA 3,we re de t e ct e d o n chrom o som e 3,9and 9,and they exp l a i ned 5.6%,13%and 11.8%o f p heno typ i c va riance,resp ecti ve l y.Po sitive a l l e l e s cam e from Xi u sh ui 79,C bao an d C bao ,re spec ti vely .Two Q TLs ,wh i ch con tr o l l e d t h e s en siti v i ty o f p lant he i gh t t o GA 3,w ere de t ec t e d on ch ro 2mo som e 1and 8,they exp l a i ne d 8.46%and 11%of phe no typ ic va ri a nce ,resp ectivel y.Po s iti ve a l l e l e s cam e fr om Xi us hu i 79,and C bao ,re 2sp ecti ve l y.O ne Q TL,wh i ch co ntro ll ed t h e sen s i t i vit y of t he firs t i n t e rnod e l en gth t o GA 3,w a s de t e ct e d on ch r om o som e 3,and i t exp l a i ned 0.05%o f ph eno typ i c vari ance.Po s iti ve all e l es cam e from C bao.O ne Q T L ,wh i ch con tro l led t h e sen s i ti v i ty of t he s eco nd i n terno de l eng th t o GA 3,wa s de t e ct e d o n chrom o som e 1,and it exp l a i ned 7.34%of ph eno typ ic va ri an ce.Po siti ve a l l e l e s cam e from C b ao.[Conclu si on ]The re sea rch re su l t s had i m po rtan t seed p rodu cti on co s t ,red uci ng the po l l u ti on o f the environm en t .Key w o rds J apo n i ca rice;Tra its re l a ted o u t cro s s s i ng;Exo genou s GA 3R e ce i ve d:Decem be r 4,2009 Accep t ed:M a rch 9,2010S y U D I I I 2F j (B 5);K y j f B S T y f f M y f (555);M y fj ""(62G ()232)3,2@j Gi bbe re lli n (GA 3)is a kind of hormone s tha t ca n contro lplant gr ow th a nd a c t on the p l a nt througho ut the li fe cyc l e ,i t plays a ve ry i m po rtan t ro l e in p l a nt se e d ge r m ina ti o n,s teme l onga ti on,fl owe r de ve l opm en t and so on [1].The GA 3leve ls i n the p l a nts of ri c e m a le ste ril e li ne a t he a di ng sta ge is signifi 2ca ntl y l owe r than the no r m a l m a l e spe c i e s,a nd its pa ni c le nec k ca n no t be no r m a l e l onga te d,a bout 1/4of the ri c e ca n no t be extra cte d,the p he nom e non tha t the nec k will bebl oc ked appe a rs [2].Too sm a ll flag lea f a ngl e is no t conduc i vet o po l lina ti on,re sults i n see d p roduc ti o n de c re a sing [3].The sp ra yi ng of exoge nous G A 3be fore he a di ng and inc re a s i ng plant G A 3leve ls i n vi vo ca n p rom o t e the e l o nga ti on of pa ni c le nec k t o re l ieve ne ck bl oc ki ng phenom eno n a nd i nc re a se the fl a g lea f angl e ,the re f o re re su l t i n the no r m a l e x po su re o f thesp i ke ,a s we ll a s to i m p rove outc r o ss i ng situa ti on [4].App li ca ti o n of GA 3ha s be com e a n i m po rtan t and i ndispe n 2sa bl e pa rt i n hybri d ri ce see d produc ti on,whi ch p l ays a de c i s i ve r o l e i n i m p r ovi ng se e d p roducti o n .H owe ve r,40-225g o f GA 3is ne ede d fo r 1h m 2of t he fie l d,this co sts the co untry up t o 3000-5000m il li on yua n pe r ye a r and i nc re a se s the i nc i 2dence of p re 2ha rve st sp routi ng and ri ce ke rne l sm ut,whi c hw il l no t on l y re duce the se ed qua lit y [5-6],but a lso po ll u t e the e nvironm e nt .The se nsitivity of di ffe rent spe c i e s t o GA 3is dif 2fe re nt [7-8].Kam iji m a [9]divi de d the rice d wa rf m uta nt i nto t w o ki nds (sen siti ve a nd no n 2sen siti ve )ac co rdi ng to the ir re 2spo nse t o GA 3.WU Han 2l in e t a l [8]ha d shown tha t the nuc l e 2a r m a l e ste ril e l ine Ks 214wa s m o re se nsiti ve t o G A 3t ha nZhe nsha n 97A.T I A N Da 2che ng [10]had shown the cons i s t e ncy of the l aw by the use o f V20A,Zhe n S ha n 97A,D S ha n A,Xieq i ngzao A a nd o the rs,a nd s t e ri le li ne s tha t of li ghtl y o rno n 2pa cka ge ne ck we re m o re se nsiti ve t o GA 3.10of Q T L t ha t controlli ng the re sponse o f p l a nt he i ght t oGA 3ha ve be en po siti o ne d by the use of R FLP m a rke rs [11],a nd the re a re thre e of Q T L tha t contro lli ng the sen siti vity of flag l ea f angl e t o GA 3,whil e thre e o f Q T L tha t contr o lli ng thesen siti vity o f top i nte rnode l e ngth to G A 3[12],the se s t udie s ha ve com e fr om the I ndica a nd J aponica g r o ups,som e syne r 2gic a l le l e s a re from the I ndi c a ri c e va ri e ti e s,and som e a re from the Japon i ca ri c e va ri e tie s,and they a re of de ce ntra li ze d d i s tri buti o n .Howe ve r,the QTL t hose controlli ng the se nsitivity of p l a nt he i ght a nd the second i n t e rnode l e ng th t o GA 3ha ve no t ye t bee n repo rted .QT L l o c a ti on ana lys i s o f se nsiti vit y of the plan t he igh t,fl a g l e a f a ng l e ,the fi rs t i n t e rnode l e ngth a nd the second i nte rnode length t o G A 3wa s ca rri e d out t o fi nd m o re QT L tha t co ul d control GA 32sen siti vity from J aponi c a rice va ri e ti e s by the J aponica J apo ni c a c ros s recom bina nt i nbre d li ne s and the SSR m olec ul a r m a rke rs li nka ge m ap ,a s we l l a s t o provi de a t heo re ti c a l ba sis fo r bree ding and i m pro ving the h i gh 2se ns i ti vity ste ril e li ne t o re duce t he usa ge of e xogeno us GA 3in the proc e ss o f hybrid ri ce se ed p roduc ti on,sa ve co sts a nd re duc e e nvi ronm e nta l poll u ti on .M a te ria ls a n d M e th o d sE xp e ri m en ta lm a te ria lsJ apon i ca rice va ri e ti e s i nc l udi ng Xiushui 79,C ba o a nd its hyb rids ha d a popu lti ong of 260line s through s i ngl e se e d de 2scen t recom b i nan t inb re d l ine s (R I L ),by se l fi ng fo r m a ny yea rs,group s of li ne s ti ll this study ha d re a ched F 10∶11gene ra 2ti ons,w ithi n stra i ns the c ha ra cte ris ti c s w e re sta ble whil e cha r 2a c te risti c s va ri e d grea tl y be t wee n stra i n s .Xi ushui79wa s co n 2venti o na l va rie tie s of J aponica ri ce bree di ng by J iaxing,I ns ti 2tute o f Agri c ul tu ra l S c i e nce s,Zhe ji a ng P r ovinc e (whi c h w a s a dopte d by J i a ng su P rovi nce i n 1996),it wa s sow i ng i n m i d 2M y N j 2,f 5;y y f S ,X ,P lant P hysiol o gy a nd B i o chem istryAgri cu l tu ral Sc i ence &Techno l o gy,2010,11(2):52-56,136C op yright κ2010,I nf o r m at i o n I n s ti tu t e of HAAS.All ri gh ts res erved.uppo rt ed b n i versiti e s isci p li ne nno vati on and ntell ectual ntrodu c ti on P r o gram und ed Pro ects 0802e Pr o ects o as i c ci ence and echno l o g P l at o r m o i nis tr o Educa ti on 000i nis tr o Agri cu l tu r e P r o ect 9482008411.C o rre spo n di n g au tho r E m ail :de li nho ng n a a i n a n i ng a nd hea di ng i n m id Augu st p l a nt he i ght w a s o 9cm C bao wa s a J aponica ri ce re sto re r li ne bree di ngb Anhu iAcadem o Agri cu l tu ra lc i e nc e s and the rep r o duc tive pe ri od wa s c l o se t o i u shui 79the p l a nt he i ght wa s 100cm.F ie ld tria ls and tra it i n ve s tiga tio n260li ne s a nd the ir p a re nts of the R I L pop ul a ti o n we re plante d in J i a ngpu e x pe ri m e nta l sta ti on of Na nji ng Agri cu l tura l Unive rsity i n the yea r o f2008.Ea ch m a te ria l of the p a re nts a nd the R I L stra ins wa s plan ted t wo row s,ea ch row wa s of e ight p l an ts.Repe a t fo r t w i ce(co ntro l g r o up a nd trea t m ent group).I ndividua l p l an t cu l ti va ti on a nd conven ti ona l culti va ti o n m a na gem en twe re ca rri e d o ut.G A3wa s sp raying to the tre a t m e nt group i n i n i ti a l he a di ng sta ge a nd e qua l am ount of w a t e r w a s sp ra yi ng to the contro lgroup.G A3wa s p r o duce d by the4%of gibbe re l li n C r m from S ha ngha i T o ng rui B i o2Tec hno l ogy Co.,Ltd..The sp ra yi ng usa ge wa s ca l cula te d a cco rdi ng to t he p ri nc i p l e s tha t the front wa s s m a ll,the m i ddl e wa s mo re a nd t he slightl y a fte r wa rds inthe GA3hybri d ri c e see d p roducti on and in a cco rdance w iththe usa ge of GA3wa s240g f o r1h m2[13],conti nuous sp ra2yi ng fo r3d,the spraying GA3conce ntra ti o n wa s0.04‰inthe first day,100m l GA3wa s sp rayi ng for ea ch pl a nt.Thesp ra yi ng G A3co nc en tra ti on wa s0.08‰i n t he second da y,100m l G A3wa s sprayi ng for ea ch plant.The sp ra yi ng GA3conce ntra ti on wa s0.04‰i n the third da y,100m l G A3wa s sp ra yi ng fo r e ac h p lant.7d afte r i n i ti a l he ad i ng,the angl e be2 t w e e n the m a i n stem a nd flag l e a f wa s m e a sured with the p ro2 tra c t o r,and10p l a nts w e re i nve sti ga te d fo r e ac h l ine.20d a f2 t e r i nitia l he ad i ng,the p l an t he igh t,the firs t i n t e rnode and the se cond i n t e rnode we re m e a sure d,a nd10p l a nts we re i nve sti2 ga ted fo r ea ch li ne.Se ns i ti vitywa s j udge d by the re sponse i n2 dex,t he gre a t the re spon se i ndex,the h i ghe r the se ns i ti vity. Acco rdi ng t o su rve y re sults o f tra its,G A3re spon se i nde x of the four tra i ts w a s c a l cu l a te d.R e sponse i ndex=[(The tra i t va l ue of G A3tre a t m e nt-The tra its va lue of untrea t e d)/The tra its va lue of untre a ted]×100%.D a ta p roc e ss i n g a nd Q TL a na l ys isThe tra its ave ra ge va lue of10plan ts of e a ch li ne wa s c a l2 c ul a te d.The com pos it e i nte rva lm app i ng(C I M)of W in Q T L Ca rt ographe r2.5soft wa re[14]wa s u se d t o ca lcul a t e the LOD va l ue a t i n te rva l s of2c M on the17li nka ge group s[15]of J a2 pon i ca S SR m o l e cula r m a rke r l inkage m ap construc t e d by the S ta te Ke y La bora t o ry of C r op Gene ti c s a nd Ge r m p l a s m En2 hanc em e nt,Na nji ng Agri c ultura lU ni ve rsity.1000of pe r m uta2 ti ons a nd com bina ti ons m e t ho d[16]wa s u se d t o de t e r m i ne the LOD thre shold t o en su re tha t ty p eⅠe rro r pro ba bilit y wa s l e s s tha n5%comm itte d by who l e2genom e de t e c ted Q T L.W he n the a c tua l ob t a i ne d va lue wa s gre a te r tha n L OD thre sho l d va l2 ue,it wa s thought t ha t a Q T L wa s existe d o n thi s se c ti on,a nd its conf i de nce i n te rva l wa s a LOD unit inte rva l be ll o w the pe ak va l ue of L OD.The Q T L we re nam e d i n ac co rda nc e w i th the McCouch rule s and so on[17].And the contr i bu ti on ra te a nd a dditi ve effe ct of ea ch Q T L w a s e sti m a te d.R e s u lts a n d A n a lys isT he d i ffe re nc es of th e tra its re la te d ou tc ros s ing o f J ap on2 ica rice be t w e e n p a ren ts a nd the ir va ria tion in the R I L g roup sTa ble1showe d tha t flag l e a f angle,plan t he i ght,the firs t i nte rnode length a nd the se cond i nte rnode length va ri e d be2 t w ee n the pa re nts,t te s ts re sult showe d a s i gn i fi ca nt d i ffe r2 e nc e.Ea ch tra it of R I L popula ti on a ppe a red a conti nuous va ri2 a ti on,the re p re se nted a c l e a r2pa re nt se pa ra ti on,whi c h w a s app roxi m a te ly no r m a l distri buti on(F i g.1).Ta b l e1　D i fferen ce o f the tra its rel a ted o u t c r o s s i ng of J apo ni ca ri ce be t ween t wo p a ren ts and va ri a ti on s am o ng the R I L sC ha racte rParen tsXius hui79C baoR I L pop ulati o nAvera ge R ange CV∥%Fl ag l eaf ang l e(FLA)∥° 6.1±2.620.6±5.715.1±6.6 5.5-6743.71 Pl a nt he i gh t(PH)∥c m93.0±4.990.8±3.5101.7±20.549.2-142.520.16 Fi rs t i n t e rnode l en gth(F I L)∥cm32.3±2.524.5±1.331.5±6.816.4-46.321.59 Seco nd i nte rno de l eng t h(S I L)∥cm14.1±1.916.7±0.818.1±4.08.6-30.822.10Q TL loc a tion a na lys is of the tra its re la te d ou tc ros s ing of J ap on ica riceTa bl e2sugge s t e d tha t the re wa s o ne Q T L de tec te d in the Q T L ana lys i s o f f l a g lea f a ng l e,wh i ch w a s l o ca te d o n ch r omo som e8(qFLA28).it exp l a ine d9.06%of phenotyp i c va riance.P os i ti ve a ll e l e s cam e from Xiushui79.The re we re four QTLs de t e c ted i n the Q T L a na lysis of p l a nt he i ght,whi c h we re l oca t e d on ch romo som e8,8,9,9(qPH2821,qP H2822, qPH2921,qPH2922),re spe c tive l y.qPH2821exp l a i ne d7.06% of p he no ty p i c va ri a nc e.Po siti ve a ll e l e s cam e from Xi u shui79; qPH2822exp l a i ne d4.49%of pheno typ i c va riance.Po siti ve a l2 l e l e s cam e fr om Xiushui79;a s f o r qPH2921,it e xp l a i ne d 16.63%of pheno typ i c va ri a nc e.Po siti ve a ll e l e s c am e from C bao;whil e qPH2922e x p l a i ned10.65%of pheno typ i c va ri2 a nc e.Po sitive a ll e l e s cam e from C ba o.The re we re t w o Q T Ls de tec te d in t he Q T L ana lys is o f t he f i rst i nte rnode l e ngth,wh i ch we re bo th l oca ted on the chrom o som e9(qF I L2 921,qF I L922),qF I L2921exp l a ine d29.81%o f p he no ty p i c va ri2 a nc e.Po siti ve a l le l e s cam e from C bao.qF I L2922e xp l a i ne d%f y fT f QTL Q T L y f,2 ,,,,(qS I L22,qS I L22,qS I L22,qS I L2922)re spe c ti ve l y,qS I L2821e xpl a i ned4.53%of phe no t yp i cva ri a nc e.Po siti ve a l le l e s c am e from Xiushui79;qS I L2822e x2 pla i ne d4.29%of p he no ty p ic va ri a nc e.Po siti ve a ll e l e s cam e from Xiushui79;qS I L2921exp l a ine d19.96%of phe no t yp i c va ri a nc e.Po sitive a ll e le s cam e fr om C bao,qS I L2922e x2 pla i ne d13.23%of pheno typ i c va ri a nce.P os i ti ve a ll e l e s cam e from C bao.T he diffe re nc es of the s en s itivity of the tra its re la ted ou t2 c ros s ing of J ap onic a ric e to GA3be t w e en the p a re n ts a nd va ria tio ns am ong th e R I L sTa ble3showed t ha t the i nve s ti ga ti o n o f four tra its such a s the fl a g l e af angl e,p lant he i ght,the first inte rno de l e ngth a nd the se cond i nte rnode l e ngth sugge ste d tha t the a ve ra geva l ue of Xi u shui79tha t sp rayi ng w it h G A3wa s hi ghe r than thecon tro l,t te st re sults showe d tha t di ffe rence s be t we e n G A3 trea t m ent and the con tro l ha d cove ri ng a ve ry s i gni fi c ant l eve l,i ndi c a ting Xi ushui79w a s se nsitive to e xogeno us G A3.Andthe a ve rage va l ue o f C bao tha t sp ra yi ng w ith G A3wa s a lso h i ghe r tha n t he con tro l,howe ve r,t te st re sults sugge ste d tha tf ff,x2G335F U Shu2huan e t al.Q TL M ap p i ng fo r the Sen s i ti v i ty o f the Tra its R e l a t e d t o O utcro s s i ng o f J ap on i ca R i ce t o Exo genou s GA340.02o pheno t p i c va ri a nc e.Po siti ve a ll e l e s c am e rom C bao.he re we re ou r s de t e c ted i n the ana l s i s o the se cond i nte rnode l e ngth whi ch we re l oca ted o n the chro mo som e48899818291the re wa s no s i gn i i c an t di e re nc e be t we en tre a te d m a te ria ls a nd t he co ntro l i ndi ca ti ng tha t C bao w a s l e ss se nsiti ve t o e oge nou s A.Ta b l e 2　Q TLs o f the t raits related o u tcr o s s i ng of J apo nica ri ce de tected by com po s i te i n te rva l m app i ng m ethod C ha racte rLo cu s C hrom o som eMa rker i n t e rval LOD s co re Co n tribut i o n ra te ∥%Additive effect Fl ag l eaf ang l e (FLA )∥°q F LA 288RM2642RM6948 5.779.06 2.18Pl a nt he i gh t (PH )∥c mqPH 28218RM802RM281 3.777.06 6.93qPH 28228RM69482RM433 3.52 4.49 5.62qPH 29219RM56522RM4107.6716.63-10.21qPH 29229RM2572O SR28 6.7410.65-8.21Fi rs t i n t e rnode l en gth (F I L )∥cm qFI L 29219RM65702RM56527.0629.81-6.53qFI L 29229RM56522RM41017.4840.02-5.15Seco nd i nte rno de l eng t h (S I L )∥cmqS I L 28218RM802RM281 2.65 4.53 1.08qS I L 28228RM2642RM6948 3.29 4.29 1.08qS I L 29219RM56522RM4108.919.96-2.17qS I L 29229RM2572O SR287.713.23-1.77P1:Xi u sh ui 79;P2:C bao.The sam e as bel ow.F ig.1　Fre quency dis tribut i o n o f the a gronom ic cha racte rs of j apon i ca ri ce in the R I L pop u l a ti o n Ta b l e 3　The m ea ns o f the tra its rel a ted o u t c r o s s s i ng unde r GA 3trea t m ent and con tro l i n t wo p a ren ts C ha racte rXi u sh ui 79GA 3t reat m en t Co ntro l C baoG A 3tre at m en tC on tro l Fl ag l eaf ang l e (FLA )∥°17.40±3.406.10±2.6021.25±2.6020.56±5.66Pl a nt he i gh t (PH)∥c m 120.40±3.5093.30±4.8092.30±8.2090.75±3.50Fi rs t i n t e rnode l en gth (F I L )∥cm 41.92±3.7032.34±2.5025.80±3.1024.45±1.30Seco nd i nte rno de l eng t h (S I L )∥cm28.54±2.9014.10±1.9022.88±4.1016.69±0.80 Ta bl e 4showe d t ha t Xi ushui 79wa s m o re sen siti ve t ha nC bao,t te st re sults sugge ste d tha t the re w e re s i gn i fi c ant diffe re nc e o f va ri ous tra its be t w ee n the t wo pa ren ts on the se ns i ti vity t o G A 3.The se nsiti vit y of the va ri ous tra i ts of R I Lpopula ti o n we re c l o sed to the p a re nts Xi u shui 79,show i ng acon ti nuous va ri a ti on,the re a re c l e a r 2pa ren t sepa ra ti on,which we re bo t h pre sen t e d the pa rti a l no r m a l distribu ti on (F i g .2).Ta b l e 4　D i ffe rence o f t h e respo n s i ve i ndex o f t he traits re l a t e d o utcro ss ing of J ap on i ca rice to GA 3i n be t w een t wo pa ren ts and va ri at i o n s am o ngthe R I L sC ha racte r Pa ren tsXiush ui 79C bao R I L pop ul a ti o nAverag e R ange CV ∥%I ndex o ffl ag l eaf ang l e (I FLA)∥%185.0±49.810.0±10.8157.4±130.79.0-815.183.04I ndex o f p l an t he i gh t (I PH)∥%29.2±3.9 3.6±5.122.1±11.80-65.753.40I ndex o f t he firs t i n t e rnod e l en gth (IFI L )∥%28.2±3.59.5±5.919.8±12.40-72.762.63I ndexo ft he secon d i n t e rno de l en gth (I S I L )∥%102.0±9.034.0±18.068.8±30.116.8-178.843.75Q TL y y f 2f G 3T Q T L Q T L y 2y f f f G 3,23,3,,y (q I FL 23,q I F 2L 22,q I FL 22),q I FL 23x 55%f y 45Ag ri cu l tu ral Sc i ence &Tech no l o gy Vo l .11,No.2,2010a na l s is on the s en s itivit o the tra its re la ted ou t c ros s ing o J ap o nic a ric e to A he re we re three s de te cte d t h r o ugh t he a na l sis on the se nsiti vit o l a g lea a ngl e t o A wh i ch we re l o c a ted o n c hromo som e 99re spe c tive l A A 91A 92A e p l a ine d .9o phe no t p i cva riance.Po sitive a ll e le s c am e from Xi u shui79;q I FLA2921 e x p la i ned13.00%o f phe no ty p ic va ri a nce.Po siti ve a ll e l e s cam e from C bao;qIFL A2922e x p la i ned11.80%of phenotyp i c va riance.P os i ti ve a l le l e s cam e from C bao.The re we re t w o Q T Ls de tec te d through the Q T L a na l ysis on the se nsitivi ty of plant he i ght t o G A3,whi c h we re l oca te d o n chrom osom e1a nd8re spe cti ve l y(q I PH21and q I PH28),qIPH21e xp l a i ne d8.46%of pheno typ i c va riance.Po sitive a ll e l e s cam e from C bao;qIPH28e x p l a i ned10.97%of pheno typ i c va riance.Po s i2tive a ll e l e s c am e from Xiushu i79.The re wa s one Q T L de te c2 ted t h rough the QT L ana lys is on the se nsiti vity of t he frist in2te rnode l e ngth to G A3,wh i ch wa s l oc a ted on chrom osom e3 (q IF I L23),it e xp l a i ne d0.05%of pheno typ i c va riance.Po si2 tive a ll e l e s cam e fr om C bao.The re wa s one Q T L de tec te d through the Q T L a na l ysis on the sen sitivity o f the se cond inte r2node length t o GA3,which wa s l o ca te d on c hromo som e1 (q IS I L21),it e xpla i ne d7.34%of phe notyp i c va riance.Po si2 tive a ll e l e s c am e from C bao.Ta b l e5　Q TLs f o r t h e sen s i t i vit y of t he tra i t s re l a t ed o utcro s sing o f japo n i ca ri ce t o GA3de tected by comp o site i n terva l m app i ng m ethodC ha racte r Lo cus C h r om o2s om eM a rke r i n te rva lLODsco reCo nt ri bu ti o nrate∥%Add iti vee ffectI ndex o f fl ag l eaf ang l e(I FLA)∥%q I FLA233RM545▲2R M3766 1.53 5.5941.81q I FLA29219RM65702RM5652▲ 6.1213.00-48.10q I FLA29229RM5652▲2R M410 5.8911.80-45.89 I ndex o f p l an t he i gh t(I PH)∥%q I PH211RM4862R M265▲ 4.778.46-3.97q I PH288RM12352R M331▲ 2.6410.97 4.11 I ndex o f t he firs t i n t e rnod e l en gth(IFI L)∥%q I FI L233RM70972R M448▲ 2.540.05-2.80 I ndex o f t he secon d i n t e rno de l en gth(I S I L)∥%q I S I L211RM4862R M265▲ 4.257.34-9.80▲i nd i cates the nea re st m a rker fr om the pu tative QTL.DD[]fG3f yG3y y fx G3f G32Q T L,y,3,,6W NG Y2y55F U Shu2huan e t al.Q TL M ap p i ng fo r the Sen s i ti v i ty o f the Tra its R e l a t e d t o O utcro s s i ng o f J ap on i ca R i ce t o Exo genou s GA3is c u s s io n so ng e t a l11u se d the m e tho d o soaking se ed s w ith we re soa ke d in so l uti o n w ith A and t he m e t ho d o sp ra i ng the A t o se ed l ings t o stud the sen siti vit o see dli ng he i gh t t o e ogeno us A a nd i ve A se ns i ti ve s we re l o ca te d wh i ch we re re spe c ti ve l l oc a ted o n chrom o som e s14a nd12.A i ng i ng e t a l used the sam e m e tho d t ostudy t he se ns i ti vity of ri ce.O ne Q T L wa s m app ed,whi c h contr o ll e d the se nsitivi ty of the firs t l e af she a th l e ng t h t o GA 3a nd wa s de tec te d on chrom o som e 1,T wo Q T LS wa s m appe d,w hi ch co ntro ll e d the sens iti vity of the se co nd lea f she a th length t o GA 3,and wa s de te cte d o n chrom osom e 3a nd 12,re spe c ti ve l y .This s t udy conc lude d t ha t the re we re s i gni fi c an t diffe r 2e nc e s i n t he sen siti vity t o e x oge nous G A 3be t wee n pa ren ts Xi 2ushui 79a nd C bao ,a nd the se nsitivi ty t o G A 3o f e ach li ne of R I L popu l a ti on a lso showe d s i gn i fi ca nt diffe re nc e s,whi c h wa s cons i s t e nt w ith the pre vi ou s stud i e s .Se ve n GA 32sen siti ve Q T Ls we re de t e c ted in this s t udy,the Q T Ls tha t controlled the se ns i ti vity o f fl a g lea f angl e t o GA 3we re de te c ted on chrom o 2som e 3,9and 9,a nd which wa s di ffe ren t w it h the repo r 2t e d [13];t he Q T L tha t controll e d the se nsitivi ty of p l a nt he i gh t t o GA 3wa s d i ffe re nt w i th tha t de tec te d by Dong e t a l ;the de te c 2t e d Q T Ls tha t co ntro ll e d the sen sitivity of the first i nte rnode l e ngth to G A 3wa s l oca t e d on the d i ffe re nt site s w i th the repo r 2t e d t hre e GA 32se ns i ti ve Q T Ls tha t contr o ll e d the t op i nte rnodel e ngth [13],wh i ch p r ovide d theo re ti ca l ba sis fo r b re eding a nd im p rovi ng of ste rile li ne tha t w it h t he hi gh sen siti vity o f the fl a g l e af a ng l e.The con tri buti o n ra te of QT L de te cte d i n t h i s study wa s l o w .The rea sons m ight be a s foll ow s:①t he de nsity of gene ti c li nka ge m ap wa s l o w ,a nd the m a rk wa s no t ve ry e 2ve nly distri bu t e d i n a few l inkage group s,a nu m be r of Q T LS tha t of slight e ffe cts m igh t no t be de tec te d;②re spon se i nde x m e tho ds used fo r ca l cula ti on i n this s tudy wa s di ffe re nt w ith othe rs,the re ac ti on i nde x va l ue w a s sm a lle r,whi c h m i ght a l 2so l e ad t o the re duc ti on of the nu m be r of de te c ted Q T L a nd the contribu ti on ra te.The com pa rison o f Q T L of the tra its re la te d o utc r o ss i ng de tec ti o n re sults show ed tha t the QT L whi c h contr o lled the cha rac te r itse l f d i d no t con tro l the se nsiti vity of this tra it t o GA 3,o nl y the QTL tha t controlli ng p l a nt he igh twa s l o c a ted o n the sam e chrom o som e se c ti on (RM 65702RM5652a nd R M 56522RM410)w it h t he Q T L tha t co ntro ll ing t he sen siti vity of flag l ea f angl e t o G A 3(q I FL A 2921a nd q I FLA 2922),but the ir dis t a nce t o the m a rk we re diffe re nt w ith e a ch o the r,the dis 2t a nce be t w ee n the qF I L 2921tha t con tro l led the first i nte rnode l e ngth a nd R M 6570wa s 10cM ,while the dista nc e be t w e e n q I FLA 2921t ha t contr o ll e d the fl a g l e a f a ngl e a nd RM6570wa s 14cM ,t he Q T L tha t co ntro lo f p l a nt he i ght,t he firs t i nte rnode l e ngth and the second i nte rnode l e ngth we re l oc a ted i n the sam e s it e ,a nd the dista nc e from the RM 5652we re 12c M ,but the d i s tance be t we en q I FLA 2922tha t co ntro lli ng the fl a g l e af a ng l e a nd RM5652wa s 8cM.The se re sults w e re diffe r 2e nt w i th t he s t udie s of Q I A O B ao 2ji a n,which m i ght be due t o the com bi na ti o n a nd g roup s fo r l oca ti on w e re diffe re nt .I n ad 2diti o n,Q T L l o ca ti o n re sults sugge ste d tha t the QTL tha t con 2tro l led the p lant he ight a nd the se ns i ti vity of the se cond i n t e r 2node length to G A 3we re l oca ted o n the sam e l o cus RM486-R M 265,a nd the contri buti on ra te wa s hi ghe r,bu t it d i d not contr o l the sen siti vity of the firs t i n te rnode length t o G A 3.The refo re ,this re sult ha d p r o vide d a strong ge ne ti c ba s is t o se l e ct the spe c i e s t ha t wa s h i gh sen siti vity of the firs t i n t e r 2node length t o GA 3whil e a voide d t he high se ns i ti vity of thesecond int e rnode length to GA 3.R e fe re n c e s[1]WANG Y H (王月华),H AN LB (韩烈保),ZENG H M (曾会明),etal .The deve l o pm ent o f dwarf m u t an ts re l ated gi b bere l li n (植物赤霉素矮化突变体研究进展)[J ].C hi na B i o techno l og y(植物生物工程杂志),2006,26(8):22-27.[2]Y UAN LP (袁隆平),WANG S L (王三良),MA GH (马国辉),et a l.Hybri d ri ce (杂交水稻学)[M ].B eij i n g:C hi na Ag ri cu l tura l Pre s s (北京:中国农业出版社),2002:246-279.[3]DONG GJ (董国军),FUJ I MO T O K (藤本宽),TENG S (滕胜),etal .Q T L ana l ysis of fl a g l eaf ang l e i n ri ce (水稻剑叶角度的Q T L 分析)[J ].Ch i ne se J o urna l o f R i ce Sci ence (中国水稻科学),2003,17(3):219-222.[4]Y UAN LP (袁隆平),CHE N HX (陈洪新).Hybri d ri ce b reed i ng andculti va ti on (杂交水稻育种栽培学)[M ].C hang sha:Hunan Sci ence and Te chno l o gy Pres s (长沙:湖南科学技术出版社),1998:200.[5]ZHU BC (朱斌成).Effect of g i bbere ll i c ac i d (G A 3)on pa rental ag 2ronom i c cha ract e rs i n hybri d ri ce seed p rod ucti o n (施用“九二○”对杂交水稻制种父母本农艺性状的影响)[J ].J i ang xi Ag ri cu l tura l Sci 2ence &Techno l ogy (江西农业科技),1988(6):4-6.[6]L IAX (李安详),L ICH (李慈厚),D I N G KX(丁克信),e t a l .Me as 2ure s fo r p reven ti ng and con tr o ll i ng ke rnel s m u t i n hyb ri d ri ce seed p rodu cti o n (杂交制种稻粒黑粉病的综合防治)[J ].J i ang su Agri cul 2tura l Sci ences (江苏农业科学),1995(4):34-36.[7]TI A N DC (田大成),ZHANG SY (张素英),Q I N CL (秦春林).Yi el d 2i n creas i ng m echanis m and m ea su rem ent i n dex o f app l y i ng g i bber 2elli c aci d (GA 3)in hybri d ri ce seed p roduct i o n (杂交稻制种喷施“九二○”增产机理及其衡量指标的探讨)[J ].Hyb rid R i ce (杂交水稻),1990(6):20-23.[8]WU HL (吴汉林),SONG ZP (宋智萍).Techn i que o f t wo 2l i ne hy 2bri d ri ce com b i nati on seed p r o ducti on (两系杂交稻制种技术初步研究)[J ].Hyb ri d R i ce (杂交水稻),1989(S1):32-35.[9]K A M I J I M A O.Som e co ns i de rati o ns o n the m echan i sm o f exp res 2s i o n o f dwarf gene s i n ri ce p l ants (I).Re spon se t o g i bbe rell i c aci d and the p re sence o f endogenou s g i bbe rrlli n 2li ke sub stances i n ri ce p lants [J ].Sci R ep t Fac Ag ri K o be Un i v,1972,10:177-182.[10]TI A N DC (田大成).I nfl uential fact o r and con tr o l techno l o gy o f ou t 2cr o s s i ng se ed 2se tti ng i n hyb ri d ri ce seed p rod ucti o n (杂交水稻制种中异交结实影响因素及控制技术的研究)[D ].Nan j i ng:Nan ji ng Ag 2ri cu lt u ral Un i ve rsit y (南京:南京农业大学),1993:67-70.[11]DONG YJ ,K A M I U TEN H,Y ANG Z N ,et a l .M app i ng o f qua ntit a 2ti ve tra i t l oc i fo r gi bb ere l li c aci d re s po ns e a t ri c e (O ryza sa ti va L.)s eedli ng s tage[J ].P l an t Sci ence,2006,170:12-17.[12]WA N G YY(王盈盈).Q T L ana l ys i s of fl ag l eaf ang l e in ri c e (水稻剑叶角度的Q T L 定位分析)[D ].Nan j i ng:Nan j i ng Ag ri cultura lU ni 2ve rs it y (南京:南京农业大学),2009:26-28.[13]HE MY (何梅玉),CHEN MX (陈梅香).Ap p l i cat i o n of g i bbe rell i caci d (GA 3)on hybri d ri ce se ed p rodu cti o n (杂交水稻制种使用赤霉素技术)[J ].Seed Sci ence &Techno l ogy (种子科技),1997(4):45.[14]WANG S,BASTEN CJ ,ZEN G Z B.W i ndows Q T L ca rt o grap hervers i on 2.5[EB /OL ].Ra l ei gh ,NC :Dep a rt m ent o f S t a ti s ti cs ,No rth C aro l i na S t a t e U ni versit y,2006.[15]G UO Y (郭媛).Co n struc ti o n o f SSR l i nkage m ap an d anal ys is ofQ T L f o r r o ll ed l eaf of res t o rer l i ne i n j apon i ca ri ce (O ryza sa ti va )(粳稻SSR 连锁图谱的构建及恢复系卷叶性状Q TL 分析)[J ].C h i nese J ou rnal of R i ce Sci ence (中国水稻科学),2009,23(3):245-251.[16]CHURC HI LL GA,DO ER GE R W.E m p i ri cal th resho l d va l ue s f o rquan ti tati ve tra i t m ap p i ng [J ].Gene ti cs ,1994,138:963-971.[17]MCCOUCH SR ,CHO YG,Y ANO M,e t a l .R epo rt on QTL no 2m encl a t u re[J ].R i ce Genet New sl e tter,1997,14:11-13.R es p o n s i b le e d it o r:C H EN Xiu 2ch e n R e s p o n s ib l e t ran s la to r:L I Tin g 2t in g R e s p o n s i b le p ro o f re ad e r:W U Xia o 2ya n(下转第36页)65Ag ri cu l tu ral Sc i ence &Tech no l o gy Vo l .11,No.2,20101。

Contributions of roots and rootstocks to sustainable, intensified crop production

© The Author [2013]. Published by Oxford University Press [on behalf of the Society for Experimental Biology]. All rights reserved. For permissions, please email: journals.permissions@Review papeRContributions of roots and rootstocks to sustainable, intensified crop productionPeter J. Gregory 1,3,*, Christopher J. Atkinson 1, A. Glyn Bengough 4,5, Mark A. Else 1, Felicidad Fernández-Fernández 1, Richard J. Harrison 1 and Sonja Schmidt 4,61East Malling Research, New Road, East Malling, Kent ME19 6BJ, UK 2Natural Resources Institute, University of Greenwich, Central Avenue, Chatham Maritime, ME4 4TB, UK 3Centre for Food Security, School of Agriculture, Policy and Development, University of Reading, Reading RG6 6AR, UK 4James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK 5Division of Civil Engineering, University of Dundee, Dundee DD1 4HN, UK 6The SIMBIOS Centre, University of Abertay, Bell Street, Dundee DD1 1HG, UK *To whom correspondence should be addressed. E-mail: peter.gregory@Received 10 August 2012; Revised 11 December 2012; Accepted 18 December 2012AbstractSustainable intensification is seen as the main route for meeting the world’s increasing demands for food and fibre. As demands mount for greater efficiency in the use of resources to achieve this goal, so the focus on roots and rootstocks and their role in acquiring water and nutrients, and overcoming pests and pathogens, is increasing. The purpose of this review is to explore some of the ways in which understanding root systems and their interactions with soils could contribute to the development of more sustainable systems of intensive production. Physical interac-tions with soil particles limit root growth if soils are dense, but root–soil contact is essential for optimal growth and uptake of water and nutrients. X-ray microtomography demonstrated that maize roots elongated more rapidly with increasing root–soil contact, as long as mechanical impedance was not limiting root elongation, while lupin was less sensitive to changes in root–soil contact. In addition to selecting for root architecture and rhizosphere properties, the growth of many plants in cultivated systems is profoundly affected by selection of an appropriate rootstock. Several mechanisms for scion control by rootstocks have been suggested, but the causal signals are still uncertain and may differ between crop species. Linkage map locations for quantitative trait loci for disease resistance and other traits of interest in rootstock breeding are becoming available. Designing root systems and rootstocks for specific environ-ments is becoming a feasible target.Key words: Biopores, QTL, resource use, root distribution, rootstock, root–shoot communication, root–soil contact, root systems.IntroductionThe increasing demands for food, fibre, and fuel, coupled with global environmental changes, are placing increasing strains on the ability of ecosystems to deliver all of the goods and services that are required (UK Foresight, 2011). Sustainable intensifica-tion will require new ways of thinking about plant husbandry and the development of practices that integrate biological and ecological processes into food, forage, and fibre production(Pretty, 2008; Powlson et al ., 2011; UK Foresight, 2011).As demands mount for greater efficiency in the use of water, nutrients, and other resources as major contributors to achieving this sustainable intensification (Pretty, 2008; Powlson et al ., 2011), so the focus on roots and their role in acquiring resources is increasing (Gregory, 2006a ; Lynch, 2007; Gewin, 2010). There are clearly differences in patternsJournal of Experimental Botany , Vol. 64, No. 5, pp. 1209–1222, 2013doi:10.1093/jxb/ers385 Advance Access publication 1 February, 2013at South China Agricultural University on January 4, 2015/Downloaded fromof growth, architecture, and responses to soil properties between species and within genotypes (O’Toole and Bland, 1987; Gregory, 2006b), and some progress has been achieved in utilizing these differences to practical effect in cropping systems. For example, genotypes of common bean (Phaseolus vulgaris) with shallow root architecture have been shown to grow and yield better in soils of low P status than genotypes with deep architecture (Rubio et al., 2001; Ho et al., 2004; Henry et al., 2010). I n soybean, too, the most P-efficient genotypes had longer and larger root systems with a greater proportion of the root system in the topsoil (Ao et al., 2010). There are also opportunities to make greater use of the modifications that roots make to their immediate environ-ment to aid the acquisition of water and nutrients and fend off pathogens (Ryan et al., 2009; Richardson et al., 2011). The rhizosphere is a complex zone of soil both influenced by and influencing roots, and there is increasing evidence of the changed properties of this zone including modifica-tion of rhizosphere pH, and the release of compounds that encourage the proliferation of beneficial microorganisms, improve nutrient availability, and protect against some path-ogens (Hinsinger et al., 2009; Ryan et al., 2009; Hiltpold et al., 2010; Hawes et al., 2012). Ryan et al. (2009) detail some current and future targets for rhizosphere engineering including release of nitrification inhibitors to reduce emis-sions of N2O (Subbarao et al., 2009), exudation of organic anions such as malate and citrate to confer some tolerance to aluminium toxicity (Delhaize et al., 2004; Magalhaes et al., 2007), and release of enzymes such as phosphatases to enhance the availability of soil phosphorus (George et al., 2007; Richardson et al., 2011). Many plants exude phosphatase enzymes from their roots naturally and this can be associated with depletion of soil organic phosphorus (e.g. George et al., 2002). Achieving greater hydrolysis of such organic P by plants could be beneficial on many soils (Richardson et al., 2011).Plant roots also have substantial effects on soil physical properties, ranging from localized increases in bulk density resulting from root expansion (Greacen et al., 1968; Braunack and Freebairn, 1988; Y oung, 1998) to structure formation as a consequence of mucilage production, root hair formation, and localized wetting and drying (McCully, 1999; Hinsinger et al., 2009; Bengough, 2012a). There is substantial poten-tial for traits of the root tip region to be exploited to over-come soil mechanical impedance, soil water stress, and cell wall constraints to expansion (Acuna et al., 2007; Bengough et al., 2011; Leach et al., 2011). Root tip traits beneficial to root penetration include those that decrease cavity expansion pressure (e.g. narrowly pointed root tips favour cylindrical deformation; Greacen et al, 1968), frictional resistance (e.g. the lubrication action of mucilage and border cells; Vollsnes et al., 2010), and axial cell wall tension (e.g. by softening of cell walls in the axial direction). Anchorage of the root tip so that the root can extend into new soil may also be a use-ful trait and an important physical function of root hairs facilitating the re-entry of a root from a macropore to the bulk soil, or into a compacted layer from a loose seedbed (Bengough et al., 2011). Managing the physical properties of the rhizosphere to stabilize soils, improve soil structure, and enable plants to access deep soil water are all attainable and desirable possibilities (Whalley et al., 2006; Acuna et al., 2007; Hinsinger et al., 2009).I n addition to selecting for root architecture and rhizos-phere properties, the growth of many plants in cultivated systems is profoundly affected by selection of an appropriate rootstock. Many fruit trees, grapevines, and fruits such as pep-pers, tomatoes, and aubergines are grown with scions grafted onto rootstocks that confer resistance to various pathogens and tolerance to salinity, regulate the size of the scion, and contribute to fruit quality. For example, the Malling root-stocks (M9, M27, etc.) confer resistance to woolly aphid on the scion and produce a range of tree sizes (Hatton, 1935; Preston, 1966). Rootstock selection offers a powerful tool for the sustainable intensification of fruit production because while the scion genotype can be used to select fruit proper-ties, adaptation to water deficit and high salinity, tolerance of alkaline soils, and susceptibility to pathogens [e.g. fireblight (FB) in apple] can all be influenced by the choice of rootstock (Jensen et al., 2012; Marguerit et al., 2012; Tamura, 2012). The purpose of this review is to explore some of the ways in which understanding root systems and rootstocks and their interactions with soils could contribute to the develop-ment of more sustainable systems of intensive production. The three topics examined are: (i) physical contact between the root and soil; (ii) the use of rootstocks and root–shoot communication; and (iii) ‘designer’ root systems for sustain-able intensified production.Root–soil contact and root elongation Importance and methods of assessmentSoil physical conditions have large effects on both the ease with which roots can extend through soils and the transfer of water, gases, and nutrients to and from the root. The mecha-nisms underlying such root responses are complex, but have been deduced in a series of controlled experiments and field studies (e.g. van Noordwijk et al., 1992). Studies on the effects of root–soil contact using thin sections showed that water and nutrient uptake per unit root length decreased with decreas-ing root–soil contact (Kooistra et al., 1992; Veen et al., 1992). Kooistra et al. (1992) compacted sieved soil to bulk densi-ties of 1.50, 1.43, and 1.08 Mg m–3 and used photographic prints of thin sections of soil to determine root–soil contact of maize roots. Root–soil contact increased from 60% to 87% with increasing bulk density. Similarly, Veen et al. (1992) grew maize in a sandy loam soil compacted to five bulk densities(1.54, 1.50, 1.43, 1.32, and 1.08 Mg m–3), corresponding toa range of soil porosity from 42.3% to 59.6%, at soil matric potentials between –10 kPa and –20 kPa. While root length decreased as bulk density increased, they found that water and nitrate uptake per unit root length after a growth period of 29 d decreased by 20–60% with decreasing bulk density and decreasing root–soil contact.However, while porosity per se is important, the size of pores constituting the porosity also affects root growth and1210 | G regory et al.at South China Agricultural University on January 4, 2015/Downloaded fromactivity. Large pores are not good for root growth, with roots preferring a network of narrow pores (e.g. White and Kirkegaard, 2010). For example, Stirzaker et al . (1996) found that barley plants grew better in compacted soil (bulk den-sity 1.78 Mg m –3) with narrow biopores made by lucerne or ryegrass roots than in compacted soil with wider pores made by canola or clover roots, or artificially with a wire of 3.2 mm in diameter. The dry weight of barley shoots grown in soils with narrow biopores was up to 96% of that of plants grown under optimal soil conditions (bulk density 1.37 Mg m –3). Root responses to soil pore size and geometry depend on the way that forces are applied to the individual root tip, with recent evidence suggesting that roots are more sensitive to axial than to radial pressures (Bengough, 2012; Kolb et al ., 2012).A penetrometer resistance of 2 MPa is typically adopted as an indicator of soil in which mechanical impedance is likely to be a major impediment to root elongation (Taylor and Ratliff, 1969; Bengough et al ., 2011). However, a recent study of UK topsoils cultivated for crops has indicated that strength in many soils exceeds 2 MPa even when water is readily available for uptake (Bengough et al ., 2011). I n a wider range of 59 soils, penetrometer resistance was typically between 1 Mpa and 3 MPa despite their moist condition, with root elonga-tion of barley seedlings typically <50% of that in repacked soils (Valentine et al ., 2012). In field soils, seedling root elon-gation rate was most closely related to the volume of pores in the size range 60–300 µm (as estimated from water-release characteristics), and accounted for almost two-thirds of the variation in elongation rates. Two possible explanations wereoffered for this result: (i) that roots take advantage of the low resistance in larger pores; or (ii) that root elongation is limited by hypoxia (and associated higher CO 2 partial pressure), as smaller pores may have been water filled. These findings agree with those of Stirzaker et al . (1996) who found that pores formed by plants can improve growth conditions in hard soils, but large pores are less advantageous than intermediate pores.The determination of root–soil contact is very difficult because of the opaque nature of soils and the wide range of pore and particle sizes. Thin sections and 3D microtomo-graphs allow visualization of the rhizosphere, but poor con-trast between roots and soil makes it difficult to determine root–soil contact (van Noordwijk et al ., 1992). Schmidt et al . (2012) developed a non-invasive method to determine root–soil contact from 3D volumetric images with an accuracy of ±3%. Root–soil contact was determined for young maize and lupin seedlings grown in loosely packed soil (<1 Mg m –3) sieved to different aggregate fractions (4–2, 2–1, 1–0.5, and <0.5 mm) and wetted to a matric potential of –0.03 MPa. Root–soil contact decreased with increasing aggregate size (Fig. 1). Such contact appears to be beneficial as long as soil strength or matric potential do not limit root elongation. Maize grown for 4 d after germination in these soil conditions showed that roots elongated faster with increasing root–soil contact, as long as mechanical impedance was not limiting root elongation (Fig. 2), while lupin was less sensitive towards changes in root–soil contact. Closer root–soil contact prob-ably allowed faster uptake of both water and nutrients (Veen et al ., 1992). However, under dry conditions (matric potential –1.6 MPa), preliminary experiments showed no significantFig. 1. A 3D segmented image of a maize seedling grown in soil aggregates of <0.5 mm diameter (a) and 4–2 mm diameter (b), andthe corresponding contact segmented out in 3D for <0.5 mm diameter aggregates (c) and 4–2 mm diameter aggregates (d). e and f are close-up views (2D) of maize roots in contact with soil sieved to <0.5 mm and 4–2 mm, respectively.Future roots | 1211at South China Agricultural University on January 4, 2015/Downloaded fromlong, as is often the case in temperate tree fruit crops. I n the case of apple, rootstocks suffer from a number of spe-cific soil-borne diseases such as collar/crown rot caused by Phytophthora cactorum and replant disease, as well as others that affect the scion such as FB (Erwinia amylovora). Woolly apple aphid (W AA) is a pest of the scion and the rootstock but it is most damaging to the latter, in particular in the south-ern hemisphere. Thus, some pests and diseases have long been the focus of breeding programmes (see, for example, Crane et al., 1936) and are still the focus of intensive study. Two case studies for apple follow, for which some level of molecular detail is available.Fireblight resistance (Erwinia amylovora)Resistance to FB is also desirable in a rootstock, as infec-tion can occur in both scion and rootstock, and the tree can be killed by girdling of the rootstock by the patho-gen (Norelli et al., 2003). The most common source of FB resistance has been a cultivar Malus×robusta cv Robusta 5 (henceforth R5), a hybrid of Malus baccata and Malus prunifolia (Norelli et al., 1986). R5 was identified as highly resistant to the predominant FB strain and has been used as a parent in most rootstock breeding programmes including EMR and Geneva (NY). The resistance is of a quantita-tive nature, and a major associated quantitative trait locus (QTL) has been mapped to linkage group (LG) 3 (Peil et al., 2007) explaining >65% of the variance associated with FB resistance from R5. Inoculation with strains known to dif-fer in their pathogenicity on R5 revealed that there were in fact two QTLs present on LG3, and a further QTL on LG7 (Gardiner et al., 2012). Candidate genes underlying LG3 include a resistance gene of the LRR (leucine-rich repeat) family of receptor-like proteins (RLPs), implicated in resist-ance in many other species (Gardiner et al., 2012), and a peroxidase gene (MxdPrx8) that is differentially regulated between the susceptible rootstock ‘M.26’ and the resistant ‘G.41’. In the resistant rootstock, this gene is rapidly down-regulated in response to FB infection, while it is oppositely regulated in the susceptible genotype. Class three peroxi-dases, such as MxdPrx8, are implicated in defence responses in model systems, though it is still unclear exactly what role these genes have in resistance to FB in Malus sp. (Triplett et al., 2009). As noted previously, R5 is susceptible to minor strains of FB (Norelli et al., 1986) which could become more prevalent as cultivars carrying R5-derived resistance are increasingly abundant. Therefore, breeders have aimed to introduce FB resistance from other sources including the ornamental apple cultivar ‘Evereste’. A major QTL, Fb_E, explaining 50–70% of the phenotypic variation in a progeny from a cross between ‘M.M.106’ and ‘Evereste’ was mapped to LG12 by Durel et al. (2009). Subsequently, Parravicini et al. (2001) identified nucleotide-binding site (NBS)-LRR and serine/threonine kinase genes in this area as candidate genes for the trait.Durel et al. (2009) also identified a separate QTL explain-ing ~40% of the variation of FB resistance derived from ‘M. floribunda 821’ in the distal part of LG12.Woolly apple aphid (Eriosoma lanigerum)The W AA is a major pest of apples, forming galls on roots and branches, generally reducing tree vigour, shoot exten-sion, and yield, and increasing susceptibility to disease (Klimstra and Rock, 1985; Brown et al., 1995). Rootstocks such as ‘M.793’ (John Innes) and the Malling-Merton series (e.g. M.M.106) were develop to incorporate resistance into W AA ‘Northern Spy’, while in later rootstocks Malus baccata and Malus sieboldii have proved useful donors of major gene resistance (Crane et al., 1936; Bus et al., 2008). The resistance genes denoted as Er1–Er4 have been mapped to LG7 (Er4; Bus et al., 2010 from ‘Mildew Immune Selection’), LG8 (Er1 from ‘Northern Spy’ and Er3 from M. sieboldii), and LG17 (Er2 derived from M. robusta 5; Bus et al., 2008). As in the case of FB, W AA resistance is known to have broken down in some areas to all three major gene resistance types; how-ever, pyramiding of markers, coupled with the identification of new resistance from wild Malus species, as well as the pyra-miding of minor race resistance genes should prove effective for future resistance breeding (Bus et al., 2008).Linkage map locations for these and other traits of inter-est in rootstock breeding are presented schematically in Fig. 4 using the simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) map of Antanaviciute et al. (2012) to estimate the position of genes and QTLs published in various apple populations.‘Designer’ root systems for sustainable intensified productionWith the global demands for food and fibre increasing, and the realization that this increase will largely be achieved by increasing yields (Godfray et al., 2010; UK Foresight, 2011), the role of roots and rootstocks in accessing resources effi-ciently and contributing to yield has received increasing prominence (Lynch, 2007; Gewin, 2010). There are many potential targets for such approaches utilizing a wide vari-ety of root traits including basal-root gravitropism (Ho et al., 2004; Lynch, 2007; Ao et al., 2010), the presence of root hairs (Gahoonia et al., 2001; Brown et al., 2012), cortical aerenchyma (Lynch, 2007), and greater branching at depth (Wasson et al., 2012). The choice of rootstock is also achiev-ing greater prominence as horticultural production intensifies and the demand for fruits and vegetables increases.For rootstocks of fruit trees, current breeding objectives include effective vigour control (most desirable are dwarfing to semi-vigorous, depending on the orchard management sys-tem and environmental stresses), optimal fruit size and yield efficiency, good anchorage, resistance to pests and diseases [especially WW A (E. lanigerum), FB (E. amylovora), crown rot (P. cactorum)], and replant disease. The effects of different rootstocks on marketable yields in a range of fruit (e.g. apple, apricot, peach, grape, tomato, cucumber, and melon) crops are well documented, but it has only recently been recognized that rootstock genotype can alter specific aspects of post-harvest fruit quality of a scion (Goncalves et al., 2006). The1216 | G regory et al.at South China Agricultural University on January 4, 2015/Downloaded frommatching of rootstocks to scions to deliver fruit of a speci-fied nutritional quality is a likely productive area of future research. Rootstock control of other quality traits (e.g. fla-vour volatile production, susceptibility to pathogens during storage) has not yet been investigated but could contribute to food security by improving nutrition and reducing waste.More effective utilization of the mechanisms underlying root–shoot and shoot–root communication also offer oppor-tunities to increase yields and fruit quality. In wheat, the Rht-B1b and Rht-D1b alleles used widely in semi-dwarf genotypes reduce the response to GAs via dominant gain-of-function mutations in DELLA genes. Wojciechowski et al . (2009) demonstrated a direct effect of these dwarfing alleles on root growth during seedling establishment, rather than a second-ary partitioning effect. Shortening of internodes, rather than a reduction in the number of nodes per shoot, has been well characterized in cereals (Peng et al ., 1999), but the genes regu-lating precocity and scion growth in dwarfing apple and other crops are not yet known, although they must be a priority if intensified production systems are to be developed. Pilcher et al . (2008) identified the Dw1 locus as a major component of dwarfing in apple, and the emerging linkage maps should allow rapid progress (Antanaviciute et al ., 2012).One aspect of root systems that has been relatively ignored is what happens as crops approach maturity. Because roots in soil are difficult to study, most screens and experiments are undertaken with seedlings, but the functioning of systems during the filling of reproductive organs is crucial in realizing yield potential especially as ‘terminal drought’ is a common feature of many arable regions. In cereal crops, the downward descent of the root system typically ceases at around the time of flowering and start of grain growth (Gregory et al ., 1978). However, whether the root system continues to grow in mass and length during grain filling is less certain. In a study with six modern cultivars of wheat grown on a sandy loam, Ford et al . (2006) found that while root mass remained constant between anthesis and maturity, root length increased in both of the two seasons of study (but significantly in only one), suggesting that proliferation of fine roots occurred concur-rent with death of thicker, mature roots; overall, they found no evidence for a decline in root mass or length during grain filling. There were significant differences between cultivars in the distribution of roots within the soil profile, with one culti-var, Shamrock, having a significantly larger root system below 40 cm in both seasons. Late-season performance of roots is important for both water and nitrogen uptake because of their contributions to grain yield and grain quality. On deep soils, many studies have indicated the desirability of increas-ing root length at depth to better capture and use water avail-able in the subsoil (e.g. Richards, 2008; Wasson et al ., 2012),Fig. 4. Schematic linkage map of apple using SNP and SSR marker data as in Antanaviciute et al . (2012) with the scale in centiMorgans (cM) given on the left. Genes and QTL positions were estimated from linkage information provided by Moriya et al . (2010) for grown gall resistance (Cg ), by Rusholme-Pilcher et al . (2008) for dwarfing (Dw1), by Bai et al . (2012) for columnar growth habit (Co ), by Bus et al . (2008, 2010) for w oolly apple aphid resistance (Er1-4) and, for fireblight resistance, by Peil et al . (2007; Fb_R5), Khan et al . (2007; Fb_F ), and Durel et al . (2009; Fb_E and Fb_Mf ).Future roots | 1217at South China Agricultural University on January 4, 2015/Downloaded fromand root lengths of ~1 cm root cm–3 soil have been shown by models and experiments to ensure uptake of all the available water at moderate rates of evaporation (van Noordwijk and de Willigen, 1987; Gregory and Brown, 1989; Tardieu et al., 1992).Past study of roots has been bedevilled by a lack of tech-niques (Gregory, 2006a). However, recent technological improvements in non-invasive techniques, such as X-ray microtomography, have permitted the response of different plant species, genotypes, and individual roots to soil proper-ties to be more readily examined, providing details of root angles and root system spread (Hargreaves et al., 2009), root diameters (Tracy et al., 2012), and root–soil contact (Schmidt et al., 2012). Field and laboratory phenotyping of roots and rootstocks to complement genomic studies are emerging as techniques to speed up the selection of ideotypes that can be a part of intensified production systems (Gregory et al., 2009; Wasson et al., 2012).AcknowledgementsEast Malling Research is supported financially by the East Malling Trust, and the James Hutton Institute receives fund-ing from the Scottish Government.ReferencesAcuna TLB, Pasuquin E, Wade LJ. 2007. Genotypic differences in root penetration ability of wheat through thin wax layers in contrasting water regimes and in the field. Plant and Soil301, 135–149.Aloni B, Cohen R, Karni K, Aktas H, Edelstein M. 2010. Hormonal signalling in rootstock–scion interactions. Scientia Horticulturae127, 119–126.Antanaviciute L, Fernández-Fernández F, Banchi E, EvansKM, Velasco R, Dunwell JM, Troggio M, Sargent DJ. 2012. An evaluation of the Malus Infinium whole genome genotyping array in an apple rootstock mapping progeny. BMC Genomics13, 303.Ao J, Fu J, Tian J, Yan X, Liao H. 2010. Genetic variability for root morph-architecture traits and root growth dynamics as related to phosphorus efficiency in soybean. Functional Plant Biology37, 304–312. Atkinson BS, Sparkes DL, Mooney SJ. 2009. Effect of seedbed cultivation and soil macrostructure on the establishment of winter wheat (Triticum aestivum). Soil and Tillage Research103, 291–301. Atkinson CJ, Else MA. 2001. Understanding how rootstocks dwarf fruit trees. Compact Fruit Tree34, 46–49.Atkinson CJ, Else MA, Taylor L, Dover CJ. 2003. Root and stem hydraulic conductivity as determinants of growth potential in grafted trees of apple (Malus pumila Mill.). Journal of Experimental Botany54, 1221–1229.Bai T, Zhu Y, Fernández-Fernández F, Keulemans J, BrownS, Xu K. 2012. Fine genetic mapping of the Co locus controlling columnar growth habit in apple. Molecular Genetics and Genomics 287, 437–450.Beakbane AB. 1956. Possible mechanism of rootstock effect. Annals of Applied Biology44, 517–521.Beakbane AB, Thompson EC. 1947. Anatomical studies of stem and roots of hardy fruit trees. IV. The root structure of some new clonal apple rootstocks budded with Cox’s Orange Pippin. Journal of Pomology and Horticultural Science23, 203–226.Bengough AG. 2012a. Water dynamics of the root zone: rhizosphere biophysics and its control on soil hydrology. Vadose Zone Journal11 vzj2011.0111.Bengough AG. 2012b. Root elongation is restricted by axial but not by radial pressures: so what happens in field soil? Plant and Soil360, 15–18.Bengough AG, McKenzie BM, Hallett PD, Valentine TA. 2011. Root elongation, water stress, and mechanical impedance: a review of limiting stresses and beneficial root tip traits. Journal of Experimental Botany62, 59–68.Benschop JJ, Jackson MB, Guhl K, Vreeburg RAM, Croker SJ, Peeters AJM, Voesenek LACJ. 2005. Contrasting interactions between ethylene and abscisic acid in Rumex species differing in submergence tolerance. The Plant Journal44, 756–768.Braunack MV, Freebairn DM. 1988. The effect of bulk density on root growth. Proceedings of the 11th International Conference of the International Soil Tillage Research Organisation. Edinburgh, 25–30. Brown JK, George TS, Thompson JA, Wright G, Lyon J. Dupuy L. Hubbard SF, White PJ. 2012. What are the implications of variation in root hair length on tolerance to phosphorus deficiency in combination with water stress in barley (Hordeum vulgare)? Annals of Botany110, 319–328.Brown MW, Schmitt JJ, Ranger S, Hogmire HW. 1995. Yield reduction in apple by edaphic woolly apple aphid (Homoptera: Aphididae) populations. Journal of Economic Entomology88, 127–133. Bukovac MJ, Wittwer SH, Tukey HB. 1958. Effect of stock–scion interrelationships on the transport of P32 and Ca45 in the apple. Journal of Horticultural Science33, 145–152.Bulley SM, Wilson FM, Hedden P, Phillips AL, Croker SJ, James DJ. 2005. Modification of gibberellin biosynthesis in the grafted apple scion allows control of tree height independent of the rootstock. Plant Biotechnology Journal3, 215–223.Bus VGM, Bassett HCM, Bowatte D, Chagné D, Ranatunga CA, Ulluwishewa D, Wiedow C, Gardiner SE. 2010. Genome mapping of an apple scab, a powdery mildew and a woolly apple aphid resistance gene from open-pollinated Mildew Immune Selection. Tree Genetics and Genomes6, 477–487.Bus VGM, Chagné D, Bassett HCM, et al.2008. Genome mapping of three major resistance genes to woolly apple aphid (Eriosoma lanigerum Hausm.). Tree Genetics and Genomes4, 223–236. Carminati A, Vetterlein D, Weller, U, Vogel HJ, Oswald SE. 2009. When roots lose contact. Vadose Zone Journal8, 805–809.Cohen S, Naor A. 2002. The effect of three rootstocks on water use, canopy conductance and hydraulic parameters of apple tees and predicting canopy from hydraulic conductance. Plant, Cell and Environment25, 17–28.Claverie M, Dirlewanger E, Cosson P, et al.2004 High-resolution mapping and chromosome landing at the root-knot nematode resistance locus Ma from Myrobalan plum using a large-insert BAC DNA library. Theoretical and Applied Genetics109, 1318–1327.1218 | G regory et al.at South China Agricultural University on January 4, 2015/Downloaded from。

玉米种质资源RAPD分子标记优化体系的建立

玉米种质资源RAPD分子标记优化体系的建立摘要：玉米是我国重要的作物之一，在我国农业生产中占有重要的地位。

吉林省玉米生产条件优越，推广面积广，品种的多样性丰富，为实现品种的管理高效科学，本实验以用玉米为材料进行rapd 的研究。

建立并优化了高效、稳定的玉米rapd实验体系，为后续玉米品种的管理提供了理论基础和实验支持。

关键词：玉米；rapd；体系中图分类号：s513 文献标识码：a 文章编号：1674-0432（2012）-10-0050-1rapd（random amplified polymorphic dna，rapd）也称之为ap-pcr （arbitrary primer pcr），即随机扩增多态性dna技术，是由williams和welsh等发明的分子标记技术，是一项基于pcr技术的分子生物技术[1，2]。

是以基因组dna为模板，常以一个10mer随机的寡核苷酸序列作引物，通过pcr扩增，产生许多不连续的dna 产物，以检测dna序列的多态性。

它可以在物种遗传背景没有任何分子生物学研究的情况下，对其进行基因组指纹图谱进行研究，还可用于检测体细胞杂种和微生物分型[3]。

玉米是最为古老的栽培作物之一，目前为我国栽培作物的第二位。

吉林省是我国的玉米大省，玉米种植面积在全国位于前五，每年审定的新品种数目为全国第一。

由于目前玉米为品种检定手段还不完善，导致品种管理有一定的难度。

故本文进行了玉米rapd分子标记体系的相关研究。

1 材料与方法1.1 供试材料购买市售玉米主推品种种子。

1.2 试验试剂和仪器taqdna聚合酶、10×pcr buffer（含mg2+）、10碱基随机引物、dntps；琼脂糖、dna分子量标准（d2000maker）等购自上海生工。

1.3 rapd方法1.3.1 dna的提取以玉米种子产生的嫩芽为材料，使用植物基因组试剂盒提取。

dna回溶于灭菌水中，-20℃保存备用，1%琼脂糖凝胶电泳检测质量。

局部放电信号特征的提取

局部放电信号特征的提取局部放电信号特征的提取摘要在局部放电量的实际测量中，测量的准确性经常会受到外界⼲扰的影响。

如何正确判断局放脉冲和⼲扰脉冲成为⼀个重要环节。

如何全⾯掌握设备内部局放的信息来进⾏绝缘诊断也⼀直是很多学者和现场试验⼈员研究的⽅向。

本⽂介绍了⼀种⽤于正确区分局部放电脉冲和⼲扰脉冲，准确测量局部放电量，并能够分析局放发⽣过程中所记录的各种信息的图形分析⽅法。

⽂章的第⼀章，作者从局部放电的产⽣、危害、⼀般测试⽅法以及测试技术的新发展等⽅⾯概述了⼀些基础知识。

⽂章的第⼆、三、四章，作者从图形分析⽅法的原理、具体实现和现场应⽤等⾓度，全⾯阐述了这种新的局部放电测试⽅法。

⽂章最后，作者对全⽂进⾏了总结，并展望了今后的⼯作。

关键词: 局部放电；图形分析；应⽤Characteristic Extraction of Partial DischargeSignalAbstractWhen measur the amount of partial discharge, the accuracy of measurement is varied constantly by outer interference. It's important to distinguish partial discharge pulse from interference pulse. So how to judge insulation quality according to partial discharge information became the study direction of many scholars and site personnel. A graphic analysis method is introduced in the paper, which can distinguish partial discharge pulse from interference pulse and measure the amount of partial discharge accurately, analyze all kinds of graphic that is recorded during the process of partial discharge. In chapter one, some fundamental knowledge of partial discharge is discussed. In chapter two、three and four. The new measurement is elaborated in the principle of graphic analysis and site application. In the end, the author summarized, and out looked the future.Keywords;Partial discharge ;Graphic analysis;Application摘要 (I)Abstract................................................................................................................................. II 1绪论 (1)1.1课题的背景 (1)1.1.1局部放电定义及其产⽣原因 (2)1.1.2局部放电的危害 (3)1.2 局部放电的测试⽅法 (4)1.2.1 ⾮电测法 (4)1.2.2电测法 (4)1.3局放测试技术的新发展 (5)1.3.1傅⽴叶变换 (5)1.3.2⾃适应滤波 (5)1.3.3 专⽤滤波器 (6)1.3.4 ⼩波变换 (6)2局部放电的测量 (8)2. 1⼯频电压下的局部放电 (8)2.2局部放电的参数 (9)3图形分析⽅法及其实现 (11)3.1局部放电测试的图形分析⽅法 (11)3.2 图形分析⽅法的硬件实现 (15)3.3图形分析⽅法的软件实现 (16)4图形分析⽅法的应⽤ (17)4.1 局部放电脉冲的图形分析 (17)4.2局部放电测量中的⼲扰图形分析 (22)4.3图形分析在局部放电现场测量中的应⽤ (28)4.3.1 局部放电测量中的电晕图形 (28)4.3.2 局放图形的分析 (30)4.3.3 图形分析⽅法在绝缘判断中的应⽤扩展 (33)4.3.4图形分析⽅法应⽤中的遗留问题 (35)结论 (38)参考⽂献 (39)致谢 (40)1绪论1.1 课题的背景对电⼒设备进⾏在线检测是具有重⼤现实意义和应⽤前景的前沿课题，对提⾼电⼒系统的安全性和运⾏⽔平有巨⼤的作⽤。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Feature-Based Tagger of Approximations of Functional Arabic MorphologyJan Hajiˇc&Otakar Smrˇz Inst.of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University in Prague {hajic,smrz}@ufal.mff.cuni.cz Tim Buckwalter&Hubert Jin Linguistic Data Consortium University of Pennsylvania timbuck2@ hubertj@1IntroductionTheﬁeld of morphological disambiguation of Arabic has recently witnessed signif-icant achievements(Habash and Rambow[15],Smith et al.[28]).Through them, the Penn Arabic Treebank(PATB,Maamouri et al.[24])is being conﬁrmed as a standard for development and evaluation of systems for automatic morphological processing of Arabic,and the Buckwalter Arabic Morphological Analyzer(Buck-walter[6,7])is becoming the most respected lexical resource of its kind.The context for understanding the current paper has evolved since our work on it started,yet,the motivation for it is unchanged and the conclusions are valid and up-to-date.We would like to open some issues concerning the very description of Arabic morphology and point out that in this domain,one should carefully dis-tinguish individual problems,theories,resources,and solutions for their frequent idiosyncrasies and incompatibilities.In this contribution,we reference Functional Arabic Morphology(Smrˇz[29]) and take the Buckwalter Morphology as the departure point for approximating this novel model by(a)restoring the true syntactic units(b)seeking their functional, rather than structural,morphological categories.We then presentﬁve versions of a feature-based morphological tagger depending on that approximation,which were built on all the currently available Parts of PATB,as well as on the MorphoTrees annotations of the Prague Arabic Dependency Treebank(PADT,Hajiˇc et al.[18]).1.1The Disambiguation ProblemArabic is a language of rich morphology,both derivational and inﬂectional(Holes [20]).Due to the fact that the Arabic script usually does not encode short vowelsand omits some other important phonological distinctions,the degree of morpho-logical ambiguity is very high.In addition,Arabic orthography prescribes to concatenate certain word forms with the preceding or the following ones,which makes the boundaries of syntactic units,i.e.tokens as we denote them,obscure.Unlike in Chinese or German,how-ever,in Arabic there are clear limits to the number and the kind of tokens that can combine in this manner.What appears to be one orthographical string in a Modern Standard Arabic text,can actually constitute from up to four syntactic tokens.1 In Latin script-based languages,one usually assumes that words,i.e.the input strings,can be processed into tokens easily and uniquely by an independent tok-enizer module that runs before a morphological tagger.For Arabic,this is not pos-sible—one input string can be analyzed in such ways that not only the morphemes, but even the syntactic tokens may vary for individual readings of the string.Thus,the problem of disambiguation of this language encompasses subprob-lems like tokenization,full morphological tagging or its simpliﬁed‘part-of-speech’versions,lemmatization,diacritization(discussed in Nelken and Shieber[25])or restoration of the structural components of words.These subproblems,of course, can come in many variants and combinations.1.2Existing Morphological SystemsThe long evolution of computational modeling of Arabic morphology is nowa-days mirrored in the excellent works of(Kiraz[23])and(Beesley and Karttunen [4]),and although many morphological systems are in development(Ramsay and Mansur[27]or Soudi et al.[32],inter alia),only(Beesley[3])and(Buckwalter [6,7])are actually accessible to the interested public,meeting the prerequisite to their wider application and evaluation.It appears from the literature and implementations(many summarized in Al-Sughaiyer and Al-Kharashi[1])that Arabic computational morphology has under-stood its purpose in the sense of operations with morphs rather than morphemes (cf.El-Sadany and Hashish[12];see also Sproat[33]or Stump[34]),and has not concerned itself systematically and to the necessary extent with its role for syntax.The outline of formal grammar in(Ditters[11]),for instance,works with gram-matical categories like number,gender,humanness,deﬁniteness,but one cannot see which of the existing systems could provide for this information correctly,as they misinterpret some morphs for bearing a category,and underdetermine lexical morphemes in general.Certain syntactic parsers,like(Othman et al.[26]),may resort to their own morphological analyzers,but still,they do not get rid of the 1In theory(Fischer[13]),an additional personal sufﬁx might increase this number toﬁve,which is extremely unlikely to occur in the standard language,and is unattested in the resources we study.StringToken Token TagBuckwalter’s Morph Tags Token Form Token GlossF---------FUT sa-willVIIA-3MS--IV3MS+IV+IVSUFF_MOOD:I yu-h ˘bir-uhe-notify S----3MP4-IVSUFF_DO:3MS -hum them P---------PREP bi-about/bySD----MS--DEM_PRON_MS d ¯¯a likathat P---------PREP an by/about N-------2RNOUN+CASE_DEF_GEN t .ar¯ıq-i way-ofN-------2D DET+NOUN+CASE_DEF_GEN ar-ras¯a il-i the-messages A-----FS2DDET+ADJ+NSUFF_FEM_SG++CASE_DEF_GENal-qas .¯ır-at-i the-shortC---------CONJwa-andZ-------2DDET+NOUN_PROP++CASE_DEF_GENal- internet-i the-internetC---------CONJwa-andFN------2R NEG_PART+CASE_DEF_GEN ˙g ayr-i other/not-of S----3FS2-POSS_PRON_3FS-h¯athemFigure 1:Tokenization of input strings into tokens in he will notify them aboutthat through SMS messages,the Internet,and other means ,and the disambiguated morphological analyses with Buckwalter’s tags and the quasi-functional token tags.form of an expression and only incidentally introduce truly functional categories (cf.Hajiˇc et al.[19]).In syntactic considerations they often call for discriminative semantic features mercial systems,esp.(Chalabi [9]),do not seem to overcome this interference either.The missing common rationale as to what higher linguistic framework the mor-phology should serve for crystalizes in the number of individual,ad hoc tagsets and a very rare discussion of their motivation,completeness,relevance and actual expressive power.This situation brought us to designing Functional Arabic Mor-phology.2In (Hajiˇc et al.[19],Smrˇz and Hajiˇc [30]),we discuss its principles and show how e.g.agreement can naturally be controlled and restored in the functional system —which is impossible if recognizing morphs only and not their functions.1.3Approximating the Functional ModelThe underlying morphological engine for both the Penn Arabic Treebank and the Prague Arabic Dependency Treebank is the Buckwalter Arabic Morphological An-2Functional Arabic Morphology (Smrˇz [29])is being implemented in the Functional Morphol-ogy (Forsberg and Ranta [14]),which is a methodology as well as a domain-speciﬁc programming language embedded in Haskell building on the computational toolkit Zen for Sanskrit (Huet [21,22]).MorphsForm Token TagLemma Glosses per Morph|laY+(null) ¯a l¯a VP-A-3MS-- ¯a l¯a promise/take an oath +he/it |liy˜ ¯al¯ıy A--------- ¯a l¯ıy mechanical/automatic|liy˜+u ¯a l¯ıy-u A-------1R ¯a l¯ıy mechanical...+[def.nom.]|liy˜+i ¯a l¯ıy-i A-------2R ¯a l¯ıy mechanical...+[def.gen.]|liy˜+a ¯a l¯ıy-a A-------4R ¯a l¯ıy mechanical...+[def.acc.]|liy˜+N ¯a l¯ıy-un A-------1I ¯a l¯ıy mechanical...+[indef.nom.]|liy˜+K ¯a l¯ıy-inA-------2I ¯a l¯ıy mechanical...+[indef.gen.]|l + ¯a lN--------R ¯a l family/clan ++iy -¯ıS----1-S2-¯ı+myIilaY il¯a P--------- il¯a to/towards Iilay +ilay P--------- il¯a to/towards ++ya-yaS----1-S2-ya+meOa+liy+(null) a-l¯ıVIIA-1-S--waliya I +follow/come after +[ind.]Oa+liy+a a-liy-aVISA-1-S--waliya I +follow/come after +[sub.]AlY|lY |lY¯a l¯a |ly|ly¯a l¯ıy |l y|l¯a ly¯ıIlY IlYil¯aIly yIlyil¯ayOlyOlyFigure 2:Analyses of the string AlYturned into the MorphoTrees hierarchy.alyzer (Buckwalter [6,7]).While PATB adopts the analyses in their original format (Maamouri et al.[24]),the PADT annotations take place on quasi-functional ap-proximations organized into MorphoTrees (Smrˇz and Pajas [31]).With respect to the linguistic view and the architecture of the tagger that we will develop,we unify the format of the morphological data by converting all the Parts of PATB into the approximation,which is done in two steps:(a)the morphs of the original input strings are re-grouped to form tokens (b)the corresponding sequences of tags are mapped into the ﬁxed-width positional notation of PADT.Let us illustrate the transformations through Figure 1and Figure 2,and refer to (Smrˇz and Pajas [31],Smrˇz and Hajiˇc [30])for any other details.2The Feature-Based TaggerThe reﬁned morphological information that we have sought in our studies requires full morphological tagging to be established.The Arabic tagger that we present is an adaptation of the feature-based,exponential-model tagger described in(Hajiˇc and Hladk´a[17]),taking the advantage of the positional tag system by predicting the individual columns/categories separately(but not irrespective of the other ones in context).It has been used on several inﬂectional and agglutinative languages.Recall the problem of tokenization being part of morphological analysis in Arabic.In order to keep the tagger’s functionality unmodiﬁed,we extend the tagset in the way that all input strings will be considered4-tuples of tokens,with the resulting string tag being a concatenation of the token tags.By disambiguating such aggregate tags,the tagger decides the tokenization as well,since the tokens can be deterministically derived from the tag and the list of associated lemmas.32.1Feature-Based TaggingInstead of employing the source–channel paradigm for tagging,we are using here a conditional approach to modeling,for which we have chosen an exponential prob-abilistic model.Such model(when predicting an event4y∈Y in a context x)hasthe general formp AC,e(y|x)=exp( n i=1λi f i(y,x))Z(x)(1)where f i(y,x)is the set(of size n)of binary-valued(yes/no)features of the event value being predicted and its context,λi is a“weight”(in the exponential sense)of the feature f i,and the normalization factor Z(x)is deﬁned naturally asZ(x)= y∈Y exp(n i=1λi f i(y,x))(2) We use a separate model for each ambiguity class AC(that actually appeared in the training data)of each of the4×10morphological categories.Theﬁnal distribu-tion p AC(y|x)is further smoothed using unigram distributions on subtags(again, separately for each category):p AC(y|x)=σp AC,e(y|x)+(1−σ)p AC,1(y)(3) Such smoothing takes care of any unseen context;for ambiguity classes not seen in the training data,for which there is no model,we use unigram probabilities of subtags,one distribution per category.3Lemma disambiguation is a separate process following tagging,and is not covered here.4In our case,a subtag,i.e.a unique value of a morphological category.In the general case,features can operate on any imaginable context.In practice, we view the context as a set of attribute–value pairs with a discrete range of values. Every feature can thus be represented by a set of contexts in which it is positive. There is,of course,also a distinguished attribute for the value of the variable being predicted(y);the rest of the attributes is denoted by x as expected.Values of attributes are denoted by an overstrike(y,x).The pool of contexts of prospective features is for the purpose of morphological tagging deﬁned as a full cross-product of the category being predicted(y)and of the x speciﬁed as a combination of[A]an ambiguity class of a single category,which may be differentfrom the category being predicted,or[B]a word form(the inputstring),or[C]a single position value membership in an ambiguityclass,or[D]a full tag(to the left of the current position only),and[E]the current position,or[F]immediately preceding/following po-sition in text,or[G]position±2strings apart,or[H]closest pre-ceding/following position(up to four positions away)having a certainambiguity class in the POS category.The full cross-product of these contexts is prohibitively large,but there are means to limit the size of the pool of features toﬁt to available memory.For Arabic,we have used a limit of7million feature contexts in the pool.Feature weights can be computed only iteratively,but it is impossible to do so in reasonable time while selecting the features at the same time,even when using certain shortcuts(Berger et al.[5]).Therefore,the initial feature weight of a feature which is true in context x for a tag y is estimated as the log of the conditional probability p(y|x),estimated by MLE from the training data.This makes the model essentially a form of a Naive Bayes one.The learner is allowed to vary the weights(in several discrete steps)during fea-ture selection,a(somewhat crude)attempt to depart from the original Naive Bayes simpliﬁcation to the approximation of the“correct”Maximum Entropy estimation.2.2Training IterationsGiven the huge number of possible features,the training proceeds in four“itera-tions”,each adding more complex features,but only from those training events that are in error after the previous iteration.The training“iterations”are not really part of the algorithm;they only allow to try and possibly keep more detailed features in later iterations when most simpleCharacteristics#Train#Test Data#String Tags#Token Tags Experiment Strings Strings Tokens Total Anno.Total Anno. PATB Part2Prototype12255619683230742031852317242 PATB Part3320998192832269028641251391314 PADT MorphoTrees10688719253225473164927378265 PATB Part11200451933922131884534165143 PATB Part1Revised12539219363221042226785401271Table1:Characteristics of the sets of data(note all tags vs.annotated only).features are not generated any more since there are much fewer errors remaining. Experimentally,we found that adding more than four such iterations does not im-prove the results.The type and nature of the features allowed for the i-th iteration has been tried experimentally and heuristically.Typically,only the simplest fea-tures(such as those having an input string as the only“context”and nothing more) are used in theﬁrst iteration.A single training iterationﬁrst generates a feature pool of a predetermined size and of the requested type(and complexity),and then proceeds in selecting features (and estimating their weights)in a greedy way(minimizing the training data error rate directly as the objective function)as described in(Hajiˇc[16]).3Experiments and EvaluationTable1overviews theﬁve experiments described below,the parameters of which severely differ.The level of detail of morphological annotation(cf.sizes of tagsets) and the ambiguity within analyses considerably increase with the progress in time.The resulting models were tested using the simplest tagging mode(Viterbi beam width1,effectively canceling the Viterbi search,5and the independence as-sumption about the categories,i.e.,simply multiplying the probabilities of the40 category values and normalizing by the available tags listed as analyses).Table2delivers the taggers’performance in terms of accuracy for full mor-phological tagging(40positions per string,10positions per token),part-of-speech tagging(assigning only theﬁrst position,one of15values,in each token tag),and lemmatization(choosing one lemma for every token in the string).Two variants of tokenization are evaluated using Fβ=1(see further the Discussion).5Please note that this is not a decisive factor for the resulting accuracy,as opposed to e.g.HMM--based tagging,since the tagger looks right for morphological ambiguity classes—see above the description of the feature pool available to the tagger at training time.Performance Per String Per TokenExperiment Full(40)Full(10)POS Lemma Tknz++Tknz PATB Part2Prototype87.8889.3196.4692.3399.3199.51 PATB Part386.8288.1795.2589.9197.5298.60 PADT MorphoTrees87.7389.2496.0290.6497.7199.25 PATB Part196.8596.9997.3792.7597.4799.37 PATB Part1Revised88.1389.1695.5790.2797.1398.86Table2:Performance evaluation for the individual experiments(in percents).Penn Arabic Treebank Part2,Version1The pre-release of this dataset(iden-tiﬁed as LDC2003E17)served for developing the prototypes of both the tagger and the mapping between Buckwalter’s sequences of tags and the quasi-functional po-sitional token tags.The analyses do not seem to overgenerate for orthographical variation(Buckwalter[8])too much yet(note the similar Tknz++and Tknz).Habash and Rambow[15]report96.5%accuracy in POS tagging for the com-parable dataset and tagset,counting,unlike us,only the well-tokenized data. Penn Arabic Treebank Part3,Version1This dataset(LDC2004T11)brings the advanced features of Buckwalter’s morphology,among which are complete vo-calization(with case and mood endings),extended lexicon,andﬁner tags for verbs and particles.Therefore,the mapping into the approximation also improved,and the complexity of the tagset largely increased compared to that of the prototype. Prague Arabic Dependency Treebank1.0Due to the nature of MorphoTrees (LDC2004T23),where long-dependency relations between tokens may be weak-ened and some values in tags expanded for the sake of more precise annotations, certain token combinations may be listed in the format for the tagger that the ana-lyzer would not produce(note the highest number of non-annotated string tags).MorphoTrees are the‘purest’available approximation of the Functional Arabic Morphology.Given the detail and the complexity of the data,the tagger’s perfor-mance is remarkable.Just like with PATB Part3,no other computational results relevant to this dataset are known to us.Penn Arabic Treebank Part1,Version2The annotations in this dataset(LDC-2003T06)are morphologically most‘impoverished’,but have been used by other researchers(Diab et al.[10],Habash and Rambow[15])to train very successful taggers based on support vector machines.Habash and Rambow[15]reach98.1% of POS accuracy and96.2%of accuracy in full token tagging,which are results well comparable to ours.|lY|ly IlY Oly Al Y|l yAlYεεIly yFigure3:Discussion of partitioning and tokenization of input strings. Penn Arabic Treebank Part1,Version3This is the revised version(LDC2005-T02)of the previously mentioned corpus,with a complete coverage of the lexicon and including the advanced features of the morphology.It is actually this dataset and not its former version(Smith et al.,p.c.)that Smith et al.[28]used for the development of their log-linear source–channel tagging model.It works with strings and morphs only,but achieves overwhelming results (accuracy)—96.1%in full string tag disambiguation,95.4%in the restoration of morphs,and94.6%in assigning one representative lemma per input string.4Discussion and ConclusionsWe have introduced two measures for nz is close to the evaluations in(Habash and Rambow[15],Diab et al.[10])which only check the partition-ing determined byﬁnding token boundaries between the characters of the original string,and do not,unlike Tknz++,require the tokenization to faithfully reconstruct the canonical non-vocalized forms of tokens,as is the standard in MorphoTrees (Smrˇz and Pajas[31],Smrˇz and Hajiˇc[30]).The disparity of these tokenizations is illustrated in Figure3.The graph on the left depicts the three‘sensible’ways of partitioning the input string AlY in the approach of(Diab et al.[10]),where characters are classiﬁed to be token-initial or not.Two tokenizations are obtained by linking the boundaries from0to3fol-lowing the solid edges.The third partitioning AlYεεimplies there is another ﬁctitious boundary and some‘empty word’εεat the end of the string,which cor-responds to taking the dashed edge in the graph.Even though conceptually sound,this kind of partitioning cannot undo the ef-fects of orthographical variation(Buckwalter[8]),nor express other useful dis-tinctions.The hierarchy in Figure3relates this tokenization to that of Figure2. Habash and Rambow[15,section7]correctly point out that“[t]here is not a single possible or obvious tokenization scheme:a tokenization scheme is an analytical tool devised by the researcher.”Nonetheless,different tokenizations capture dif-ferent information,and some may be linguistically not as appropriate as others (cf.Bar-Haim et al.[2]for the inﬂuence of tokenization on tagging in Hebrew).In any case,we evaluate tokenizations in terms of the Longest Common Sub-sequence(LCS)problem.The tokens that are the members of the LCS with some referential tokenization,are considered correctly recognized.Dividing the length of the LCS by the length of one of the sequences,we get recall,doing it for the other of the sequences,we get precision.The harmonic mean of both is Fβ=1.We have presentedﬁve versions of the feature-based tagger of Arabic,devel-oped gradually on all the data of the Penn Arabic Treebank and the Prague Arabic Dependency ing the experience with other inﬂectional languages,we prefer the functional treatment of the morphology of Arabic,which we now only approximate.The pure description with respect to syntactic tokens and their rele-vant,functional grammatical categories is being further pursued and implemented.The results of our tagger rank competitively high in theﬁeld(cf.Habash and Rambow[15]).Full morphological tagging is expected to improve with the in-creasing‘functionality’of the data.Note that applying the conditionally-estimated context-based models set forth in(Smith et al.[28])to such data is certainly possi-ble and promising,too.Lemmatization and the issue of unknown words have only received little attention in our tagger,and can be well improved.This research was supported by the Ministry of Education of the Czech Re-public,project MSM0021620838,by the Grant Agency of Charles University in Prague,project207-10/203333,and through the Fulbright-Masaryk Fellowship of the Fulbright Commission in the Czech Republic.References[1]Imad A.Al-Sughaiyer and Ibrahim A.Al-Kharashi.Arabic Morphological Analysis Tech-niques:A Comprehensive Survey.Journal of the American Society for Information Science and Technology,55(3):189–213,2004.[2]Roy Bar-Haim,Khalil Sima’an,and Yoad Winter.Choosing an Optimal Architecture for Seg-mentation and POS-Tagging of Modern Hebrew.In Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages,pages39–46,Ann Arbor,2005.[3]Kenneth R.Beesley.Finite-State Morphological Analysis and Generation of Arabic at XeroxResearch:Status and Plans in2001.In EACL2001Workshop Proceedings on Arabic Language Processing:Status and Prospects,pages1–8,Toulouse,France,2001.[4]Kenneth R.Beesley and Lauri Karttunen.Finite State Morphology.CSLI Studies in Computa-tional Linguistics.CSLI Publications,Stanford,California,2003.[5]Adam L.Berger,Stephen Della Pietra,and Vincent J.Della Pietra.A Maximum EntropyApproach to Natural Language putational Linguistics,22(1):39–71,1996.[6]Tim Buckwalter.Buckwalter Arabic Morphological Analyzer Version1.0.LDC catalog num-ber LDC2002L49,ISBN1-58563-257-0,2002.[7]Tim Buckwalter.Buckwalter Arabic Morphological Analyzer Version2.0.LDC catalog num-ber LDC2004L02,ISBN1-58563-324-0,2004.[8]Tim Buckwalter.Issues in Arabic Orthography and Morphology Analysis.In Proceedings ofthe COLING2004Workshop on Computational Approaches to Arabic Script-based Languages, pages31–34,2004.[9]Achraf Chalabi.Sakhr Arabic Lexicon.In NEMLAR International Conference on ArabicLanguage Resources and Tools,pages21–24.ELDA,2004.[10]Mona Diab,Kadri Hacioglu,and Daniel Jurafsky.Automatic Tagging of Arabic Text:FromRaw Text to Base Phrase Chunks.In HLT-NAACL2004:Short Papers,pages149–152,2004.[11]Everhard Ditters.A Formal Grammar for the Description of Sentence Structure in ModernStandard Arabic.In EACL2001Workshop Proceedings on Arabic Language Processing:Sta-tus and Prospects,pages31–37,Toulouse,France,2001.[12]Tarek A.El-Sadany and Mohamed A.Hashish.An Arabic morphological system.IBM SystemsJournal,28(4):600–612,1989.[13]Wolfdietrich Fischer.A Grammar of Classical Arabic.Yale Language Series.Yale UniversityPress,third revised edition,2001.Translated by Jonathan Rodgers.[14]Markus Forsberg and Aarne Ranta.Functional Morphology.In Proceedings of ICFP2004,pages213–223.ACM Press,2004.[15]Nizar Habash and Owen Rambow.Arabic Tokenization,Part-of-Speech Tagging and Morpho-logical Disambiguation in One Fell Swoop.In Proceedings of the43rd Annual Meeting of the Association for Computational Linguistics ACL2005,pages573–580,Ann Arbor,2005. [16]Jan Hajiˇc.Morphological Tagging:Data vs.Dictionaries.In Proceedings of NAACL-ANLP2000,pages94–101,Seattle,2000.ACL.[17]Jan Hajiˇc and Barbora Hladk´a.Tagging Inﬂective Languages:Prediction of MorphologicalCategories for a Rich,Structured Tagset.In Proceedings of COLING-ACL1998,pages483–490,Montreal,Canada,1998.ACL.[18]Jan Hajiˇc,Otakar Smrˇz,Petr Zem´a nek,Petr Pajas,JanˇSnaidauf,Emanuel Beˇs ka,JakubKr´aˇc mar,and Kamila Hassanov´a.Prague Arabic Dependency Treebank1.0.LDC catalog number LDC2004T23,ISBN1-58563-319-4,2004.[19]Jan Hajiˇc,Otakar Smrˇz,Petr Zem´a nek,JanˇSnaidauf,and Emanuel Beˇs ka.Prague ArabicDependency Treebank:Development in Data and Tools.In NEMLAR International Conference on Arabic Language Resources and Tools,pages110–117.ELDA,2004.[20]Clive Holes.Modern Arabic:Structures,Functions,and Varieties.Georgetown Classics inArabic Language and Linguistics.Georgetown University Press,2004.[21]G´e rard Huet.The Zen Computational Linguistics Toolkit.ESSLLI Course Notes,FoLLI,theAssociation of Logic,Language and Information,2002.[22]G´e rard Huet.A Functional Toolkit for Morphological and Phonological Processing,Applica-tion to a Sanskrit Tagger.Journal of Functional Programming,2004.[23]George Anton putational Nonlinear Morphology with Emphasis on Semitic Lan-guages.Studies in Natural Language Processing.Cambridge University Press,2001. [24]Mohamed Maamouri,Ann Bies,Tim Buckwalter,and Wigdan Mekki.The Penn Arabic Tree-bank:Building a Large-Scale Annotated Arabic Corpus.In NEMLAR International Conference on Arabic Language Resources and Tools,pages102–109.ELDA,2004.[25]Rani Nelken and Stuart M.Shieber.Arabic Diacritization Using Finite-State Transducers.InProceedings of the ACL Workshop on Computational Approaches to Semitic Languages,pages 79–86,Ann Arbor,2005.[26]Eman Othman,Khaled Shaalan,and Ahmed Rafea.A Chart Parser for Analyzing Modern Stan-dard Arabic Sentence.In Proceedings of the MT Summit IX Workshop on Machine Translation for Semitic Languages:Issues and Approaches,pages37–44,2003.[27]Allan Ramsay and Hanady Mansur.Arabic morphology:a categorial approach.In EACL2001Workshop Proceedings on Arabic Language Processing:Status and Prospects,pages17–22, Toulouse,France,2001.[28]Noah A.Smith,David A.Smith,and Roy W.Tromble.Context-Based Morphological Disam-biguation with Random Fields.In Proceedings of HLT/EMNLP2005,Vancouver,2005. [29]Otakar Smrˇz.Functional Arabic Morphology.Formal System and Implementation.PhD thesis,Charles University in Prague,in prep.[30]Otakar Smrˇz and Jan Hajiˇc.The Other Arabic Treebank:Prague Dependencies and Functions.In Arabic Computational Linguistics:Current Implementations.CSLI Publications,to appear.[31]Otakar Smrˇz and Petr Pajas.MorphoTrees of Arabic and Their Annotation in the TrEd En-vironment.In NEMLAR International Conference on Arabic Language Resources and Tools, pages38–41.ELDA,2004.[32]Abdelhadi Soudi,Violetta Cavalli-Sforza,and Abderrahim Jamari.A Computational Lexeme-Based Treatment of Arabic Morphology.In EACL2001Workshop Proceedings on Arabic Language Processing:Status and Prospects,pages155–162,Toulouse,2001.[33]Richard Sproat.Morphology and Computation.ACL–MIT Press Series in Natural LanguageProcessing.MIT Press,1992.[34]Gregory T.Stump.Inﬂectional Morphology.A Theory of Paradigm Structure.CambridgeStudies in Linguistics.Cambridge University Press,2001.。