Constructive Machine Translation Evaluation

合集下载

Introduction to Artificial Intelli智慧树知到课后章节答案2023年

Introduction to Artificial Intelligence智慧树知到课后章节答案2023年下哈尔滨工程大学哈尔滨工程大学第一章测试1.All life has intelligence The following statements about intelligence arewrong（）A:All life has intelligence B:Bacteria do not have intelligence C:At present,human intelligence is the highest level of nature D:From the perspective of life, intelligence is the basic ability of life to adapt to the natural world答案:Bacteria do not have intelligence2.Which of the following techniques is unsupervised learning in artificialintelligence?（）A:Neural network B:Support vector machine C:Decision tree D:Clustering答案:Clustering3.To which period can the history of the development of artificial intelligencebe traced back?（）A:1970s B:Late 19th century C:Early 21st century D:1950s答案:Late 19th century4.Which of the following fields does not belong to the scope of artificialintelligence application?（）A:Aviation B:Medical C:Agriculture D:Finance答案:Aviation5.The first artificial neuron model in human history was the MP model,proposed by Hebb.（）A:对 B:错答案:错6.Big data will bring considerable value in government public services, medicalservices, retail, manufacturing, and personal location services. （）A:错 B:对答案:对第二章测试1.Which of the following options is not human reason:（）A:Value rationality B:Intellectual rationality C:Methodological rationalityD:Cognitive rationality答案:Intellectual rationality2.When did life begin? （）A:Between 10 billion and 4.5 billion years B:Between 13.8 billion years and10 billion years C:Between 4.5 billion and 3.5 billion years D:Before 13.8billion years答案:Between 4.5 billion and 3.5 billion years3.Which of the following statements is true regarding the philosophicalthinking about artificial intelligence?（）A:Philosophical thinking has hindered the progress of artificial intelligence.B:Philosophical thinking has contributed to the development of artificialintelligence. C:Philosophical thinking is only concerned with the ethicalimplications of artificial intelligence. D:Philosophical thinking has no impact on the development of artificial intelligence.答案:Philosophical thinking has contributed to the development ofartificial intelligence.4.What is the rational nature of artificial intelligence?（）A:The ability to communicate effectively with humans. B:The ability to feel emotions and express creativity. C:The ability to reason and make logicaldeductions. D:The ability to learn from experience and adapt to newsituations.答案:The ability to reason and make logical deductions.5.Which of the following statements is true regarding the rational nature ofartificial intelligence?（）A:The rational nature of artificial intelligence includes emotional intelligence.B:The rational nature of artificial intelligence is limited to logical reasoning.C:The rational nature of artificial intelligence is not important for itsdevelopment. D:The rational nature of artificial intelligence is only concerned with mathematical calculations.答案:The rational nature of artificial intelligence is limited to logicalreasoning.6.Connectionism believes that the basic element of human thinking is symbol,not neuron; Human's cognitive process is a self-organization process ofsymbol operation rather than weight. （）A:对 B:错答案:错第三章测试1.The brain of all organisms can be divided into three primitive parts:forebrain, midbrain and hindbrain. Specifically, the human brain is composed of brainstem, cerebellum and brain (forebrain). （）A:错 B:对答案:对2.The neural connections in the brain are chaotic. （）A:对 B:错答案:错3.The following statement about the left and right half of the brain and itsfunction is wrong （）.A:When dictating questions, the left brain is responsible for logical thinking,and the right brain is responsible for language description. B:The left brain is like a scientist, good at abstract thinking and complex calculation, but lacking rich emotion. C:The right brain is like an artist, creative in music, art andother artistic activities, and rich in emotion D:The left and right hemispheres of the brain have the same shape, but their functions are quite different. They are generally called the left brain and the right brain respectively.答案:When dictating questions, the left brain is responsible for logicalthinking, and the right brain is responsible for language description.4.What is the basic unit of the nervous system?（）A:Neuron B:Gene C:Atom D:Molecule答案:Neuron5.What is the role of the prefrontal cortex in cognitive functions?（）A:It is responsible for sensory processing. B:It is involved in emotionalprocessing. C:It is responsible for higher-level cognitive functions. D:It isinvolved in motor control.答案:It is responsible for higher-level cognitive functions.6.What is the definition of intelligence?（）A:The ability to communicate effectively. B:The ability to perform physicaltasks. C:The ability to acquire and apply knowledge and skills. D:The abilityto regulate emotions.答案:The ability to acquire and apply knowledge and skills.第四章测试1.The forward propagation neural network is based on the mathematicalmodel of neurons and is composed of neurons connected together by specific connection methods. Different artificial neural networks generally havedifferent structures, but the basis is still the mathematical model of neurons.（）A:对 B:错答案:对2.In the perceptron, the weights are adjusted by learning so that the networkcan get the desired output for any input. （）A:对 B:错答案:对3.Convolution neural network is a feedforward neural network, which hasmany advantages and has excellent performance for large image processing.Among the following options, the advantage of convolution neural network is（）.A:Implicit learning avoids explicit feature extraction B:Weight sharingC:Translation invariance D:Strong robustness答案:Implicit learning avoids explicit feature extraction;Weightsharing;Strong robustness4.In a feedforward neural network, information travels in which direction?（）A:Forward B:Both A and B C:None of the above D:Backward答案:Forward5.What is the main feature of a convolutional neural network?（）A:They are used for speech recognition. B:They are used for natural languageprocessing. C:They are used for reinforcement learning. D:They are used forimage recognition.答案:They are used for image recognition.6.Which of the following is a characteristic of deep neural networks?（）A:They require less training data than shallow neural networks. B:They havefewer hidden layers than shallow neural networks. C:They have loweraccuracy than shallow neural networks. D:They are more computationallyexpensive than shallow neural networks.答案:They are more computationally expensive than shallow neuralnetworks.第五章测试1.Machine learning refers to how the computer simulates or realizes humanlearning behavior to obtain new knowledge or skills, and reorganizes the existing knowledge structure to continuously improve its own performance.（）A:对 B:错答案:对2.The best decision sequence of Markov decision process is solved by Bellmanequation, and the value of each state is determined not only by the current state but also by the later state.（）A:对 B:错答案:对3.Alex Net's contributions to this work include: （）.A:Use GPUNVIDIAGTX580 to reduce the training time B:Use the modified linear unit (Re LU) as the nonlinear activation function C:Cover the larger pool to avoid the average effect of average pool D:Use the Dropouttechnology to selectively ignore the single neuron during training to avoid over-fitting the model答案:Use GPUNVIDIAGTX580 to reduce the training time;Use themodified linear unit (Re LU) as the nonlinear activation function;Cover the larger pool to avoid the average effect of average pool;Use theDropout technology to selectively ignore the single neuron duringtraining to avoid over-fitting the model4.In supervised learning, what is the role of the labeled data?（）A:To evaluate the model B:To train the model C:None of the above D:To test the model答案:To train the model5.In reinforcement learning, what is the goal of the agent?（）A:To identify patterns in input data B:To minimize the error between thepredicted and actual output C:To maximize the reward obtained from theenvironment D:To classify input data into different categories答案:To maximize the reward obtained from the environment6.Which of the following is a characteristic of transfer learning?（）A:It can only be used for supervised learning tasks B:It requires a largeamount of labeled data C:It involves transferring knowledge from onedomain to another D:It is only applicable to small-scale problems答案:It involves transferring knowledge from one domain to another第六章测试1.Image segmentation is the technology and process of dividing an image intoseveral specific regions with unique properties and proposing objects ofinterest. In the following statement about image segmentation algorithm, the error is （）.A:Region growth method is to complete the segmentation by calculating the mean vector of the offset. B:Watershed algorithm, MeanShift segmentation,region growth and Ostu threshold segmentation can complete imagesegmentation. C:Watershed algorithm is often used to segment the objectsconnected in the image. D:Otsu threshold segmentation, also known as themaximum between-class difference method, realizes the automatic selection of global threshold T by counting the histogram characteristics of the entire image答案:Region growth method is to complete the segmentation bycalculating the mean vector of the offset.2.Camera calibration is a key step when using machine vision to measureobjects. Its calibration accuracy will directly affect the measurementaccuracy. Among them, camera calibration generally involves the mutualconversion of object point coordinates in several coordinate systems. So,what coordinate systems do you mean by "several coordinate systems" here?（）A:Image coordinate system B:Image plane coordinate system C:Cameracoordinate system D:World coordinate system答案:Image coordinate system;Image plane coordinate system;Camera coordinate system;World coordinate systemmonly used digital image filtering methods:（）.A:bilateral filtering B:median filter C:mean filtering D:Gaussian filter答案:bilateral filtering;median filter;mean filtering;Gaussian filter4.Application areas of digital image processing include:（）A:Industrial inspection B:Biomedical Science C:Scenario simulation D:remote sensing答案:Industrial inspection;Biomedical Science5.Image segmentation is the technology and process of dividing an image intoseveral specific regions with unique properties and proposing objects ofinterest. In the following statement about image segmentation algorithm, the error is ( ).A:Otsu threshold segmentation, also known as the maximum between-class difference method, realizes the automatic selection of global threshold T by counting the histogram characteristics of the entire imageB: Watershed algorithm is often used to segment the objects connected in the image. C:Region growth method is to complete the segmentation bycalculating the mean vector of the offset. D:Watershed algorithm, MeanShift segmentation, region growth and Ostu threshold segmentation can complete image segmentation.答案:Region growth method is to complete the segmentation bycalculating the mean vector of the offset.第七章测试1.Blind search can be applied to many different search problems, but it has notbeen widely used due to its low efficiency.（）A:错 B:对答案:对2.Which of the following search methods uses a FIFO queue （）.A:width-first search B:random search C:depth-first search D:generation-test method答案:width-first search3.What causes the complexity of the semantic network （）.A:There is no recognized formal representation system B:The quantifiernetwork is inadequate C:The means of knowledge representation are diverse D:The relationship between nodes can be linear, nonlinear, or even recursive 答案:The means of knowledge representation are diverse;Therelationship between nodes can be linear, nonlinear, or even recursive4.In the knowledge graph taking Leonardo da Vinci as an example, the entity ofthe character represents a node, and the relationship between the artist and the character represents an edge. Search is the process of finding the actionsequence of an intelligent system.（）A:对 B:错答案:对5.Which of the following statements about common methods of path search iswrong（）A:When using the artificial potential field method, when there are someobstacles in any distance around the target point, it is easy to cause the path to be unreachable B:The A* algorithm occupies too much memory during the search, the search efficiency is reduced, and the optimal result cannot beguaranteed C:The artificial potential field method can quickly search for acollision-free path with strong flexibility D:A* algorithm can solve theshortest path of state space search答案:When using the artificial potential field method, when there aresome obstacles in any distance around the target point, it is easy tocause the path to be unreachable第八章测试1.The language, spoken language, written language, sign language and Pythonlanguage of human communication are all natural languages.（）A:对 B:错答案:错2.The following statement about machine translation is wrong （）.A:The analysis stage of machine translation is mainly lexical analysis andpragmatic analysis B:The essence of machine translation is the discovery and application of bilingual translation laws. C:The four stages of machinetranslation are retrieval, analysis, conversion and generation. D:At present,natural language machine translation generally takes sentences as thetranslation unit.答案:The analysis stage of machine translation is mainly lexical analysis and pragmatic analysis3.Which of the following fields does machine translation belong to? （）A:Expert system B:Machine learning C:Human sensory simulation D:Natural language system答案:Natural language system4.The following statements about language are wrong: （）。

多模态翻译理论与实践研究读书笔记

《多模态翻译理论与实践研究》读书笔记目录一、内容综述 (2)1. 研究背景与意义 (2)2. 翻译学科的发展趋势 (4)二、多模态翻译理论 (5)1. 多模态翻译的定义 (6)2. 多模态翻译的理论框架 (7)a. 结构主义视角 (8)b. 功能主义视角 (9)c. 概念式翻译学视角 (11)3. 多模态翻译的理论模型 (12)a. 以翻译为中心的模型 (13)b. 以信息为中心的模型 (14)c. 综合性模型 (16)三、多模态翻译实践 (17)1. 多模态翻译的应用领域 (18)a. 文化交流与传播 (20)b. 语言教学 (20)c. 人工智能辅助翻译 (22)2. 多模态翻译的案例分析 (23)a. 跨文化交际中的多模态翻译 (25)b. 语言教学中的多模态翻译实践 (26)c. 人工智能辅助翻译系统的设计与应用 (27)四、多模态翻译的挑战与对策 (28)1. 技术挑战 (29)a. 自然语言处理技术 (31)b. 计算机辅助翻译工具 (32)2. 理论挑战 (33)a. 翻译理论的创新与发展 (34)b. 多模态翻译的哲学思考 (36)3. 实践挑战 (37)a. 教育培训的需求分析 (38)b. 企业跨文化交流与合作 (39)五、结论 (40)1. 研究成果总结 (41)2. 研究展望与建议 (42)一、内容综述《多模态翻译理论与实践研究》是一本深入探讨多模态翻译理论和实践的学术著作。

本书通过对多模态翻译的全面分析，揭示了其在跨文化交流中的重要作用和广阔的应用前景。

在内容综述部分，作者首先回顾了多模态翻译的基本概念和理论框架，包括多模态翻译的定义、特点、分类以及多模态翻译的理论基础等。

作者详细介绍了多模态翻译在不同领域的应用实践，如翻译学术文献、广告语、影视作品等，并分析了多模态翻译所面临的挑战和问题，如跨文化交际障碍、语言结构差异等。

作者还探讨了多模态翻译与语义翻译、交际翻译等翻译理论的关系，强调了多模态翻译在促进跨文化交流、增进不同文化理解和尊重方面的重要作用。

机器翻译方法

机器翻译方法机器翻译（Machine Translation，MT）是指利用计算机技术实现自然语言之间的翻译。

随着人工智能技术的快速发展，机器翻译已经成为解决语言交流障碍的有效工具。

本文将介绍几种常见的机器翻译方法，并分析它们的优缺点。

一、基于规则的基于规则的机器翻译方法是早期机器翻译技术的主要方法之一。

它通过事先构建一系列的翻译规则，然后根据这些规则将源语言文本转换成目标语言文本。

这种方法需要大量的人工工作，主要包括：1. 构建词汇库：将源语言词汇与目标语言词汇一一对应。

2. 编写规则：根据语法规则和词汇库，编写一系列的翻译规则。

3. 设计规则匹配算法：将源语言文本与规则进行匹配，并生成目标语言文本。

优点：基于规则的机器翻译方法可以实现精确的翻译，尤其在语法规则复杂的语言对之间效果较好。

缺点：构建规则和词汇库需要耗费大量时间和人力，且对语言灵活性要求较高，无法处理多义词和歧义的情况。

二、基于统计的基于统计的机器翻译方法通过分析大规模的双语语料库，学习源语言与目标语言之间的统计规律，从而实现自动翻译。

主要步骤包括：1. 建立双语语料库：收集大规模的源语言和目标语言平行语料，如新闻报道、书籍等。

2. 分词与对齐：将源语言和目标语言文本进行分词，并进行句子级别的对齐。

3. 训练模型：利用统计算法，根据对齐的双语语料库，学习源语言和目标语言之间的翻译模型。

4. 解码翻译：根据学习到的翻译模型，将源语言文本翻译成目标语言文本。

优点：基于统计的机器翻译方法可以自动学习源语言和目标语言之间的翻译规律，无需人工构建规则和词汇库。

缺点：对于生僻词和长句等复杂情况，效果不如基于规则的机器翻译方法。

三、基于神经网络的近年来，随着深度学习的广泛应用，基于神经网络的机器翻译方法逐渐兴起。

该方法通过构建深层神经网络模型，直接将源语言文本映射到目标语言文本，实现端到端的翻译。

主要步骤包括：1. 构建编码器-解码器模型：编码器将源语言文本映射到一个语义空间，解码器将语义空间中的信息转换为目标语言文本。

自然语言处理NaturalLanguageProcessing(NLP)

与NLP相近的两个研究领域：
自然语言理解(Natural Language Understanding, NLU)：强调对语言含义和意图的深层次解释
计算语言学(Computational Linguistics, CL)：强调可计算的语言理论
NLP技术的典型应用
机器翻译自动摘要文本分类与信息过滤信息检索自动问答信息抽取与文本挖掘情感分析 ......
持。
信息抽取实例:会议报道(人民日报1998-03-09)
新华社北京３月８日电（记者李术峰）: 中国农工民主党第十二届中央常务委员会第一次会议今天在北京召开。
会议研究通过了贯彻落实“两会”精神的有关决定，审议通过了中国农工民主党中央１９９８年工作要点（草案），并任命了中央副秘书长。
农工民主党中央主席蒋正华主持了会议，他说，农工民主党有１００多名党员作为代表和委员参加了今年的“两会”，各位党员要认真履行代表和委员的职责，开好会，在１９９８年的工作中认真贯彻 “两会”精神，加强农工民主党的自身建设，推动事业进一步发展，为建设有中国特色社会主义事业作出新的贡献。
主要内容（3）
基于语料库的自然语言处理方法（经验方法）
语言模型（N元文法）分词、词性标注（序列化标注模型）句法分析（概率上下文无关模型）文本分类（朴素贝叶斯模型、最大熵模型）机器翻译 (IBM Model等) ......（基于神经网络的深度学习方法）
所需的前导知识
编译技术概率与统计
参考书籍
宗成庆，统计自然语言处理，清华大学出版社，2008 刘群等译，自然语言理解（第二版），电子工业出版社，2005 苑春法等译，统计自然语言处理基础，电子工业出版社，2005 冯志伟等译，自然语言处理综论，电子工业出版社，2005 黄昌宁等，语料库语言学，商务印书馆，2002 冯志伟，计算语言学基础，商务印书馆，2001 余士文，计算语言学概论，商务印书馆，2003 姚天顺，自然语言理解－－一种让机器懂得人类语言的研究（第

单纯形算法在统计机器翻译Re-ranking中的应用

单纯形算法在统计机器翻译Re-ranking中的应用付雷;刘群【摘要】近年来,discriminative re-ranking技术已经被应用到很多自然语言处理相关的分支中,像句法分析,词性标注,机器翻译等,并都取得了比较好的效果,在各自相应的评估标准下都有所提高.本文将以统计机器翻译为例,详细地讲解利用单纯形算法(Simplex Algorithm)对翻译结果进行re-rank的原理和过程,算法的实现和使用方法,以及re-rank实验中特征选择的方法,并给出该算法在NIST-2002(开发集)和NIST-2005(测试集)中英文机器翻译测试集合上的实验结果,在开发集和测试集上,BLEU分值分别获得了1.26%和1.16%的提高.【期刊名称】《中文信息学报》【年(卷),期】2007(021)003【总页数】6页(P28-33)【关键词】人工智能;机器翻译;discriminative re-ranking;单纯形算法;统计机器翻译【作者】付雷;刘群【作者单位】中国科学院,研究生院,北京,100049;中国科学院,计算技术研究所,多语言交互技术评测实验室,北京,100080;中国科学院,计算技术研究所,多语言交互技术评测实验室,北京,100080【正文语种】中文【中图分类】TP391所谓discriminative re－ranking就是指针对某个评估标准对机器翻译程序输出的多个结果进行重新选择，致力于从中选择出使该标准达到最优时的翻译结果。

以统计机器翻译为例，实现discriminative reranking的基本做法如下：首先由机器翻译系统对开发集和测试集中的每个句子都生成N个候选译文，称之为“N－best list”，然后根据需要选取合适的译文特征，生成所有候选译文的特征分数，接着，就可以利用某种re－ranking算法针对机器翻译相应的评估标准，如BLEU、NIST，在开发集上训练出各特征在相应评估标准达到最优值时的权重，最后，利用开发集上训练得到的各特征的权重直接从测试集的N－best list中选出每个句子的最佳译文。

基于HMM的短语翻译对抽取方法

基于HMM 的短语翻译对抽取方法∗左云存宗成庆中国科学院自动化研究所模式识别国家重点实验室北京 100080E-mail: {yczuo, cqzong}@摘要：在基于语料库的统计翻译方法中,基于短语的统计翻译与基于单个词的统计翻译相比可以更好地处理句中词语之间的关系，从而有效地提高机器翻译系统的性能。

在基于短语的统计翻译方法中,一种重要的策略是把短语翻译对作为一种知识加入到翻译系统中，因此,整个系统的性能与使用的短语翻译对的质量具有很大的关系。

本文在基于HMM 词对齐方法的基础上，提出了一种从双语语料中自动抽取短语翻译对的方法，这种方法根据词语对齐时出现的不同情况作不同的处理，提高了短语翻译对抽取的效果。

关键词：HMM ；词对齐；短语翻译对；机器翻译Phrase Translation Extraction Based on HMMZuo Yuncun, Zong ChengqingNational Laboratory of Pattern Recognition, Institute of Automation,Chinese Academy of Sciences, Beijing 100080E-mail: {yczuo, cqzong}@Abstract: In corpus-based statistical machine translation methods, phrase-based models are effective in improving translation quality as they can deal with the relationship between words in sentences better than word-based models. One approach of phrase-based translation incorporates phrase translations as knowledge sources into systems, and the systems’ performance greatly depends on the quality of phrase knowledge. In this paper, we describe a new approach of phrase translation extraction based on HMM-based word alignment method. The experiment result proved that this approach is effective in phrase translation extraction from bilingual corpus.Keywords: HMM, word alignment, phrase translation extraction, machine translation1 前言机器翻译的任务是把源语言句子(1...I s s s = )翻译成目标语言句子()。

机器学习与人工智能领域中常用的英语词汇

机器学习与人工智能领域中常用的英语词汇1.General Concepts (基础概念)•Artificial Intelligence (AI) - 人工智能1)Artificial Intelligence (AI) - 人工智能2)Machine Learning (ML) - 机器学习3)Deep Learning (DL) - 深度学习4)Neural Network - 神经网络5)Natural Language Processing (NLP) - 自然语言处理6)Computer Vision - 计算机视觉7)Robotics - 机器人技术8)Speech Recognition - 语音识别9)Expert Systems - 专家系统10)Knowledge Representation - 知识表示11)Pattern Recognition - 模式识别12)Cognitive Computing - 认知计算13)Autonomous Systems - 自主系统14)Human-Machine Interaction - 人机交互15)Intelligent Agents - 智能代理16)Machine Translation - 机器翻译17)Swarm Intelligence - 群体智能18)Genetic Algorithms - 遗传算法19)Fuzzy Logic - 模糊逻辑20)Reinforcement Learning - 强化学习•Machine Learning (ML) - 机器学习1)Machine Learning (ML) - 机器学习2)Artificial Neural Network - 人工神经网络3)Deep Learning - 深度学习4)Supervised Learning - 有监督学习5)Unsupervised Learning - 无监督学习6)Reinforcement Learning - 强化学习7)Semi-Supervised Learning - 半监督学习8)Training Data - 训练数据9)Test Data - 测试数据10)Validation Data - 验证数据11)Feature - 特征12)Label - 标签13)Model - 模型14)Algorithm - 算法15)Regression - 回归16)Classification - 分类17)Clustering - 聚类18)Dimensionality Reduction - 降维19)Overfitting - 过拟合20)Underfitting - 欠拟合•Deep Learning (DL) - 深度学习1)Deep Learning - 深度学习2)Neural Network - 神经网络3)Artificial Neural Network (ANN) - 人工神经网络4)Convolutional Neural Network (CNN) - 卷积神经网络5)Recurrent Neural Network (RNN) - 循环神经网络6)Long Short-Term Memory (LSTM) - 长短期记忆网络7)Gated Recurrent Unit (GRU) - 门控循环单元8)Autoencoder - 自编码器9)Generative Adversarial Network (GAN) - 生成对抗网络10)Transfer Learning - 迁移学习11)Pre-trained Model - 预训练模型12)Fine-tuning - 微调13)Feature Extraction - 特征提取14)Activation Function - 激活函数15)Loss Function - 损失函数16)Gradient Descent - 梯度下降17)Backpropagation - 反向传播18)Epoch - 训练周期19)Batch Size - 批量大小20)Dropout - 丢弃法•Neural Network - 神经网络1)Neural Network - 神经网络2)Artificial Neural Network (ANN) - 人工神经网络3)Deep Neural Network (DNN) - 深度神经网络4)Convolutional Neural Network (CNN) - 卷积神经网络5)Recurrent Neural Network (RNN) - 循环神经网络6)Long Short-Term Memory (LSTM) - 长短期记忆网络7)Gated Recurrent Unit (GRU) - 门控循环单元8)Feedforward Neural Network - 前馈神经网络9)Multi-layer Perceptron (MLP) - 多层感知器10)Radial Basis Function Network (RBFN) - 径向基函数网络11)Hopfield Network - 霍普菲尔德网络12)Boltzmann Machine - 玻尔兹曼机13)Autoencoder - 自编码器14)Spiking Neural Network (SNN) - 脉冲神经网络15)Self-organizing Map (SOM) - 自组织映射16)Restricted Boltzmann Machine (RBM) - 受限玻尔兹曼机17)Hebbian Learning - 海比安学习18)Competitive Learning - 竞争学习19)Neuroevolutionary - 神经进化20)Neuron - 神经元•Algorithm - 算法1)Algorithm - 算法2)Supervised Learning Algorithm - 有监督学习算法3)Unsupervised Learning Algorithm - 无监督学习算法4)Reinforcement Learning Algorithm - 强化学习算法5)Classification Algorithm - 分类算法6)Regression Algorithm - 回归算法7)Clustering Algorithm - 聚类算法8)Dimensionality Reduction Algorithm - 降维算法9)Decision Tree Algorithm - 决策树算法10)Random Forest Algorithm - 随机森林算法11)Support Vector Machine (SVM) Algorithm - 支持向量机算法12)K-Nearest Neighbors (KNN) Algorithm - K近邻算法13)Naive Bayes Algorithm - 朴素贝叶斯算法14)Gradient Descent Algorithm - 梯度下降算法15)Genetic Algorithm - 遗传算法16)Neural Network Algorithm - 神经网络算法17)Deep Learning Algorithm - 深度学习算法18)Ensemble Learning Algorithm - 集成学习算法19)Reinforcement Learning Algorithm - 强化学习算法20)Metaheuristic Algorithm - 元启发式算法•Model - 模型1)Model - 模型2)Machine Learning Model - 机器学习模型3)Artificial Intelligence Model - 人工智能模型4)Predictive Model - 预测模型5)Classification Model - 分类模型6)Regression Model - 回归模型7)Generative Model - 生成模型8)Discriminative Model - 判别模型9)Probabilistic Model - 概率模型10)Statistical Model - 统计模型11)Neural Network Model - 神经网络模型12)Deep Learning Model - 深度学习模型13)Ensemble Model - 集成模型14)Reinforcement Learning Model - 强化学习模型15)Support Vector Machine (SVM) Model - 支持向量机模型16)Decision Tree Model - 决策树模型17)Random Forest Model - 随机森林模型18)Naive Bayes Model - 朴素贝叶斯模型19)Autoencoder Model - 自编码器模型20)Convolutional Neural Network (CNN) Model - 卷积神经网络模型•Dataset - 数据集1)Dataset - 数据集2)Training Dataset - 训练数据集3)Test Dataset - 测试数据集4)Validation Dataset - 验证数据集5)Balanced Dataset - 平衡数据集6)Imbalanced Dataset - 不平衡数据集7)Synthetic Dataset - 合成数据集8)Benchmark Dataset - 基准数据集9)Open Dataset - 开放数据集10)Labeled Dataset - 标记数据集11)Unlabeled Dataset - 未标记数据集12)Semi-Supervised Dataset - 半监督数据集13)Multiclass Dataset - 多分类数据集14)Feature Set - 特征集15)Data Augmentation - 数据增强16)Data Preprocessing - 数据预处理17)Missing Data - 缺失数据18)Outlier Detection - 异常值检测19)Data Imputation - 数据插补20)Metadata - 元数据•Training - 训练1)Training - 训练2)Training Data - 训练数据3)Training Phase - 训练阶段4)Training Set - 训练集5)Training Examples - 训练样本6)Training Instance - 训练实例7)Training Algorithm - 训练算法8)Training Model - 训练模型9)Training Process - 训练过程10)Training Loss - 训练损失11)Training Epoch - 训练周期12)Training Batch - 训练批次13)Online Training - 在线训练14)Offline Training - 离线训练15)Continuous Training - 连续训练16)Transfer Learning - 迁移学习17)Fine-Tuning - 微调18)Curriculum Learning - 课程学习19)Self-Supervised Learning - 自监督学习20)Active Learning - 主动学习•Testing - 测试1)Testing - 测试2)Test Data - 测试数据3)Test Set - 测试集4)Test Examples - 测试样本5)Test Instance - 测试实例6)Test Phase - 测试阶段7)Test Accuracy - 测试准确率8)Test Loss - 测试损失9)Test Error - 测试错误10)Test Metrics - 测试指标11)Test Suite - 测试套件12)Test Case - 测试用例13)Test Coverage - 测试覆盖率14)Cross-Validation - 交叉验证15)Holdout Validation - 留出验证16)K-Fold Cross-Validation - K折交叉验证17)Stratified Cross-Validation - 分层交叉验证18)Test Driven Development (TDD) - 测试驱动开发19)A/B Testing - A/B 测试20)Model Evaluation - 模型评估•Validation - 验证1)Validation - 验证2)Validation Data - 验证数据3)Validation Set - 验证集4)Validation Examples - 验证样本5)Validation Instance - 验证实例6)Validation Phase - 验证阶段7)Validation Accuracy - 验证准确率8)Validation Loss - 验证损失9)Validation Error - 验证错误10)Validation Metrics - 验证指标11)Cross-Validation - 交叉验证12)Holdout Validation - 留出验证13)K-Fold Cross-Validation - K折交叉验证14)Stratified Cross-Validation - 分层交叉验证15)Leave-One-Out Cross-Validation - 留一法交叉验证16)Validation Curve - 验证曲线17)Hyperparameter Validation - 超参数验证18)Model Validation - 模型验证19)Early Stopping - 提前停止20)Validation Strategy - 验证策略•Supervised Learning - 有监督学习1)Supervised Learning - 有监督学习2)Label - 标签3)Feature - 特征4)Target - 目标5)Training Labels - 训练标签6)Training Features - 训练特征7)Training Targets - 训练目标8)Training Examples - 训练样本9)Training Instance - 训练实例10)Regression - 回归11)Classification - 分类12)Predictor - 预测器13)Regression Model - 回归模型14)Classifier - 分类器15)Decision Tree - 决策树16)Support Vector Machine (SVM) - 支持向量机17)Neural Network - 神经网络18)Feature Engineering - 特征工程19)Model Evaluation - 模型评估20)Overfitting - 过拟合21)Underfitting - 欠拟合22)Bias-Variance Tradeoff - 偏差-方差权衡•Unsupervised Learning - 无监督学习1)Unsupervised Learning - 无监督学习2)Clustering - 聚类3)Dimensionality Reduction - 降维4)Anomaly Detection - 异常检测5)Association Rule Learning - 关联规则学习6)Feature Extraction - 特征提取7)Feature Selection - 特征选择8)K-Means - K均值9)Hierarchical Clustering - 层次聚类10)Density-Based Clustering - 基于密度的聚类11)Principal Component Analysis (PCA) - 主成分分析12)Independent Component Analysis (ICA) - 独立成分分析13)T-distributed Stochastic Neighbor Embedding (t-SNE) - t分布随机邻居嵌入14)Gaussian Mixture Model (GMM) - 高斯混合模型15)Self-Organizing Maps (SOM) - 自组织映射16)Autoencoder - 自动编码器17)Latent Variable - 潜变量18)Data Preprocessing - 数据预处理19)Outlier Detection - 异常值检测20)Clustering Algorithm - 聚类算法•Reinforcement Learning - 强化学习1)Reinforcement Learning - 强化学习2)Agent - 代理3)Environment - 环境4)State - 状态5)Action - 动作6)Reward - 奖励7)Policy - 策略8)Value Function - 值函数9)Q-Learning - Q学习10)Deep Q-Network (DQN) - 深度Q网络11)Policy Gradient - 策略梯度12)Actor-Critic - 演员-评论家13)Exploration - 探索14)Exploitation - 开发15)Temporal Difference (TD) - 时间差分16)Markov Decision Process (MDP) - 马尔可夫决策过程17)State-Action-Reward-State-Action (SARSA) - 状态-动作-奖励-状态-动作18)Policy Iteration - 策略迭代19)Value Iteration - 值迭代20)Monte Carlo Methods - 蒙特卡洛方法•Semi-Supervised Learning - 半监督学习1)Semi-Supervised Learning - 半监督学习2)Labeled Data - 有标签数据3)Unlabeled Data - 无标签数据4)Label Propagation - 标签传播5)Self-Training - 自训练6)Co-Training - 协同训练7)Transudative Learning - 传导学习8)Inductive Learning - 归纳学习9)Manifold Regularization - 流形正则化10)Graph-based Methods - 基于图的方法11)Cluster Assumption - 聚类假设12)Low-Density Separation - 低密度分离13)Semi-Supervised Support Vector Machines (S3VM) - 半监督支持向量机14)Expectation-Maximization (EM) - 期望最大化15)Co-EM - 协同期望最大化16)Entropy-Regularized EM - 熵正则化EM17)Mean Teacher - 平均教师18)Virtual Adversarial Training - 虚拟对抗训练19)Tri-training - 三重训练20)Mix Match - 混合匹配•Feature - 特征1)Feature - 特征2)Feature Engineering - 特征工程3)Feature Extraction - 特征提取4)Feature Selection - 特征选择5)Input Features - 输入特征6)Output Features - 输出特征7)Feature Vector - 特征向量8)Feature Space - 特征空间9)Feature Representation - 特征表示10)Feature Transformation - 特征转换11)Feature Importance - 特征重要性12)Feature Scaling - 特征缩放13)Feature Normalization - 特征归一化14)Feature Encoding - 特征编码15)Feature Fusion - 特征融合16)Feature Dimensionality Reduction - 特征维度减少17)Continuous Feature - 连续特征18)Categorical Feature - 分类特征19)Nominal Feature - 名义特征20)Ordinal Feature - 有序特征•Label - 标签1)Label - 标签2)Labeling - 标注3)Ground Truth - 地面真值4)Class Label - 类别标签5)Target Variable - 目标变量6)Labeling Scheme - 标注方案7)Multi-class Labeling - 多类别标注8)Binary Labeling - 二分类标注9)Label Noise - 标签噪声10)Labeling Error - 标注错误11)Label Propagation - 标签传播12)Unlabeled Data - 无标签数据13)Labeled Data - 有标签数据14)Semi-supervised Learning - 半监督学习15)Active Learning - 主动学习16)Weakly Supervised Learning - 弱监督学习17)Noisy Label Learning - 噪声标签学习18)Self-training - 自训练19)Crowdsourcing Labeling - 众包标注20)Label Smoothing - 标签平滑化•Prediction - 预测1)Prediction - 预测2)Forecasting - 预测3)Regression - 回归4)Classification - 分类5)Time Series Prediction - 时间序列预测6)Forecast Accuracy - 预测准确性7)Predictive Modeling - 预测建模8)Predictive Analytics - 预测分析9)Forecasting Method - 预测方法10)Predictive Performance - 预测性能11)Predictive Power - 预测能力12)Prediction Error - 预测误差13)Prediction Interval - 预测区间14)Prediction Model - 预测模型15)Predictive Uncertainty - 预测不确定性16)Forecast Horizon - 预测时间跨度17)Predictive Maintenance - 预测性维护18)Predictive Policing - 预测式警务19)Predictive Healthcare - 预测性医疗20)Predictive Maintenance - 预测性维护•Classification - 分类1)Classification - 分类2)Classifier - 分类器3)Class - 类别4)Classify - 对数据进行分类5)Class Label - 类别标签6)Binary Classification - 二元分类7)Multiclass Classification - 多类分类8)Class Probability - 类别概率9)Decision Boundary - 决策边界10)Decision Tree - 决策树11)Support Vector Machine (SVM) - 支持向量机12)K-Nearest Neighbors (KNN) - K最近邻算法13)Naive Bayes - 朴素贝叶斯14)Logistic Regression - 逻辑回归15)Random Forest - 随机森林16)Neural Network - 神经网络17)SoftMax Function - SoftMax函数18)One-vs-All (One-vs-Rest) - 一对多(一对剩余)19)Ensemble Learning - 集成学习20)Confusion Matrix - 混淆矩阵•Regression - 回归1)Regression Analysis - 回归分析2)Linear Regression - 线性回归3)Multiple Regression - 多元回归4)Polynomial Regression - 多项式回归5)Logistic Regression - 逻辑回归6)Ridge Regression - 岭回归7)Lasso Regression - Lasso回归8)Elastic Net Regression - 弹性网络回归9)Regression Coefficients - 回归系数10)Residuals - 残差11)Ordinary Least Squares (OLS) - 普通最小二乘法12)Ridge Regression Coefficient - 岭回归系数13)Lasso Regression Coefficient - Lasso回归系数14)Elastic Net Regression Coefficient - 弹性网络回归系数15)Regression Line - 回归线16)Prediction Error - 预测误差17)Regression Model - 回归模型18)Nonlinear Regression - 非线性回归19)Generalized Linear Models (GLM) - 广义线性模型20)Coefficient of Determination (R-squared) - 决定系数21)F-test - F检验22)Homoscedasticity - 同方差性23)Heteroscedasticity - 异方差性24)Autocorrelation - 自相关25)Multicollinearity - 多重共线性26)Outliers - 异常值27)Cross-validation - 交叉验证28)Feature Selection - 特征选择29)Feature Engineering - 特征工程30)Regularization - 正则化2.Neural Networks and Deep Learning (神经网络与深度学习)•Convolutional Neural Network (CNN) - 卷积神经网络1)Convolutional Neural Network (CNN) - 卷积神经网络2)Convolution Layer - 卷积层3)Feature Map - 特征图4)Convolution Operation - 卷积操作5)Stride - 步幅6)Padding - 填充7)Pooling Layer - 池化层8)Max Pooling - 最大池化9)Average Pooling - 平均池化10)Fully Connected Layer - 全连接层11)Activation Function - 激活函数12)Rectified Linear Unit (ReLU) - 线性修正单元13)Dropout - 随机失活14)Batch Normalization - 批量归一化15)Transfer Learning - 迁移学习16)Fine-Tuning - 微调17)Image Classification - 图像分类18)Object Detection - 物体检测19)Semantic Segmentation - 语义分割20)Instance Segmentation - 实例分割21)Generative Adversarial Network (GAN) - 生成对抗网络22)Image Generation - 图像生成23)Style Transfer - 风格迁移24)Convolutional Autoencoder - 卷积自编码器25)Recurrent Neural Network (RNN) - 循环神经网络•Recurrent Neural Network (RNN) - 循环神经网络1)Recurrent Neural Network (RNN) - 循环神经网络2)Long Short-Term Memory (LSTM) - 长短期记忆网络3)Gated Recurrent Unit (GRU) - 门控循环单元4)Sequence Modeling - 序列建模5)Time Series Prediction - 时间序列预测6)Natural Language Processing (NLP) - 自然语言处理7)Text Generation - 文本生成8)Sentiment Analysis - 情感分析9)Named Entity Recognition (NER) - 命名实体识别10)Part-of-Speech Tagging (POS Tagging) - 词性标注11)Sequence-to-Sequence (Seq2Seq) - 序列到序列12)Attention Mechanism - 注意力机制13)Encoder-Decoder Architecture - 编码器-解码器架构14)Bidirectional RNN - 双向循环神经网络15)Teacher Forcing - 强制教师法16)Backpropagation Through Time (BPTT) - 通过时间的反向传播17)Vanishing Gradient Problem - 梯度消失问题18)Exploding Gradient Problem - 梯度爆炸问题19)Language Modeling - 语言建模20)Speech Recognition - 语音识别•Long Short-Term Memory (LSTM) - 长短期记忆网络1)Long Short-Term Memory (LSTM) - 长短期记忆网络2)Cell State - 细胞状态3)Hidden State - 隐藏状态4)Forget Gate - 遗忘门5)Input Gate - 输入门6)Output Gate - 输出门7)Peephole Connections - 窥视孔连接8)Gated Recurrent Unit (GRU) - 门控循环单元9)Vanishing Gradient Problem - 梯度消失问题10)Exploding Gradient Problem - 梯度爆炸问题11)Sequence Modeling - 序列建模12)Time Series Prediction - 时间序列预测13)Natural Language Processing (NLP) - 自然语言处理14)Text Generation - 文本生成15)Sentiment Analysis - 情感分析16)Named Entity Recognition (NER) - 命名实体识别17)Part-of-Speech Tagging (POS Tagging) - 词性标注18)Attention Mechanism - 注意力机制19)Encoder-Decoder Architecture - 编码器-解码器架构20)Bidirectional LSTM - 双向长短期记忆网络•Attention Mechanism - 注意力机制1)Attention Mechanism - 注意力机制2)Self-Attention - 自注意力3)Multi-Head Attention - 多头注意力4)Transformer - 变换器5)Query - 查询6)Key - 键7)Value - 值8)Query-Value Attention - 查询-值注意力9)Dot-Product Attention - 点积注意力10)Scaled Dot-Product Attention - 缩放点积注意力11)Additive Attention - 加性注意力12)Context Vector - 上下文向量13)Attention Score - 注意力分数14)SoftMax Function - SoftMax函数15)Attention Weight - 注意力权重16)Global Attention - 全局注意力17)Local Attention - 局部注意力18)Positional Encoding - 位置编码19)Encoder-Decoder Attention - 编码器-解码器注意力20)Cross-Modal Attention - 跨模态注意力•Generative Adversarial Network (GAN) - 生成对抗网络1)Generative Adversarial Network (GAN) - 生成对抗网络2)Generator - 生成器3)Discriminator - 判别器4)Adversarial Training - 对抗训练5)Minimax Game - 极小极大博弈6)Nash Equilibrium - 纳什均衡7)Mode Collapse - 模式崩溃8)Training Stability - 训练稳定性9)Loss Function - 损失函数10)Discriminative Loss - 判别损失11)Generative Loss - 生成损失12)Wasserstein GAN (WGAN) - Wasserstein GAN（WGAN）13)Deep Convolutional GAN (DCGAN) - 深度卷积生成对抗网络（DCGAN）14)Conditional GAN (c GAN) - 条件生成对抗网络（c GAN）15)Style GAN - 风格生成对抗网络16)Cycle GAN - 循环生成对抗网络17)Progressive Growing GAN (PGGAN) - 渐进式增长生成对抗网络（PGGAN）18)Self-Attention GAN (SAGAN) - 自注意力生成对抗网络（SAGAN）19)Big GAN - 大规模生成对抗网络20)Adversarial Examples - 对抗样本•Encoder-Decoder - 编码器-解码器1)Encoder-Decoder Architecture - 编码器-解码器架构2)Encoder - 编码器3)Decoder - 解码器4)Sequence-to-Sequence Model (Seq2Seq) - 序列到序列模型5)State Vector - 状态向量6)Context Vector - 上下文向量7)Hidden State - 隐藏状态8)Attention Mechanism - 注意力机制9)Teacher Forcing - 强制教师法10)Beam Search - 束搜索11)Recurrent Neural Network (RNN) - 循环神经网络12)Long Short-Term Memory (LSTM) - 长短期记忆网络13)Gated Recurrent Unit (GRU) - 门控循环单元14)Bidirectional Encoder - 双向编码器15)Greedy Decoding - 贪婪解码16)Masking - 遮盖17)Dropout - 随机失活18)Embedding Layer - 嵌入层19)Cross-Entropy Loss - 交叉熵损失20)Tokenization - 令牌化•Transfer Learning - 迁移学习1)Transfer Learning - 迁移学习2)Source Domain - 源领域3)Target Domain - 目标领域4)Fine-Tuning - 微调5)Domain Adaptation - 领域自适应6)Pre-Trained Model - 预训练模型7)Feature Extraction - 特征提取8)Knowledge Transfer - 知识迁移9)Unsupervised Domain Adaptation - 无监督领域自适应10)Semi-Supervised Domain Adaptation - 半监督领域自适应11)Multi-Task Learning - 多任务学习12)Data Augmentation - 数据增强13)Task Transfer - 任务迁移14)Model Agnostic Meta-Learning (MAML) - 与模型无关的元学习（MAML）15)One-Shot Learning - 单样本学习16)Zero-Shot Learning - 零样本学习17)Few-Shot Learning - 少样本学习18)Knowledge Distillation - 知识蒸馏19)Representation Learning - 表征学习20)Adversarial Transfer Learning - 对抗迁移学习•Pre-trained Models - 预训练模型1)Pre-trained Model - 预训练模型2)Transfer Learning - 迁移学习3)Fine-Tuning - 微调4)Knowledge Transfer - 知识迁移5)Domain Adaptation - 领域自适应6)Feature Extraction - 特征提取7)Representation Learning - 表征学习8)Language Model - 语言模型9)Bidirectional Encoder Representations from Transformers (BERT) - 双向编码器结构转换器10)Generative Pre-trained Transformer (GPT) - 生成式预训练转换器11)Transformer-based Models - 基于转换器的模型12)Masked Language Model (MLM) - 掩蔽语言模型13)Cloze Task - 填空任务14)Tokenization - 令牌化15)Word Embeddings - 词嵌入16)Sentence Embeddings - 句子嵌入17)Contextual Embeddings - 上下文嵌入18)Self-Supervised Learning - 自监督学习19)Large-Scale Pre-trained Models - 大规模预训练模型•Loss Function - 损失函数1)Loss Function - 损失函数2)Mean Squared Error (MSE) - 均方误差3)Mean Absolute Error (MAE) - 平均绝对误差4)Cross-Entropy Loss - 交叉熵损失5)Binary Cross-Entropy Loss - 二元交叉熵损失6)Categorical Cross-Entropy Loss - 分类交叉熵损失7)Hinge Loss - 合页损失8)Huber Loss - Huber损失9)Wasserstein Distance - Wasserstein距离10)Triplet Loss - 三元组损失11)Contrastive Loss - 对比损失12)Dice Loss - Dice损失13)Focal Loss - 焦点损失14)GAN Loss - GAN损失15)Adversarial Loss - 对抗损失16)L1 Loss - L1损失17)L2 Loss - L2损失18)Huber Loss - Huber损失19)Quantile Loss - 分位数损失•Activation Function - 激活函数1)Activation Function - 激活函数2)Sigmoid Function - Sigmoid函数3)Hyperbolic Tangent Function (Tanh) - 双曲正切函数4)Rectified Linear Unit (Re LU) - 矩形线性单元5)Parametric Re LU (P Re LU) - 参数化Re LU6)Exponential Linear Unit (ELU) - 指数线性单元7)Swish Function - Swish函数8)Softplus Function - Soft plus函数9)Softmax Function - SoftMax函数10)Hard Tanh Function - 硬双曲正切函数11)Softsign Function - Softsign函数12)GELU (Gaussian Error Linear Unit) - GELU（高斯误差线性单元）13)Mish Function - Mish函数14)CELU (Continuous Exponential Linear Unit) - CELU（连续指数线性单元）15)Bent Identity Function - 弯曲恒等函数16)Gaussian Error Linear Units (GELUs) - 高斯误差线性单元17)Adaptive Piecewise Linear (APL) - 自适应分段线性函数18)Radial Basis Function (RBF) - 径向基函数•Backpropagation - 反向传播1)Backpropagation - 反向传播2)Gradient Descent - 梯度下降3)Partial Derivative - 偏导数4)Chain Rule - 链式法则5)Forward Pass - 前向传播6)Backward Pass - 反向传播7)Computational Graph - 计算图8)Neural Network - 神经网络9)Loss Function - 损失函数10)Gradient Calculation - 梯度计算11)Weight Update - 权重更新12)Activation Function - 激活函数13)Optimizer - 优化器14)Learning Rate - 学习率15)Mini-Batch Gradient Descent - 小批量梯度下降16)Stochastic Gradient Descent (SGD) - 随机梯度下降17)Batch Gradient Descent - 批量梯度下降18)Momentum - 动量19)Adam Optimizer - Adam优化器20)Learning Rate Decay - 学习率衰减•Gradient Descent - 梯度下降1)Gradient Descent - 梯度下降2)Stochastic Gradient Descent (SGD) - 随机梯度下降3)Mini-Batch Gradient Descent - 小批量梯度下降4)Batch Gradient Descent - 批量梯度下降5)Learning Rate - 学习率6)Momentum - 动量7)Adaptive Moment Estimation (Adam) - 自适应矩估计8)RMSprop - 均方根传播9)Learning Rate Schedule - 学习率调度10)Convergence - 收敛11)Divergence - 发散12)Adagrad - 自适应学习速率方法13)Adadelta - 自适应增量学习率方法14)Adamax - 自适应矩估计的扩展版本15)Nadam - Nesterov Accelerated Adaptive Moment Estimation16)Learning Rate Decay - 学习率衰减17)Step Size - 步长18)Conjugate Gradient Descent - 共轭梯度下降19)Line Search - 线搜索20)Newton's Method - 牛顿法•Learning Rate - 学习率1)Learning Rate - 学习率2)Adaptive Learning Rate - 自适应学习率3)Learning Rate Decay - 学习率衰减4)Initial Learning Rate - 初始学习率5)Step Size - 步长6)Momentum - 动量7)Exponential Decay - 指数衰减8)Annealing - 退火9)Cyclical Learning Rate - 循环学习率10)Learning Rate Schedule - 学习率调度11)Warm-up - 预热12)Learning Rate Policy - 学习率策略13)Learning Rate Annealing - 学习率退火14)Cosine Annealing - 余弦退火15)Gradient Clipping - 梯度裁剪16)Adapting Learning Rate - 适应学习率17)Learning Rate Multiplier - 学习率倍增器18)Learning Rate Reduction - 学习率降低19)Learning Rate Update - 学习率更新20)Scheduled Learning Rate - 定期学习率•Batch Size - 批量大小1)Batch Size - 批量大小2)Mini-Batch - 小批量3)Batch Gradient Descent - 批量梯度下降4)Stochastic Gradient Descent (SGD) - 随机梯度下降5)Mini-Batch Gradient Descent - 小批量梯度下降6)Online Learning - 在线学习7)Full-Batch - 全批量8)Data Batch - 数据批次9)Training Batch - 训练批次10)Batch Normalization - 批量归一化11)Batch-wise Optimization - 批量优化12)Batch Processing - 批量处理13)Batch Sampling - 批量采样14)Adaptive Batch Size - 自适应批量大小15)Batch Splitting - 批量分割16)Dynamic Batch Size - 动态批量大小17)Fixed Batch Size - 固定批量大小18)Batch-wise Inference - 批量推理19)Batch-wise Training - 批量训练20)Batch Shuffling - 批量洗牌•Epoch - 训练周期1)Training Epoch - 训练周期2)Epoch Size - 周期大小3)Early Stopping - 提前停止4)Validation Set - 验证集5)Training Set - 训练集6)Test Set - 测试集7)Overfitting - 过拟合8)Underfitting - 欠拟合9)Model Evaluation - 模型评估10)Model Selection - 模型选择11)Hyperparameter Tuning - 超参数调优12)Cross-Validation - 交叉验证13)K-fold Cross-Validation - K折交叉验证14)Stratified Cross-Validation - 分层交叉验证15)Leave-One-Out Cross-Validation (LOOCV) - 留一法交叉验证16)Grid Search - 网格搜索17)Random Search - 随机搜索18)Model Complexity - 模型复杂度19)Learning Curve - 学习曲线20)Convergence - 收敛3.Machine Learning Techniques and Algorithms (机器学习技术与算法)•Decision Tree - 决策树1)Decision Tree - 决策树2)Node - 节点3)Root Node - 根节点4)Leaf Node - 叶节点5)Internal Node - 内部节点6)Splitting Criterion - 分裂准则7)Gini Impurity - 基尼不纯度8)Entropy - 熵9)Information Gain - 信息增益10)Gain Ratio - 增益率11)Pruning - 剪枝12)Recursive Partitioning - 递归分割13)CART (Classification and Regression Trees) - 分类回归树14)ID3 (Iterative Dichotomiser 3) - 迭代二叉树315)C4.5 (successor of ID3) - C4.5（ID3的后继者）16)C5.0 (successor of C4.5) - C5.0（C4.5的后继者）17)Split Point - 分裂点18)Decision Boundary - 决策边界19)Pruned Tree - 剪枝后的树20)Decision Tree Ensemble - 决策树集成•Random Forest - 随机森林1)Random Forest - 随机森林2)Ensemble Learning - 集成学习3)Bootstrap Sampling - 自助采样4)Bagging (Bootstrap Aggregating) - 装袋法5)Out-of-Bag (OOB) Error - 袋外误差6)Feature Subset - 特征子集7)Decision Tree - 决策树8)Base Estimator - 基础估计器9)Tree Depth - 树深度10)Randomization - 随机化11)Majority Voting - 多数投票12)Feature Importance - 特征重要性13)OOB Score - 袋外得分14)Forest Size - 森林大小15)Max Features - 最大特征数16)Min Samples Split - 最小分裂样本数17)Min Samples Leaf - 最小叶节点样本数18)Gini Impurity - 基尼不纯度19)Entropy - 熵20)Variable Importance - 变量重要性•Support Vector Machine (SVM) - 支持向量机1)Support Vector Machine (SVM) - 支持向量机2)Hyperplane - 超平面3)Kernel Trick - 核技巧4)Kernel Function - 核函数5)Margin - 间隔6)Support Vectors - 支持向量7)Decision Boundary - 决策边界8)Maximum Margin Classifier - 最大间隔分类器9)Soft Margin Classifier - 软间隔分类器10) C Parameter - C参数11)Radial Basis Function (RBF) Kernel - 径向基函数核12)Polynomial Kernel - 多项式核13)Linear Kernel - 线性核14)Quadratic Kernel - 二次核15)Gaussian Kernel - 高斯核16)Regularization - 正则化17)Dual Problem - 对偶问题18)Primal Problem - 原始问题19)Kernelized SVM - 核化支持向量机20)Multiclass SVM - 多类支持向量机•K-Nearest Neighbors (KNN) - K-最近邻1)K-Nearest Neighbors (KNN) - K-最近邻2)Nearest Neighbor - 最近邻3)Distance Metric - 距离度量4)Euclidean Distance - 欧氏距离5)Manhattan Distance - 曼哈顿距离6)Minkowski Distance - 闵可夫斯基距离7)Cosine Similarity - 余弦相似度8)K Value - K值9)Majority Voting - 多数投票10)Weighted KNN - 加权KNN11)Radius Neighbors - 半径邻居12)Ball Tree - 球树13)KD Tree - KD树14)Locality-Sensitive Hashing (LSH) - 局部敏感哈希15)Curse of Dimensionality - 维度灾难16)Class Label - 类标签17)Training Set - 训练集18)Test Set - 测试集19)Validation Set - 验证集20)Cross-Validation - 交叉验证•Naive Bayes - 朴素贝叶斯1)Naive Bayes - 朴素贝叶斯2)Bayes' Theorem - 贝叶斯定理3)Prior Probability - 先验概率4)Posterior Probability - 后验概率5)Likelihood - 似然6)Class Conditional Probability - 类条件概率7)Feature Independence Assumption - 特征独立假设8)Multinomial Naive Bayes - 多项式朴素贝叶斯9)Gaussian Naive Bayes - 高斯朴素贝叶斯10)Bernoulli Naive Bayes - 伯努利朴素贝叶斯11)Laplace Smoothing - 拉普拉斯平滑12)Add-One Smoothing - 加一平滑13)Maximum A Posteriori (MAP) - 最大后验概率14)Maximum Likelihood Estimation (MLE) - 最大似然估计15)Classification - 分类16)Feature Vectors - 特征向量17)Training Set - 训练集18)Test Set - 测试集19)Class Label - 类标签20)Confusion Matrix - 混淆矩阵•Clustering - 聚类1)Clustering - 聚类2)Centroid - 质心3)Cluster Analysis - 聚类分析4)Partitioning Clustering - 划分式聚类5)Hierarchical Clustering - 层次聚类6)Density-Based Clustering - 基于密度的聚类7)K-Means Clustering - K均值聚类8)K-Medoids Clustering - K中心点聚类9)DBSCAN (Density-Based Spatial Clustering of Applications with Noise) - 基于密度的空间聚类算法10)Agglomerative Clustering - 聚合式聚类11)Dendrogram - 系统树图12)Silhouette Score - 轮廓系数13)Elbow Method - 肘部法则14)Clustering Validation - 聚类验证15)Intra-cluster Distance - 类内距离16)Inter-cluster Distance - 类间距离17)Cluster Cohesion - 类内连贯性18)Cluster Separation - 类间分离度19)Cluster Assignment - 聚类分配20)Cluster Label - 聚类标签•K-Means - K-均值1)K-Means - K-均值2)Centroid - 质心3)Cluster - 聚类4)Cluster Center - 聚类中心5)Cluster Assignment - 聚类分配6)Cluster Analysis - 聚类分析7)K Value - K值8)Elbow Method - 肘部法则9)Inertia - 惯性10)Silhouette Score - 轮廓系数11)Convergence - 收敛12)Initialization - 初始化13)Euclidean Distance - 欧氏距离14)Manhattan Distance - 曼哈顿距离15)Distance Metric - 距离度量16)Cluster Radius - 聚类半径17)Within-Cluster Variation - 类内变异18)Cluster Quality - 聚类质量19)Clustering Algorithm - 聚类算法20)Clustering Validation - 聚类验证•Dimensionality Reduction - 降维1)Dimensionality Reduction - 降维2)Feature Extraction - 特征提取3)Feature Selection - 特征选择4)Principal Component Analysis (PCA) - 主成分分析5)Singular Value Decomposition (SVD) - 奇异值分解6)Linear Discriminant Analysis (LDA) - 线性判别分析7)t-Distributed Stochastic Neighbor Embedding (t-SNE) - t-分布随机邻域嵌入8)Autoencoder - 自编码器9)Manifold Learning - 流形学习10)Locally Linear Embedding (LLE) - 局部线性嵌入11)Isomap - 等度量映射12)Uniform Manifold Approximation and Projection (UMAP) - 均匀流形逼近与投影13)Kernel PCA - 核主成分分析14)Non-negative Matrix Factorization (NMF) - 非负矩阵分解15)Independent Component Analysis (ICA) - 独立成分分析16)Variational Autoencoder (VAE) - 变分自编码器17)Sparse Coding - 稀疏编码18)Random Projection - 随机投影19)Neighborhood Preserving Embedding (NPE) - 保持邻域结构的嵌入20)Curvilinear Component Analysis (CCA) - 曲线成分分析•Principal Component Analysis (PCA) - 主成分分析1)Principal Component Analysis (PCA) - 主成分分析2)Eigenvector - 特征向量3)Eigenvalue - 特征值4)Covariance Matrix - 协方差矩阵。

自然语言处理及计算语言学相关术语中英对译表（M～Z）

machine dictionary 机器词典machine language 机器语⾔machine learning 机器学习machine translation 机器翻译machine-readable dictionary (MRD) 机读辞典Macrolinguistics 宏观语⾔学Markov chart 马可夫图Mathematical Linguistics 数理语⾔学maximum entropy 熵M-D (modifier-head) construction 偏正结构mean length of utterance (MLU) 语句平均长度measure of information 讯习测度 [信息测度] memory based 根据记忆的mental lexicon ⼼理词汇库mental model ⼼理模型mental process ⼼理过程 [智⼒过程;智⼒处理] metalanguage 超语⾔metaphor 隐喻metaphorical extension 隐喻扩展metarule 律上律 [元规则]metathesis 语⾳易位Microlinguistics 微观语⾔学middle structure 中间式结构minimal pair 最⼩对Minimalist Program 微⾔主义MLU (mean length of utterance) 语句平均长度modal 情态词modal auxiliary 情态助动词modal logic 情态逻辑modifier 修饰语Modular Logic Grammar 模组化逻辑语法modular parsing system 模组化句法剖析系统modularity 模组性(理论)module 模组monophthong 单元⾳monotonic 单调monotonicity 单调性Montague Grammar 蒙泰究语法 [蒙塔格语法] mood 语⽓morpheme 词素morphological affix 构词词缀morphological decomposition 语素分解morphological pattern 词型morphological processing 词素处理morphological rule 构词律 [词法规则] morphological segmentation 语素切分Morphology 构词学Morphophonemics 词⾳学 [形态⾳位学;语素⾳位学] morphophonological rule 形态⾳位规则Morphosyntax 词句法Motor Theory 肌动理论movement 移位MRD (machine-readable dictionary) 机读辞典MT (machine translation) 机器翻译multilingual processing system 多语讯息处理系统multilingual translation 多语翻译multimedia 多媒体multi-media communication 多媒体通讯multiple inheritance 多重继承multistate logic 多态逻辑mutation 语⾳转换mutual exclusion 互斥mutual information 相互讯息nativist position 语法天⽣假说natural language ⾃然语⾔natural language processing (NLP) ⾃然语⾔处理natural language understanding ⾃然语⾔理解negation 否定negative sentence 否定句neologism 新词语nested structure 套结构network 路neural network 类神经路Neurolinguistics 神经语⾔学neutralization 中⽴化n-gram n-连词n-gram modeling n-连词模型NLP (natural language processing) ⾃然语⾔处理node 节点nominalization 名物化nonce 暂⽤的non-finite ⾮限定non-finite clause ⾮限定式⼦句non-monotonic reasoning ⾮单调推理normal distribution 常态分布noun 名词noun phrase 名词组NP (noun phrase) completeness 名词组完全性object 宾语{语⾔学}/物件{资讯科学}object oriented programming 物件导向程式设计 [⾯向对向的程序设计] official language 官⽅语⾔one-place predicate ⼀元述语on-line dictionary 线上查询词典 [联机词点]onomatopoeia 拟声词onset 节⾸⾳ontogeny 个体发⽣Ontology 本体论open set 开放集operand 运算元 [操作对象]optimization 化 [化]overgeneralization 过度概化overgeneration 过度衍⽣paradigmatic relation 聚合关系paralanguage 附语⾔parallel construction 并列结构Parallel Corpus 平⾏语料库parallel distributed processing (PDP) 平⾏分布处理paraphrase 转述 [释意;意译;同意互训]parole ⾔语parser 剖析器 [句法剖析程序]parsing 剖析part of speech (POS) 词类particle 语助词PART-OF relation PART-OF 关系part-of-speech tagging 词类标注pattern recognition 型样识别P-C (predicate-complement) insertion 述补中插PDP (parallel distributed processing) 平⾏分布处理perception 知觉perceptron 感觉器 [感知器]perceptual strategy 感知策略performative ⾏为句periphrasis ⽤独⽴词表达perlocutionary 语效性的permutation 移位Petri Net Grammar Petri 语法philology 语⽂学phone 语⾳phoneme ⾳素phonemic analysis 因素分析phonemic stratum ⾳素层Phonetics 语⾳学phonogram ⾳标Phonology 声韵学 [⾳位学;⼴义语⾳学] Phonotactics ⾳位排列理论phrasal verb 词组动词 [短语动词]phrase 词组 [短语]phrase marker 词组标记 [短语标记]pitch ⾳调pitch contour 调形变化Pivot Grammar 枢轴语法pivotal construction 承轴结构plausibility function 可能性函数PM (phrase marker) 词组标记 [短语标记] polysemy 多义性POS-tagging 词类标记postposition ⽅位词PP (preposition phrase) attachment 介词依附Pragmatics 语⽤学Precedence Grammar 优先顺序语法precision 精确度predicate 述词predicate calculus 述词计算predicate logic 述词逻辑 [谓词逻辑]predicate-argument structure 述词论元结构prefix 前缀premodification 前置修饰preposition 介词Prescriptive Linguistics 规定语⾔学 [规范语⾔学] presentative sentence 引介句presupposition 前提Principle of Compositionality 语意合成性原理privative ⼆元对⽴的probabilistic parser 概率句法剖析程式problem solving 解决问题program 程式programming language 程式设计语⾔ [程序设计语⾔] proofreading system 校对系统proper name 专有名词prosody 节律prototype 原型pseudo-cleft sentence 准分裂句Psycholinguistics ⼼理语⾔学punctuation 标点符号pushdown automata 下推⾃动机pushdown transducer 下推转换器qualification 后置修饰quantification 量化quantifier 范域词Quantitative Linguistics 计量语⾔学question answering system 问答系统queue 伫列radical 字根 [词⼲;词根;部⾸;偏旁]radix of tuple 元组数基random access 随机存取rationalism 理性论rationalist (position) 理性论⽴场 [唯理论观点]reading laboratory 阅读实验室real time 即时real time control 即时控制 [实时控制]recursive transition network 递回转移路reduplication 重迭词 [重复]reference 指涉referent 指称对象referential indices 指标referring expression 指涉词 [指⽰短语]register 暂存器 [寄存器]{资讯科学}/调⾼{语⾳学}/语⾔的场合层级{社会语⾔学} regular language 正规语⾔ [正则语⾔]relational database 关联式资料库 [关系数据库]relative clause 关系⼦句relaxation method 松弛法relevance 相关性Restricted Logic Grammar 受限逻辑语法resumptive pronouns 复指代词retroactive inhibition 逆抑制rewriting rule 重写规则rheme 述位rhetorical structure 修辞结构rhetorics 修辞学robust 强健性robust processing 强健性处理robustness 强健性schema 基朴school grammar 教学语法scope 范域 [作⽤域;范围]script 脚本search mechanism 检索机制search space 检索空间searching route 检索路径 [搜索路径]second order predicate ⼆阶述词segmentation 分词segmentation marker 分段标志selectional restriction 选择限制semantic field 语意场semantic frame 语意架构semantic network 语意路semantic representation 语意表征 [语义表⽰]semantic representation language 语意表征语⾔semantic restriction 语意限制semantic structure 语意结构Semantics 语意学sememe 意素Semiotics 符号学sender 发送者sensorimotor stage 感觉运动期sensory information 感官讯息 [感觉信息]sentence 句⼦sentence generator 句⼦产⽣器 [句⼦⽣成程序]sentence pattern 句型separation of homonyms 同⾳词区分sequence 序列serial order learning 顺序学习serial verb construction 连动结构set oriented semantic network 集合导向型语意路 [⾯向集合型语意路] SGML (Standard Generalized Markup Language) 结构化通⽤标记语⾔shift-reduce parsing 替换简化式剖析short term memory 短程记忆sign 信号signal processing technology 信号处理技术simple word 单纯词situation 情境Situation Semantics 情境语意学situational type 情境类型social context 社会环境sociolinguistics 社会语⾔学software engineering 软体⼯程 [软件⼯程]sort 排序speaker-independent speech recognition ⾮特定语者语⾳识别spectrum 频谱speech ⼝语speech act assignment ⾔语⾏为指定speech continuum ⾔语连续体speech disorder 语⾔失序 [⾔语缺失]speech recognition 语⾳辨识speech retrieval 语⾳检索speech situation ⾔谈情境 [⾔语情境]speech synthesis 语⾳合成speech translation system 语⾳翻译系统speech understanding system 语⾳理解系统spreading activation model 扩散激发模型standard deviation 标准差Standard Generalized Markup Language 标准通⽤标⽰语⾔start-bound complement 接头词state of affairs algebra 事态代数state transition diagram 状态转移图statement kernel 句核static attribute list 静态属性表statistical analysis 统计分析Statistical Linguistics 统计语⾔学statistical significance 统计意义stem 词⼲stimulus-response theory 刺激反应理论stochastic approach to parsing 概率式句法剖析 [句法剖析的随机⽅法] stop 爆破⾳Stratificational Grammar 阶层语法 [层级语法]string 字串[串；字符串]string manipulation language 字串操作语⾔string matching 字串匹配 [字符串]structural ambiguity 结构歧义Structural Linguistics 结构语⾔学structural relation 结构关系structural transfer 结构转换structuralism 结构主义structure 结构structure sharing representation 结构共享表征subcategorization 次类划分 [下位范畴化]subjunctive 假设的sublanguage ⼦语⾔subordinate 从属关系subordinate clause 从属⼦句 [从句;⼦句]subordination 从属substitution rule 代换规则 [置换规则]substrate 底层语⾔suffix 后缀superordinate 上位的superstratum 上层语⾔suppletion 异型[不规则词型变化]suprasegmental 超⾳段的syllabification ⾳节划分syllable ⾳节syllable structure constraint ⾳节结构限制symbolization and verbalization 符号化与字句化synchronic 同步的synonym 同义词syntactic category 句法类别syntactic constituent 句法成分syntactic rule 语法规律 [句法规则] Syntactic Semantics 句法语意学syntagm 句段syntagmatic 组合关系 [结构段的;组合的] Syntax 句法Systemic Grammar 系统语法tag 标记target language ⽬的语⾔ [⽬标语⾔]task sharing 课题分享 [任务共享] tautology 套套逻辑 [恒真式;重⾔式;同义反复] taxonomical hierarchy 分类阶层 [分类层次] telescopic compound 套装合并template 模板temporal inference 循序推理 [时序推理] temporal logic 时间逻辑 [时序逻辑] temporal marker 时貌标记tense 时态terminology 术语text ⽂本text analyzing ⽂本分析text coherence ⽂本⼀致性text generation ⽂本⽣成 [篇章⽣成]Text Linguistics ⽂本语⾔学text planning ⽂本规划text proofreading ⽂本校对text retrieval ⽂本检索text structure ⽂本结构 [篇章结构]text summarization ⽂本⾃动摘要 [篇章摘要] text understanding ⽂本理解text-to-speech ⽂本转语⾳thematic role 题旨⾓⾊thematic structure 题旨结构theorem 定理thesaurus 同义词辞典theta role 题旨⾓⾊theta-grid 题旨格token 实类 [标记项]tone ⾳调tone language ⾳调语⾔tone sandhi 连调变换top-down 由上⽽下 [⾃顶向下]topic 主题topicalization 主题化 [话题化]trace 痕迹Trace Theory 痕迹理论training 训练transaction 异动 [处理单位]transcription 转写 [抄写;速记翻译]transducer 转换器transfer 转移transfer approach 转换⽅法transfer framework 转换框架transformation 变形 [转换]Transformational Grammar 变形语法 [转换语法] transitional state term set 转移状态项集合transitivity 及物性translation 翻译translation equivalence 翻译等值性translation memory 翻译记忆transparency 透明性tree 树状结构 [树]Tree Adjoining Grammar 树形加接语法 [树连接语法] treebank 树图资料库[语法关系树库]trigram 三连词t-score t-数turing machine 杜林机 [图灵机]turing test 杜林测试 [图灵试验]type 类型type/token node 标记类型/实类节点type-feature structure 类型特征结构typology 类型学ultimate constituent 终端成分unbounded dependency ⽆界限依存underlying form 基底型式underlying structure 基底结构unification 连并 [合⼀]Unification-based Grammar 连并为本的语法 [基于合⼀的语法] Universal Grammar 普遍性语法universal instantiation 普遍例式universal quantifier 全称范域词unknown word 未知词 [未定义词]unrestricted grammar ⾮限制型语法usage flag 使⽤旗标user interface 使⽤者界⾯ [⽤户界⾯]Valence Grammar 结合价语法Valence Theory 结合价理论valency 结合价variance 变异数 [⽅差]verb 动词verb phrase 动词组 [动词短语]verb resultative compound 动补复合词verbal association 词语联想verbal phrase 动词组verbal production ⾔语⽣成vernacular 本地话V-O construction (verb-object) 动宾结构vocabulary 字汇vocabulary entry 词条vocal track 声道vocative 呼格voice recognition 声⾳辨识 [语⾳识别]vowel 母⾳vowel harmony 母⾳和谐 [元⾳和谐]waveform 波形weak verb 弱化动词Whorfian hypothesis Whorfian 假说word 词word frequency 词频word frequency distribution 词频分布word order 词序word segmentation 分词word segmentation standard for Chinese 中⽂分词规范word segmentation unit 分词单位 [切词单位]word set 词集working memory ⼯作记忆 [⼯作存储区]world knowledge 世界知识writing system 书写系统X-Bar Theory X标杠理论 ["x"阶理论]Zipf's Law 利夫规律 [齐普夫定律]。

人工智能词汇

常用英语词汇 -andrew Ng课程average firing rate均匀激活率intensity强度average sum-of-squares error均方差Regression回归backpropagation后向流传Loss function损失函数basis 基non-convex非凸函数basis feature vectors特点基向量neural network神经网络batch gradient ascent批量梯度上涨法supervised learning监察学习Bayesian regularization method贝叶斯规则化方法regression problem回归问题办理的是连续的问题Bernoulli random variable伯努利随机变量classification problem分类问题bias term偏置项discreet value失散值binary classfication二元分类support vector machines支持向量机class labels种类标记learning theory学习理论concatenation级联learning algorithms学习算法conjugate gradient共轭梯度unsupervised learning无监察学习contiguous groups联通地区gradient descent梯度降落convex optimization software凸优化软件linear regression线性回归convolution卷积Neural Network神经网络cost function代价函数gradient descent梯度降落covariance matrix协方差矩阵normal equations DC component直流重量linear algebra线性代数decorrelation去有关superscript上标degeneracy退化exponentiation指数demensionality reduction降维training set训练会合derivative导函数training example训练样本diagonal对角线hypothesis假定，用来表示学习算法的输出diffusion of gradients梯度的弥散LMS algorithm “least mean squares最小二乘法算eigenvalue特点值法eigenvector特点向量batch gradient descent批量梯度降落error term残差constantly gradient descent随机梯度降落feature matrix特点矩阵iterative algorithm迭代算法feature standardization特点标准化partial derivative偏导数feedforward architectures前馈构造算法contour等高线feedforward neural network前馈神经网络quadratic function二元函数feedforward pass前馈传导locally weighted regression局部加权回归fine-tuned微调underfitting欠拟合first-order feature一阶特点overfitting过拟合forward pass前向传导non-parametric learning algorithms无参数学习算forward propagation前向流传法Gaussian prior高斯先验概率parametric learning algorithm参数学习算法generative model生成模型activation激活值gradient descent梯度降落activation function激活函数Greedy layer-wise training逐层贪心训练方法additive noise加性噪声grouping matrix分组矩阵autoencoder自编码器Hadamard product阿达马乘积Autoencoders自编码算法Hessian matrix Hessian矩阵hidden layer隐含层hidden units隐蔽神经元Hierarchical grouping层次型分组higher-order features更高阶特点highly non-convex optimization problem高度非凸的优化问题histogram直方图hyperbolic tangent双曲正切函数hypothesis估值，假定identity activation function恒等激励函数IID 独立同散布illumination照明inactive克制independent component analysis独立成份剖析input domains输入域input layer输入层intensity亮度/灰度intercept term截距KL divergence相对熵KL divergence KL分别度k-Means K-均值learning rate学习速率least squares最小二乘法linear correspondence线性响应linear superposition线性叠加line-search algorithm线搜寻算法local mean subtraction局部均值消减local optima局部最优解logistic regression逻辑回归loss function损失函数low-pass filtering低通滤波magnitude幅值MAP 极大后验预计maximum likelihood estimation极大似然预计mean 均匀值MFCC Mel 倒频系数multi-class classification多元分类neural networks神经网络neuron 神经元Newton’s method牛顿法non-convex function非凸函数non-linear feature非线性特点norm 范式norm bounded有界范数norm constrained范数拘束normalization归一化numerical roundoff errors数值舍入偏差numerically checking数值查验numerically reliable数值计算上稳固object detection物体检测objective function目标函数off-by-one error缺位错误orthogonalization正交化output layer输出层overall cost function整体代价函数over-complete basis超齐备基over-fitting过拟合parts of objects目标的零件part-whole decompostion部分-整体分解PCA 主元剖析penalty term处罚因子per-example mean subtraction逐样本均值消减pooling池化pretrain预训练principal components analysis主成份剖析quadratic constraints二次拘束RBMs 受限 Boltzman 机reconstruction based models鉴于重构的模型reconstruction cost重修代价reconstruction term重构项redundant冗余reflection matrix反射矩阵regularization正则化regularization term正则化项rescaling缩放robust 鲁棒性run 行程second-order feature二阶特点sigmoid activation function S型激励函数significant digits有效数字singular value奇怪值singular vector奇怪向量smoothed L1 penalty光滑的L1 范数处罚Smoothed topographic L1 sparsity penalty光滑地形L1 稀少处罚函数smoothing光滑Softmax Regresson Softmax回归sorted in decreasing order降序摆列source features源特点Adversarial Networks抗衡网络sparse autoencoder消减归一化Affine Layer仿射层Sparsity稀少性Affinity matrix亲和矩阵sparsity parameter稀少性参数Agent 代理 /智能体sparsity penalty稀少处罚Algorithm 算法square function平方函数Alpha- beta pruningα - β剪枝squared-error方差Anomaly detection异样检测stationary安稳性（不变性）Approximation近似stationary stochastic process安稳随机过程Area Under ROC Curve／ AUC Roc 曲线下边积step-size步长值Artificial General Intelligence/AGI通用人工智supervised learning监察学习能symmetric positive semi-definite matrix Artificial Intelligence/AI人工智能对称半正定矩阵Association analysis关系剖析symmetry breaking对称无效Attention mechanism注意力体制tanh function双曲正切函数Attribute conditional independence assumptionthe average activation均匀活跃度属性条件独立性假定the derivative checking method梯度考证方法Attribute space属性空间the empirical distribution经验散布函数Attribute value属性值the energy function能量函数Autoencoder自编码器the Lagrange dual拉格朗日对偶函数Automatic speech recognition自动语音辨别the log likelihood对数似然函数Automatic summarization自动纲要the pixel intensity value像素灰度值Average gradient均匀梯度the rate of convergence收敛速度Average-Pooling均匀池化topographic cost term拓扑代价项Backpropagation Through Time经过时间的反向流传topographic ordered拓扑次序Backpropagation/BP反向流传transformation变换Base learner基学习器translation invariant平移不变性Base learning algorithm基学习算法trivial answer平庸解Batch Normalization/BN批量归一化under-complete basis不齐备基Bayes decision rule贝叶斯判断准则unrolling组合扩展Bayes Model Averaging／ BMA 贝叶斯模型均匀unsupervised learning无监察学习Bayes optimal classifier贝叶斯最优分类器variance 方差Bayesian decision theory贝叶斯决议论vecotrized implementation向量化实现Bayesian network贝叶斯网络vectorization矢量化Between-class scatter matrix类间散度矩阵visual cortex视觉皮层Bias 偏置 /偏差weight decay权重衰减Bias-variance decomposition偏差 - 方差分解weighted average加权均匀值Bias-Variance Dilemma偏差–方差窘境whitening白化Bi-directional Long-Short Term Memory/Bi-LSTMzero-mean均值为零双向长短期记忆Accumulated error backpropagation积累偏差逆传Binary classification二分类播Binomial test二项查验Activation Function激活函数Bi-partition二分法Adaptive Resonance Theory/ART自适应谐振理论Boltzmann machine玻尔兹曼机Addictive model加性学习Bootstrap sampling自助采样法／可重复采样Bootstrapping自助法Break-Event Point／ BEP 均衡点Calibration校准Cascade-Correlation级联有关Categorical attribute失散属性Class-conditional probability类条件概率Classification and regression tree/CART分类与回归树Classifier分类器Class-imbalance类型不均衡Closed -form闭式Cluster簇/ 类/ 集群Cluster analysis聚类剖析Clustering聚类Clustering ensemble聚类集成Co-adapting共适应Coding matrix编码矩阵COLT 国际学习理论会议Committee-based learning鉴于委员会的学习Competitive learning竞争型学习Component learner组件学习器Comprehensibility可解说性Computation Cost计算成本Computational Linguistics计算语言学Computer vision计算机视觉Concept drift观点漂移Concept Learning System /CLS观点学习系统Conditional entropy条件熵Conditional mutual information条件互信息Conditional Probability Table／ CPT 条件概率表Conditional random field/CRF条件随机场Conditional risk条件风险Confidence置信度Confusion matrix混杂矩阵Connection weight连结权Connectionism 连结主义Consistency一致性／相合性Contingency table列联表Continuous attribute连续属性Convergence收敛Conversational agent会话智能体Convex quadratic programming凸二次规划Convexity凸性Convolutional neural network/CNN卷积神经网络Co-occurrence同现Correlation coefficient有关系数Cosine similarity余弦相像度Cost curve成本曲线Cost Function成本函数Cost matrix成本矩阵Cost-sensitive成本敏感Cross entropy交错熵Cross validation交错考证Crowdsourcing众包Curse of dimensionality维数灾害Cut point截断点Cutting plane algorithm割平面法Data mining数据发掘Data set数据集Decision Boundary决议界限Decision stump决议树桩Decision tree决议树／判断树Deduction演绎Deep Belief Network深度信念网络Deep Convolutional Generative Adversarial NetworkDCGAN深度卷积生成抗衡网络Deep learning深度学习Deep neural network/DNN深度神经网络Deep Q-Learning深度Q 学习Deep Q-Network深度Q 网络Density estimation密度预计Density-based clustering密度聚类Differentiable neural computer可微分神经计算机Dimensionality reduction algorithm降维算法Directed edge有向边Disagreement measure不合胸怀Discriminative model鉴别模型Discriminator鉴别器Distance measure距离胸怀Distance metric learning距离胸怀学习Distribution散布Divergence散度Diversity measure多样性胸怀／差别性胸怀Domain adaption领域自适应Downsampling下采样D-separation（ Directed separation）有向分别Dual problem对偶问题Dummy node 哑结点General Problem Solving通用问题求解Dynamic Fusion 动向交融Generalization泛化Dynamic programming动向规划Generalization error泛化偏差Eigenvalue decomposition特点值分解Generalization error bound泛化偏差上界Embedding 嵌入Generalized Lagrange function广义拉格朗日函数Emotional analysis情绪剖析Generalized linear model广义线性模型Empirical conditional entropy经验条件熵Generalized Rayleigh quotient广义瑞利商Empirical entropy经验熵Generative Adversarial Networks/GAN生成抗衡网Empirical error经验偏差络Empirical risk经验风险Generative Model生成模型End-to-End 端到端Generator生成器Energy-based model鉴于能量的模型Genetic Algorithm/GA遗传算法Ensemble learning集成学习Gibbs sampling吉布斯采样Ensemble pruning集成修剪Gini index基尼指数Error Correcting Output Codes／ ECOC纠错输出码Global minimum全局最小Error rate错误率Global Optimization全局优化Error-ambiguity decomposition偏差 - 分歧分解Gradient boosting梯度提高Euclidean distance欧氏距离Gradient Descent梯度降落Evolutionary computation演化计算Graph theory图论Expectation-Maximization希望最大化Ground-truth实情／真切Expected loss希望损失Hard margin硬间隔Exploding Gradient Problem梯度爆炸问题Hard voting硬投票Exponential loss function指数损失函数Harmonic mean 调解均匀Extreme Learning Machine/ELM超限学习机Hesse matrix海塞矩阵Factorization因子分解Hidden dynamic model隐动向模型False negative假负类Hidden layer隐蔽层False positive假正类Hidden Markov Model/HMM 隐马尔可夫模型False Positive Rate/FPR假正例率Hierarchical clustering层次聚类Feature engineering特点工程Hilbert space希尔伯特空间Feature selection特点选择Hinge loss function合页损失函数Feature vector特点向量Hold-out 留出法Featured Learning特点学习Homogeneous 同质Feedforward Neural Networks/FNN前馈神经网络Hybrid computing混杂计算Fine-tuning微调Hyperparameter超参数Flipping output翻转法Hypothesis假定Fluctuation震荡Hypothesis test假定考证Forward stagewise algorithm前向分步算法ICML 国际机器学习会议Frequentist频次主义学派Improved iterative scaling/IIS改良的迭代尺度法Full-rank matrix满秩矩阵Incremental learning增量学习Functional neuron功能神经元Independent and identically distributed/独Gain ratio增益率立同散布Game theory博弈论Independent Component Analysis/ICA独立成分剖析Gaussian kernel function高斯核函数Indicator function指示函数Gaussian Mixture Model高斯混杂模型Individual learner个体学习器Induction归纳Inductive bias归纳偏好Inductive learning归纳学习Inductive Logic Programming／ ILP归纳逻辑程序设计Information entropy信息熵Information gain信息增益Input layer输入层Insensitive loss不敏感损失Inter-cluster similarity簇间相像度International Conference for Machine Learning/ICML国际机器学习大会Intra-cluster similarity簇内相像度Intrinsic value固有值Isometric Mapping/Isomap等胸怀映照Isotonic regression平分回归Iterative Dichotomiser迭代二分器Kernel method核方法Kernel trick核技巧Kernelized Linear Discriminant Analysis／KLDA核线性鉴别剖析K-fold cross validation k折交错考证／k 倍交错考证K-Means Clustering K–均值聚类K-Nearest Neighbours Algorithm/KNN K近邻算法Knowledge base 知识库Knowledge Representation知识表征Label space标记空间Lagrange duality拉格朗日对偶性Lagrange multiplier拉格朗日乘子Laplace smoothing拉普拉斯光滑Laplacian correction拉普拉斯修正Latent Dirichlet Allocation隐狄利克雷散布Latent semantic analysis潜伏语义剖析Latent variable隐变量Lazy learning懒散学习Learner学习器Learning by analogy类比学习Learning rate学习率Learning Vector Quantization/LVQ学习向量量化Least squares regression tree最小二乘回归树Leave-One-Out/LOO留一法linear chain conditional random field线性链条件随机场Linear Discriminant Analysis／ LDA 线性鉴别剖析Linear model线性模型Linear Regression线性回归Link function联系函数Local Markov property局部马尔可夫性Local minimum局部最小Log likelihood对数似然Log odds／ logit对数几率Logistic Regression Logistic回归Log-likelihood对数似然Log-linear regression对数线性回归Long-Short Term Memory/LSTM 长短期记忆Loss function损失函数Machine translation/MT机器翻译Macron-P宏查准率Macron-R宏查全率Majority voting绝对多半投票法Manifold assumption流形假定Manifold learning流形学习Margin theory间隔理论Marginal distribution边沿散布Marginal independence边沿独立性Marginalization边沿化Markov Chain Monte Carlo/MCMC马尔可夫链蒙特卡罗方法Markov Random Field马尔可夫随机场Maximal clique最大团Maximum Likelihood Estimation/MLE极大似然预计／极大似然法Maximum margin最大间隔Maximum weighted spanning tree最大带权生成树Max-Pooling 最大池化Mean squared error均方偏差Meta-learner元学习器Metric learning胸怀学习Micro-P微查准率Micro-R微查全率Minimal Description Length/MDL最小描绘长度Minimax game极小极大博弈Misclassification cost误分类成本Mixture of experts混杂专家Momentum 动量Moral graph道德图／正直图Multi-class classification多分类Multi-document summarization多文档纲要One shot learning一次性学习Multi-layer feedforward neural networks One-Dependent Estimator／ ODE 独依靠预计多层前馈神经网络On-Policy在策略Multilayer Perceptron/MLP多层感知器Ordinal attribute有序属性Multimodal learning多模态学习Out-of-bag estimate包外预计Multiple Dimensional Scaling多维缩放Output layer输出层Multiple linear regression多元线性回归Output smearing输出调制法Multi-response Linear Regression／ MLR Overfitting过拟合／过配多响应线性回归Oversampling 过采样Mutual information互信息Paired t-test成对 t查验Naive bayes 朴实贝叶斯Pairwise 成对型Naive Bayes Classifier朴实贝叶斯分类器Pairwise Markov property成对马尔可夫性Named entity recognition命名实体辨别Parameter参数Nash equilibrium纳什均衡Parameter estimation参数预计Natural language generation/NLG自然语言生成Parameter tuning调参Natural language processing自然语言办理Parse tree分析树Negative class负类Particle Swarm Optimization/PSO粒子群优化算法Negative correlation负有关法Part-of-speech tagging词性标明Negative Log Likelihood负对数似然Perceptron感知机Neighbourhood Component Analysis/NCA Performance measure性能胸怀近邻成分剖析Plug and Play Generative Network即插即用生成网Neural Machine Translation神经机器翻译络Neural Turing Machine神经图灵机Plurality voting相对多半投票法Newton method牛顿法Polarity detection极性检测NIPS 国际神经信息办理系统会议Polynomial kernel function多项式核函数No Free Lunch Theorem／ NFL 没有免费的午饭定理Pooling池化Noise-contrastive estimation噪音对照预计Positive class正类Nominal attribute列名属性Positive definite matrix正定矩阵Non-convex optimization非凸优化Post-hoc test后续查验Nonlinear model非线性模型Post-pruning后剪枝Non-metric distance非胸怀距离potential function势函数Non-negative matrix factorization非负矩阵分解Precision查准率／正确率Non-ordinal attribute无序属性Prepruning 预剪枝Non-Saturating Game非饱和博弈Principal component analysis/PCA主成分剖析Norm 范数Principle of multiple explanations多释原则Normalization归一化Prior 先验Nuclear norm核范数Probability Graphical Model概率图模型Numerical attribute数值属性Proximal Gradient Descent/PGD近端梯度降落Letter O Pruning剪枝Objective function目标函数Pseudo-label伪标记Oblique decision tree斜决议树Quantized Neural Network量子化神经网络Occam’s razor奥卡姆剃刀Quantum computer 量子计算机Odds 几率Quantum Computing量子计算Off-Policy离策略Quasi Newton method拟牛顿法Radial Basis Function／ RBF 径向基函数Random Forest Algorithm随机丛林算法Random walk随机闲步Recall 查全率／召回率Receiver Operating Characteristic/ROC受试者工作特点Rectified Linear Unit/ReLU线性修正单元Recurrent Neural Network循环神经网络Recursive neural network递归神经网络Reference model 参照模型Regression回归Regularization正则化Reinforcement learning/RL加强学习Representation learning表征学习Representer theorem表示定理reproducing kernel Hilbert space/RKHS重生核希尔伯特空间Re-sampling重采样法Rescaling再缩放Residual Mapping残差映照Residual Network残差网络Restricted Boltzmann Machine/RBM受限玻尔兹曼机Restricted Isometry Property/RIP限制等距性Re-weighting重赋权法Robustness稳重性 / 鲁棒性Root node根结点Rule Engine规则引擎Rule learning规则学习Saddle point鞍点Sample space样本空间Sampling采样Score function评分函数Self-Driving自动驾驶Self-Organizing Map／ SOM自组织映照Semi-naive Bayes classifiers半朴实贝叶斯分类器Semi-Supervised Learning半监察学习semi-Supervised Support Vector Machine半监察支持向量机Sentiment analysis感情剖析Separating hyperplane分别超平面Sigmoid function Sigmoid函数Similarity measure相像度胸怀Simulated annealing模拟退火Simultaneous localization and mapping同步定位与地图建立Singular Value Decomposition奇怪值分解Slack variables废弛变量Smoothing光滑Soft margin软间隔Soft margin maximization软间隔最大化Soft voting软投票Sparse representation稀少表征Sparsity稀少性Specialization特化Spectral Clustering谱聚类Speech Recognition语音辨别Splitting variable切分变量Squashing function挤压函数Stability-plasticity dilemma可塑性 - 稳固性窘境Statistical learning统计学习Status feature function状态特点函Stochastic gradient descent随机梯度降落Stratified sampling分层采样Structural risk构造风险Structural risk minimization/SRM构造风险最小化Subspace子空间Supervised learning监察学习／有导师学习support vector expansion支持向量展式Support Vector Machine/SVM支持向量机Surrogat loss代替损失Surrogate function代替函数Symbolic learning符号学习Symbolism符号主义Synset同义词集T-Distribution Stochastic Neighbour Embeddingt-SNE T–散布随机近邻嵌入Tensor 张量Tensor Processing Units/TPU张量办理单元The least square method最小二乘法Threshold阈值Threshold logic unit阈值逻辑单元Threshold-moving阈值挪动Time Step时间步骤Tokenization标记化Training error训练偏差Training instance训练示例／训练例Transductive learning直推学习Transfer learning迁徙学习Treebank树库algebra线性代数Tria-by-error试错法asymptotically无症状的True negative真负类appropriate适合的True positive真切类bias 偏差True Positive Rate/TPR真切例率brevity简洁，简洁；短暂Turing Machine图灵机[800 ] broader宽泛Twice-learning二次学习briefly简洁的Underfitting欠拟合／欠配batch 批量Undersampling欠采样convergence收敛，集中到一点Understandability可理解性convex凸的Unequal cost非均等代价contours轮廓Unit-step function单位阶跃函数constraint拘束Univariate decision tree单变量决议树constant常理Unsupervised learning无监察学习／无导师学习commercial商务的Unsupervised layer-wise training无监察逐层训练complementarity增补Upsampling上采样coordinate ascent同样级上涨Vanishing Gradient Problem梯度消逝问题clipping剪下物；剪报；修剪Variational inference变分推测component重量；零件VC Theory VC维理论continuous连续的Version space版本空间covariance协方差Viterbi algorithm维特比算法canonical正规的，正则的Von Neumann architecture冯· 诺伊曼架构concave非凸的Wasserstein GAN/WGAN Wasserstein生成抗衡网络corresponds相切合；相当；通讯Weak learner弱学习器corollary推论Weight权重concrete详细的事物，实在的东西Weight sharing权共享cross validation交错考证Weighted voting加权投票法correlation互相关系Within-class scatter matrix类内散度矩阵convention商定Word embedding词嵌入cluster一簇Word sense disambiguation词义消歧centroids质心，形心Zero-data learning零数据学习converge收敛Zero-shot learning零次学习computationally计算(机)的approximations近似值calculus计算arbitrary任意的derive获取，获得affine仿射的dual 二元的arbitrary任意的duality二元性；二象性；对偶性amino acid氨基酸derivation求导；获取；发源amenable 经得起查验的denote预示，表示，是的标记；意味着，[逻]指称axiom 公义，原则divergence散度；发散性abstract提取dimension尺度，规格；维数architecture架构，系统构造；建筑业dot 小圆点absolute绝对的distortion变形arsenal军械库density概率密度函数assignment分派discrete失散的人工智能词汇discriminative有辨别能力的indicator指示物，指示器diagonal对角interative重复的，迭代的dispersion分别，散开integral积分determinant决定要素identical相等的；完整同样的disjoint不订交的indicate表示，指出encounter碰到invariance不变性，恒定性ellipses椭圆impose把强加于equality等式intermediate中间的extra 额外的interpretation解说，翻译empirical经验；察看joint distribution结合概率ennmerate例举，计数lieu 代替exceed超出，越出logarithmic对数的，用对数表示的expectation希望latent潜伏的efficient奏效的Leave-one-out cross validation留一法交错考证endow 给予magnitude巨大explicitly清楚的mapping 画图，制图；映照exponential family指数家族matrix矩阵equivalently等价的mutual互相的，共同的feasible可行的monotonically单一的forary首次试试minor较小的，次要的finite有限的，限制的multinomial多项的forgo 摒弃，放弃multi-class classification二分类问题fliter过滤nasty厌烦的frequentist最常发生的notation标记，说明forward search前向式搜寻na?ve 朴实的formalize使定形obtain获取generalized归纳的oscillate摇动generalization归纳，归纳；广泛化；判断（依据不optimization problem最优化问题足）objective function目标函数guarantee保证；抵押品optimal最理想的generate形成，产生orthogonal(矢量，矩阵等 ) 正交的geometric margins几何界限orientation方向gap 裂口ordinary一般的generative生产的；有生产力的occasionally有时的heuristic启迪式的；启迪法；启迪程序partial derivative偏导数hone 怀恋；磨property性质hyperplane超平面proportional成比率的initial最先的primal原始的，最先的implement履行permit同意intuitive凭直觉获知的pseudocode 伪代码incremental增添的permissible可同意的intercept截距polynomial多项式intuitious直觉preliminary预备instantiation例子precision精度人工智能词汇perturbation不安，搅乱theorem定理poist 假定，假想tangent正弦positive semi-definite半正定的unit-length vector单位向量parentheses圆括号valid 有效的，正确的posterior probability后验概率variance方差plementarity增补variable变量；变元pictorially图像的vocabulary 词汇parameterize确立的参数valued经估价的；可贵的poisson distribution柏松散布wrapper 包装pertinent有关的总计 1038 词汇quadratic二次的quantity量，数目；重量query 疑问的regularization使系统化；调整reoptimize从头优化restrict限制；限制；拘束reminiscent回想旧事的；提示的；令人联想的（ of ）remark 注意random variable随机变量respect考虑respectively各自的；分其他redundant过多的；冗余的susceptible敏感的stochastic可能的；随机的symmetric对称的sophisticated复杂的spurious假的；假造的subtract减去；减法器simultaneously同时发生地；同步地suffice知足scarce罕有的，难得的split分解，分别subset子集statistic统计量successive iteratious连续的迭代scale标度sort of有几分的squares 平方trajectory轨迹temporarily临时的terminology专用名词tolerance容忍；公差thumb翻阅threshold阈，临界。

面向神经机器翻译系统的多粒度蜕变测试

软件学报ISSN 1000-9825, CODEN RUXUEWE-mail:************.cn Journal of Software ,2021,32(4):1051–1066 [doi: 10.13328/ki.jos.006221] ©中国科学院软件研究所版权所有. Tel: +86-10-62562563面向神经机器翻译系统的多粒度蜕变测试*钟文康1, 葛季栋1, 陈翔2, 李传艺1, 唐泽1, 骆斌11(计算机软件新技术国家重点实验室(南京大学),江苏南京 210023) 2(南通大学信息科学技术学院,江苏南通 226019)通讯作者: 李传艺,E-mail:***********.cn摘要: 机器翻译是利用计算机将一种自然语言转换成另一种自然语言的任务,是人工智能领域研究的热点问题之一.近年来,随着深度学习的发展,基于序列到序列结构的神经机器翻译模型在多种语言对的翻译任务上都取得了超过统计机器翻译模型的效果,并被广泛应用于商用翻译系统中.虽然商用翻译系统的实际应用效果直观表明了神经机器翻译模型性能有很大的提升,但如何系统地评估其翻译质量仍是一项具有挑战性的工作.一方面,若基于参考译文评估翻译效果,其高质量参考译文的获取成本非常高;另一方面,与统计机器翻译模型相比,神经机器翻译模型存在更显著的鲁棒性问题,然而还没有探讨神经机器翻译模型鲁棒性的相关研究.面对上述挑战,提出了一种基于蜕变测试的多粒度测试框架,用于在没有参考译文的情况下评估神经机器翻译系统的翻译质量及其翻译鲁棒性.该测试框架首先在句子粒度、短语粒度和单词粒度上分别对源语句进行替换,然后将源语句和替换后语句的翻译结果进行基于编辑距离和成分结构分析树的相似度计算,最后根据相似度判断翻译结果是否满足蜕变关系.分别在教育、微博、新闻、口语和字幕这5个领域的中英文数据集上对6个主流商用神经机器翻译系统使用不同的蜕变测试框架进行了对比实验.实验结果表明,所提方法在与基于参考译文方法的皮尔逊相关系数和斯皮尔曼相关系数上分别比同类型方法高80%和20%,说明提出的无参考译文的测试评估方法与基于参考译文的评估方法的正相关性更高,验证了其在评估准确性上显著优于同类型其他方法.关键词: 神经网络;机器翻译;质量评估;蜕变测试;多粒度中图法分类号: TP311 中文引用格式: 钟文康,葛季栋,陈翔,李传艺,唐泽,骆斌.面向神经机器翻译系统的多粒度蜕变测试.软件学报,2021, 32(4):1051–1066. /1000-9825/6221.htm英文引用格式: Zhong WK, Ge JD, Chen X, Li CY, Tang Z, Luo B. Multi-granularity metamorphic testing for neural machine translation system. Ruan Jian Xue Bao/Journal of Software, 2021,32(4):1051–1066 (in Chinese). /1000- 9825/6221.htmMulti-granularity Metamorphic Testing for Neural Machine Translation SystemZHONG Wen-Kang 1, GE Ji-Dong1, CHEN Xiang 2, LI Chuan-Yi 1, TANG Ze 1, LUO Bin 11(State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China) 2(School of Information Science and Technology, Nantong University, Nantong 226019, China)Abstract : Machine translation task focuses on converting one natural language into another. In recent years, neural machine translation models based on sequence-to-sequence models have achieved better performance than traditional statistical machine translation models on multiple language pairs, and have been used by many translation service providers. Although the practical application of commercial translation system shows that the neural machine translation model has great improvement, how to systematically evaluate its translation* 基金项目: 国家自然科学基金(61802167, 61972197, 61802095); 江苏省自然科学基金(BK20201250)Foundation item: National Natural Science Foundation of China (61802167, 61972197, 61802095); Natural Science Foundation of Jiangsu Province of China (BK20201250)本文由“面向领域的软件系统构造与质量保障”专题特约编辑潘敏学教授、魏峻研究员、崔展齐教授推荐.收稿时间: 2020-09-12; 修改时间: 2020-10-26; 采用时间: 2020-12-19; jos 在线出版时间: 2021-01-221052Journal of Software 软件学报 V ol.32, No.4, April 2021quality is still a challenging task. On the one hand, if the translation effect is evaluated based on the reference text, the acquisition cost of high-quality reference text is very high. On the other hand, compared with the statistical machine translation model, the neural machine translation model has more significant robustness problems. However, there are no relevant studies on the robustness of the neural machine translation model. This study proposes a multi-granularity test framework MGMT based on metamorphic testing, which can evaluate the robustness of neural machine translation systems without reference translations. The testing framework first replaces the source sentence on sentence-granularity, phrase-granularity, and word-granularity respectively, then compares the translation results of the source sentence and the replaced sentences based on the constituency parse tree, and finally judges whether the result satisfies the metamorphic relationship. The experiments are conducted on multi-field Chinese-English translation datasets and six industrial neural machine translation systems are evaluated, and compared with same type of metamorphic testing and methods based on reference translations. The experimental results show that the proposed method MGMT is 80% and 20% higher than similar methods in terms of Pearson’s correlation coefficient and Spearman’s correlation coefficient respectively. This indicates that the non-reference translation evaluation method proposed in this study has a higher positive correlation with the reference translation based evaluation method, which verifies that MGMT’s evaluation accuracy is significantly better than other methods of the same type.Key words : neural network; machine translation; quality estimation; metamorphic test; multi-granularity1 引言机器翻译研究如何将基于一种自然语言描述的文本自动翻译成基于另一种自然语言描述的文本,是自然语言处理的一个重要研究问题.传统的机器翻译系统主要采用统计机器翻译模型[1].近年来,随着深度学习的发展和应用,基于序列对序列模型(sequence to sequence model)的神经机器翻译模型[2]在很多语言对的机器翻译任务上都超过了统计机器翻译模型.神经机器翻译模型不仅有很高的研究价值,还有很强的产业化能力[3],目前主流的翻译服务提供商(例如,谷歌翻译、必应翻译、百度翻译、腾讯翻译等)都提供了在线神经机器翻译服务.尽管神经机器翻译为机器翻译任务带来了极大的性能上的提升,但仍存在一些问题.例如,对长句子和低频词语的翻译效果不佳,翻译结果和词对齐模型不符等[4],并且,这些错误出现的规律和原因往往难以被发现.与统计机器翻译模型相比,神经机器翻译系统还存在更为显著的鲁棒性问题[4].Cheng 等人[5]指出,对输入语句做出的极小改变可能会引起翻译结果的剧烈改变,如同“蝴蝶效应”.此外,目前商用的神经机器翻译系统较多,但由于神经网络模型结构和训练数据的差异,各神经机器翻译系统的稳定性并不一样.图1和图2分别展示了谷歌和百度的神经机器翻译系统在翻译3个近似句子时的不同结果.3个待翻译英文句子在结构上完全相同,在内容上仅句尾单词含义不同,但是它们经过谷歌和百度的神经机器翻译系统翻译得到的翻译结果却出现了较大的差异.谷歌翻译在第2个和第3个句子上出现了翻译错误,而百度翻译在第1个句子上出现了翻译错误.Fig.1 Translation errors of Google’s neural machine translation system图1 谷歌神经机器翻译系统的翻译错误示例Fig.2 Translation errors of Baidu’s neural machine translation system图2 百度神经机器翻译系统的翻译错误示例钟文康等:面向神经机器翻译系统的多粒度蜕变测试1053显然,对神经机器翻译系统进行翻译鲁棒性评估具有重要的现实意义和研究意义.目前在该领域还缺乏相关研究.传统的机器翻译质量评估通常采用基于参考译文的方法,侧重翻译的正确性.而神经机器翻译系统采用的神经网络模型导致其与传统的统计机器翻译模型相比存在更为显著的翻译鲁棒性问题,亟需合理的测试手段和评估指标.如何对神经机器翻译系统进行测试和质量评估具有一定的研究挑战性.目前,这种挑战性主要体现在两个方面.神经网络模型的测试困难性.神经机器翻译系统采用的神经网络模型缺乏可解释性和可理解性[6].在进行神经机器翻译时,待翻译语句在神经网络结构中会被转换为多维向量,这种转换涉及到的步骤繁杂,参数众多,很难理解每一个步骤的实际含义.另一方面,神经网络模型对训练数据具有很强的依赖性,相同的网络结构在不同的训练数据集下,训练出的参数取值会存在较大差异,造成输出的稳定性较低.机器翻译任务的评估困难性.翻译质量通常基于参考译文进行评估,即给定人工翻译结果,与神经机器翻译系统输出的翻译进行比较,并通过相似度指标进行量化.但这种方法完全依赖于参考译文的质量,而高质量的参考译文获取的难度较大,成本很高.神经网络模型的测试困难性意味着采用白盒测试方法可行性较低,而机器翻译模型的黑盒测试方法通常基于参考译文,成本较高.为了解决上述研究挑战,实现在没有参考译文情况下对神经机器翻译系统进行有效的翻译鲁棒性评估,本文基于蜕变测试思想提出了一个多粒度的蜕变测试框架MGMT(multi-granularity metamorphic test).MGMT首次采用多粒度的蜕变测试方法进行质量评估,分别在句子、短语和单词粒度上定义了蜕变关系以及相似度计算方法,并基于蜕变关系对每一个句子进行3个粒度上的蜕变测试,最后用蜕变关系满足率作为神经机器翻译系统的鲁棒性量化指标.同时,我们基于MGMT框架开展了实证研究,采用一个公开的中英翻译数据集UM-Corpus[7],选取其中5个领域(教育、微博、新闻、口语、字幕)的英文句子集作为源数据集,在MGMT测试框架下对现有的、使用广泛的大型神经机器翻译系统(包括谷歌翻译[8]、必应翻译[9]、百度翻译[10]、阿里巴巴翻译[11]、腾讯翻译[12]、搜狗翻译[13])进行质量评估.最后将数据集中的中文句子作为参考译文,以基于参考译文的方法为基准,与同类型的蜕变测试方法进行比较,以证明MGMT相比于同类型方法在评估准确度上有显著的优越性.本文第2节对已有的面向神经机器翻译系统的质量评估和测试工作进行总结.第3节介绍本文提出的多粒度蜕变测试框架,描述测试流程、蜕变关系定义以及相似度计算方法.第4节针对6个主流商用神经机器翻译系统在一个多领域的翻译数据集上进行实验,并用同类型蜕变测试工作与基于参考译文的测试方法进行对比,以证明本文方法的有效性.第5节进行总结与展望,对本文工作进行总结并阐明未来可能的工作方向.2 相关工作传统的机器翻译系统质量评估并不区分正确性和鲁棒性,通常用翻译质量来衡量系统质量.Eirini[14]总结了两类翻译质量评估方法.一类是人工评估,即由专业译者来判断翻译质量的好坏.人工评估的优点是评估结果最接近实际,但是时间成本和人力成本都较高.另一类方法是基于参考译文进行评估,即给定翻译好的参考译文,将机器翻译的输出结果与参考译文进行相似度指标计算,最常用的指标有BLEU[15]、METEOR[16]、WER[17]等.基于参考译文的方法相对于人工方法成本有所降低,但是高质量参考译文的获取难度较大,成本仍然很高.如何在没有参考译文的情况下对神经机器翻译系统进行质量评估是一项困难的任务.神经机器翻译系统采用的神经网络模型具有参数规模大、可理解性弱的特点,且普遍存在测试预言问题.测试预言问题[18]是指在测试中对于某个输入需要给定预期的输出来判断系统实际输出的正确性.Wang等人[19]总结了目前常见的解决深度神经网络系统测试预言的方法,将其分为两类.第1类基于差异测试[20],即通过检测同一输入在基于相同规约的实现下的输出是否相同来判断是否出错.另一类基于蜕变测试[21],即通过定义蜕变关系来描述系统的输入变化和输出变化之间的关系.在以往的神经机器翻译系统质量评估工作中,基于蜕变测试的方法较为常见,这种方法的关键在于蜕变关系的定义.Milam等人[22]提出用往返翻译RTT(round-trip translation)在无需参考译文的情况下可以对机器翻译系统1054 Journal of Software软件学报 V ol.32, No.4, April 2021进行测试的有效性.基于RTT构造的蜕变关系是:源语句通过神经机器翻译系统翻译到目标语言,再翻译回源语言得到的翻译结果应该与源语句相同.Daniel等人[23]提出了一种结合蒙特卡洛随机算法和蜕变测试的方法MCMT(Monte Carlo combined metamorphic test),以此来衡量神经机器翻译系统的质量.它定义了一种类似RTT[22]的蜕变关系:源语言经过神经机器翻译系统直接翻译到目标语言,与源语言先使用蒙特卡洛算法随机翻译到一种中间语言,再翻译到目标语言得到的两个翻译结果应该相同.Zhou等人[24]在Daniel等人[23]工作的基础上提出了新的神经机器翻译系统质量评估方法MT4MT.该方法使用基于词替换的蜕变关系:替换源语句中的一个单词,不会影响翻译语句的结构.同时,MT4MT有针对性地设计了一些简单的替换规则.此外,有部分工作研究如何在无需参考译文的情况下进行机器翻译系统的翻译错误定位.He等人[25]提出了结构不变性测试(structure-invariant test,简称SIT)以发现系统的翻译错误.结构不变的含义是指,上下文含义相近的句子在结构上应该相同.具体做法是,将源语句中的某个词通过BERT遮蔽语言模型[26]进行替换,生成上下文相似的句子,最后再比较这两个句子的结构相似度.Zheng等人[27]也提出了一种自动测试神经机器翻译系统的方法,通过短语识别和联系学习可以自动发现神经机器翻译系统的过译(over-translation)和漏译(under- translation)错误.Shashij等人[28]提出了一种翻译错误的自动检测方法,借助句子的成分句法分析树将句子中的短语独立出来,通过比较短语在句子中和独立翻译的结果来自动发现系统的翻译错误.Sun等人[29]提出了一个结合测试与修复的框架TransRepair,在测试阶段也采用了基于词替换的方法来生成上下文相似句子.但是,上述基于蜕变测试的工作仍然存在不足之处.MCMT [23]采用随机算法来选择中间语言,但不同语言的翻译效果有较大差异,会对实验产生干扰.MT4MT[24]设计的替换规则过于主观,能被替换的词的范围较小.另外,基于蜕变测试的已有工作都只采用了一种蜕变关系来进行蜕变测试,实验结果缺乏说服力.针对已有研究工作存在的不足,本文提出了一个多粒度的蜕变测试框架MGMT,可以在无需参考译文的情况下对神经机器翻译系统进行鲁棒性评估.MGMT与已有方法有较大区别.首先,MGMT与已有工作的目的不同.已有工作旨在利用蜕变测试思想对神经机器翻译系统的翻译性能进行评估(例如RTT[22]、MCMT[23])或定位翻译错误的样本(例如SIT[25]、TransRepair[29]),主要关注翻译的正确性;而MGMT的主要目的是借助蜕变测试对神经机器翻译系统的整体鲁棒性进行评估,主要关注翻译的稳定性.具体来说,MGMT与RTT[22]、MCMT[23]、MT4MT[24]都基于蜕变测试对神经机器翻译系统进行整体性评估且不需要参考译文,RTT[22]、MCMT[23]、MT4MT[24]基于单一蜕变关系来评估翻译质量,但MGMT旨在以合理的方式评估翻译系统鲁棒性,因此使用了3种符合翻译直觉的蜕变关系(具体细节可参考第3.2节中的蜕变关系定义).MGMT中短语和单词粒度的测试样本生成思路受SIT[25]和TransRepair[29]的启发,用替换的方式生成测试样本,但SIT[25]的目的是尽可能地发现更多的翻译错误,因此采用了尽可能多且独立于替换方法的相似度计算方法,而MGMT为确保鲁棒性评估的合理性,采用了一一对应的替换方法和相似度计算方法.3 多粒度的蜕变测试框架在本节中,首先介绍框架的整体架构和测试流程(见第3.1节).其次介绍框架中的主要模块设计,包括句子粒度、短语粒度、单词粒度上的蜕变关系定义(见第3.2节).之后介绍MGMT框架中如何选择待替换成分并进行成分替换(见第3.3节)以及如何进行各粒度上的相似度计算(见第3.4节).3.1 整体架构本文提出的面向神经机器翻译系统的多粒度蜕变测试框架MGMT大致可分为3个部分.图3展示了从源语句输入到蜕变关系判定结果输出的主要流程.(1) 选择源语句中的待替换成分.根据MGMT中定义的蜕变关系(见第3.2节),对源语句进行句子、短语、单词粒度的替换.因此,首先要在3个粒度上选择源句子中需要替换的成分.在MGMT的设计中,源句子在句子粒度上的待替换成分即为整个句子.接着进行待替换单词和待替换短语的选择.我们对源语句进行成分句法分钟文康等:面向神经机器翻译系统的多粒度蜕变测试 1055 析,得到句子的成分句法分析树,再使用DeepSelect 算法(见第3.1.1节)在成分句法分析树上进行选择.在图3所示的例子中,我们根据分析树选择了一个NNP(proper noun,singular)词性单词作为待替换单词和一个ADJP (adjective phrase)词性短语作为待替换短语.(2) 对源语句进行成分替换.句子粒度上的替换基于RTT [22]思想.RTT 包含正译(forward translation,简称FT)和回译(backward translation,简称BT),正译是指将文本从源语言翻译到目标语言,回译是指将正译得到的翻译结果翻译回源语言.我们先将源语句正译到目标语言,再回译到源语言以得到句子粒度的替换结果.短语粒度和单词粒度上的替换基于BERT 遮蔽语言模型[26].本文将上面(1)中选中的待替换单词和短语用遮蔽词替代,再输入BERT 遮蔽语言模型中,该模型可以根据句子的语境预测被遮蔽位置的词.最后用预测出的结果替换源句子中相同位置的单词和短语以得到短语粒度和单词粒度的替换结果.(3) 翻译并对翻译结果进行相似度计算.将源语句连同上面(2)中3个粒度的替换语句输入到神经机器翻译系统中得到4个目标语言翻译结果,并分别对3个粒度上的替换语句和源语句的翻译结果进行相似度计算.在句子粒度上,根据编辑距离[12]分别计算源语言句子对和目标语言句子对的相似度.在短语和单词粒度上,考虑到选择待替换成分是基于成分结构分析树的,因此在计算目标语言句子对相似度时也基于句子的成分结构分析树.最后根据相似度计算结果判断是否满足MGMT 定义的蜕变关系(见第3.2节).Fig.3 Process of multi-granularity metamorphic testing framework图3 多粒度蜕变测试框架的流程3.2 蜕变关系定义为了利用蜕变测试对神经机器翻译系统进行合理的翻译鲁棒性评估,在本节中我们定义了句子、短语和单词3个粒度的蜕变关系.这3个蜕变关系的定义基于对翻译任务的常理性推断.句子粒度的蜕变关系基于:源语句的直译结果与源语句经过多轮翻译得到的翻译结果应该接近.短语和单词粒度的蜕变关系基于:改变源语句中的一小部分,那么源语句其他部分的翻译结果应该不变.下文第3.2.1节、第3.2.2节和第3.2.3节分别详细介绍句子、短语、单词粒度的蜕变关系定义及判定方法.3.2.1 句子粒度蜕变关系RTT [22]是在没有参考译文情况下的一种常用机器翻译系统测试手段.其测试流程是首先将源语言正译成1056 Journal of Software 软件学报 V ol.32, No.4, April 2021 目标语言,再将目标语言翻译结果回译到源语言,最后通过比较两个源语言句子来评估机器翻译系统的质量.本文在RTT“正译-回译”流程的基础上添加1次正译,由此定义了句子粒度的蜕变关系MR sl .定义1(句子粒度蜕变关系MR sl ). 设源语言句子为S ,将S 经过神经机器翻译系统正译得到目标语言翻译结果S t ,再将S t 通过神经机器翻译系统回译到源语言得到翻译结果S 1,最后将S 1通过神经机器翻译系统再一次正译到目标语言得到S t 1,那么S 、S 1、S t 、S t 1应满足:11(,)/(,)1t t Similarity S S Similarity S S ≥ (1)公式(1)的含义是,用目标语言句子对与源语言句子对相似度的比值来评估基于句子粒度的翻译鲁棒性,目的是排除回译对实验的影响.MGMT 框架实际评估的是神经机器翻译系统在源语言到目标语言翻译(正译)上的翻译鲁棒性,然而句子粒度的替换过程(如图3所示)涉及到一次回译.回译采用的神经机器翻译模型是与正译采用的神经机器翻译模型相互独立的,因此在回译阶段产生的翻译错误会影响第2次正译.例如,在某次测试过程中,正译的翻译质量极高而回译的翻译质量很低,直接以Similarity (S ,S 1)或Similarity (S t ,S t 1)评估翻译质量都会导致评估值远高于真实值.因此,我们在公式(1)中用Similarity (S t ,S t 1)/Similarity (S ,S 1)作为翻译质量的评估值,意在为低质量的回译过程增加一个补偿因子:如果某次回译过程翻译质量较差(Similarity (S ,S 1)较小),那么正译翻译质量分数应得到部分补偿(即Similarity (S t ,S t 1)/Similarity (S ,S 1)的值会增大);若回译过程翻译质量较好(即Similarity (S ,S 1)接近1),那么翻译鲁棒性的真实值也更接近目标语言句子对的相似度Similarity (S t ,S t 1),而此时公式(1)中的评估值Similarity (S t ,S t 1)/Similarity (S ,S 1)也更接近Similarity (S t ,S t 1).3.2.2 短语粒度蜕变关系一个句子由单词构成,不同的单词能够组成不同的短语结构.以英文句子为例,短语结构可分为名词性短语(noun phrase,简称NP)、动词性短语(verb phrase,简称VP)、介词性短语(prepositional phrase,简称PP)、副词性短语(adverb phrase)等.将源句子中某个短语结构替换为另一个近似的短语结构之后,源句子和替换后句子经过神经机器翻译系统翻译得到的翻译结果的结构应该相同.本文由此定义了短语粒度的蜕变关系MR pl .定义2(短语粒度蜕变关系MR pl ). 设源语句为S ,替换S 中的某个短语产生结构相似的替换语句S p .再将S 和S p 通过神经机器翻译系统翻译到目标语言得到结果S t 和S pt .那么,S t 和S t 应满足:(,)1t pt StructureSimilarity S S = (2)公式(2)的含义是源语句S 和短语替换语句S p 经过神经机器翻译系统的翻译结果S t 和S pt 在结构上应该相同.本文用基于成分句法分析树的相似度计算方法(见第3.3.3节)来计算S t 和S pt 的结构相似度.结构相似度的取值范围在0~1之间,取值为0时,说明两个句子的句法分析树结构完全不同,取值为1时,说明两个句子的句法分析树结构完全相同.3.2.3 单词粒度蜕变关系一个句子由单词构成,不同的单词有着不同的词性,处在不同的句子结构块中.将源句子中的某个单词替换为相同上下文的近似单词,那么源句子和替换后的句子的翻译结果在结构上应该相同.本文由此定义了单词粒度的蜕变关系MR wl .定义3(单词粒度蜕变关系MR wl ). 设源语句为S ,替换S 中某个单词产生结构相似的替换语句S w ,再将S 和S w 通过神经机器翻译系统翻译到目标语言得到结果S t 和S wt .那么,S t 和S wt 应满足:(,)1t wt StructureSimilarity S S = (3)公式(3)的含义是源语句S 和单词替换语句S w 经过神经机器翻译系统得到的翻译结果S t 和S wt 在结构上应该相同.同样,我们用基于成分句法分析树的相似度计算方法(见第3.3.3节)来计算S t 和S wt 的结构相似度.结构相似度的取值范围为0~1,取值为0时,说明两个句子的句法分析树结构完全不同,取值为1时,说明两个句子的句法分析树结构完全相同.3.3 替换3.3.1 选择待替换成分根据MGMT 定义的蜕变关系(见第3.2节),对于一个测试样本要在3个粒度上进行基于替换的蜕变测试,钟文康等:面向神经机器翻译系统的多粒度蜕变测试1057 因此首先要在句子、短语、单词3个粒度上选择源语句中要被替换的成分.句子粒度的待替换成分即整个源句子.短语和单词粒度的待替换成分是句子中的某个短语和单词.在MGMT 中,基于源句子的成分句法分析树来选择短语和单词粒度的待替换成分.图4展示了一个英文句子经过BerkeleyParser [30]得到的成分句法分析树.可以看到,句子根据单词词性和短语词性被组织成树状结构.成分句法分析树的节点都是由句子中的单词构成的,每一棵子树都是某几个单词的组合.那么在句子中选择一个词或一个短语,就等价于在句法分析树中选择到达某棵子树的路径.基于以上特点,本文设计了一种基于成分句法分析树的选择算法DeepSelect 来在短语粒度和单词粒度上选择要替换的成分.算法1. DeepSelect.输入:源语句Sentence ,候选集大小Candidatenum ,短语词性集合PhrasePOS ;输出:待替换路径finalpath .1. 建立待替换路径的候选集Candidates ,初始化为空集2. Tree =BerkeleyParser (Sentence )3. 遍历Tree ,将Tree 中的每一条路径(包含子路径)加入路径集合PathSet 并将PathSet 中的元素按路径长度降序排列4. if 当前粒度为单词粒度 then5. for each path in PathSet do6. if length (Candidates )<Candidatenum then7. Candidates.append (path )8. elif 当前粒度为短语粒度 then9. for each path in PathSet dopath in PathSet10. if length (Candidates )>Candidatenum and Tree [path ] in phrasePOS then11. Candidates.append (path )12. Finalpath=random.choose (Candidates )13. return FinalpathDeepSelect 算法旨在选择句子在短语和单词粒度下的待替换成分.首先,采用BerkeleyParser [30]句法分析器来生成句子的成分句法分析树.由于处在成分句法分析树较深路径的节点的粒度一般较小,选择这些节点更符合MGMT 的蜕变关系定义(见第 3.2节),因此我们将句法分析树中的路径按从长到短排序收集到路径集合PathSet 中.接着,在单词粒度下,我们直接往候选集中添加Candidatanum 条路径;在短语粒度下还需进行一个额外判断,要求路径节点的词性必须是短语结构型.最后,为了保证路径选择的公平性,算法随机从候选集中选择一条路径作为最终的待替换路径.3.3.2 成分替换句子粒度的成分替换采用的是基于往返翻译的方法.首先将源句子输入神经机器翻译系统得到目标语言的直译结果,再将直译结果输入翻译系统翻译回源语言,这样就得到了一个句子粒度的替换语句.短语粒度和单词粒度的成分替换采用的是BERT [26]遮蔽语言模型.BERT 模型是一个非常成功的自然语言理解模型,在很多自然语言处理任务中通过微调都能达到SOTA(state of the art)效果.在BERT 中每个词的词向量不是唯一的,而是与词的上下文相关,因此,通过BERT 可获得符合句子语义的词向量.模型主要通过遮蔽词预测和下一句预测这两个任务来进行训练.其中,遮蔽词预测是指将一个句子中15%的词遮蔽,把预测这些被遮蔽 Fig.4 Example of constituency parse tree 图4 成分句法分析树示例。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

[Proceedings of the Evaluators’ Forum, April 21st – 24th, 1991, Les Rasses, Vaud, Switzerland;ed. Kirsten Falkedal (Geneva: ISSCO).]Constructive Machine TranslationEvaluationStephen MinnisMitsubishi Electric CorporationAbstractIt is now acknowledged that the evaluation of the quality of MT output is inextricably linked to the purpose to which the translationoutput will be put. It is also true to say that the value of the evaluationis inseparably linked to the purpose to which the evaluation resultswill be put.For the developer of an MT system, the evaluation of the quality of the MT output must be approached from the viewpoint of increas-ing the knowledge about the MT system. Resultant measures mustbe analysed, so that practical feedback to improve the system is feasi-ble. However, for the manager, evaluation of the MT output is oftenviewed in terms of comparison. Measures are compared against pre-vious measures, or against those obtained by other systems, in orderto gauge progress, or to assess the systems ability.Measurement can be viewed as a tool for increasing the knowl- edge of some object or entity. From both viewpoints described above,the measurement is required as a means to increase the knowledgeabout the system, whether it is knowledge about the systems errors,or performance.However, there is a more fundamental level at which measurement should be applied as a tool for increasing knowledge; that is, to in-crease the knowledge of the properties we are trying to measure (inthis case intelligibility and fidelity). Such measurement is a precursoryrequirement for more general uses of evaluation measures, as describedabove.When surveying the many methods currently employed in MT eval- uation, it is not immediately obvious that the methods used serve toincrease the knowledge of the properties being measured. This re-port describes a constructive machine translation evaluation method,aimed at addressing this issue.99IntroductionThe move towards the use of measurement is sometimes justified by quotingthe physicist, Lord Kelvin [Cook 1982] (see figure 1).The premise is that measurement serves as a tool for increasing our under- standing or knowledge. There is however, a danger in attempting to measure an entity when that entity is not fully understood, as it is possible to measure incorrectly (see figure 1, second quote [Hamer 1985]).Measurement as a tool for increasing KnowledgeLord Kelvin“When you can measure what you are speaking about, and express it innumbers you know something about it; but when you cannot measure it, whenyou cannot express it in numbers, your knowledge is of a meagre andunsatisfactory kind..”George Miller“In truth, a good case could be made that if your knowledge is meagre andunsatisfactory, the last thing in the world you should do ismake measurements. The chance is negligible that you will measure the rightthings accidentally”Figure 1Current MT evaluation methods do not immediately lend themselves to increasing the knowledge of what we are measuring. As an example, do the methods for measuring the intelligibility serve to increase the knowledge of what makes a sentence intelligible? In fact this statement begs the question ‘what is intelligibility?’.If this issue is not addressed, then evaluation techniques centering on mea- suring intelligibility and fidelity can only be steeped in subjectivity. These vague notions will always mean different things to different people. Further- more taking ‘measures’ by rating sentences on a subjective scale (cf ALPAC and other evaluation methods) only adds to this subjectivity. In order for MT evaluation to progress from this predicament a more constructive machine translation evaluation method is required.A constructive evaluation method is one in which the evaluation method is applied in a manner which increases the knowledge about the translation system (from a variety of viewpoints), in addition to enhancing our knowledge of the properties being assessed (intelligibility and fidelity).100To develop this evaluation method, the approach taken in this document is to (briefly) review basic measurement theory, and then attempt to apply the important concepts to MT evaluation. The end result is an evaluation method which seems to offer some advantages over current evaluation meth- ods.Basic Measurement TheoryMeasurement is one mechanism we use to describe particular properties of entities around us. Statements such as ‘I am taller than you’, ‘My pen is redder than yours’, ‘I am more knowledgeable than him’ etc. are all based on some underlying theory of measurement. An informal definition of mea- surement (given in [Finkelstein and Leaning 1984]) is shown in figure 2.To illustrate by way of example, if ‘people’ are taken as the object of inter- est, we observe that some people are taller than others. Therefore a property (or attribute) that people possess is height. When we assign numbers to peoples height, the relationship ‘taller than’ is preserved in the measurement chosen.The assignment of numbers requires standard procedures of measurement, and standard units representing the measurement.Basic Measurement TheoryDefinition:“Measurement is the process of empirical, objective assignment of numbersto the properties of objects and events in the real world in such a way asto describe them”Example:Figure 2101For example, if we measure a person standing upright against a wall, using the metre scale, we can state that a person who is 1.6m tall, is taller than someone who is 1.4m tall. Therefore the observed property ‘height’ is captured by the measurement (as 1.6 > 1.4). Note that the procedure of measurement should be more strictly defined, for example, to take into account whether shoes are worn, whether hair height is counted and so on.This simple description of the application of measurement must be revised when the object1 to be measured is very complex. In this case the usual approach is to simplify the object of interest, or at least some particular feature of the object, in the form of a model.As an example, suppose we want to measure the ‘complexity’ attribute of a software program. It is obvious that many factors will affect this com- plexity, including the control structure, the data structure, and the length of the program. To measure the control structure, one approach could be to develop a graphical model of the control flow, and measure the structure of that graph2 (see figure 3). The assumption is that the structure of the graph has an intimate correlation with the ‘complexity’.Using Models to Abstract PropertiesThe control structure of [A] is more ‘complex’ than that of [B]Figure 3There are essentially two approaches to modelling; the first one, demon- strated above, is the structural approach, where the problem is decomposed1 All references to objects in the remainder of this document, should also be read as applying to events2 This approach has been taken in the field of Software Measurement [McCabe 1976, Fenton-Whitty 1987]102into manageable components, eg. the control structure is one component of the overall complexity. The second approach is statistical in nature, where many aspects of the object (eg the code) are measured (eg its length, number of goto statements etc.), and these are utilized in a statistical model. Typi- cally, statistical models are used when the object/property being measured is not well understood. In practice, often an amalgamated model is developed, employing features of both these approaches.Both types of models are useful, but all such models must be validated.Validation is when something can be confirmed as relating to reality, usu- ally shown by objective and repeatable experimentation. Some of the state- ments given at the start of this section can be validated, eg the statements relating to height and colour, however knowledgeability is more difficult to validate. It might be possible to validate some aspects of knowledgeability, by using particular models which describe knowledgeability in measurable terms. As an example, the winner of a mastermind quiz about ‘Chinese sauces used in Mongolia between the years 1912-1918’ might be considered more knowledgeable than the losers (assuming they get the same questions etc.), but strictly speaking the measurement only shows that the winner was more knowledgeable on that topic, over that particular set of questions. It does not constitute a validation of a wider notion that the winner is more knowledgeable than the losers3.Clearly, interpretation of the results is important, and must be done care- fully with full analysis of the implications of the measurement method em- ployed. This is particularly important when the property being measured is complex and abstract. It should also be clear that there is an advantage in defining vague attributes in measurable terms, in order that we can draw some conclusions about them.Usually once a model is developed, some aspects of features of the model are used to hypothesise or explain certain relationships observed for at- tributes of the object. For example in the program complexity example, it has been hypothesised that the nesting of control loops increases the com- plexity of the control structure. The measures used attempt to capture this increased ‘structural complexity’ by assigning appropriate weighting to nest- ing in the measurement employed.To summarize, the tasks in the application of measurement are shown in figure 4.It is important to remember that each attribute of interest will possess its own specific assumptions about the object or model, and will therefore3 If the set of questions in the quiz was always the same, then eventually everyone would become familiar with the answers. The initial quiz then becomes useless (cf MT benchmark corpus evaluation?)103require its own specific metric, or measurement procedure.Applying Measurement Theory*to define the object or event of interest (this may be a model)*to establish some intuitively observed property of that object or event*to define the standard measurement procedures, rules and units, in orderto capture the relationship that the attribute imposes on the object oreventFigure 4Measurement of MT OutputThe three steps in the application of measurement are approached in the following manner. First, the attributes of interest are introduced, as they have more or less already been decided by work in MT evaluation over the past two decades.After this the object of interest, and how it can be modelled is discussed. This step will necessarily include a description of how the attributes can be defined in measurable terms, on the particular model chosen.Note that in order to restrict the scope and length of the paper, there has been little attempt to pursue, in detail, the third (and most difficult) task, that of defining the standard measurement procedures, rules and units. The tentative approach presented, aims to outline only the main features of the proposed measurement procedure.Identification of Intuitively Recognised PropertiesThe choice of the two attributes of interest, intelligibility and fidelity, is dictated by the work in MT evaluation over the past few decades.The attribute of intelligibility naturally comes from the intuitive observa- tion that some machine translated texts are more intelligible than others. We want our resultant measures to represent the relationship ‘more intelligible than’. A text with a higher intelligibility rating will be more intelligible than one with a lower rating4.4 Note that a measurement will have appropriate units and scales. We can say thata person of height 2m is twice as tall as someone who is 1m tall. For intelligibility and accuracy, at least initially, the scales will be arbitrary, and we will not consider such aspects.104Similarly, the fidelity of a translation arises from the observation that some translations are more ‘accurate’ than others. The notion of fidelity may be more difficult than intelligibility. This is because intelligibility is re- stricted to one cultural and linguistic domain, ie for one language. Fidelity must be assessed by a bi-lingual, and hence is more open to interpretation, as the knowledge of a second language is not often accompanied by a full understanding of the languages inherent culture and usage. The correspon- dence a bi-lingual makes between the two languages depends on this essential knowledge, and increases with experience. Also, often a literal translation does not convey the full ‘meaning’ of the source language.By restricting the domain of the language under translation, for example to a particular technical domain, then the problems involved with bi-lingual assessment may be reduced in scope5.Definition of the Object of InterestFor MT evaluation, is the object of interest the translated text, produced from the input text, or is it the translation process we are really interested in evaluating? This distinction is discussed within the framework of the model shown in figure 5.5 Actually, decreasing the scope may increase the problems due to more specific natureand knowledge required of that domain.105In the case of fidelity, what we want to measure is actually the relationship of some measure on the target text, TM f, to some measure on the source text SM f. This means that the fidelity measure is actually capturing some aspect of the translation process.The measures TM i and TM f are not equivalent. The fidelity is not a case of relating the intelligibility of the target text, to the intelligibility of the source text.To validate these measures, it is necessary to show that the measures on the text correctly capture the relationship being measured, ie that the num- bers assigned reflect the relationship observed. For accuracy, in addition, the relationship between the source and target text measures must be validated.What exactly is the text ‘object’ of interest? The domain of interest for MT is written text, therefore possible text ‘objects’ could be paragraphs or sentences, for example. It would be possible to discuss the intelligibility or fidelity of paragraphs, or even larger ‘objects’, but the next assumption is that our object of interest is the sentence in isolation. This is because most current machine translation systems translate on a sentential basis, and therefore the measurements should be trying to capture the effectiveness of this intra-sentential translation.This is a practical assumption that reduces the complexity of the evalu- ation task, as it should be easier to assess the sentence in isolation.However it is apparent that even for a sentence considered in isolation, it is not immediately obvious what the intelligibility or the fidelity will be, as we do not know exactly what these properties are. This is probably a terminology problem; if we continue to talk about the intelligibility defined as ‘the ease at which the meaning of a sentence can be understood’ and then proceed to develop scales to measure this (eg ALPAC), we get nowhere, as we never increase our knowledge of the attribute, and furthermore the measurements and results are always steeped in subjectivity.The sensible approach seems to be to define the attributes in measurable terms, so that thereafter we can reason about them objectively. Further- more, we then have the means to validate any models we may propose. This provides an opportunity to progress in increasing our knowledge of what af- fects/constitutes the attributes, and only then can we propose scales, such as those that have been suggested in other methods.Two approaches are suggested, both aiming towards this goal. They are both directed at implementing objective and repeatable measures.The first is to define the intelligibility/fidelity attributes in terms of com- prehension time. In figure 6 the dotted line represents the threshold value at which the subject has grasped the meaning of the translated text6. Tests would need to be carried out periodically to show that the correct meaning6 This threshold value being assigned as a standard measurement procedure.106was read. Without going into such an experiment in detail, it is obvious that it would take some time to organise, but with careful design, it is possible that the results could provide useful measures. For example, the rate of un- derstanding could be studied (the gradient), the area under the curve might have some significance, as well as the time to reach the threshold. Such as- pects could provide information as to the capabilities of the subjects (if more than one was used), for example for normalization purposes.Defining Attributes in Measurable Terms: Approach 1comprehensiontimemeasurable features*rate of comprehension (gradient of curve)*area under curve*time to reach cut-off pointFigure 6For accuracy, the same method could be used, except with reference to the original text. The method is similar to the informativeness measures proposed in [King], except it tries to be more objective in the measurement (time is better than simply assigning an arbitrary number).This approach does suffer from the resources it requires, as well as prac- tical difficulties, for example, in showing text comprehension, especially for machine translated sentences.The following approach is more practical, but is an indirect approach. It is based on the method of ‘forward translation’, or for practical purposes, post-editing. In case of intelligibility, the time taken to make the sentence intelligible (by post-editing) could be used. For fidelity, the time taken to make the sentence accurate (with original text for reference) could similarly be used7. In this case we are in fact measuring the difference in the x-ability7 This raises the question as to whether it is possible to post-edit only for accuracy. In fact, 'real' post-editing is geared to producing accurate and intelligible output. The107Figure 7This approach could be verified by using a variety of post-editors to confirm the final translations are understandable/accurate 9. Although different translators might produce a slightly different end-translation, agreement could be made as to whether the final translations are acceptable (perhaps by using guidelines etc.). The time taken could be averaged to give a final value, if more than one post-edit is done.In both approaches described, there is a need to standardise the measurement procedures used. Post-editing time might be affected by a variety of factors, including tool use etc. not to mention the post-editors ability. These topics are not discussed any further in this report.Since fidelity is the relation of accuracy of the target text to the source text, the assumption is that when the text is post-edited for fidelity, the relationship is 1:1. For intelligibility, since the post-editing is done in isolation by a monolingual, the resultant text might be widely inaccurate. This is not a problem from the measurement viewpoint. In fact the difference between the post-edited intelligibility version and the post-edited fidelity version, will indicatedifference between the post-editing time for intelligibility and the post-editing time for intelligibility/accuracy could provide an acceptable accuracy measure. 8 Example translation and post-edited text taken from [Nikkei 1990]. 9 Note that verification is not validation. Verification is showing that something is correct. In this case we want to show that the final post-edited text is correct .108 of the pre-(post-edited) text and the final post-edited version (see figure 78 ).the quality of the translation in conveying the required meaning correctly. Developing another measure to capture this difference would be useful.To summarize so far, it has been proposed to measure the two attributes intelligibility and fidelity in measurable terms. The preferred method (for practical purposes) is one in which we measure the attributes indirectly, therefore our assumption is that the time measured correlates with the at- tributes of interest. This is indirect measurement. A similar example which helps to clarify the approach, is that we can indirectly measure the tempera- ture of a room by measuring the length an iron bar expands. The expansion of an iron bar has been validated as being related to the temperature, through extensive experimentation, similarly, the above indirect approach must also be validated.Constructive MT Evaluation MethodIn the last section a definition for intelligibility and fidelity was proposed which was measurable. If we accept such a definition then we increase the objectivity in our reasoning about the attributes. This allows us to investi- gate what affects the attributes, and therefore increase our knowledge about them.To increase understanding of an object, it was mentioned in the intro- duction that there are two approaches; to develop theoretical models, or to attempt to measure certain aspects, and see if we can deduce significant trends or patterns through statistics. As an example, if we wanted to mea- sure the temperature of a room, we could develop a theoretical model relating the expansion of an iron bar to the temperature, taking into consideration such factors as the bars size and density. Alternatively at the other extreme, we could measure these entities and try to relate them to the temperature statistically. Typically a mixed approach is employed, where a model is hy- pothesised, but refined through empirical observation and measurement.We have some intuitions as to ‘factors’ that might affect the attributes of intelligibility and accuracy, for example, missing words, incorrect sentence structure, wrong translation of verbs, and so on10. However, we do not know the inter-play between the factors affecting the attributes.The following practical approach is proposed; to get the post-editor/evaluator to classify what is wrong with a sentence (see Figure 811). Although it is possible that many evaluators will possibly classify different aspects as be- ing wrong, this is only likely to happen when the translated sentence is 10 Other possibilities might include style, grammaticality etc. but these are more difficult to detect in typical MT translated sentences. The above approach is deliberately kept simple.11 This classification being taken from [Nagao et al. 1988].109very unintelligible12. Note that we should be able to measure how bad the classifications differ between post-editors; this could serve as another mea- sure/indication of the intelligibility/fidelity.Figure 8Furthermore, even if different evaluators classify different aspects, as it is proposed that the post-editor must correct what has been classified (this time being measured), the classification will be ‘tested’. If the changes dic- tated by the error classification are not sufficient to make the sentence ac- curate/understandable, the results are discarded (or, more practically, the error classification is changed). Therefore we have a means for verifying the classification made by the post-editor. As explained before, showing that the final result is acceptable in terms of the attribute of interest, through agreement reached by a group of evaluators provides further verification of the measures.12 The presumption is that the sentences are of a sufficient quality to be evaluated.110The classification shown in figure 8 has been deliberately left relatively high level, so that the method is practical. The idea is that the evaluator can decide the levels, and the detail required, as necessary. For example, a more detailed classification would be to divided the phrase classification into noun phrases and verb phrases.At this point it is assumed that the time for post-editing has been recorded, and that the above error classifications have been verified as described.Since we do not know how each of the above entities affects the intelligi- bility or fidelity, it is proposed that weights are assigned to each category.These weights will be assigned values using our own intuition at the onset of the evaluation, but the aim is to tune them as to actual importance, using a database of collected measures. If we consider the indirect measure of temperature again, a theory relating the expansion to properties of the iron bar might have been developed in such a way. It is noted that the indirect measurement of temperature has far fewer variables, and is more amenable to measurement being a ‘hard’ physical quantity, than does a property such as intelligibility. Although the attributes of intelligibility and fidelity are complex, with experience it should be possible to identify the key factors affecting them.Figure 9111Different weights will probably have to be assigned for each attribute, as each attribute will be affected differently by different factors.Note we could go further and sub-assign weights, for example a badly transferred verb could be assigned a weighting of 8, and an adjective of 4, as a verb is generally perceived as being more important for accuracy. However, it might be better to err on the side of caution, and not be too ambitious with regard to details, until key factors have been identified.The weights do not have to be known to the post-editor. This will con- tribute to the objectivity of the classification and the post-editing process.There is one final aspect of the measurement which we need to consider. This is the requirement to normalise sentences according to their complexity, so that the measures taken are put in perspective. This is necessary, because generally speaking, longer sentences will be more difficult to understand or be accurate, than shorter sentences. Normalization typically requires some quantification of frequency of occurrence. One possibility is to normalise within each error category. A second possibility is to normalise according to the number of verbs in the total sentence, as the number of verbs is generally seen as being related to the sentence complexity13. A more practical idea is to simply count the number of words in the sentence. A target language parser could be used to parse the post-edited sentence, and extract various information. The number of ‘levels’ in the parse tree could also be used as an indication of complexity (or even the parse tree itself – cf the earlier example on control flow program complexity)14.Assigning the final attribute rating is initially assumed to be a simple additive procedure. A more sophisticated model may be developed once suitable data is gathered. Alternatively, we could try a more detailed model initially using intuition based on empirical observations.An example of tailoring of the weights is now given (normalization ig- nored), to show how the method might work in practice. Suppose we have the following four sentences, assumed all of the same ‘complexity’ (see Figure 10).From this it is possible to see that some correlation exists between the resulting attribute score (X) and the actual time. In this case, one attribute rating is approximately 5.9 seconds. Therefore, we can say that if we get another attribute rating of 100, the time to post edit for that attribute will be approximately 590 seconds.Although this example is too simple, it is hoped that weights can be tailored and roughly accurate models developed after a large database is 13Although, again, the question of standard definitions must be considered. For example ‘the car is red’ contains a verb ‘is’, whereas ‘a red car’ does not. Both could be valid translations. Also contrast ‘the men destroyed the house’ and ‘the men’s destruction of the house’.14Again, would need to standardise the use of the parser employed for this purpose.112。