递减样本集成学习算法

2016，52（12）1引言集成学习起源于Kearns 提出的一个猜想[1]，能否从多个弱分类器构造出一个强分类器。Schapire 对这一问题很快做出了肯定的回答并给出了统计意义的证明[2]。时至今日，集成学习已发展成为统计学、人工智能、机器

学习领域的研究热点之一。专业的定义，集成学习是使用一系列基本分类器进行学习输入，并构造某种规则将基本分类器的学习结果整合起来，从而获得比单个分类递减样本集成学习算法

周羿，陈科，朱波，刘浩，王宇凡，武继刚，孙学梅

ZHOU Yi,CHEN Ke,ZHU Bo,LIU Hao,WANG Yufan,WU Jigang,SUN Xuemei

天津工业大学计算机科学与软件学院，天津300387

School of Computer Science and Software Engineering,Tianjin Polytechnic University,Tianjin 300387,China

ZHOU Yi,CHEN Ke,ZHU Bo,et al.Ensemble learning algorithm by consecutively removing training https://www.360docs.net/doc/581279684.html,puter Engineering and Applications,2016,52（12）：69-74.

Abstract ：Ensemble learning,which integrates multiple weak learners and produces a stronger learner,is one of the key research areas in machine learning.Although a number of algorithms have been proposed for the generation of base learners,these algorithms are usually with low robustness.This study proposes a novel ensemble learning algorithm,namely an Ensemble Learning Algorithm by Consecutively Removing Training Samples （ELACRTS ）,which possesses the merits of both boosting and bagging methods.By removing the samples with high confidence from the training set,the training space is gradually reduced,which allows a sufficient learning on the underrepresented samples.The ELACRTS method generates a series of decreasing training subspaces and therefore produces a number of diverse base classifiers.Similar to boosting and bagging,voting is employed for integration of predictions by multiple base classifiers.It employs 10-folds cross validation to assess the performance of the proposed ELACRTS method.Extensive experiments on 8datasets and 7base classifiers demonstrate that the ELACRTS algorithm outperforms the boosting and bagging algorithms.

Key words ：ensemble learning;base classifier;training subspace;decreasing;confidence level

摘要：从多个弱分类器重构出强分类器的集成学习方法是机器学习领域的重要研究方向之一。尽管已有多种多样性基本分类器的生成方法被提出，但这些方法的鲁棒性仍有待提高。递减样本集成学习算法综合了目前最为流行的boosting 与bagging 算法的学习思想，通过不断移除训练集中置信度较高的样本，使训练集空间依次递减，使得某些被低估的样本在后续的分类器中得到充分训练。该策略形成一系列递减的训练子集，因而也生成一系列多样性的基本分类器。类似于boosting 与bagging 算法，递减样本集成学习方法采用投票策略对基本分类器进行整合。通过严格的十折叠交叉检验，在8个UCI 数据集与7种基本分类器上的测试表明，递减样本集成学习算法总体上要优于boosting 与bagging 算法。

关键词：集成学习；基本分类器；训练子空间；递减；置信度

文献标志码：A 中图分类号：TP39doi ：10.3778/j.issn.1002-8331.1407-0563

基金项目：国家自然科学基金（No.11201134）；天津市自然科学基金一般项目（No.12JCYBJC31900）。

作者简介：周羿（1992—），男，研究领域为机器学习，E-mail ：zhouyi920521@https://www.360docs.net/doc/581279684.html, ；陈科（1982—），男，博士，副教授，研究领域

为计算生物学、机器学习、计算机视觉；朱波（1993—），男，硕士，研究领域为机器学习、数据挖掘；刘浩（1991—），男，研

究领域为计算机视觉；王宇凡（1993—），男，研究领域为机器学习；武继刚（1963—），男，博士，教授，研究领域为高性能

计算、软硬件协同设计、VLSI 容错设计、算法与数据结构；孙学梅（1971—），女，博士，副教授，研究领域为数据挖掘、无

线网络技术。

收稿日期：2014-08-07修回日期：2014-09-22文章编号：1002-8331（2016）12-0069-06

CNKI 网络优先出版：2015-04-15,https://www.360docs.net/doc/581279684.html,/kcms/detail/11.2127.TP.20150415.0924.002.html Computer Engineering and Applications 计算机工程与应用