Optimising multiple kernels for SVM by Genetic Programming

合集下载

支持向量机优化算法与参数选择的技巧与策略

支持向量机优化算法与参数选择的技巧与策略支持向量机（Support Vector Machine，SVM）是一种常用的机器学习算法，具有强大的分类和回归能力。

然而，在实际应用中，如何优化SVM算法并选择合适的参数仍然是一个挑战。

本文将探讨一些支持向量机优化算法的技巧与策略。

一、核函数的选择核函数是SVM算法的核心，它将数据从原始空间映射到高维特征空间，使得线性不可分的数据变得线性可分。

常用的核函数有线性核、多项式核、高斯核等。

在选择核函数时，需要考虑数据的特点和问题的复杂度。

如果数据线性可分，可以选择线性核函数；如果数据非线性可分，可以选择多项式核函数或高斯核函数。

此外，还可以尝试组合多个核函数，以提高分类的准确性。

二、正则化参数的选择正则化参数C是SVM算法中的一个重要参数，它控制了模型的复杂度和容错能力。

较小的C值会导致较大的间隔，容忍更多的误分类点，但可能会导致过拟合；较大的C值会导致较小的间隔，更少的误分类点，但可能会导致欠拟合。

因此，在选择正则化参数时，需要根据实际情况进行调整，避免过拟合或欠拟合。

三、样本不平衡问题的处理在实际应用中，样本不平衡是一个常见的问题，即某一类别的样本数量远远大于其他类别。

这会导致训练出的模型对少数类别的分类效果较差。

为了解决样本不平衡问题，可以采用以下策略之一：欠采样、过采样、集成学习或调整类别权重。

欠采样通过减少多数类别的样本数量来平衡数据；过采样通过增加少数类别的样本数量来平衡数据；集成学习通过结合多个模型来提高分类效果；调整类别权重通过给予不同类别不同的权重来平衡数据。

四、特征选择与降维在实际应用中，数据往往包含大量的特征，但并非所有特征都对分类或回归任务有用。

因此，特征选择和降维是优化SVM算法的重要步骤。

特征选择可以通过统计方法、信息论方法或基于模型的方法来实现，选择与问题相关的最重要的特征。

降维可以通过主成分分析、线性判别分析等方法来实现，将高维数据转化为低维数据，减少计算复杂度，提高模型的训练和预测效率。

支持向量机的性能优化和改进

支持向量机的性能优化和改进支持向量机（Support Vector Machine, SVM）是一种常用的监督学习算法，广泛应用于模式识别、文本分类、图像处理等领域。

然而，在实际应用中，SVM存在一些性能上的瓶颈和问题。

为了进一步提高SVM的性能和效率，并解决其在大规模数据集上的不足，研究者们提出了多种优化和改进方法。

本文将从几个方面介绍SVM的性能优化和改进.一、硬间隔支持向量机硬间隔支持向量机是SVM的最基本形式，其目标是找到一个最优的超平面，将两个不同类别的样本点分隔开来。

然而，硬间隔支持向量机对数据的要求非常严苛，要求数据是线性可分的。

对于线性不可分的数据，就无法使用硬间隔SVM进行分类。

因此，研究者提出了软间隔支持向量机。

二、软间隔支持向量机软间隔支持向量机允许一定程度上的数据混合在分隔超平面的两侧，引入了一个松弛变量来控制分隔裕度。

这样能够更好地适应线性不可分的情况，并且对噪声数据有一定的容错性。

然而，在实际应用中，软间隔SVM的性能也受到很多因素的影响，需要进行进一步的改进和优化。

三、核函数和非线性支持向量机在实际应用中，很多数据集是非线性可分的，使用线性支持向量机无法得到好的分类结果。

为了解决这个问题，研究者们提出了核支持向量机。

核函数将数据从原始空间映射到高维特征空间，使得数据在高维空间中更容易线性可分。

常用的核函数有线性核函数、多项式核函数、高斯核函数等。

通过使用核函数，支持向量机可以处理更加复杂的分类问题，提高了分类性能。

四、多分类支持向量机支持向量机最初是用于二分类问题的，即将数据分成两个类别。

然而，在实际应用中，很多问题是多分类问题。

为了解决多分类问题，研究者们提出了多分类支持向量机。

常见的方法有一对一（One-vs-One）和一对多（One-vs-Rest）两种。

一对一方法将多类别问题转化为多个二分类问题，每次选取两个类别进行训练。

一对多方法则将多个类别中的一个作为正例，其余类别作为反例进行训练。

支持向量机模型的超参数调优技巧

支持向量机模型的超参数调优技巧支持向量机（Support Vector Machine，SVM）是一种常用的机器学习算法，广泛应用于分类和回归问题。

然而，SVM模型的性能很大程度上依赖于超参数的选择。

本文将介绍一些常用的SVM模型的超参数调优技巧，帮助读者提高模型性能。

首先，我们需要了解SVM模型的超参数。

SVM模型有两个重要的超参数：惩罚参数C和核函数参数gamma。

惩罚参数C控制了误分类样本对模型的影响程度，较大的C值会使模型更加关注误分类样本，可能导致过拟合；较小的C值则会使模型更加关注泛化能力，可能导致欠拟合。

核函数参数gamma控制了模型在训练数据中的拟合程度，较大的gamma值会使模型更加关注局部特征，可能导致过拟合；较小的gamma值则会使模型更加关注全局特征，可能导致欠拟合。

在调优SVM模型的超参数时，我们可以使用交叉验证的方法。

交叉验证将训练数据划分为多个子集，每次使用其中一部分作为验证集，其余部分作为训练集。

通过对不同超参数组合进行交叉验证，选择模型性能最好的超参数组合。

一种常用的调优方法是网格搜索（Grid Search）。

网格搜索通过指定超参数的候选值范围，遍历所有可能的超参数组合，并计算每个组合的交叉验证准确率。

最终选择准确率最高的超参数组合作为最优模型的超参数。

然而，网格搜索方法存在一个问题，即计算量较大。

当超参数的候选值范围较大时，网格搜索需要尝试大量的超参数组合，导致计算时间过长。

为了解决这个问题，我们可以使用随机搜索（Random Search）方法。

随机搜索通过指定超参数的候选值范围和搜索次数，随机选择超参数组合，并计算每个组合的交叉验证准确率。

最终选择准确率最高的超参数组合作为最优模型的超参数。

相比于网格搜索，随机搜索可以在较短的时间内找到较好的超参数组合。

除了网格搜索和随机搜索，还有一些其他的超参数调优技巧。

例如，我们可以使用贝叶斯优化（Bayesian Optimization）方法。

支持向量机中核函数参数优化的方法研究

支持向量机中核函数参数优化的方法研究支持向量机（Support Vector Machine，简称SVM）是一种常用的机器学习算法，它在分类和回归问题中表现出色。

SVM的优势在于可以处理高维数据，并且具有较好的泛化能力。

在SVM中，核函数是一项重要的参数，它可以将低维数据映射到高维空间中，从而使得数据更容易被分离。

本文将探讨支持向量机中核函数参数优化的方法。

首先，我们需要了解核函数在支持向量机中的作用。

SVM的核函数有多种选择，常见的有线性核函数、多项式核函数和高斯核函数等。

核函数的作用是将原始数据映射到一个更高维度的空间中，使得数据在新的空间中更容易被线性分离。

这样一来，我们可以在高维空间中找到一个超平面，将不同类别的数据分隔开来。

因此，核函数的选择对SVM的性能至关重要。

在实际应用中，我们需要根据具体问题选择合适的核函数和优化其参数。

不同的核函数适用于不同的数据特征，因此我们需要通过实验和调参来确定最佳的核函数及其参数。

下面将介绍几种常用的核函数参数优化方法。

一种常用的方法是网格搜索（Grid Search）。

网格搜索通过遍历给定的参数空间，计算每个参数组合下的模型性能，并选择性能最好的参数组合。

具体来说，我们可以指定一组参数的候选值，然后通过交叉验证来评估每个参数组合的性能。

最终，我们选择性能最好的参数组合作为最终的模型参数。

网格搜索的优点是简单易懂，但是当参数空间较大时，计算复杂度较高。

另一种常用的方法是随机搜索（Random Search）。

与网格搜索不同，随机搜索在给定的参数空间中随机选择参数组合进行评估。

随机搜索的好处在于，它能够更快地找到较好的参数组合，尤其是当参数空间较大时。

然而，随机搜索可能无法找到全局最优解，因为它只是在参数空间中进行随机采样。

除了网格搜索和随机搜索，还有一些更高级的优化方法可供选择。

例如，贝叶斯优化（Bayesian Optimization）是一种基于贝叶斯推断的优化方法，它通过建立一个高斯过程模型来估计参数的性能，并选择最有可能提高性能的参数组合进行评估。

多种群混沌映射麻雀优化算法

多种群混沌映射麻雀优化算法下载温馨提示：该文档是我店铺精心编制而成，希望大家下载以后，能够帮助大家解决实际的问题。

文档下载后可定制随意修改，请根据实际需要进行相应的调整和使用，谢谢!并且，本店铺为大家提供各种各样类型的实用资料，如教育随笔、日记赏析、句子摘抄、古诗大全、经典美文、话题作文、工作总结、词语解析、文案摘录、其他资料等等，如想了解不同资料格式和写法，敬请关注!Download tips: This document is carefully compiled by the editor. I hope that after you download them, they can help yousolve practical problems. The document can be customized and modified after downloading, please adjust and use it according to actual needs, thank you!In addition, our shop provides you with various types of practical materials, such as educational essays, diary appreciation, sentence excerpts, ancient poems, classic articles, topic composition, work summary, word parsing, copy excerpts,other materials and so on, want to know different data formats and writing methods, please pay attention!多种群混沌映射麻雀优化算法，是一种基于混沌映射和麻雀行为的智能优化算法，能够有效地应用于解决复杂的优化问题。

粒子群优化的多支持向量机(PSO-SVM)(IJMECS-V2-N2-05)

10-fold cross validation is used to measure the classification evaluation on the datasets.
The study is organized in the following way: after a short introduction the second section we present the evolutionary method PSO (Particle Swarm Optimization). The third section is devoted to the formulation of a method of classification containing cores to knowing, the machines with vectors of support (SVM). The fourth section we describe in details our contribution PSO-SVM. In the fifth section we expose the results of our experiments. Our work is completed by a conclusion where are put forward the advantages and the weaknesses related to the use of our system.
Copyright © 2010 MECS
I.J.Modern Education and Computer Science, 2010, 2, 32-38 Published Online December 2010 in MECS (/)

svm 高斯核函数

svm 高斯核函数
SVM（支持向量机）是一种用于分类和回归的机器学习算法。

在实践中，SVM通常使用高斯核函数。

高斯核函数是一种基于距离的相似度度量方法，它将数据点映射到高维特征空间中，使得数据在该空间中更容易分离。

使用高斯核函数的SVM具有在复杂分类任务中表现出色的能力。

在实现SVM时，高斯核函数的参数选择和正则化参数的选择会对模型的性能产生重要影响。

因此，需要进行交叉验证和参数调整来优化模型。

在实践中，高斯核函数的使用需要考虑一些问题，例如数据集规模和特征选择等。

因此，在使用高斯核函数的SVM时，需要仔细考虑数据集的性质，并谨慎选择参数。

- 1 -。

支持向量机的性能优化和改进

支持向量机的性能优化和改进支持向量机（Support Vector Machine，SVM）是一种常用的机器学习算法，广泛应用于模式识别、数据分类和回归分析等领域。

然而，SVM在处理大规模数据集和高维特征空间时存在一些性能瓶颈。

为了克服这些问题，研究者们提出了许多性能优化和改进的方法。

本文将探讨这些方法，并分析它们在提高SVM性能方面的优势和局限性。

一、特征选择与降维特征选择是SVM性能优化的重要一环。

在处理大规模数据集时，特征数量通常非常庞大，这会导致计算复杂度上升和过拟合问题。

因此，通过选择最相关的特征子集来降低计算复杂度，并提高分类精度是非常重要的。

一种常用的特征选择方法是基于过滤器（Filter）与包装器（Wrapper）相结合的策略。

过滤器方法通过计算每个特征与目标变量之间相关性来评估其重要程度，并按照设定阈值进行筛选；而包装器方法则通过将特定子集输入分类器，并根据其分类精度进行评估来选择最佳子集。

此外，降维也是提高SVM性能的重要手段之一。

通过降低特征空间的维度，可以减少计算复杂度、提高模型训练速度和分类准确率。

常用的降维方法有主成分分析（PCA）和线性判别分析（LDA）等。

这些方法通过线性变换将高维数据映射到低维空间，以保留最重要的特征信息。

二、核函数选择与参数调优核函数是SVM中的关键组成部分，它可以将数据映射到高维特征空间，从而使非线性问题转化为线性可分问题。

常用的核函数有线性核、多项式核和高斯径向基函数（RBF）等。

在选择核函数时，需要根据数据集的特点和任务需求进行合理选择。

例如，在处理非线性问题时，RBF核通常能够取得较好效果；而在处理文本分类问题时，多项式核可以更好地捕捉特征之间的关系。

此外，在使用SVM训练模型时，参数调优也是非常重要的一步。

主要包括正则化参数C和径向基函数宽度γ两个参数。

正则化参数C 控制着模型对误分类样本的容忍程度；而径向基函数宽度γ则控制着模型的泛化能力。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Optimising multiple kernels for SVM byGenetic ProgrammingLaura Dios¸an1,2,Alexandrina Rogozan1and Jean-Pierre Pecuchet12LITIS,EA4108,INSA,Rouen,France1Babes¸-Bolyai University,Cluj-Napoca,Romanialauras@cs.ubbcluj.ro,{arogozan,pecuchet}@insa-rouen.frAbstract.Kernel-based methods have shown signiﬁcant performances in solv-ing supervised classiﬁcation problems.However,there is no rigorous methodol-ogy capable to learn or to evolve the kernel function together with its parame-ters.In fact,most of the classic kernel-based classiﬁers use only a single kernel,whereas the real-world applications have emphasized the need to consider a com-bination of kernels-also known as a multiple kernel(MK)-in order to boostthe classiﬁcation accuracy by adapting better to the characteristics of the data.Our aim is to propose an approach capable to automatically design a complexmultiple kernel(CMK)and to optimise its parameters by evolutionary means.Inorder to achieve this purpose we propose a hybrid model that combines a Ge-netic Programming(GP)algorithm and a kernel-based Support Vector Machine(SVM)classiﬁer.Each GP chromosome is a tree that encodes the mathemati-cal expression of a MK function.Numerical experiments show that the SVMinvolving our evolved complex multiple kernel(eCMK)perform better than theclassical simple kernels.Moreover,on the considered data sets,our eCMK out-perform both a state of the art convex linear MK(cLMK)and an evolutionarylinear MK(eLMK).These results emphasize the fact that the SVM algorithmrequires a combination of kernels more complex than a linear one.1IntroductionVarious classiﬁcation techniques have been used in order to detect correctly the labels associated to some items.Kernel-based techniques(such as Support Vector Machine (SVM)[1])are an example of such intensively explored classiﬁers.These methods represent the data by means of a kernel function,which deﬁnes similarities between pairs of data[2].One reason for the success of kernel-based methods is that the kernel function takes relationships that are implicit in the data and makes them explicit,the result being that the detection of patterns takes place more easily.The selection of an appropriate kernel K is the most important design decision in SVM since it implicitly deﬁnes the feature space F and the mapφ.An SVM will work correctly even if we do not know the exact form of the features that are used in F. The performance of an SVM algorithm depends also on several parameters.One of them,denoted C,controls the trade-off between maximizing the margin and classifying without error.The other parameters regard the kernel function.For simplicity,Chapelle in[3]has proposed to denote all these parameters as hyper parameters.All that hyperparameters have to be tuned.This is a difﬁcult problem,since the estimate of the error on a validation set is not an explicit function of these parameters.The selection of an optimal kernel function and the values of the hyper parameters is known in literature as model selection[3].This task is usually performed by train-ing the classiﬁer with different functions picked up from a range of kernels and several parameter values from a discrete set,which isﬁxed a priori.The optimal model corre-sponds to the conﬁguration that generates the best classiﬁcation performance by using a cross-validation technique[3].Nevertheless,a simple kernel may not be always suitable especially for very complex classiﬁcation problems like those related to multi-modal heterogeneous data.The real-world applications have emphasized the need to consider a combination of kernels,denoted by multiple kernel(MK)[4,5].Recent research works have already shown that the MKs improve the performance of the SVM classiﬁers due to theirﬂexibility,allowing for a better learning of complex and heterogeneous data.In addition,the optimisation of the hyper parameters plays a very important part.The automatic MK designing is more than a simple kernel selection:in the MK framework the best expression of the kernel function is learnt as a more or less complex combination of simple kernels.In the same time,the optimal values of the hyper param-eters are found.One has to answer several important questions concerning the design of a MK:It is possible to learn the MK function by using some training annotated data? And,in the case of a positive answer,What kernels have to be used within an MK for a given classiﬁcation problem?How toﬁnd the optimal parameters of the simple kernels involved in this combination?and What allows for better classiﬁcation performances: a linear MK or a complex MK?If the answer for theﬁrst question can be found in the literature[4,5,6,7,8,9],the answer of the last ones will be given in this paper.Therefore we choose to use the evolutionary framework in order to discover the optimal expression and its parameters of an MK function for several given problems. We combine the Genetic Programming(GP)[10]and the SVM algorithms[1]within a two-level hybrid model.The aim of the model we propose is toﬁnd the best MK func-tion and to optimise its parameters,but also to adapt the regularisation kernel parameter C.These three objectives are achieved simultaneously because each GP chromosome encodes the expression of a complex multiple kernel(CMK)and its parameters.The GP-kernel is involved into a standard SVM algorithm to be trained in order to solve a particular classiﬁcation problem.After an iterative process which runs more genera-tions,an optimal evolved complex multiple kernel(eCMK∗)is provided.The proposed combination of kernels could be learnt from thousands of examples while combining hundreds of kernels within reasonable time.The eCMK we introduce is compared not only to several well-known simple kernels,but also to a convex linear MK(cLMK)[4] and to an evolved linear MK(eLMK)[7].We will show that our model is able toﬁnd more efﬁcient complex MKs on the considered data sets.The paper is organized as follows:The related work is presented in Section2.Sec-tion3outlines the theory behind SVM classiﬁers giving a particular emphasis to the kernel functions.Section4describes our technique for evolving CMKs.This is fol-lowed by Section5where the results of the experiments are presented and discussed. Finally,Section6concludes our paper.2Related workAn MK is in fact a combination of several simple kernels.This combination could be either linear or complex.Regarding the linear combination,each simple kernel is involved with a weight that represents its relative inﬂuence/importance in the LMK. The optimal weights of the simple kernels included in an LMK have been found by convex[4,5,6,3,11]or evolutionary methods[7,12].About the shape of the complex combinations of kernels,to the best of our known, the genetic algorithms(GAs)[8,9]have only been used in order to learn the expression of an MK function.As regards the optimisation of the hyper-parameters,extensive exploration such as performing line search for one hyper-parameter or grid search for two hyper-parameters is frequently applied.However,this search processes usually require training the model several times with different hyper-parameter values and hence is computationally pro-hibitive especially when the number of candidate values is large.Because of the compu-tational complexity,grid search is only suitable for the adjustment of very few param-eters.More elaborated techniques for optimising hyper-parameters are the gradient-based approaches[3].Different optimisation criteria have been used:minimize the leave-one-out error[1],minimax(maximize the radius margin bound and minimize the validation or the leave-one-out errors)[3]or minimize the CV error[1].Several promis-ing recent approaches[13,14,15]are based on regularisation path algorithms that can trace the entire solution path as a function of the hyper-parameter without having to train the model multiple times.Evolutionary algorithms have also been used in order to optimise the hyper parameters of an SVM classiﬁer[16,17].There are very few approaches that deal with both the problem of hyper param-eter optimisation and of MK function learning.Ong et al.[11]have shown that the MK function is a linear combination of aﬁnite number of pre-speciﬁed hyper-kernel evaluations.The semi deﬁnite programming(SDP)approach[4,5,13,18]is applied for learning MK seen as a linear combination of positive semi deﬁnite matrices.Similar to this idea,Bousquet and Herrmann[19]further restricts the class of kernels to the convex hull of the kernel matrices normalized by their trace.The genetic algorithms have also been used in order to optimise both the MK shape and its hyper parameters[8,9].In fact,other than a combination of kernels through different operations,these GA-based approaches could optimised the hyper parameters of the SVM algorithm.3Support Vector MachinesInitially,the SVM algorithm has been proposed for solving binary classiﬁcation prob-lems[1,20].Later,these algorithms have been generalized for multi-classes problems. Consequently,we will explain the theory behind SVM only on binary-labelled data.Suppose the training data has the following form:D=(x i,y i)i=1,m ,where x i∈ drepresents an input vector and each y i,y i∈{−1,1},the output label associated to the item x i.The SVM algorithm maps the input vectors to a higher dimensional space where a maximal separating hyper-plane is constructed[21].Learning the SVM means to minimize the norm of the weight vector w under the constraint that the trainingitems of different classes belong to opposite sides of the separating hyper-plane.Since y i ∈{−1,+1}we can formulate this constraint as:y i (w T x i +b )≥1,i =1,...,m.The items that satisfy this equation in case of equality are called support vectors since they deﬁne the resulting maximum-margin hyper-planes.To account for misclassiﬁca-tion,the soft margin formulation of SVM has introduced some slack variables ξi ∈ –see Eq.(1).Moreover,the separation surface has to be nonlinear in many classiﬁcation problems.The SVM algorithm can be extended to handle nonlinear separation surfaces by using a feature function φ(x ).The SVM extension to nonlinear data sets is based on mapping the input variables into a feature space F of a higher dimension and then performing a linear classiﬁcation in that higher dimensional space.The important prop-erty of this new space is that the data set mapped by φbecomes linearly separable if an appropriate feature function is used,even when that data set is not linearly separable in the original space.Hence,to construct a maximal margin classiﬁer one has to solve the convex quadratic programming problem encoded by Eq.(1),which is the primal formulation of it:minimise w,b,ξ12w T w +C m i =1ξi subject to:y i (w T φ(x i )+b )≥1−ξi ,ξi ≥0,∀i ∈{1,2,...,m }.(1)The coefﬁcient C is a tuning parameter that controls the trade off between maximizing the margin and classifying without error.The primal decision variables w and b deﬁne the separating hyper-planes.Instead of solving Eq.(1)directly,it is a common prac-tice to solve its dual formulation described by Eq.(2),where a i denotes the Lagrange variable for the i th constraint of Eq.(1):maximise a ∈ m m i =1a i −12 m i,j =1a i a j y i y j φ(x i )φ(x j )subject to m i =1a i y i =0,0≤a i ≤C,∀i ∈{1,2,...,m }.(2)The optimal separating hyper-plane f (x )=w T φ(x )+b ,where w and b are determined by Eqs.(1)or (2)could be used in order to classify the un-labelled input data:y k =sign x i ∈Sa i y i φ(x i )φ(x k )+b(3)where S represents the set of support vector items x i .Because not all input data-points are linear separable,it is suitable to use a kernel function.Cf.[2]a kernel is a function K ,such that K (x,z )= Φ(x ),Φ(z ) for all x,z ∈ d .Note that all we required are the results of such an inner product.Therefore we do not even need to have an explicit representation of the mapping φ,nor to know the nature of the feature space.The only requirement is to be able to evaluate the kernel function on all the pairs of data items,which is much easier than computing the coordinates of those items in the feature space.Evaluating the kernel yields a symmetric,positive semi deﬁnite matrix known as the kernel or Gram matrix [22].In order to obtain an SVM classiﬁer with kernels,one has to solve the following optimization problem:maximise a ∈ m m i =1a i −12 m i,j =1a i a j y i y j K (x i ,x j )subject to m i =1a i y i =0,0≤a i ≤C,∀i ∈{1,2,...,m }.(4)In this case,Eq.(3)becomes:y k=signx i∈Sa i y i K(x i,x k)+b.Table1.The expression of several classical kernels.Kernel name Kernel expressionPolynomial K P ol(x,z)=(x T·z+coef)dRadial basis function K RBF(x,z)=exp(−σ|x−z|2)Sigmoid K Sig(x,z)=tanh(σx T·z+r)4The model for evolving complex MKsThe model we propose can be used in order to discover the optimal expression of an eCMK.This model involves a hybrid approach,which combines a GP algorithm and an SVM classiﬁer.Each GP chromosome is a tree that encodes the mathematical ex-pression of an eCMK to be used by the SVM algorithm.The quality of a GP individual is given by the classiﬁcation accuracy computed through running the SVM on the vali-dation set in order to solve a particular classiﬁcation problem.The hybrid approach is structured on two levels:a macro level and a micro level(see Figure1(a)).The macro level algorithm is a standard GP[10],which is used in order to evolve the mathematical expression of a CMK.We use steady-state evolutionary model[23] as an underlying mechanism for our GP implementation.The evolutionary algorithm starts by an initialisation step for creating a random population of individuals.The fol-lowing steps are repeated until a given number of generations/iterations are reached: two parents are selected by using a binary selection procedure;the parents are recom-bined in order to obtain an offspring O;the offspring is than considered for mutation; the new individual O∗(obtained after mutation)replaces the worst individual W in the current population if O∗is better than W.The micro level algorithm is an SVM classiﬁer.The original implementation of the SVM algorithm proposed in libsvm[24]allows the use of several well-known kernels (linear,polynomial,RBF and sigmoid,respectively,kernels)–see Table1.In numerical experiments,we also use a modiﬁed version of this algorithm,which is based on our evolutionary CMK.The quality of each GP individual is computed by running the SVM algorithm embedding the eCMK encoded in the current chromosome.The accuracy rate computed by the classiﬁer(on the validation set)represents theﬁtness of the GP tree.4.1The GP representation of an MKIn our model,the GP chromosome is a tree encoding the mathematical expression of an eCMK and its parameters.The tree-based representation of an MK allows for a larger search space of kernel combinations than an array-based representation.However,we constrained the GP individual representation to satisfy the kernel algebra[2](regarding the positiveness and the symmetry of the Gram matrix required by valid kernels).(a)(b)Fig.1.a)Sketch of our hybrid approach;b)A GP chromosome that encodes the CMK expression:(Kθ2(x,z)+o1)×s1×Kθ2(x,z)×(Kθ3(x,z)×Kθ2(x,z)+o2).The approach we propose is based on a particular type of GP tree:its leaves contain either a simple kernel or a constant,where a kernel is a function with two arguments (x and z)that represent the input vectors.Note that a GP tree must contain at least one kernel in its leaves,otherwise the obtained expression will perform any dot product between the input vectors x and z.The leaves of the tree contain elements from the terminal set(TS),while the internal nodes contain elements from the function set(FS).For a better adaptation to the classiﬁcation problem,our terminal set contains not only the classic simple kernels,but also some ephemeral random constants[10]:T S= KT S∪{o i,s j},where KTS represents the terminal set that corresponds to those sim-ple kernels(such as linear kernel,polynomial kernel,radial basis function(RBF)ker-nel,sigmoid kernel–see Table1),o i are offset(shifting)coefﬁcients that control the threshold of the mapping from the original space into the feature space F and s i are scaling coefﬁcients that control the relative inﬂuence of the simple kernels in the eCMK expression.Both types of coefﬁcients must be represented by positive real values.Each simple kernel has associated a set of parametersθthat can affect the perfor-mance of the SVM algorithm.Therefore,we will consider more kernels for the TS, but with different parameters.The RBF kernel has only a parameter–the bandwidthσ(in this caseθ={σ}).The sigmoid kernel has two parameters:the bandwidthσand the shifting coefﬁcients r that controls the threshold of the mapping(θ={σ,r}).The polynomial kernel has only a parameter:the degree d(θ={d})–see Table1.In order to deal with shape and parameter optimisation,our solution is to consider in the TS different simple kernels with different parametersθ.We will denote these kernels as parametrised kernels Kθ.Thus,the GP algorithm will be able to discover the best eCMK expression by combining the best parametrised simple kernels.The function set contains three operations(F S={+,×,exp})that preserve the key properties of a kernel function.The theory of kernel algebra also speciﬁes the power function,but this operation(with a natural exponent)can be easily obtained as a re-peated multiplication.An example of a GP chromosome is depicted in Figure1(b). Although we have used F S={+,×,exp}and T S={Kθ1,Kθ2,Kθ3,o,s}for our GP chromosome representation,only two functions(+and×),two kernels(Kθ2and Kθ3)and three constants are actually involved in the current chromosome that represents the expression of the eCMK iter,ind(x,z).4.2Genetic operationsInitialization.We have used the grow method,which is a recursive procedure,in order to initialize a GP individual.We have chosen this initialisation method,which is well known in the literature,for its robustness.The root of each GP tree must be a function from the FS.If a node contains a function,then its children are initialized either with another function or with a terminal(a kernel or a constant).The initialization process is stopped at the maximal depth of the GP tree.The leaves of the GP tree are initialised with terminals taken from the TS.At least one leaf of the GP tree has to contain a kernel in order to obtain a valid expression of the MK.The maximal kernel depth has to be large enough in order to ensure an important search space for the optimal expression of a CMK.Crossover.We use the crossover operator in order to assure an important diversity of the eCMK s.The crossover is performed in a tree-structure preserving way in order to guarantee the syntactical validity of the offspring.Our model uses a one-cutting point crossover with the particularity that the offspring has to contain at least one kernel in its leaves.Mutation.The purpose of the mutation operator is to produce a local small pertur-bation of the current chromosome.A cutting point is randomly chosen:the sub-tree belonging to that point is deleted and a new sub-tree is grown there by applying the same random growth process that was used in order to generate the initial population. Note that the maximal depth allowed for the GP trees limits the growth process of the sub-tree.The mutation operator may generate new constants at any point in a run,like in Koza’s implementation[10].In our model,these ephemeral random constants are rep-resented by the scaling and offset coefﬁcients.Note that in our model the initialization, recombination and mutation operators always generate valid eCMK s.4.3Fitness assignmentThe evaluation of the chromosome quality is based on a validation process.We must therefore provide some information about the data set partitioning before describing theﬁtness assignment process.Each data sample was randomly divided into two sets: a training set(80%)-for model building-and a testing set(20%)-for performance assignment.The training set was then randomly partitioned into learning(2/3)and val-idation(1/3)parts.Each eCMK encoded by a GP tree is taken into consideration in order to learn the corresponding SVM model on the learning subset and for its classiﬁcation perfor-mance assignment on the validation subset.Therefore,the quality of an eCMK,which is the current GP chromosome,can be measured by the classiﬁcation accuracy rate(the number of correctly classiﬁed items over the total number of items)computed on the validation data set.Note that we are dealing with a maximization problem:the quality of the MK is proven by the accuracy of the SVM algorithm that uses the respective ker-nel.At the end of all GP generations(iterations),the optimal CMK,provided by thebest GP chromosome,denoted as eCMK∗,is involved again into the SVM classiﬁcation algorithm on the test data set.4.4Comparison to the previous modelsThe model we propose combines a GP technique with an SVM classiﬁer.The idea of a hybrid approach GP+SVM is not new,but the representation that we proposed is novel.Our approach is more general than the previous ones based on GP used in order to evolve the expression of a kernel function[25,26,27].This time our purpose is to discover a new complex MK function and not only a simple one.The GP trees that encode the MK s are more elaborated since they could contain,in their leaves,other standard kernels whose good performance has been already proven.Several important remarks can be also made regarding our previous eLMK model presented in[7]and the eCMK model we propose in this paper.The eLMK is a com-bination of kernels and it can be only linear,while the eCMK is a more complex one. The objective function is also different in these models:the GP algorithm used in the eCMK model optimises the shape and the parameters of an MK function,while the GA has optimised only the weights of a linear combination of kernels[7].Moreover,the expression of the CMK obtained by a GA-based model in[8,9]could be actually less complex than that we are evolving now.Our approach is able toﬁnd a more adapted MK expression due to:a larger set of operations involved in the expres-sion of the eCMK(+,×,exp).The power function with an integer exponent involved in the GA-based eCMK model[8,9]appears implicitly in our GP-based approach due to the tree-based representation of the MK;this representation is able to generate it by itself as a repeated multiplication.A more complex form of the MK expression is due to both the GP tree-based representation and the coefﬁcients in the expression of a CMK.A better adaptability of the CMK to the data is the third difference-the previous GA-based eCMK model forces at least the polynomial and the ANOV A kernels to appear in the expression of the MK.Our GP-based approach for eCMK s allows,according to the data and their characteristics,to chose those simple kernels to be involved in the combination(either all the kernels,or just a few of them).5Experiments and discussionWe have evaluated our eCMK learning approach on several data sets taken from Ma-chine Learning Repository UCI and Statlog data sets.These data sets were chosen in order to allow comparisons with a state-of-the-art LMK s proposed in[4]and with an evolutionary LMK proposed in[7].These data sets are also widely used in the classiﬁca-tion community.All the data sets relate to binary-classiﬁcation problems,but they have different sizes(the number of items and the number of characteristics)and they belong to differentﬁelds:medical,economical,and geographic:classiﬁcation of radar returns from the ionosphere(P1),breast cancer classiﬁcation(P2),heart disease diagnosis(P3), classiﬁcations of personal income(P4)and(P5).A population of50individuals is evolved during50generations,which is a reason-able limit in order to assure the diversity of our evolutionary complex multiple kernels(eCMK s).We have also limited the maximal depth of a GP tree to10levels,which al-lows us to consider complex combinations of maximum210kernels in reasonable time. We have worked with the binary tournament mechanism for the chromosome selec-tion.The crossover and mutation operations are performed by0.8and0.3,respectively, probabilities.Research works have shown that the optimisation of the kernel parameters together with the optimisation of the kernel expression(shape)are the most important steps to be considered when building an SVM classiﬁer.Therefore,we use a method proposed in [3]in order to initialise the regularisation coefﬁcient.A good value for the C parameter could be the inverse of the empirical variance s2of the data in the feature space[3]. We use this value also in the numerical experiments performed in order to evolve the expression of an eCMK function.5.1Evolving the complex multiple kernel functionIn thisﬁrst experiment,our aim is to evolve a complex multiple kernel eCMK.Two different terminal sets are used in order to evolve different combinations:a terminal set that contains only several simple kernels KTS and a mixed terminal set that contains not only simple kernels,but also some constants MT S=KT S∪{c1,c2,...c n}.Note that in our experiments,these constants could be either scaling or shifting coefﬁcients. Therefore,the TS s used in our numerical experiments are:1.a T S composed by several simple kernels with different parameters KT S={KθP ol ,KθRBF ,KθSig}where the parametersθof each simple kernel have been consideredin some discrete ranges:for the degree d of the Polynomial kernel15values(from 1to15)are considered,for the bandwidthσof the RBF kernel the following val-ues:σqt=q·10t,q={1,2,...,9},t={−5,−4,...,−1}are considered and for the sigmoid kernel all the combination betweenσand r,where theσqt and the r=10u,u∈{−1,0,1}are taken into account.2.a T S with different standard kernels and constants MT S=KT S∪{c1,c2,...,c n}.We have to note several things about the constant values c i.Mercer conditions[22] impose these constants to be positive.The[0,1]range was suggested in[4,7]for all the constants as the authors have represented the relative weights of the simple kernels (SK s)involved into a linear MK(LMK).In our case,we have to deal with some scal-ing and shifting coefﬁcients that can appear or not in the expression of the eCMK s. Therefore,several positive intervals have been considered for these coefﬁcients in our experiments[0,1],[0,10]and[0,100],the best seems to be the[0,1]range.During dif-ferent runs,various expressions of the eCMK function are obtained,all of them with about the same complexity.The performances of the eCMKs based on various TS s are presented in Table2: theﬁrst two rows contain the accuracy rates(for each problem)computed by the SVM algorithm involving our eCMK on the test set(unseen data).This eCMK is the best MK(eCMK∗)obtained at the end of the evolutionary process on the validation set (the best GP chromosome from the last generation).In Table2we also present the performances of three classic kernels for all the test problems(the last three rows).Weemphasize the fact that the value of the penalty error C is adapted to the data and other parameters involved in each simple kernel were optimised in order to achieve the best classiﬁcation performances.This allows us to verify if our eCMK s outperform these optimised simple kernels and then to measure the improvements.In addition,Table2 displays the corresponding conﬁdence intervals(on the test set of each problem).Table2.The accuracy rate of various kernels.Theﬁrst two rows present the accuracy rates computed by the SVM algorithm embedding the eMK.The last three rows con-tain the performances of the simple kernels for each test problem.P1P2P3P4P5Multiple kernels KTS86.11±1.1397.81±0.1386.98±0.5184.27±0.1486.93±0.19 MTS91.67±0.9098.03±0.1386.98±0.5184.38±0.1488.99±0.18Simple kernels Kpol77.77±1.3697.58±0.1485.79±0.5384.26±0.1486.24±0.20 Krbf80.55±1.2997.81±0.1385.21±0.5483.65±0.1483.49±0.21 Ksig66.67±1.5497.81±0.1377.91±0.6382.73±0.1484.52±0.21The values from Table2indicate that the eCMK s always perform better than the optimised simple kernels.This is a very important result,if we take into account the fact that GP algorithm has the possibility to choose among the simple kernels and their parameters.In addition,by taking into account different T S compositions,we can re-mark that the eCMK s based on a complex expression that contains simple kernels and coefﬁcients seem to perform slightly better than the eCMK s based only on simple ker-nels(MTS KTS).Therefore,it seems to be efﬁcient to combine the kernels by the coefﬁcients.Thus,we are tempted to promote the eCMK based on a mixed TS to the detriment of the eCMK based only on kernels.5.2Comparison between the complex evolved MKs and the linear MKsWe aim to compare the improvements obtained by the SVM classiﬁer which involves our most promising eCMK∗s based on MTS with both the state of the art convex LMK [4]and the evolutionary LMK already proposed in[7].In order to emphasize the improvements obtained by involving an MK in the SVM algorithm,an average performance improvement(∆)is computed for each MK as the mean of the improvementsδi for all the problems.Note thatδi is the difference between the accuracy rate computed by the SVM algorithm with an MK(Acc MK)and the ac-curacy rate computed by the same SVM algorithm,but with a simple human-designedkernel(SK)for the i th problem:δi=Acc i MK−Acc i SKAcc iSK ,i=1,5,and∆=5i=1δi5,where SK could be one of the considered simple kernels:K P ol,K RBF and K Sig and MK could be one of the MKs:eCMK-the evolutionary complex multiple kernel pro-posed in this paper based on MTS,eLMK-the evolutionary linear multiple kernel[7] and cLMK-the convex linear multiple kernel[4].。