S. Kernel Basis Pursuit

合集下载

压缩感知去噪代码

压缩感知（Compressed Sensing）是一种信号处理技术，它允许我们以远低于Nyquist采样率测量信号，从而显著减少数据存储和传输需求。

去噪是信号处理的一个重要步骤，它通过消除或减少噪声来提高信号的质量。

以下是一个使用Python和NumPy库的简单压缩感知去噪代码示例。

这个例子使用了L1最小化方法进行去噪，它是一种常见的压缩感知去噪方法。

```pythonimport numpy as np# 信号和噪声x = np.random.randn(100) # 随机生成一个包含100个元素的信号s = np.dot(np.random.randn(90), x[90:100]) + np.random.randn(10) # 添加一些噪声# 使用压缩感知恢复信号# 这里我们使用L1最小化方法进行去噪# 注意：在实际应用中，你可能需要使用更复杂的算法，如OMP（Orthogonal Matching Pursuit）或Basis Pursuit等# 下面仅是为了说明，并不是压缩感知的精确实现Rx = np.dot(np.linalg.inv(np.linalg.svd(np.random.randn(5))), np.random.randn(5))rec_signal = np.dot(Rx, s)# 对恢复的信号进行去噪rec_signal_denoised = np.abs(rec_signal) # 使用L1最小化方法去噪print("原始信号：", x)print("添加噪声后的信号：", s)print("恢复的信号：", rec_signal)print("去噪后的信号：", rec_signal_denoised)```这个代码示例仅用于说明压缩感知和去噪的基本概念。

在实际应用中，你需要使用更复杂的算法和工具，如scikit-learn库中的L1Minimizer类，或者使用深度学习的方法进行去噪。

匹配追踪算法和基追踪

匹配追踪算法和基追踪英文回答：Matching Pursuit (MP) Algorithm and Basis Pursuit (BP)。

Matching pursuit (MP) and basis pursuit (BP) are two closely related algorithms used for signal reconstruction and decomposition. They are iterative greedy algorithmsthat aim to find the best representation of a signal as a linear combination of basis elements or atoms.Matching Pursuit Algorithm.MP is an iterative algorithm that starts with aninitial guess of the signal representation and then iteratively adds basis elements to the representation until a stopping criterion is met. At each iteration, MP selects the basis element that best matches the residual of the signal (the difference between the current representation and the original signal). The selected basis element isthen added to the representation, and the residual is updated.Basis Pursuit Algorithm.BP is a variation of MP that uses a regularization term to penalize the size of the representation. This regularization term helps to prevent overfitting and produces a more stable representation of the signal. BP solves a constrained optimization problem to find the representation that minimizes the sum of the squared error between the representation and the original signal and the regularization term.Applications.MP and BP are widely used in signal processing and machine learning applications, including:Signal denoising.Image compression.Feature extraction.Anomaly detection.Speech recognition.Advantages and Disadvantages.Advantages:Simple and computationally efficient.Can produce sparse representations.Can be used with a variety of basis sets.Disadvantages:Can be sensitive to noise.Does not always converge to the optimal solution.Can be slow for large signals.中文回答：匹配追踪算法和基追踪算法。

基于SBL算法的大间隔非等间距阵DOA估计技术

2020年第 3 期声学与电子工程总第 139 期基于SBL算法的大间隔非等间距阵 DOA 估计技术罗光成1 杜马千里 1 茆琳 2 李智忠3（1.91001部队，北京，100036；2.92196部队，青岛，262100）（3.海军潜艇学院，青岛，262199）摘要针对大间隔非等间距阵DOA估计的栅瓣模糊虚警问题，基于空域稀疏信号重构原理，提出基于稀疏贝叶斯学习器（Sparse Bayesian Learning，SBL）算法的大间隔非等间距阵 DOA 估计技术。

将稀疏贝叶斯学习算法应用到大间距阵DOA估计中，提升空间谱估计器的栅瓣抑制能力。

仿真和海试数据性能验证结果表明，与常规方法相比，SBL算法具有更佳的 DOA 估计能力。

关键词大间隔非等间距阵；栅瓣抑制；稀疏重构；稀疏贝叶斯学习大间隔非等间距阵DOA估计技术可应用在以下场景中：多个声呐浮标节点组合成阵探测，水下固定式声呐在同样范围内使用尽可能少的阵元以降低成本或部分阵元损坏等。

当阵元间距与所要探测信号的波长不满足空间奈奎斯特采样定律，即阵元间距大于二分之一波长时（定义为大间隔），经典的基于二次型的波束搜索算法，如CBF、MVDR等易出现栅瓣虚警现象，因而大间隔非等间距阵DOA估计技术需具备栅瓣抑制能力。

作为新的采样理论，压缩感知（或者压缩采样）可通过开发信号的稀疏特性，在远小于Nyquist采样率的条件下，用随机采样获取信号的离散样本，然后通过非线性重建算法完美地重建信号[1]。

基于这一思想，本文拟利用稀疏空域信号重构方法开展大间隔不等间距阵的DOA估计技术研究。

稀疏信号重构技术近些年来得到快速发展，并被广泛应用到各个领域，包括图像重建与恢复、小波降噪、雷达成像、声呐目标定位等[2-5]，在频谱估计和阵列处理的背景下也出现了一些新的算法，包括l1范数最小化[6]、匹配追踪[7]、凸优化方法[8]等。

比较经典的是Gorodnitsky等人利用迭代最小范数加权称为FOCUSS进行DOA 估计[9]，后来Chen提出了基于追踪准则的稀疏信号估计方法 (Basis Pursuit Denoising ，BPDN) [10]，通过最小化空间信号的l1范数进行稀疏性约束，结合噪声功率限制的阵列数据的稀疏重构拟合约束，利用凸优化工具获得稀疏的DOA信号估计结果。

压缩感知的重构算法

压缩感知的重构算法算法的重构是压缩感知中重要的一步，是压缩感知的关键之处。

因为重构算法关系着信号能否精确重建，国内外的研究学者致力于压缩感知的信号重建，并且取得了很大的进展，提出了很多的重构算法，每种算法都各有自己的优缺点，使用者可以根据自己的情况，选择适合自己的重构算法，大大增加了使用的灵活性，也为我们以后的研究提供了很大的方便。

压缩感知的重构算法主要分为三大类：1.组合算法2.贪婪算法3.凸松弛算法每种算法之中又包含几种算法，下面就把三类重构算法列举出来。

组合算法：先是对信号进行结构采样，然后再通过对采样的数据进行分组测试，最后完成信号的重构。

(1) 傅里叶采样（Fourier Representaion）(2) 链式追踪算法（Chaining Pursuit）(3) HHS追踪算法（Heavy Hitters On Steroids）贪婪算法：通过贪婪迭代的方式逐步逼近信号。

(1) 匹配追踪算法（Matching Pursuit MP）(2) 正交匹配追踪算法（Orthogonal Matching Pursuit OMP）(3) 分段正交匹配追踪算法（Stagewise Orthogonal Matching Pursuit StOMP）(4) 正则化正交匹配追踪算法（Regularized Orthogonal Matching Pursuit ROMP）(5) 稀疏自适应匹配追踪算法（Sparisty Adaptive Matching Pursuit SAMP）凸松弛算法：(1) 基追踪算法（Basis Pursuit BP）(2) 最小全变差算法（Total Variation TV）(3) 内点法（Interior-point Method）(4) 梯度投影算法（Gradient Projection）(5) 凸集交替投影算法（Projections Onto Convex Sets POCS）算法较多，但是并不是每一种算法都能够得到很好的应用，三类算法各有优缺点，组合算法需要观测的样本数目比较多但运算的效率最高，凸松弛算法计算量大但是需要观测的数量少重构的时候精度高，贪婪迭代算法对计算量和精度的要求居中，也是三种重构算法中应用最大的一种。

强化学习的马尔可夫决策过程与值函数

强化学习的马尔可夫决策过程与值函数强化学习是一种机器学习领域中的方法，旨在使机器能够通过与环境互动来学习最佳行动策略。

马尔可夫决策过程（MDP）和值函数是强化学习中的两个核心概念。

在本文中，我们将详细介绍马尔可夫决策过程和值函数，并讨论它们在强化学习中的作用。

一、马尔可夫决策过程（MDP）马尔可夫决策过程是强化学习中用于建模决策问题的数学框架。

它是一种序列决策问题，其中智能体根据环境的状态进行决策，并接收激励信号来调整自己的行为。

MDP主要由以下几个要素组成：1. 状态空间（State Space）：状态空间是指环境可能处于的所有状态的集合。

每个状态都代表了环境的一个特定配置。

在MDP中，状态可以是离散的（如棋盘上的位置）或连续的（如机器人的位置和速度）。

2. 动作空间（Action Space）：动作空间是指智能体可以选择的所有可能动作的集合。

每个动作都会导致环境从一个状态转移到另一个状态。

3. 转移概率（Transition Probability）：转移概率定义了在给定当前状态和动作的情况下，环境转移到下一个状态的概率。

这可以用一个转移函数来表示，即P(s'|s, a)，其中s'代表下一个状态，s代表当前状态，a代表当前动作。

4. 奖励函数（Reward Function）：奖励函数定义了智能体在不同状态和采取不同动作时获得的即时奖励。

奖励可以是正值、负值或零，用来评估智能体的行为。

5. 折扣因子（Discount Factor）：折扣因子（通常用γ表示）被引入到MDP中，用于表示未来奖励的重要性相对于即时奖励的衰减程度。

如果折扣因子接近于0，智能体将更关注即时奖励；如果折扣因子接近于1，智能体将更关注未来的奖励。

基于上述要素，马尔可夫决策过程可以通过一个状态-动作-奖励的序列表示，即{<s_0, a_0, r_0>, <s_1, a_1, r_1>, <s_2, a_2,r_2>, ...}。

《Kernel Sparse Representation》中文翻译

Kernel Sparse Representation-BasedClassifier一、问题：1、针对不同类别的样本相互融合的分布情况，或者不能用线性的方法将它们有效分开的一般分类问题，SRC 失去了分类能力，如何选择一个有效的方法实现有效分类；2、KSR 虽然是SRC 的非线性扩展，但是它不能使用用于稀疏信号重构的方法，并且在实验中，测试时间较长，如何缩短测试时间；3、在KSRC 中，我们需要选择一个参数内核，例如一个RBF 内核，则必须选择有效的方法来确定相应的参数，使得效果优于SRC 的分类效果。

二、解决方法：为了解决以上的问题，在SRC 的基础上，文章引入了核函数。

核函数定义为： 1212(,)()()T k x x x x φφ=。

最常用的核函数分别有：高斯核径向基函数212122(,)()k x x exp x x γ=--，其中0γ>，还有线性核函数：1212(,)T k x x x x =等。

文章定义一个训练数据集：()(){}{},|,1,2,...,,1,2,...,m i i i i x y x X R y c i n ∈⊆∈=，其中，c 是类别的数目，m 是数据输入空间X 的维数，i y 是和i x 相对应的类标。

给定一个测试数据集x X ∈，我们的目标是从给定的c 类训练样本中预测出它的实际类标y 。

现在定义第j 类训练样本作为矩阵,1,[,...,],1,...,j j m n j j j n X x x R j c ⨯=∈=的各个列，其中，,j i x 被定义为第j 类样本，j n 是第j 类训练样本的数目。

下面定义新的训练样本矩阵来表示所有的样本数据：12c [,,...,]m n X X X X R ⨯=∈其中，1c j j n n ==∑.根据映射，将输入空间X （低维空间）的数据映射到核特征空间F （高维空间）中，有：12:()[(),(),...,()]T D x X x x x x F φφφφφ∈→=∈，其中，D m 是核特征空间F 的维度。

基于稀疏重建的信号DOA估计

基于稀疏重建的信号DOA估计任肖丽;王骥;万群【摘要】从稀疏信号重建角度提出了一种改进的波达方向（DOA）估计方法。

由于最小冗余线阵（MRLA）能以较少的阵元数获得较大的阵列孔径，将MRLA与ℓ1-SVD方法相结合估计信号的DOA。

仿真结果表明，经多次实验验证，所提方法是有效的，相比ℓ1-SVD方法可以估计出更多信源的DOA，并且可以用较少的阵元数估计更多的信源DOA，具有信源过载能力。

%This paper proposes a modified Direction of Arrival(DOA)estimation method based on Minimum Redundancy Linear Array(MRLA)from the sparse signal reconstruction perspective. According to the structure feature of MRLA that obtaining larger antenna aperture through a smaller number of array sensors, MRLA is combined with ℓ1-SVD method to estimate signal DOAs. Simulations demonstrate that the proposed method is effective, and compared with ℓ1-SVD meth-od it can estimate more DOAs of signal source, and it is capable of estimating more DOAs with fewer antenna elements.【期刊名称】《计算机工程与应用》【年(卷),期】2015(000)001【总页数】6页(P195-199,217)【关键词】波达方向(DOA);稀疏信号重建;最小冗余线阵(MRLA);ℓ1-SVD【作者】任肖丽;王骥;万群【作者单位】广东海洋大学信息学院，广东湛江 524088;广东海洋大学信息学院，广东湛江 524088;电子科技大学电子工程学院，成都 611731【正文语种】中文【中图分类】TN911.71 引言源定位是信号处理领域的主要目的之一，利用传感器阵列可以将其转换成DOA估计。

高斯朴素贝叶斯训练集精确度的英语

高斯朴素贝叶斯训练集精确度的英语Gaussian Naive Bayes (GNB) is a popular machine learning algorithm used for classification tasks. It is particularly well-suited for text classification, spam filtering, and recommendation systems. However, like any other machine learning algorithm, GNB's performance heavily relies on the quality of the training data. In this essay, we will delve into the factors that affect the training set accuracy of Gaussian Naive Bayes and explore potential solutions to improve its performance.One of the key factors that influence the training set accuracy of GNB is the quality and quantity of the training data. In order for the algorithm to make accurate predictions, it needs to be trained on a diverse and representative dataset. If the training set is too small or biased, the model may not generalize well to new, unseen data. This can result in low training set accuracy and poor performance in real-world applications. Therefore, it is crucial to ensure that the training data is comprehensive and well-balanced across different classes.Another factor that can impact the training set accuracy of GNB is the presence of irrelevant or noisy features in the dataset. When the input features contain irrelevant information or noise, it can hinder the algorithm's ability to identify meaningful patterns and make accurate predictions. To address this issue, feature selection and feature engineering techniques can be employed to filter out irrelevant features and enhance the discriminative power of the model. Byselecting the most informative features and transforming them appropriately, we can improve the training set accuracy of GNB.Furthermore, the assumption of feature independence in Gaussian Naive Bayes can also affect its training set accuracy. Although the 'naive' assumption of feature independence simplifies the model and makes it computationally efficient, it may not hold true in real-world datasets where features are often correlated. When features are not independent, it can lead to biased probability estimates and suboptimal performance. To mitigate this issue, techniques such as feature extraction and dimensionality reduction can be employed to decorrelate the input features and improve the training set accuracy of GNB.In addition to the aforementioned factors, the choice of hyperparameters and model tuning can also impact the training set accuracy of GNB. Hyperparameters such as the smoothing parameter (alpha) and the covariance type in the Gaussian distribution can significantly influence the model's performance. Therefore, it is important to carefully tune these hyperparameters through cross-validation andgrid search to optimize the training set accuracy of GNB. By selecting the appropriate hyperparameters, we can ensure that the model is well-calibrated and achieves high accuracy on the training set.Despite the challenges and limitations associated with GNB, there are several strategies that can be employed to improve its training set accuracy. By curating a high-quality training dataset, performing feature selection and engineering, addressing feature independence assumptions, and tuning model hyperparameters, we can enhance the performance of GNB and achieve higher training set accuracy. Furthermore, it is important to continuously evaluate and validate the model on unseen data to ensure that it generalizes well and performs robustly in real-world scenarios. By addressing these factors and adopting best practices in model training and evaluation, we can maximize the training set accuracy of Gaussian Naive Bayes and unleash its full potential in various applications.。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Kernel Basis PursuitVincent Guigue,Alain Rakotomamonjy,St´e phane Canu1 Lab.Perception,Syst`e mes,Information-CNRS-FRE2645 Avenue de l’Universit´e,76801St´Etienne du Rouvray{Vincent.Guigue,Alain.Rakoto,Stephane.Canu}@insa-rouen.frR´e sum´e:Les m´e thodes`a noyaux sont largement utilis´e es dans le domaine de la r´e gression.Cependant,ce type de probl`e me aboutit`a deux questions r´e currentes: comment optimiser le noyau et comment r´e gler le compromis biais-variance? L’utilisation de noyaux multiples et le calcul du chemin complet de r´e gularisation permettent de faire face simplement et efﬁcacement`a ces deux tˆa ches.L’intro-duction de noyaux multiples est´e galement un moyen de fusionner des sources d’information h´e t´e rog`e nes.Notre approche est inspir´e e de l’algorithme Basis Pursuit(Chen et al.,1998). Nous avons suivi la m´e thode de Vincent et Bengio pour la non-lin´e arisation du Basis Pursuit(Vincent&Bengio,2002).Cet article pr´e sente une solution simple et parcimonieuse pour le probl`e me de r´e gression par m´e thode`a noyaux multiples.Nous avons utilis´e la formulation du LASSO(Least Absolute Shrinkage and Selection Operator)(Tibshirani,1996), bas´e e sur une r´e gularisation L1,et l’algorithme du LARS(Stepwise Least Angle Regression)(Efron et al.,2004)pour la r´e r´e gularisation L1est un gage de parcimonie tandis que le calcul du chemin complet de r´e gularisation, via le LARS,permet de d´eﬁnir de nouveaux crit`e res pour trouver le compro-mis biais-variance optimal.Nous pr´e senterons´e galement une heuristique pour le r´e glage des param`e tres du noyau,aﬁn de rendre la m´e thode compl`e tement non param´e trique.Mots-cl´e s:R´e gression,Noyaux Multiples,LASSO,M´e thode Non-Param´e triqueAbstract:Kernel methods have been widely used in the context of regression. But every problem leads to two major tasks:optimizing the kernel and setting the ﬁtness-regularization ing multiple kernels and Basis Pursuit is a way to face easily and efﬁciently these two tasks.On top of that,it enables us to deal with multiple and heterogeneous sources of information.Our approach is inspired by the Basis Pursuit algorithm(Chen et al.,1998).We use Vincent and Bengio’s method(Vincent&Bengio,2002)to kernelize the Basis Pursuit and introduce the ability of mixing heterogeneous sources of information.CAp2005This article aims at presenting an easy,efﬁcient and sparse solution to the multipleKernel Basis Pursuit problem.We will use the Least Absolute Shrinkage andSelection Operator(LASSO)formulation(Tibshirani,1996)(L1regularization),and the Stepwise Least Angle Regression(LARS)algorithm(Efron et al.,2004)as solver.The LARS provides a fast and sparse solution to the LASSO.The factthat it computes the optimal regularization path enables us to propose new auto-adaptive hyper-parameters for theﬁtness-regularization compromise.We willalso propose some heuristics to choose the kernel parameters.Finally,we aim atproposing a parameter free,sparse and fast regression method.Key words:Regression,Multiple Kernels,LASSO,Parameter Free.1IntroductionThe context of our work is the following:we wish to estimate the functional dependency between an input x and an output y of a system given a set of examples{(x i,y i),x i∈X,y i∈Y,i=1...n}which have been drawn i.i.d from an unknown probability law P(X,Y).Thus,our aim is to recover the function f which minimizes the following riskR[f]=E{(f(X)−Y)2}(1) but as P(X,Y)is unknown,we have to look for the function f which minimizes theempirical risk:R emp[f]=ni=1(f(x i)−y i)2(2)This problem is ill-posed and a classical way to turn it into a well-posed one is to use regularization theory(Tikhonov&Ars´e nin,1977;Girosi et al.,1995).In this con-text,the solution of the problem is the function f∈H that minimizes the regularized empirical risk:R reg[f]=1Kernel Basis Pursuit method consisting in adding functions from a dictionary.The bias-variance problem involves several parameters,especially the kernel parameters and the hyper-parameter trading between goodness-of-ﬁt and regularization.Our solution is based onℓ1regularization,we useΩ= β ℓ1in equation3.This for-mulation is called the Least Absolute Shrinkage and Selection Operator(LASSO)(Tib-shirani,1996),it will enable us to improve sparsity.Our solver relies on the Stepwise Least Angle Regression(LARS)algorithm(Efron et al.,2004),which is an iterative forward algorithm.Thus,the sparsity of the solution is closely linked to the efﬁciency of the method.We use Vincent and Bengio’s strategy(Vincent&Bengio,2002)to kernelize the resulting method.Finally,we end at the Kernel Basis Pursuit algorithm. Associated with this learning problem,there are two major tasks to build a good re-gression function with kernel:optimizing the kernel and choosing a good compromise betweenﬁtness and regularization.The use of multiple kernels is a way to make the ﬁrst task easier.We will use the optimal path regularization properties of the LARS to propose new heuristics,in order to set dynamically theﬁtness-regularization compro-mise.In section2,we will compare two approaches to the question of sparsity:the Match-ing Pursuit and the Basis Pursuit.We will explain the building and the use of the multiple kernels,combined with the LARS in section3.Our results on synthetic and real data are presented in section4.Section5gives our conclusions and perspectives on this work.2Basis vs Matching PursuitTwo common strategies are available to face the problem of building a sparse regres-sion function f.Theﬁrst one relies on an iterative building of f.At each step k,the comparison between the target y and the function f k leads to add a new source of infor-mation to build f k+1.This approach is fast but it is greedy and thus sub-optimal.The second solution consists in solving a learning problem,by minimizing the regularized empirical risk of equation3.Mallat and Zhang introduced the Matching Pursuit algorithm(Mallat&Zhang,1993): they proposed to construct a regression function f as a linear combination of elemen-tary functions g picked from aﬁnite redundant dictionary D.This algorithm is iterative and one new function g is introduced at each step,associated with a weightβ.At step k,we get the following approximation of f:f k=ki=1βi g i(5)Given R k,the residue generated by f k,the function g k+1and its associated weight βk+1are selected according to:(g k+1,αk+1)=argmin g∈D,β∈R R k−βg 2(6) The improvements described by Pati et al.(Orthogonal Matching Pursuit algorithm) (Pati et al.,1993)keep the same framework,but optimize all the weightsβi at eachCAp2005step.A third algorithm called pre-ﬁtting(Vincent&Bengio,2002)enables us to choose (g k+1,βk+1)according to R k+1.All those methods are iterative and greedy.The different variations improve the weights or the choice of the function g but the main characteristic remains unchanged. Matchin Pursuit does not allow to get rid of a previous source of information,which means that its solution is sub-optimal.The approach of Chen et al.(Chen et al.,1998) is really different:they consider the whole dictionary of functions and look for the best linear solution(equation5)to estimate y,namely,the solution which minimizes theregularized empirical ingΩ= β ℓ1leads to the LASSO formulation.Such aformulation requires costly and complex linear programming(Chen,1995)or modiﬁed EM implementation(Grandvalet,1998)to be solved.Finally it enables them toﬁnd an exact solution to the regularized learning problem.The Stepwise Least Angle Regression(LARS)(Efron et al.,2004)offers new oppor-tunities,by combining an iterative and efﬁcient approach with the exact solution of the LASSO.The fact that the LARS begins with an empty set of variables,combined with the sparsity of the solution explains the efﬁciency of the method.The ability of deleting dynamically useless variables enables the method to converge to the exact solution of the LASSO problem.3Learning with multiple kernels3.1Building a multiple kernel regression functionVincent and Bengio(Vincent&Bengio,2002)propose to treat the kernel K exactly in the same way as the matrix X.Each column of K is then a source of information that can be added to the linear regression model f.Given an input vector x and a parametric mapping functionΦθdeﬁned byΦθ:R d Fx→Φθ(x)=Kθ(x,·)(7)where F is the spanned feature space,we consider Kθ(x,.)as a source of information. It becomes easy to deal with multiple mapping functionsΦi.The multiple resulting kernels K i are placed side by side in a big matrix K:K= K1...K i...K N (8) N is the number of kernels.In this situation,each source of information K i(x j,·)is characterized by a point x j of the learning set and a kernel parameter i.The number of information sources is then s=nN and K∈R n×s.The learning problem becomes a variable selection problem where theβi coefﬁcients can be seen as the weights of the sources of information.We simplify the notations:f=Ni=1n j=1βij K i(x j,·)=s i=1βi K(i,·)=Kβ(9)Kernel Basis PursuitIt is important to note that no assumption is made on the kernel K θwhich can be non-positive.K can associate kernels of the same type (e.g.Gaussian)with different parameter values as well as different types of kernels (e.g.Gaussian and polynomial).The resulting matrix K is neither positive deﬁnite or square.3.2LARSThe LARS (Efron et al.,2004)is a stepwise iterative algorithm which provides an exact to minimization of the regularized empirical risk (equation 3)with Ω= β ℓ1.We use the following formulation,which is equivalent to the LASSO:min β y −Kβ 2With respect to: β ℓ1≤t (10)We denote by βi the regression coefﬁcient associated to the i th source of information and by ˆy (j )=Kβ(j )the regression function at step j .More generally,we will use exponent to characterize the RS is made of the following main steps:1.Initialization:the active set of information source A is empty,all βcoefﬁcients are set to zero.putation of the correlation between the sources of information and the residue.The residue R is deﬁned by R =y −ˆy .3.The most correlated source is added to the active set.A ={A arg max i,θ(|K T θ(x i ,·)R |)}(11)4.Deﬁnition of the best direction in the active set:−→u A This is the most expensivepart of the algorithm in time computation since it requires the inversion of thematrix K T A K A .5.The original part of the algorithm resides in the computation of the step γ.The idea is to compute γsuch as two functions are equi-correlated with the residue (cf Fig.1)whereas Ordinary Least Square (OLS)algorithm deﬁnes γsuch as −→u A and −−−−−→ˆy (j +1),y become orthogonal.6.The regression function is updated:ˆy (j +1)=ˆy (j )+γ−→u A (12)It is necessary to introduce the ability of suppressing a function from the active set to ﬁt the LASSO solution,namely to turn the forward algorithm into a stepwise method.When the sign of a βi changes during the update (equation (12),the step γis reduced so that this βi becomes zero.Then,the corresponding source is removed from the active set and an optimization is performed over the new active set.Solving the LASSO is really fast with this method,due to the fact that it is both forward and sparse.The ﬁrst steps are not expensive,because of the small size of theCAp2005active set,then it becomes more and more time consuming with iterations.But the sparsity ofℓ1regularization limits the number of required RS begins with an empty active set whereas linear programming and other backward methods begin with all functions and require to solve high dimensional linear system to put irrelevant coefﬁcients to zero.Given the fact that only one point is added(or removed)during an iteration,it is possible to update the inverted matrix of step four instead of fully computing it.This leads to a simple-LARS algorithm,similarly to the simple-SVM formulation(Loosli et al.,2004),which also increases the speed of the method.3.3Optimization of regularization parameterOne of the most interesting property of the LARS is the fact that it computes the whole regularization path.The regularization parameterλof equation3is equivalent to the bound t of equation10.At each step,the introduction of a new source of information leads to an optimal solution,corresponding to a given value of t.In the other classical algorithms,λis set a priori and optimized by cross-validation.The LARS enables us to compute a set of optimal solutions corresponding to different values of t,with only one learning stage.It also enables us to optimize the value of t dynamically,during the learning stage.Finding a good setting for t is very important:when t becomes too large,the re-sulting regression function is the same as the Ordinary Least Square(OLS)regression function.Hence,it requires the resolution of linear system of size s×s.Early stopping should enable us to decrease the time computation(which is linked to the sparsity of the solution)as well as to improve the generalization of the learning(by regularizing).3.3.1Different compromise parametersThe computation of the complete regularization path offers the opportunity to set the compromise parameter dynamically(Bach et al.,2004).Theﬁrst step is to look for different expressions of the regularization parameter t of equation(10).The aim is to ﬁnd the most meaningful one,namely the easiest way to set this parameter.Kernel Basis Pursuit -The original formulation of the LARS relies on the compromise parameter t which is a bound on the sum of the absolute values of theβcoefﬁcients.t is difﬁcult to set because it is somewhat meaningless.-It is possible to apply Ljung criterion(Ljung,1987)on the autocorrelation of the residue.The parameter is then a threshold which decides when the residue can be considered as white noise.-Another solution consists in the study of the evolution of loss functionℓ(y i,f jθ(x i))with regards to the step j.The criterion is a bound on the variation of this cost.-ν-LARS.It is possible to deﬁne a criterion on the number of support vectors or on the rate of support vectors among the learning set.It is important to note that theνthreshold is then a hard threshold,whereas in theν-SVM method whereνcan be seen as an upper bound on the rate of support vectors(Sch¨o lkopf&Smola, 2002).However,all these methods require the setting of a parameter a priori.The value of this parameter is estimated by cross-validation.3.3.2Trap sourceWe propose another method based on a trap parameter.The idea is to introduce one or many sources of information that we do not want to use.When the most correlated source with the residue belongs to the trap set,the learning procedure is stopped.The trap sources of information can be built on different heuristics:-according to the original signal noise when there exists prior knowledge on the data,-with regards to the distribution of the learning points,to prevent overﬁtting(cfFig.2),in this case,a Gaussian kernel K=Kσof is added to the informationsources,withσof very small,-by adding random variables among the sources of information(with Gaussian or uniform distribution).This kind of heuristic has already been used in variable selection(Bi et al.,2003).The use of a trap scale is closely linked to the way that LARS selects the sources of information.As seen in section3.2,the selected source of information at a given iteration is the most correlated with the residue.Those heuristics are based on the meaning of the trap scale:the learning stage should be stopped when the residue is most correlated respectively with the noise,with only one source of information or with an independent random variable generated according to the uniform distribution. This means that no more relevant information is present in the sources that are not in the active set.CAp2005Figure2:Illustration of a trap scale based on overﬁtting heuristic(Gaussian kernel).When Kσ2(x,·)is the most correlated source of information with the residue,it meansthat the error is caused by only one point,it is a way to detect the beginning of overﬁt-ting.3.4Optimizing kernel parametersInstead of searching an optimal parameter(or parameter vector)for the problem,we propose toﬁnd a key scale toﬁt the regions where there are the highest point density in the input space.We aim atﬁnding a reference Gaussian parameter so that the two near-est points in the input space have few inﬂuence on each other.This reference parameter represents the smallest bandwidth which can be interesting for a given problem.Then, we propose to build a series of bigger Gaussian parameters from this scale toﬁt the different densities of points that can append in the whole input space.A one nearest neighbors is performed on the training data.To describe high density regions,we focus on the shortest distances between neighbors.The key distance D k is deﬁned as the distance between x i and x j,the two nearest points in the input space.The corresponding key Gaussian parameterσk is deﬁned so that:K(x i,x j)=12πσk exp −D2kKernel Basis Pursuit4ExperimentsWe illustrate the efﬁciency of the methodology on synthetic and real data.Tables1and 2present the results with two different algorithms:the SVM and the LARS.We use four strategies to stop the learning stage of the LARS.-LARS- i|βi|is the classical method where a bound is deﬁned on the sum of the regression coefﬁcient.This bound is estimated by cross validation.-ν-LARS is based on the fraction of support vectors.νis also estimated by cross validation.-LARS-RV relies on the introduction of random variables as sources of informa-tion.The learning stage is stopped when one of these sources is picked up as most correlated with the residue.-LARS-σs relies also on a trap scale,but this scale is built according to the dis-tribution the learning set.Selecting a source in this trap scale can be seen as overﬁtting.We useσs=σk of equation(13).To validate this approach,we compare the results with classical Gaussianǫ-SVM re-gression,Parametersǫ,C andσare optimized by cross validation.In order to distin-guish the beneﬁt of the LARS from the beneﬁts of the multiple kernel learning,we also give the results of LARS algorithm combined with single kernel.4.1Synthetic dataThe learning of cos(exp(ωt))regression function,with random sampling show the mul-tiple kernel interest.We try to identify:f(t)=cos(exp(ωt))+b(t)(15) where b(t)is a Gaussian white noise of varianceσ2b=0.4.t∈[0,2]is drawn ac-cording to a uniform distribution,ω=2.4.We also tested the method over classical synthetic data described by Donoho and Johnstone(Donoho&Johnstone,1994).For those signals,we took t∈[0,1],drawn according to a uniform distribution.We use200points for the learning set and1000points for the testing set.The noise is added only on the learning set.Parameters(ν, i|βi|...)are computed by cross validation on the learning set.Table1presents the results over30runs for each data base.These results point out the sparsity and the efﬁciency of LARS solutions.Figure3 illustrates how multiple kernel learning enables the regression function toﬁt the local frequency of the model.It also shows that selected points belong higher and higher scales with iterations.Indeed,the correlation with the residue can be seen as an ener-getic criterion:when the amplitude of the signal remain constant,there is more energy in the low frequency part of the signal.That is why theﬁrst selected sources of infor-mation describe those parts of the signal.The results with different Donoho’s synthetic signals enable us to distinguish the beneﬁts of the LARS method from the beneﬁts ofCAp 2005Algorithmǫ-SVM ν-LARS cos(exp(t ))0.16±0.016155.40.17±0.01512000.041±0.006837.1000.039±0.005932.800Blocks 1.18±0.2045.3001.19±0.17210.028±0.007025.6340.028±0.006533.933HeaviSine 0.48±0.1251.43110.49±0.12404Algorithm LARS-P i |βi |LARS-RV cos(exp(t ))0.13±0.014122.4170.13±0.016121.470.035±0.00624600.035±0.007549.800Blocks 0.95±0.3434.02130.96±0.2542.3560.032±0.00502700.033±0.005927.330HeaviSine 0.50±0.1451.3000.49±0.1548.224Table 1:Results of SVM and LARS for the cos(exp(t ))and Donoho’s classical func-tions estimation.Mean and standard deviation of MSE on the test set (30runs),number of support vectors used for each solution,number of best performances.Kernel Basis PursuitFigure3:These illustratingﬁgures explain the learning of the cos(exp(ωt))function with low noise level.Selected learning points belong to different scales,depending on the local frequency of the function to learn.Rightﬁgure shows that selected points belong higher and higher scales with iterations,namely,the sources of information correlated with the residue are more and more local with iterations.the multiple kernels.The LARS improves the sparsity of the solution,whereas the multiple kernels improve the results on signals that require a multiple scale approach.ǫ-SVM achieves the best results for Ramp and HeaviSine signals.This can be ex-plained by the fact that the Ramp and HeaviSine signals are almost uniform in term of frequency.Theǫtube algorithm of the SVM regression is especially efﬁcient on this kind of problem.It is important to note that LARS-RV and LARS-σs are parameter free methods when combined with the heuristic described in section3.4.Best results are achieved with LARS- i|βi|,however,LARS-RV results are almost equivalent without any parame-ters.4.2Real dataExperiments are carried out over regression data bases pyrim and triazines available in the UCI repository(Blake&Merz,1998).We compare our results with(Chang&Lin, 2005).The experimental procedure for real data is the following one:Thirty training/testing set are produced randomly,table2presents mean and standard deviation of MSE(mean square error)on the test set.80%of the points are used for training and the remaining paramters(ν, i|βi|...)are computed by cross validation on the learning set.The results obtained with LARS algorithm are either equivalent to Chang and Lin’s ones or better.ǫ-SVM solution is not really competitive but it gives an interesting information on the number of support vectors required for each RS-RV and LARS-σs results are very interesting:they are parameter free using the heuristic describe in section3.4,moreover the LARS-RV achieves the best results for pyrim.CAp2005Algo SVMǫ-SVM RV0.009±0.01637.1100.011±0.00931.20triazines0.021±0.005−−0.022±0.00637.00Algo LARSPi|βi|σs0.007±0.00638.08170.020±0.00552.038Table2:Results of SVM and LARS for the different regression database.Mean and standard deviation of MSE on the test set(30runs),number of support vectors used for each solution,number of best performances.5ConclusionThis paper enables us to meet two objectives:proposing a sparse kernel-based solution for the regression problem and introducing new solutions for the bias-variance compro-mise problem.The LARS offers opportunities for both problems.It gives an exact solution to the LASSO problem,which is sparse due toℓ1regularization.The ability of dealing with multiple kernels allows rough setting for the kernel parameters.Then,LARS algo-rithm optimizes the parameters at each iteration,selecting a new point in the optimal scale.The fact that the LARS computes the regularization path offers efﬁcient and non parametric settings for the compromise parameter.This methodology gives good results on synthetic and real data.In the meantime,the required time computation is reduced compared with SVM,due to the sparsity of the obtained solutions.The perspectives of this work are threefold.We have to test LARS-methods on more databases to evaluate all properties.We also want to improve the multiple kernel build-ing.Indeed,the use of the currentσk often leads to a slight overﬁtting and to less sparse solutions.Finally,we will analyze the LARS-RV results deeper,to explain the good results and possibly improve them.ReferencesB ACH F.,T HIBAUX R.&J ORDAN M.(2004).Computing regularization paths for learningKernel Basis Pursuitmultiple kernels.In Advances in Neural Information Processing Systems,volume17.B I J.,B ENNETT K.,E MBRECHTS M.,B RENEMAN C.&S ONG M.(2003).Dimensionality reduction via sparse support vector machines.Journal of Machine Learning Research,3,1229–1243.B LAKE C.&M ERZ C.(1998).UCI rep.of machine learning databases.C HANG M.&L IN C.(2005).Leave-one-out bounds for support vector regression model se-lection.Neural Computation.C HEN S.(1995).Basis Pursuit.PhD thesis,Department of Statistics,Stanford University.C HEN S.,D ONOHO D.&S AUNDERS M.(1998).Atomic decomposition by basis pursuit. SIAM Journal on Scientiﬁc Computing,20(1),33–61.D ONOHO D.&J OHNSTONE I.(1994).Ideal spatial adaptation by wavelet shrinkage. Biometrika,81,425–455.E FRON B.,H ASTIE T.,J OHNSTONE I.&T IBSHIRANI R.(2004).Least angle regression. Annals of statistics,32(2),407–499.G IROSI F.,J ONES M.&P OGGIO T.(1995).Regularization theory and neural networks archi-tectures.Neural Computation,7(2),219–269.G RANDVALET Y.(1998).Least absolute shrinkage is equivalent to quadratic penalization.In ICANN,p.201–206.K IMELDORF G.&W AHBA G.(1971).Some results on Tchebychefﬁan spline functions.J. Math.Anal.Applic.,33,82–95.L JUNG L.(1987).System Identiﬁcation-Theory for the User.L OOSLI G.,C ANU S.,V ISHWANATHAN S.,S MOLA A.J.&C HATTOPADHYAY M.(2004). Une boˆıte`a outils rapide et simple pour les svm.In CAp.M ALLAT S.&Z HANG Z.(1993).Matching pursuits with time-frequency dictionaries.IEEE Transactions on Signal Processing,41(12),3397–3415.P ATI Y.C.,R EZAIIFAR R.&K RISHNAPRASAD P.S.(1993).Orthogonal matching pursuits: recursive function approximation with applications to wavelet decomposition.In Proceedings of the27th Asilomar Conference in Signals,Systems,and Computers.S CH¨OLKOPF B.&S MOLA A.(2002).Learning with kernels.T IBSHIRANI R.(1996).Regression shrinkage and selection via the lasso.J.Royal.Statist., 58(1),267–288.T IKHONOV A.&A RS´E NIN V.(1977).Solutions of ill-posed problems.W.H.Winston.V INCENT P.&B ENGIO Y.(2002).Kernel matching pursuit.Machine Learning Journal,48(1), 165–187.W AHBA G.(1990).Spline Models for Observational Data.Series in Applied Mathematics, V ol.59,SIAM.。