LMS AND SMI ALGORITHMS FOR SPATIAL ADAPTIVE INTERFERENCE REJECTION 3 Contents
基于主动学习和二次有理核的模型无关局部解释方法

基于主动学习和二次有理核的模型无关局部解释方法
周晟昊;袁伟伟;关东海
【期刊名称】《计算机科学》
【年(卷),期】2024(51)2
【摘要】深度学习模型的广泛使用,在更大程度上使人们意识到模型的决策是亟需解决的问题,复杂难以解释的黑盒模型阻碍了算法在实际场景中部署。
LIME作为最流行的局部解释方法,生成的扰动数据却具有不稳定性,导致最终的解释产生偏差。
针对上述问题,提出了一种基于主动学习和二次有理核的模型无关局部解释方法ActiveLIME,使得局部解释模型更加忠于原始分类器。
ActiveLIME生成扰动数据后,通过主动学习的查询策略对扰动数据进行采样,筛选不确定性高的扰动集训练,使用迭代过程中准确度最高的局部模型对感兴趣实例生成解释。
并且,针对容易陷入局部过拟合的高维稀疏样本,在模型损失函数中引入了二次有理核来减少过拟合。
实验结果表明,所提出的ActiveLIME方法引比传统局部解释方法具有更高的局部保真度和解释质量。
【总页数】7页(P245-251)
【作者】周晟昊;袁伟伟;关东海
【作者单位】南京航空航天大学计算机科学与技术学院
【正文语种】中文
【中图分类】TP391
【相关文献】
1.一种基于因果强度的局部因果结构主动学习方法
2.基于主动形状模型算法的局部灰度模型的加权改进方法
3.基于局部主动轮廓模型的飞机壁板铆接孔定位方法研究
4.基于模糊核聚类和主动学习的异常检测方法
5.基于核极限学习机的快速主动学习方法及其软测量应用
因版权原因,仅展示原文概要,查看原文内容请购买。
GSPBOX_-Atoolboxforsignalprocessingongraphs_

GSPBOX_-Atoolboxforsignalprocessingongraphs_GSPBOX:A toolbox for signal processing on graphsNathanael Perraudin,Johan Paratte,David Shuman,Lionel Martin Vassilis Kalofolias,Pierre Vandergheynst and David K.HammondMarch 16,2016AbstractThis document introduces the Graph Signal Processing Toolbox (GSPBox)a framework that can be used to tackle graph related problems with a signal processing approach.It explains the structure and the organization of this software.It also contains a general description of the important modules.1Toolbox organizationIn this document,we brie?y describe the different modules available in the toolbox.For each of them,the main functions are brie?y described.This chapter should help making the connection between the theoretical concepts introduced in [7,9,6]and the technical documentation provided with the toolbox.We highly recommend to read this document and the tutorial before using the toolbox.The documentation,the tutorials and other resources are available on-line 1.The toolbox has ?rst been implemented in MATLAB but a port to Python,called the PyGSP,has been made recently.As of the time of writing of this document,not all the functionalities have been ported to Python,but the main modules are already available.In the following,functions pre?xed by [M]:refer to the MATLAB implementation and the ones pre?xed with [P]:refer to the Python implementation. 1.1General structure of the toolbox (MATLAB)The general design of the GSPBox focuses around the graph object [7],a MATLAB structure containing the necessary infor-mations to use most of the algorithms.By default,only a few attributes are available (see section 2),allowing only the use of a subset of functions.In order to enable the use of more algorithms,additional ?elds can be added to the graph structure.For example,the following line will compute the graph Fourier basis enabling exact ?ltering operations.1G =gsp_compute_fourier_basis(G);Ideally,this operation should be done on the ?y when exact ?ltering is required.Unfortunately,the lack of well de?ned class paradigm in MATLAB makes it too complicated to be implemented.Luckily,the above formulation prevents any unnecessary data copy of the data contained in the structure G .In order to avoid name con?icts,all functions in the GSPBox start with [M]:gsp_.A second important convention is that all functions applying a graph algorithm on a graph signal takes the graph as ?rst argument.For example,the graph Fourier transform of the vector f is computed by1fhat =gsp_gft(G,f);1Seehttps://lts2.epfl.ch/gsp/doc/for MATLAB and https://lts2.epfl.ch/pygsp for Python.The full documentation is also avail-able in a single document:https://lts2.epfl.ch/gsp/gspbox.pdf1a r X i v :1408.5781v 2 [c s .I T ] 15 M a r 2016The graph operators are described in section4.Filtering a signal on a graph is also a linear operation.However,since the design of special?lters(kernels)is important,they are regrouped in a dedicated module(see section5).The toolbox contains two additional important modules.The optimization module contains proximal operators,projections and solvers compatible with the UNLocBoX[5](see section6).These functions facilitate the de?nition of convex optimization problems using graphs.Finally,section??is composed of well known graph machine learning algorithms.1.2General structure of the toolbox(Python)The structure of the Python toolbox follows closely the MATLAB one.The major difference comes from the fact that the Python implementation is object-oriented and thus allows for a natural use of instances of the graph object.For example the equivalent of the MATLAB call:1G=gsp_estimate_lmax(G);can be achieved using a simple method call on the graph object:1G.estimate_lmax()Moreover,the use of class for the"graph object"allows to compute additional graph attributes on the?y,making the code clearer as its MATLAB equivalent.Note though that functionalities are grouped into different modules(one per section below) and that several functions that work on graphs have to be called directly from the modules.For example,one should write:1layers=pygsp.operators.kron_pyramid(G,levels)This is the case as soon as the graph is the structure on which the action has to be performed and not our principal focus.In a similar way to the MATLAB implementation using the UNLocBoX for the convex optimization routines,the Python implementation uses the PyUNLocBoX,which is the Python port of the UNLocBoX. 2GraphsThe GSPBox is constructed around one main object:the graph.It is implemented as a structure in Matlab and as a class in Python.It stores the nodes,the edges and other attributes related to the graph.In the implementation,a graph is fully de?ned by the weight matrix W,which is the main and only required attribute.Since most graph structures are far from fully connected, W is implemented as a sparse matrix.From the weight matrix a Laplacian matrix is computed and stored as an attribute of the graph object.Different other attributes are available such as plotting attributes,vertex coordinates,the degree matrix,the number of vertices and edges.The list of all attributes is given in table1.2Attribute Format Data type DescriptionMandatory?eldsW N x N sparse matrix double Weight matrix WL N x N sparse matrix double Laplacian matrixd N x1vector double The diagonal of the degree matrixN scalar integer Number of verticesNe scalar integer Number of edgesplotting[M]:structure[P]:dict none Plotting parameterstype text string Name,type or short descriptiondirected scalar[M]:logical[P]:boolean State if the graph is directed or notlap_type text string Laplacian typeOptional?eldsA N x N sparse matrix[M]:logical[P]:boolean Adjacency matrixcoords N x2or N x3matrix double Vectors of coordinates in2D or3D.lmax scalar double Exact or estimated maximum eigenvalue U N x N matrix double Matrix of eigenvectorse N x1vector double Vector of eigenvaluesmu scalar double Graph coherenceTable1:Attributes of the graph objectThe easiest way to create a graph is the[M]:gsp_graph[P]:pygsp.graphs.Graph function which takes the weight matrix as input.This function initializes a graph structure by creating the graph Laplacian and other useful attributes.Note that by default the toolbox uses the combinatorial de?nition of the Laplacian operator.Other Laplacians can be computed using the[M]:gsp_create_laplacian[P]:pygsp.gutils.create_laplacian function.Please note that almost all functions are dependent of the Laplacian de?nition.As a result,it is important to select the correct de?nition at? rst.Many particular graphs are also available using helper functions such as:ring,path,comet,swiss roll,airfoil or two moons. In addition,functions are provided for usual non-deterministic graphs suchas:Erdos-Renyi,community,Stochastic Block Model or sensor networks graphs.Nearest Neighbors(NN)graphs form a class which is used in many applications and can be constructed from a set of points (or point cloud)using the[M]:gsp_nn_graph[P]:pygsp.graphs.NNGraph function.The function is highly tunable and can handle very large sets of points using FLANN[3].Two particular cases of NN graphs have their dedicated helper functions:3D point clouds and image patch-graphs.An example of the former can be seen in thefunction[M]:gsp_bunny[P]:pygsp.graphs.Bunny.As for the second,a graph can be created from an image by connecting similar patches of pixels together.The function[M]:gsp_patch_graph creates this graph.Parameters allow the resulting graph to vary between local and non-local and to use different distance functions [12,4].A few examples of the graphs are displayed in Figure1.3PlottingAs in many other domains,visualization is very important in graph signal processing.The most basic operation is to visualize graphs.This can be achieved using a call to thefunction[M]:gsp_plot_graph[P]:pygsp.plotting.plot_graph. In order to be displayable,a graph needs to have2D(or3D)coordinates(which is a?eld of the graph object).Some graphs do not possess default coordinates(e.g.Erdos-Renyi).The toolbox also contains routines to plot signals living on graphs.The function dedicated to this task is[M]:gsp_plot_ signal[P]:pygsp.plotting.plot_signal.For now,only1D signals are supported.By default,the value of the signal is displayed using a color coding,but bars can be displayed by passing parameters.3Figure 1:Examples of classical graphs :two moons (top left),community (top right),airfoil (bottom left)and sensor network (bottom right).The third visualization helper is a function to plot ?lters (in the spectral domain)which is called [M]:gsp_plot_filter [P]:pygsp.plotting.plot_filter .It also supports ?lter-banks and allows to automatically inspect the related frames.The results obtained using these three plotting functions are visible in Fig.2.4OperatorsThe module operators contains basics spectral graph functions such as Fourier transform,localization,gradient,divergence or pyramid decomposition.Since all operator are based on the Laplacian de? nition,the necessary underlying objects (attributes)are all stored into a single object:the graph.As a ?rst example,the graph Fourier transform [M]:gsp_gft [P]:pygsp.operators.gft requires the Fourier basis.This attribute can be computed with the function [M]:gsp_compute_fourier_basis[P]:/doc/c09ff3e90342a8956bec0975f46527d3240ca692.html pute_fourier_basis [9]that adds the ?elds U ,e and lmax to the graph structure.As a second example,since the gradient and divergence operate on the edges of the graph,a search on the edge matrix is needed to enable the use of these operators.It can be done with the routines [M]:gsp_adj2vec[P]:pygsp.operators.adj2vec .These operations take time and should4Figure 2:Visualization of graph and signals using plotting functions.NameEdge derivativefe (i,j )Laplacian matrix (operator)Available Undirected graph Combinatorial LaplacianW (i,j )(f (j )?f (i ))D ?WV Normalized Laplacian W (i,j ) f (j )√d (j )f (i )√d (i )D ?12(D ?W )D ?12V Directed graph Combinatorial LaplacianW (i,j )(f (j )?f (i ))12(D ++D ??W ?W ?)V Degree normalized Laplacian W (i,j ) f (j )√d ?(j )?f (i )√d +(i )I ?12D ?12+[W +W ?]D ?12V Distribution normalized Laplacianπ(i ) p (i,j )π(j )f (j )? p (i,j )π(i )f (i )12 Π12PΠ12+Π?12P ?Π12 VTable 2:Different de?nitions for graph Laplacian operator and their associated edge derivative.(For directed graph,d +,D +and d ?,D ?de?ne the out degree and in-degree of a node.π,Πis the stationary distribution of the graph and P is a normalized weight matrix W .For sake of clarity,exact de?nition of those quantities are not given here,but can be found in [14].)be performed only once.In MATLAB,these functions are called explicitly by the user beforehand.However,in Python they are automatically called when needed and the result stored as an attribute. The module operator also includes a Multi-scale Pyramid Transform for graph signals [6].Again,it works in two steps.Firstthe pyramid is precomputed with [M]:gsp_graph_multiresolution [P]:pygsp.operators.graph_multiresolution .Second the decomposition of a signal is performed with [M]:gsp_pyramid_analysis [P]:pygsp.operators.pyramid_analysis .The reconstruction uses [M]:gsp_pyramid_synthesis [P]:pygsp.operators.pyramid_synthesis .The Laplacian is a special operator stored as a sparse matrix in the ?eld L of the graph.Table 2summarizes the available de?nitions.We are planning to implement additional ones.5FiltersFilters are a special kind of linear operators that are so prominent in the toolbox that they deserve their own module [9,7,2,8,2].A ?lter is simply an anonymous function (in MATLAB)or a lambda function (in Python)acting element-by-element on the input.In MATLAB,a ?lter-bank is created simply by gathering these functions together into a cell array.For example,you would write:51%g(x)=x^2+sin(x)2g=@(x)x.^2+sin(x);3%h(x)=exp(-x)4h=@(x)exp(-x);5%Filterbank composed of g and h6fb={g,h};The toolbox contains many prede?ned design of?lter.They all start with[M]:gsp_design_in MATLAB and are in the module[P]:pygsp.filters in Python.Once a?lter(or a?lter-bank)is created,it can be applied to a signal with[M]: gsp_filter_analysis in MATLAB and a call to the method[P]:analysis of the?lter object in Python.Note that the toolbox uses accelerated algorithms to scale almost linearly with the number of sample[11].The available type of?lter design of the GSPBox can be classi?ed as:Wavelets(Filters are scaled version of a mother window)Gabor(Filters are shifted version of a mother window)Low passlter(Filters to de-noise a signal)High pass/Low pass separationlterbank(tight frame of2lters to separate the high frequencies from the low ones.No energy is lost in the process)Additionally,to adapt the?lter to the graph eigen-distribution,the warping function[M]:gsp_design_warped_translates [P]:pygsp.filters.WarpedTranslates can be used[10].6UNLocBoX BindingThis module contains special wrappers for the UNLocBoX[5].It allows to solve convex problems containing graph terms very easily[13,15,14,1].For example,the proximal operator of the graph TV norm is given by[M]:gsp_prox_tv.The optimization module contains also some prede?ned problems such as graph basis pursuit in[M]:gsp_solve_l1or wavelet de-noising in[M]:gsp_wavelet_dn.There is still active work on this module so it is expected to grow rapidly in the future releases of the toolbox.7Toolbox conventions7.1General conventionsAs much as possible,all small letters are used for vectors(or vector stacked into a matrix)and capital are reserved for matrices.A notable exception is the creation of nearest neighbors graphs.A variable should never have the same name as an already existing function in MATLAB or Python respectively.This makes the code easier to read and less prone to errors.This is a best coding practice in general,but since both languages allow the override of built-in functions,a special care is needed.All function names should be lowercase.This avoids a lot of confusion because some computer architectures respect upper/lower casing and others do not.As much as possible,functions are named after the action they perform,rather than the algorithm they use,or the person who invented it.No global variables.Global variables makes it harder to debug and the code is harder to parallelize.67.2MATLABAll function start by gsp_.The graph structure is always therst argument in the function call.Filters are always second.Finally,optional parameter are last.In the toolbox,we do use any argument helper functions.As a result,optional argument are generally stacked into a graph structure named param.If a transform works on a matrix,it will per default work along the columns.This is a standard in Matlab(fft does this, among many other functions).Function names are traditionally written in uppercase in MATLAB documentation.7.3PythonAll functions should be part of a module,there should be no call directly from pygsp([P]:pygsp.my_function).Inside a given module,functionalities can be further split in differentles regrouping those that are used in the same context.MATLAB’s matrix operations are sometimes ported in a different way that preserves the efciency of the code.When matrix operations are necessary,they are all performed through the numpy and scipy libraries.Since Python does not come with a plotting library,we support both matplotlib and pyqtgraph.One should install the required libraries on his own.If both are correctly installed,then pyqtgraph is favoured unless speci?cally speci?ed. AcknowledgementsWe would like to thanks all coding authors of the GSPBOX.The toolbox was ported in Python by Basile Chatillon,Alexandre Lafaye and Nicolas Rod.The toolbox was also improved by Nauman Shahid and Yann Sch?nenberger.References[1]M.Belkin,P.Niyogi,and V.Sindhwani.Manifold regularization:A geometric framework for learning from labeled and unlabeledexamples.The Journal of Machine Learning Research,7:2399–2434,2006.[2] D.K.Hammond,P.Vandergheynst,and R.Gribonval.Wavelets on graphs via spectral graph theory.Applied and ComputationalHarmonic Analysis,30(2):129–150,2011.[3]M.Muja and D.G.Lowe.Scalable nearest neighbor algorithms for high dimensional data.Pattern Analysis and Machine Intelligence,IEEE Transactions on,36,2014.[4]S.K.Narang,Y.H.Chao,and A.Ortega.Graph-wavelet?lterbanks for edge-aware image processing.In Statistical Signal ProcessingWorkshop(SSP),2012IEEE,pages141–144.IEEE,2012.[5]N.Perraudin,D.Shuman,G.Puy,and P.Vandergheynst.UNLocBoX A matlab convex optimization toolbox using proximal splittingmethods.ArXiv e-prints,Feb.2014.[6] D.I.Shuman,M.J.Faraji,and P.Vandergheynst.A multiscale pyramid transform for graph signals.arXiv preprint arXiv:1308.4942,2013.[7] D.I.Shuman,S.K.Narang,P.Frossard,A.Ortega,and P.Vandergheynst.The emerging?eld of signal processing on graphs:Extendinghigh-dimensional data analysis to networks and other irregular domains.Signal Processing Magazine,IEEE,30(3):83–98,2013.7[8] D.I.Shuman,B.Ricaud,and P.Vandergheynst.A windowed graph Fourier transform.Statistical Signal Processing Workshop(SSP),2012IEEE,pages133–136,2012.[9] D.I.Shuman,B.Ricaud,and P.Vandergheynst.Vertex-frequency analysis on graphs.arXiv preprint arXiv:1307.5708,2013.[10] D.I.Shuman,C.Wiesmeyr,N.Holighaus,and P.Vandergheynst.Spectrum-adapted tight graph wavelet and vertex-frequency frames.arXiv preprint arXiv:1311.0897,2013.[11] A.Susnjara,N.Perraudin,D.Kressner,and P.Vandergheynst.Accelerated?ltering on graphs using lanczos method.arXiv preprintarXiv:1509.04537,2015.[12] F.Zhang and E.R.Hancock.Graph spectral image smoothing using the heat kernel.Pattern Recognition,41(11):3328–3342,2008.[13] D.Zhou,O.Bousquet,/doc/c09ff3e90342a8956bec0975f46527d3240ca692.html l,J.Weston,and B.Sch?lkopf.Learning with local and global consistency.Advances in neural informationprocessing systems,16(16):321–328,2004.[14] D.Zhou,J.Huang,and B.Sch?lkopf.Learning from labeled and unlabeled data on a directed graph.In the22nd international conference,pages1036–1043,New York,New York,USA,2005.ACM Press.[15] D.Zhou and B.Sch?lkopf.A regularization framework for learning from graph data.2004.8。
基于多尺度小波变换的变步长LMS滤波算法

基于多尺度小波变换的变步长LMS滤波算法缑新科;王娜娜【摘要】对LMS自适应算法、基于抽样函数的变步长LMS算法和基于多尺度小波变换的自适应滤波算法进行了研究,在此基础上把变步长LMS算法与多尺度小波变换相结合,产生了新算法.该算法一方面可以克服固定步长LMS算法在收敛速度与收敛精度方面与步长因子的矛盾;另一方面,小波变换的引入减少了输入向量自相关矩阵的条件数,提高了收敛速度、跟踪性能和稳定性.最后对算法的性能进行了计算机仿真比较,仿真结果表明:基于多尺度小波变换的变步长LMS滤波算法具有较快的收敛速度和更强的抑噪能力.%The paper represents a new algorithm from the mergence of the variable step size LMS algorithm and the multi-scale wavelet transform on the basis of studies of LMS algorithm, variable step size LMS algorithm based on sample function ,and LMS adaptive filtering algorithm based on multi-scale wavelet transform. On one hand,this new algorithm can overcome the contradiction between the fixed step size LMS algorithm' s speed and accuracy of convergence and its step factors. On the other hand, the introduction of wavelet reduce the conditions of auto-correlation matrix of the input vector, and improve the speed, tracking performance and stability of contradication. Finally, the algorithm is compared by computer simulation, the simulation results show that a adaptive LMS algorithm with variable step based on multi-scale wavelet transform has faster convergence speed, better performance, and stronger robustness against noise and disturbance.【期刊名称】《工业仪表与自动化装置》【年(卷),期】2012(000)003【总页数】6页(P5-9,13)【关键词】自适应滤波;LMS算法;变步长;小波变换;Matlab仿真【作者】缑新科;王娜娜【作者单位】兰州理工大学电气工程与信息工程学院,兰州730050;兰州理工大学电气工程与信息工程学院,兰州730050【正文语种】中文【中图分类】O2310 引言1967年Windrow等人提出了自适应LMS滤波算法,自适应滤波系统的参数能自动的调整而达到最优状况,而且在设计时,只需要很少的或根本不需要任何关于信号与噪声的先验统计知识。
一种变步长LMS算法提取胎儿心电

一种变步长LMS算法提取胎儿心电
袁丽;吴水才;袁延超;高小峰
【期刊名称】《中国医疗设备》
【年(卷),期】2018(033)003
【摘要】目的为了解决LMS算法在收敛速度与稳态误差之间的矛盾,本文提出一种变步长LMS算法提取胎儿心电.方法首先将母体腹部信号作为原始输入信号,母体胸部信号作为参考输入信号,对两路信号进行预处理,去除基线漂移,再使用本文设计的算法提取胎儿心电.结果实验结果表明本文算法在收敛速度与跟踪能力方面均优于传统LMS算法,并且可提取出清晰的胎儿心电.结论改进后的LMS算法可以提取出比较清晰的胎儿心电.
【总页数】4页(P15-17,21)
【作者】袁丽;吴水才;袁延超;高小峰
【作者单位】北京工业大学生命科学与生物工程学院,北京 100124;北京工业大学生命科学与生物工程学院,北京 100124;北京工业大学生命科学与生物工程学院,北京 100124;北京麦迪克斯科技有限公司,北京 100095
【正文语种】中文
【中图分类】R540.4+1
【相关文献】
1.小波域变步长ICA算法提取胎儿心电信号 [J], 占海龙;赵治栋
2.一种基于变抽头长度的变步长LMS算法 [J], 雷翼龙;余涛
3.基于迭代次数变步长的LMS算法在ECG信号提取中的应用 [J], 孙俊香;赵继印
4.一种面向车内噪声控制的改进变步长LMS算法 [J], 钱梦男;卢剑伟;晏桂喜;郭嘉豪
5.基于分段式变步长截断误差LMS算法的心率信号提取 [J], 赵继印;田宝凤;李建坡;隋俊凌
因版权原因,仅展示原文概要,查看原文内容请购买。
mi配准算法阈值

mi配准算法阈值1. 什么是mi配准算法mi配准算法是一种基于互信息的图像配准方法,用于将两幅图像进行对齐,使得它们在空间上完全或近似重合。
图像配准在医学影像处理、计算机视觉和遥感图像处理等领域具有广泛的应用。
2. mi配准算法的原理mi配准算法基于互信息度量图像之间的相似性。
互信息是一种统计量,用于衡量两个随机变量之间的相关性。
在图像配准中,互信息可以用来度量两幅图像的相似程度。
mi配准算法的主要步骤如下:步骤1:图像预处理首先,对待配准的两幅图像进行预处理。
这包括图像灰度化、滤波、去噪等操作,以提高配准的准确性和稳定性。
步骤2:选择参考图像和待配准图像从待配准的图像集中选择一幅作为参考图像,其他图像作为待配准图像。
参考图像是用来作为参照物进行配准的基准。
步骤3:计算互信息对于每一对参考图像和待配准图像,计算它们之间的互信息。
互信息可以用公式表示如下:I(X;Y)=∑∑py∈Yx∈X (x,y)log(p(x,y)p(x)p(y))其中,X和Y分别表示参考图像和待配准图像的像素值集合,p(x,y)表示(x,y)对出现的概率,p(x)和p(y)分别表示x和y出现的概率。
步骤4:寻找最大互信息对于每一对参考图像和待配准图像,计算它们之间的互信息。
通过遍历所有可能的图像配准变换参数,找到使互信息最大的配准变换参数。
步骤5:应用配准变换将找到的最佳配准变换参数应用到待配准图像上,对其进行配准。
配准变换可以是平移、旋转、缩放等操作。
步骤6:评估配准结果评估配准结果的准确性和稳定性。
可以使用各种评估指标,如均方差、互信息等。
3. mi配准算法阈值的作用mi配准算法阈值是用来控制配准过程中的相似性度量的灵敏度。
当阈值较高时,只有相似性较高的图像区域才会被考虑在内,从而提高配准的准确性和稳定性。
当阈值较低时,相似性较低的图像区域也会被考虑在内,从而增加了配准的灵活性和鲁棒性。
mi配准算法阈值的选择需要根据具体应用场景和需求来确定。
Kernels and regularization on graphs

Kernels and Regularization on GraphsAlexander J.Smola1and Risi Kondor21Machine Learning Group,RSISEAustralian National UniversityCanberra,ACT0200,AustraliaAlex.Smola@.au2Department of Computer ScienceColumbia University1214Amsterdam Avenue,M.C.0401New York,NY10027,USArisi@Abstract.We introduce a family of kernels on graphs based on thenotion of regularization operators.This generalizes in a natural way thenotion of regularization and Greens functions,as commonly used forreal valued functions,to graphs.It turns out that diffusion kernels canbe found as a special case of our reasoning.We show that the class ofpositive,monotonically decreasing functions on the unit interval leads tokernels and corresponding regularization operators.1IntroductionThere has recently been a surge of interest in learning algorithms that operate on input spaces X other than R n,specifically,discrete input spaces,such as strings, graphs,trees,automata etc..Since kernel-based algorithms,such as Support Vector Machines,Gaussian Processes,Kernel PCA,etc.capture the structure of X via the kernel K:X×X→R,as long as we can define an appropriate kernel on our discrete input space,these algorithms can be imported wholesale, together with their error analysis,theoretical guarantees and empirical success.One of the most general representations of discrete metric spaces are graphs. Even if all we know about our input space are local pairwise similarities between points x i,x j∈X,distances(e.g shortest path length)on the graph induced by these similarities can give a useful,more global,sense of similarity between objects.In their work on Diffusion Kernels,Kondor and Lafferty[2002]gave a specific construction for a kernel capturing this structure.Belkin and Niyogi [2002]proposed an essentially equivalent construction in the context of approx-imating data lying on surfaces in a high dimensional embedding space,and in the context of leveraging information from unlabeled data.In this paper we put these earlier results into the more principled framework of Regularization Theory.We propose a family of regularization operators(equiv-alently,kernels)on graphs that include Diffusion Kernels as a special case,and show that this family encompasses all possible regularization operators invariant under permutations of the vertices in a particular sense.2Alexander Smola and Risi KondorOutline of the Paper:Section2introduces the concept of the graph Laplacian and relates it to the Laplace operator on real valued functions.Next we define an extended class of regularization operators and show why they have to be es-sentially a function of the Laplacian.An analogy to real valued Greens functions is established in Section3.3,and efficient methods for computing such functions are presented in Section4.We conclude with a discussion.2Laplace OperatorsAn undirected unweighted graph G consists of a set of vertices V numbered1to n,and a set of edges E(i.e.,pairs(i,j)where i,j∈V and(i,j)∈E⇔(j,i)∈E). We will sometimes write i∼j to denote that i and j are neighbors,i.e.(i,j)∈E. The adjacency matrix of G is an n×n real matrix W,with W ij=1if i∼j,and 0otherwise(by construction,W is symmetric and its diagonal entries are zero). These definitions and most of the following theory can trivially be extended toweighted graphs by allowing W ij∈[0,∞).Let D be an n×n diagonal matrix with D ii=jW ij.The Laplacian of Gis defined as L:=D−W and the Normalized Laplacian is˜L:=D−12LD−12= I−D−12W D−12.The following two theorems are well known results from spectral graph theory[Chung-Graham,1997]:Theorem1(Spectrum of˜L).˜L is a symmetric,positive semidefinite matrix, and its eigenvaluesλ1,λ2,...,λn satisfy0≤λi≤2.Furthermore,the number of eigenvalues equal to zero equals to the number of disjoint components in G.The bound on the spectrum follows directly from Gerschgorin’s Theorem.Theorem2(L and˜L for Regular Graphs).Now let G be a regular graph of degree d,that is,a graph in which every vertex has exactly d neighbors.ThenL=d I−W and˜L=I−1d W=1dL.Finally,W,L,˜L share the same eigenvectors{v i},where v i=λ−1iW v i=(d−λi)−1L v i=(1−d−1λi)−1˜L v i for all i.L and˜L can be regarded as linear operators on functions f:V→R,or,equiv-alently,on vectors f=(f1,f2,...,f n) .We could equally well have defined Lbyf,L f =f L f=−12i∼j(f i−f j)2for all f∈R n,(1)which readily generalizes to graphs with a countably infinite number of vertices.The Laplacian derives its name from its analogy with the familiar Laplacianoperator∆=∂2∂x21+∂2∂x22+...+∂2∂x2mon continuous spaces.Regarding(1)asinducing a semi-norm f L= f,L f on R n,the analogous expression for∆defined on a compact spaceΩisf ∆= f,∆f =Ωf(∆f)dω=Ω(∇f)·(∇f)dω.(2)Both(1)and(2)quantify how much f and f vary locally,or how“smooth”they are over their respective domains.Kernels and Regularization on Graphs3 More explicitly,whenΩ=R m,up to a constant,−L is exactly thefinite difference discretization of∆on a regular lattice:∆f(x)=mi=1∂2∂x2if≈mi=1∂∂x if(x+12e i)−∂∂x if(x−12e i)δ≈mi=1f(x+e i)+f(x−e i)−2f(x)δ2=1δ2mi=1(f x1,...,x i+1,...,x m+f x1,...,x i−1,...,x m−2f x1,...,x m)=−1δ2[L f]x1,...,x m,where e1,e2,...,e m is an orthogonal basis for R m normalized to e i =δ, the vertices of the lattice are at x=x1e1+...+x m e m with integer valuedcoordinates x i∈N,and f x1,x2,...,x m=f(x).Moreover,both the continuous and the dis-crete Laplacians are canonical operators on their respective domains,in the sense that they are invariant under certain natural transformations of the underlying space,and in this they are essentially unique.Regular grid in two dimensionsThe Laplace operator∆is the unique self-adjoint linear second order differ-ential operator invariant under transformations of the coordinate system under the action of the special orthogonal group SO m,i.e.invariant under rotations. This well known result can be seen by using Schur’s lemma and the fact that SO m is irreducible on R m.We now show a similar result for L.Here the permutation group plays a similar role to SO m.We need some additional definitions:denote by S n the group of permutations on{1,2,...,n}withπ∈S n being a specific permutation taking i∈{1,2,...n}toπ(i).The so-called defining representation of S n consists of n×n matricesΠπ,such that[Ππ]i,π(i)=1and all other entries ofΠπare zero. Theorem3(Permutation Invariant Linear Functions on Graphs).Let L be an n×n symmetric real matrix,linearly related to the n×n adjacency matrix W,i.e.L=T[W]for some linear operator L in a way invariant to permutations of vertices in the sense thatΠ πT[W]Ππ=TΠ πWΠπ(3)for anyπ∈S n.Then L is related to W by a linear combination of the follow-ing three operations:identity;row/column sums;overall sum;row/column sum restricted to the diagonal of L;overall sum restricted to the diagonal of W. Proof LetL i1i2=T[W]i1i2:=ni3=1ni4=1T i1i2i3i4W i3i4(4)with T∈R n4.Eq.(3)then implies Tπ(i1)π(i2)π(i3)π(i4)=T i1i2i3i4for anyπ∈S n.4Alexander Smola and Risi KondorThe indices of T can be partitioned by the equality relation on their values,e.g.(2,5,2,7)is of the partition type [13|2|4],since i 1=i 3,but i 2=i 1,i 4=i 1and i 2=i 4.The key observation is that under the action of the permutation group,elements of T with a given index partition structure are taken to elements with the same index partition structure,e.g.if i 1=i 3then π(i 1)=π(i 3)and if i 1=i 3,then π(i 1)=π(i 3).Furthermore,an element with a given index index partition structure can be mapped to any other element of T with the same index partition structure by a suitable choice of π.Hence,a necessary and sufficient condition for (4)is that all elements of T of a given index partition structure be equal.Therefore,T must be a linear combination of the following tensors (i.e.multilinear forms):A i 1i 2i 3i 4=1B [1,2]i 1i 2i 3i 4=δi 1i 2B [1,3]i 1i 2i 3i 4=δi 1i 3B [1,4]i 1i 2i 3i 4=δi 1i 4B [2,3]i 1i 2i 3i 4=δi 2i 3B [2,4]i 1i 2i 3i 4=δi 2i 4B [3,4]i 1i 2i 3i 4=δi 3i 4C [1,2,3]i 1i 2i 3i 4=δi 1i 2δi 2i 3C [2,3,4]i 1i 2i 3i 4=δi 2i 3δi 3i 4C [3,4,1]i 1i 2i 3i 4=δi 3i 4δi 4i 1C [4,1,2]i 1i 2i 3i 4=δi 4i 1δi 1i 2D [1,2][3,4]i 1i 2i 3i 4=δi 1i 2δi 3i 4D [1,3][2,4]i 1i 2i 3i 4=δi 1i 3δi 2i 4D [1,4][2,3]i 1i 2i 3i 4=δi 1i 4δi 2i 3E [1,2,3,4]i 1i 2i 3i 4=δi 1i 2δi 1i 3δi 1i 4.The tensor A puts the overall sum in each element of L ,while B [1,2]returns the the same restricted to the diagonal of L .Since W has vanishing diagonal,B [3,4],C [2,3,4],C [3,4,1],D [1,2][3,4]and E [1,2,3,4]produce zero.Without loss of generality we can therefore ignore them.By symmetry of W ,the pairs (B [1,3],B [1,4]),(B [2,3],B [2,4]),(C [1,2,3],C [4,1,2])have the same effect on W ,hence we can set the coefficient of the second member of each to zero.Furthermore,to enforce symmetry on L ,the coefficient of B [1,3]and B [2,3]must be the same (without loss of generality 1)and this will give the row/column sum matrix ( k W ik )+( k W kl ).Similarly,C [1,2,3]and C [4,1,2]must have the same coefficient and this will give the row/column sum restricted to the diagonal:δij [( k W ik )+( k W kl )].Finally,by symmetry of W ,D [1,3][2,4]and D [1,4][2,3]are both equivalent to the identity map.The various row/column sum and overall sum operations are uninteresting from a graph theory point of view,since they do not heed to the topology of the graph.Imposing the conditions that each row and column in L must sum to zero,we recover the graph Laplacian.Hence,up to a constant factor and trivial additive components,the graph Laplacian (or the normalized graph Laplacian if we wish to rescale by the number of edges per vertex)is the only “invariant”differential operator for given W (or its normalized counterpart ˜W ).Unless stated otherwise,all results below hold for both L and ˜L (albeit with a different spectrum)and we will,in the following,focus on ˜Ldue to the fact that its spectrum is contained in [0,2].Kernels and Regularization on Graphs5 3RegularizationThe fact that L induces a semi-norm on f which penalizes the changes between adjacent vertices,as described in(1),indicates that it may serve as a tool to design regularization operators.3.1Regularization via the Laplace OperatorWe begin with a brief overview of translation invariant regularization operators on continuous spaces and show how they can be interpreted as powers of∆.This will allow us to repeat the development almost verbatim with˜L(or L)instead.Some of the most successful regularization functionals on R n,leading to kernels such as the Gaussian RBF,can be written as[Smola et al.,1998]f,P f :=|˜f(ω)|2r( ω 2)dω= f,r(∆)f .(5)Here f∈L2(R n),˜f(ω)denotes the Fourier transform of f,r( ω 2)is a function penalizing frequency components|˜f(ω)|of f,typically increasing in ω 2,and finally,r(∆)is the extension of r to operators simply by applying r to the spectrum of∆[Dunford and Schwartz,1958]f,r(∆)f =if,ψi r(λi) ψi,fwhere{(ψi,λi)}is the eigensystem of∆.The last equality in(5)holds because applications of∆become multiplications by ω 2in Fourier space.Kernels are obtained by solving the self-consistency condition[Smola et al.,1998]k(x,·),P k(x ,·) =k(x,x ).(6) One can show that k(x,x )=κ(x−x ),whereκis equal to the inverse Fourier transform of r−1( ω 2).Several r functions have been known to yield good results.The two most popular are given below:r( ω 2)k(x,x )r(∆)Gaussian RBF expσ22ω 2exp−12σ2x−x 2∞i=0σ2ii!∆iLaplacian RBF1+σ2 ω 2exp−1σx−x1+σ2∆In summary,regularization according to(5)is carried out by penalizing˜f(ω) by a function of the Laplace operator.For many results in regularization theory one requires r( ω 2)→∞for ω 2→∞.3.2Regularization via the Graph LaplacianIn complete analogy to(5),we define a class of regularization functionals on graphs asf,P f := f,r(˜L)f .(7)6Alexander Smola and Risi KondorFig.1.Regularization function r (λ).From left to right:regularized Laplacian (σ2=1),diffusion process (σ2=1),one-step random walk (a =2),4-step random walk (a =2),inverse cosine.Here r (˜L )is understood as applying the scalar valued function r (λ)to the eigen-values of ˜L ,that is,r (˜L ):=m i =1r (λi )v i v i ,(8)where {(λi ,v i )}constitute the eigensystem of ˜L .The normalized graph Lapla-cian ˜Lis preferable to L ,since ˜L ’s spectrum is contained in [0,2].The obvious goal is to gain insight into what functions are appropriate choices for r .–From (1)we infer that v i with large λi correspond to rather uneven functions on the graph G .Consequently,they should be penalized more strongly than v i with small λi .Hence r (λ)should be monotonically increasing in λ.–Requiring that r (˜L) 0imposes the constraint r (λ)≥0for all λ∈[0,2].–Finally,we can limit ourselves to r (λ)expressible as power series,since the latter are dense in the space of C 0functions on bounded domains.In Section 3.5we will present additional motivation for the choice of r (λ)in the context of spectral graph theory and segmentation.As we shall see,the following functions are of particular interest:r (λ)=1+σ2λ(Regularized Laplacian)(9)r (λ)=exp σ2/2λ(Diffusion Process)(10)r (λ)=(aI −λ)−1with a ≥2(One-Step Random Walk)(11)r (λ)=(aI −λ)−p with a ≥2(p -Step Random Walk)(12)r (λ)=(cos λπ/4)−1(Inverse Cosine)(13)Figure 1shows the regularization behavior for the functions (9)-(13).3.3KernelsThe introduction of a regularization matrix P =r (˜L)allows us to define a Hilbert space H on R m via f,f H := f ,P f .We now show that H is a reproducing kernel Hilbert space.Kernels and Regularization on Graphs 7Theorem 4.Denote by P ∈R m ×m a (positive semidefinite)regularization ma-trix and denote by H the image of R m under P .Then H with dot product f,f H := f ,P f is a Reproducing Kernel Hilbert Space and its kernel is k (i,j )= P −1ij ,where P −1denotes the pseudo-inverse if P is not invertible.Proof Since P is a positive semidefinite matrix,we clearly have a Hilbert space on P R m .To show the reproducing property we need to prove thatf (i )= f,k (i,·) H .(14)Note that k (i,j )can take on at most m 2different values (since i,j ∈[1:m ]).In matrix notation (14)means that for all f ∈Hf (i )=f P K i,:for all i ⇐⇒f =f P K.(15)The latter holds if K =P −1and f ∈P R m ,which proves the claim.In other words,K is the Greens function of P ,just as in the continuous case.The notion of Greens functions on graphs was only recently introduced by Chung-Graham and Yau [2000]for L .The above theorem extended this idea to arbitrary regularization operators ˆr (˜L).Corollary 1.Denote by P =r (˜L )a regularization matrix,then the correspond-ing kernel is given by K =r −1(˜L ),where we take the pseudo-inverse wherever necessary.More specifically,if {(v i ,λi )}constitute the eigensystem of ˜L,we have K =mi =1r −1(λi )v i v i where we define 0−1≡0.(16)3.4Examples of KernelsBy virtue of Corollary 1we only need to take (9)-(13)and plug the definition of r (λ)into (16)to obtain formulae for computing K .This yields the following kernel matrices:K =(I +σ2˜L)−1(Regularized Laplacian)(17)K =exp(−σ2/2˜L)(Diffusion Process)(18)K =(aI −˜L)p with a ≥2(p -Step Random Walk)(19)K =cos ˜Lπ/4(Inverse Cosine)(20)Equation (18)corresponds to the diffusion kernel proposed by Kondor and Laf-ferty [2002],for which K (x,x )can be visualized as the quantity of some sub-stance that would accumulate at vertex x after a given amount of time if we injected the substance at vertex x and let it diffuse through the graph along the edges.Note that this involves matrix exponentiation defined via the limit K =exp(B )=lim n →∞(I +B/n )n as opposed to component-wise exponentiation K i,j =exp(B i,j ).8Alexander Smola and Risi KondorFig.2.Thefirst8eigenvectors of the normalized graph Laplacian corresponding to the graph drawn above.Each line attached to a vertex is proportional to the value of the corresponding eigenvector at the vertex.Positive values(red)point up and negative values(blue)point down.Note that the assignment of values becomes less and less uniform with increasing eigenvalue(i.e.from left to right).For(17)it is typically more efficient to deal with the inverse of K,as it avoids the costly inversion of the sparse matrix˜L.Such situations arise,e.g.,in Gaussian Process estimation,where K is the covariance matrix of a stochastic process[Williams,1999].Regarding(19),recall that(aI−˜L)p=((a−1)I+˜W)p is up to scaling terms equiv-alent to a p-step random walk on the graphwith random restarts(see Section A for de-tails).In this sense it is similar to the dif-fusion kernel.However,the fact that K in-volves only afinite number of products ofmatrices makes it much more attractive forpractical purposes.In particular,entries inK ij can be computed cheaply using the factthat˜L is a sparse matrix.A nearest neighbor graph.Finally,the inverse cosine kernel treats lower complexity functions almost equally,with a significant reduction in the upper end of the spectrum.Figure2 shows the leading eigenvectors of the graph drawn above and Figure3provide examples of some of the kernels discussed above.3.5Clustering and Spectral Graph TheoryWe could also have derived r(˜L)directly from spectral graph theory:the eigen-vectors of the graph Laplacian correspond to functions partitioning the graph into clusters,see e.g.,[Chung-Graham,1997,Shi and Malik,1997]and the ref-erences therein.In general,small eigenvalues have associated eigenvectors which vary little between adjacent vertices.Finding the smallest eigenvectors of˜L can be seen as a real-valued relaxation of the min-cut problem.3For instance,the smallest eigenvalue of˜L is0,its corresponding eigenvector is D121n with1n:=(1,...,1)∈R n.The second smallest eigenvalue/eigenvector pair,also often referred to as the Fiedler-vector,can be used to split the graph 3Only recently,algorithms based on the celebrated semidefinite relaxation of the min-cut problem by Goemans and Williamson[1995]have seen wider use[Torr,2003]in segmentation and clustering by use of spectral bundle methods.Kernels and Regularization on Graphs9Fig.3.Top:regularized graph Laplacian;Middle:diffusion kernel with σ=5,Bottom:4-step random walk kernel.Each figure displays K ij for fixed i .The value K ij at vertex i is denoted by a bold line.Note that only adjacent vertices to i bear significant value.into two distinct parts [Weiss,1999,Shi and Malik,1997],and further eigenvec-tors with larger eigenvalues have been used for more finely-grained partitions of the graph.See Figure 2for an example.Such a decomposition into functions of increasing complexity has very de-sirable properties:if we want to perform estimation on the graph,we will wish to bias the estimate towards functions which vary little over large homogeneous portions 4.Consequently,we have the following interpretation of f,f H .As-sume that f = i βi v i ,where {(v i ,λi )}is the eigensystem of ˜L.Then we can rewrite f,f H to yield f ,r (˜L )f = i βi v i , j r (λj )v j v j l βl v l = iβ2i r (λi ).(21)This means that the components of f which vary a lot over coherent clusters in the graph are penalized more strongly,whereas the portions of f ,which are essentially constant over clusters,are preferred.This is exactly what we want.3.6Approximate ComputationOften it is not necessary to know all values of the kernel (e.g.,if we only observe instances from a subset of all positions on the graph).There it would be wasteful to compute the full matrix r (L )−1explicitly,since such operations typically scale with O (n 3).Furthermore,for large n it is not desirable to compute K via (16),that is,by computing the eigensystem of ˜Land assembling K directly.4If we cannot assume a connection between the structure of the graph and the values of the function to be estimated on it,the entire concept of designing kernels on graphs obviously becomes meaningless.10Alexander Smola and Risi KondorInstead,we would like to take advantage of the fact that ˜L is sparse,and con-sequently any operation ˜Lαhas cost at most linear in the number of nonzero ele-ments of ˜L ,hence the cost is bounded by O (|E |+n ).Moreover,if d is the largest degree of the graph,then computing L p e i costs at most |E | p −1i =1(min(d +1,n ))ioperations:at each step the number of non-zeros in the rhs decreases by at most a factor of d +1.This means that as long as we can approximate K =r −1(˜L )by a low order polynomial,say ρ(˜L ):= N i =0βi ˜L i ,significant savings are possible.Note that we need not necessarily require a uniformly good approximation and put the main emphasis on the approximation for small λ.However,we need to ensure that ρ(˜L)is positive semidefinite.Diffusion Kernel:The fact that the series r −1(x )=exp(−βx )= ∞m =0(−β)m x m m !has alternating signs shows that the approximation error at r −1(x )is boundedby (2β)N +1(N +1)!,if we use N terms in the expansion (from Theorem 1we know that ˜L≤2).For instance,for β=1,10terms are sufficient to obtain an error of the order of 10−4.Variational Approximation:In general,if we want to approximate r −1(λ)on[0,2],we need to solve the L ∞([0,2])approximation problemminimize β, subject to N i =0βi λi −r −1(λ) ≤ ∀λ∈[0,2](22)Clearly,(22)is equivalent to minimizing sup ˜L ρ(˜L )−r−1(˜L ) ,since the matrix norm is determined by the largest eigenvalues,and we can find ˜Lsuch that the discrepancy between ρ(λ)and r −1(λ)is attained.Variational problems of this form have been studied in the literature,and their solution may provide much better approximations to r −1(λ)than a truncated power series expansion.4Products of GraphsAs we have already pointed out,it is very expensive to compute K for arbitrary ˆr and ˜L.For special types of graphs and regularization,however,significant computational savings can be made.4.1Factor GraphsThe work of this section is a direct extension of results by Ellis [2002]and Chung-Graham and Yau [2000],who study factor graphs to compute inverses of the graph Laplacian.Definition 1(Factor Graphs).Denote by (V,E )and (V ,E )the vertices V and edges E of two graphs,then the factor graph (V f ,E f ):=(V,E )⊗(V ,E )is defined as the graph where (i,i )∈V f if i ∈V and i ∈V ;and ((i,i ),(j,j ))∈E f if and only if either (i,j )∈E and i =j or (i ,j )∈E and i =j .Kernels and Regularization on Graphs 11For instance,the factor graph of two rings is a torus.The nice property of factor graphs is that we can compute the eigenvalues of the Laplacian on products very easily (see e.g.,Chung-Graham and Yau [2000]):Theorem 5(Eigenvalues of Factor Graphs).The eigenvalues and eigen-vectors of the normalized Laplacian for the factor graph between a regular graph of degree d with eigenvalues {λj }and a regular graph of degree d with eigenvalues {λ l }are of the form:λfact j,l =d d +d λj +d d +d λ l(23)and the eigenvectors satisfy e j,l(i,i )=e j i e l i ,where e j is an eigenvector of ˜L and e l is an eigenvector of ˜L.This allows us to apply Corollary 1to obtain an expansion of K asK =(r (L ))−1=j,l r −1(λjl )e j,l e j,l .(24)While providing an explicit recipe for the computation of K ij without the need to compute the full matrix K ,this still requires O (n 2)operations per entry,which may be more costly than what we want (here n is the number of vertices of the factor graph).Two methods for computing (24)become evident at this point:if r has a special structure,we may exploit this to decompose K into the products and sums of terms depending on one of the two graphs alone and pre-compute these expressions beforehand.Secondly,if one of the two terms in the expansion can be computed for a rather general class of values of r (x ),we can pre-compute this expansion and only carry out the remainder corresponding to (24)explicitly.4.2Product Decomposition of r (x )Central to our reasoning is the observation that for certain r (x ),the term 1r (a +b )can be expressed in terms of a product and sum of terms depending on a and b only.We assume that 1r (a +b )=M m =1ρn (a )˜ρn (b ).(25)In the following we will show that in such situations the kernels on factor graphs can be computed as an analogous combination of products and sums of kernel functions on the terms constituting the ingredients of the factor graph.Before we do so,we briefly check that many r (x )indeed satisfy this property.exp(−β(a +b ))=exp(−βa )exp(−βb )(26)(A −(a +b ))= A 2−a + A 2−b (27)(A −(a +b ))p =p n =0p n A 2−a n A 2−b p −n (28)cos (a +b )π4=cos aπ4cos bπ4−sin aπ4sin bπ4(29)12Alexander Smola and Risi KondorIn a nutshell,we will exploit the fact that for products of graphs the eigenvalues of the joint graph Laplacian can be written as the sum of the eigenvalues of the Laplacians of the constituent graphs.This way we can perform computations on ρn and˜ρn separately without the need to take the other part of the the product of graphs into account.Definek m(i,j):=l ρldλld+de l i e l j and˜k m(i ,j ):=l˜ρldλld+d˜e l i ˜e l j .(30)Then we have the following composition theorem:Theorem6.Denote by(V,E)and(V ,E )connected regular graphs of degrees d with m vertices(and d ,m respectively)and normalized graph Laplacians ˜L,˜L .Furthermore denote by r(x)a rational function with matrix-valued exten-sionˆr(X).In this case the kernel K corresponding to the regularization operator ˆr(L)on the product graph of(V,E)and(V ,E )is given byk((i,i ),(j,j ))=Mm=1k m(i,j)˜k m(i ,j )(31)Proof Plug the expansion of1r(a+b)as given by(25)into(24)and collect terms.From(26)we immediately obtain the corollary(see Kondor and Lafferty[2002]) that for diffusion processes on factor graphs the kernel on the factor graph is given by the product of kernels on the constituents,that is k((i,i ),(j,j ))= k(i,j)k (i ,j ).The kernels k m and˜k m can be computed either by using an analytic solution of the underlying factors of the graph or alternatively they can be computed numerically.If the total number of kernels k n is small in comparison to the number of possible coordinates this is still computationally beneficial.4.3Composition TheoremsIf no expansion as in(31)can be found,we may still be able to compute ker-nels by extending a reasoning from[Ellis,2002].More specifically,the following composition theorem allows us to accelerate the computation in many cases, whenever we can parameterize(ˆr(L+αI))−1in an efficient way.For this pur-pose we introduce two auxiliary functionsKα(i,j):=ˆrdd+dL+αdd+dI−1=lrdλl+αdd+d−1e l(i)e l(j)G α(i,j):=(L +αI)−1=l1λl+αe l(i)e l(j).(32)In some cases Kα(i,j)may be computed in closed form,thus obviating the need to perform expensive matrix inversion,e.g.,in the case where the underlying graph is a chain[Ellis,2002]and Kα=Gα.Kernels and Regularization on Graphs 13Theorem 7.Under the assumptions of Theorem 6we haveK ((j,j ),(l,l ))=12πi C K α(j,l )G −α(j ,l )dα= v K λv (j,l )e v j e v l (33)where C ⊂C is a contour of the C containing the poles of (V ,E )including 0.For practical purposes,the third term of (33)is more amenable to computation.Proof From (24)we haveK ((j,j ),(l,l ))= u,v r dλu +d λv d +d −1e u j e u l e v j e v l (34)=12πi C u r dλu +d αd +d −1e u j e u l v 1λv −αe v j e v l dαHere the second equalityfollows from the fact that the contour integral over a pole p yields C f (α)p −αdα=2πif (p ),and the claim is verified by checking thedefinitions of K αand G α.The last equality can be seen from (34)by splitting up the summation over u and v .5ConclusionsWe have shown that the canonical family of kernels on graphs are of the form of power series in the graph Laplacian.Equivalently,such kernels can be char-acterized by a real valued function of the eigenvalues of the Laplacian.Special cases include diffusion kernels,the regularized Laplacian kernel and p -step ran-dom walk kernels.We have developed the regularization theory of learning on graphs using such kernels and explored methods for efficiently computing and approximating the kernel matrix.Acknowledgments This work was supported by a grant of the ARC.The authors thank Eleazar Eskin,Patrick Haffner,Andrew Ng,Bob Williamson and S.V.N.Vishwanathan for helpful comments and suggestions.A Link AnalysisRather surprisingly,our approach to regularizing functions on graphs bears re-semblance to algorithms for scoring web pages such as PageRank [Page et al.,1998],HITS [Kleinberg,1999],and randomized HITS [Zheng et al.,2001].More specifically,the random walks on graphs used in all three algorithms and the stationary distributions arising from them are closely connected with the eigen-system of L and ˜Lrespectively.We begin with an analysis of PageRank.Given a set of web pages and links between them we construct a directed graph in such a way that pages correspond。
用于生命科学的人工智能定量相位成像方法

用于生命科学的人工智能定量相位成像方法English:Artificial intelligence (AI) has revolutionized the field of quantitative phase imaging (QPI) in life sciences by enabling high-throughput and accurate analysis of complex biological samples. AI-based QPI methods integrate advanced machine learning algorithms with computational imaging techniques to extract quantitative information from phase images with unprecedented precision and speed. These methods leverage deep learning models to perform tasks such as cell segmentation, classification, and tracking, enabling researchers to study dynamic cellular processes and disease progression with minimal human intervention. One example of AI-enabled QPI is label-free cell classification, where AI algorithms can accurately differentiate between different cell types based on their phase images, providing valuable insights into cellular morphology and function. Furthermore, AI-based QPI approaches can also improve the sensitivity and specificity of quantitative phase measurements, allowing for more accurate quantification of cellular and subcellular features. Overall, the integration of AI with QPI holds great potential for advancing our understanding of complexbiological systems and accelerating the development of novel diagnostic and therapeutic strategies in the life sciences.中文翻译:人工智能(AI)已经彻底改革了生命科学中的定量相位成像(QPI)领域,通过实现对复杂生物样品的高通量和准确分析。
德国工业4.0原版

Intense research activities in universities and other research institutions Drastically increasing number of publications in recent years Large amount of funding by the German government
Model predictive control (MPC)
Modern, optimization-based control technique Successful applications in many industrial fields Can handle hard constraints on states and inputs Optimization of some performance criterion Applicable to nonlinear, MIMO systems
A system is strictly dissipative on a set W ⊆ Z with respect to the supply rate s if there exists a storage function λ such that for all (x , u ) ∈ W it holds that λ(f (x , u )) − λ(x ) ≤ s (x , u ) − ρ(x ) with ρ > 0.
k =0 x (k |t + 1) x (t + 1) state x input u t+1 u (k |t + 1) k =N
Basic MPC scheme
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
(1) where,
x(t) = s(t)a(θo ) +
i=1
ui (t)a(θi )
• x(t) - the received signal (output of antenna array). • a(θo ) - the steering vector for the desired signal. • a(θi ) - the steering vector for the ith interfering signal. The generalized spatial filtering problem is to estimate the signal s(t) from the received signal x(t) such the some measure of error between the estimate s ˆ(t) and the original signal s(t) is minimized, based on certain (optional) constraints such as constant modulus, exact position of nulls, etc. The mean squared error (MSE) criterion is normally used for optimization. In this case, the filtering problem can be setup as a classical Wiener Filtering problem for which the solution can be iteratively reached using the LMS and the SMI algorithms. The goal is to develop a general framework for the simulation of the LMS and SMI algorithms and to simulate their performance for various configurations of the antenna array, the number of interferers (Nu ), the statistical properties of the signal and the interferers and the statistical relations between the signal and the interferers (i.e, correlated or uncorrelated, etc).
LMS AND SMI ALGORITHMS FOR SPATIAL ADAPTIVE INTERFERENCE REJECTION
3
Contents List of Figures 1. The Spatial Filtering Problem 1.1. Statement 1.2. Problem Setup 1.3. The Adaptive Processor 1.4. Assumptions 2. The Wiener MMSE Solution 3. Case - I : The LMS Algorithm 3.1. The Method of Steepest Descent 3.2. The LMS Simplification 3.3. The Algorithm 3.4. Stability 3.5. Convergence 4. Case II : Sampled Matrix Invariance 4.1. Motivation 4.2. Formulation 4.3. Adaptation in the SMI algorithm 4.4. Stability 4.5. Convergence 5. Results - LMS Algorithm 5.1. Demonstration 5.2. Effect of µ 5.3. Dynamic Channel Variations 5.4. Convergence of the Weight Vector 5.5. Effect of Changing the SIR 6. Results - SMI Algorithm 6.1. Effect of varying the Block Size 6.2. Dynamic Variations in the Channel 7. Appendix - I : MATLAB Code 7.1. LMS Algorithm 7.2. Sample Matrix Inversion 7.3. Generate Signal 7.4. SteeringVector 7.5. CalcCovarianceMatrix 7.6. CrossCorrelationVector References 4 5 5 5 6 6 7 8 8 9 9 9 9 10 10 10 10 11 11 12 12 14 16 17 18 19 21 22 23 23 24 25 26 26 27 28
March 16, 2001 LMS AND SMI ALGORITHMS FOR SPATIAL ADAPTIVE INTERFERENCE REJECTION
VIJAYA CHANDRAN RAMASAMI
1
2
VIJAYA CHANDRAN RAMASAMI
Abstract. Adaptive Antenna Arrays promise an enormous bandwidth increase for Wireless Communications using Spatial domain processing which is considered to be the last frontier than can provide substantial capacity increase. The aim of this project is to set up and simulate an adaptive spatial filtering problem, using antenna arrays, where an estimate of the desired signal is constructed from a received signal consisting of additional interfering signals and background noise. The filtering is done in an iterative fashion using the LMS and the SMI algorithms.
4
VIJAYA CHAs 1 An Uniform Linear Array (ULA) 5 2 The Adaptive Processor 6 3 Example MSE Surface for N = 2 8 4 Array Factor Plot (Rectangular) for the Example Configuration 12 5 Array Factor Plot (Polar) for the Example Configuration 13 6 LMS Error for the Example Configuration 13 7 LMS Error for µ = 0.0001 14 8 LMS Error for µ = 0.001 14 9 LMS Error for µ = 0.003 15 10 LMS Error for µ = 0.03 15 11 LMS Error for Dynamic Changes in the Spatial Channel 16 12 Example Weight Vector Convergence 17 13 LMS Convergence for SIR = 0 dB (all interferers) 18 14 LMS Convergence for SIR = -20 dB (all interferers) 18 15 Array Factor Plot (Rectangular) for the Example Configuration. The array factors obtained for each block is superimposed to show similarity of processing 19 16 Absolute Error 20 17 SMI Error for the block size of 5 21 18 SMI Error for the block size of 100 21 19 Response to dynamic variations in the channel 22
LMS AND SMI ALGORITHMS FOR SPATIAL ADAPTIVE INTERFERENCE REJECTION
5
1. The Spatial Filtering Problem 1.1. Statement. Consider an antenna array that receive various signals from several directions in space. The spatial filtering problem is to optimize the antenna array pattern such that maximum possible gain is placed on the direction on the desired signal and nulls are placed on the directions of the interferers. 1.2. Problem Setup. Let s(t) denote the desired communications signal arriving at the array at an angle θs and ui (t) denote the Nu interferering signals arriving at the array at angle of incidences θi respectively. It is assumed here that the adaptive processor has no a-priori knowledge of the direction of arrivals (DOA’s) of both the signal and the interferers. The output of the antenna array is given by,