AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION

合集下载

一个新的对于无约束非凸优化问题渐近的算法

一个新的对于无约束非凸优化问题渐近的算法

一个新的对于无约束非凸优化问题渐近的算法陈汝栋;吴成玉【摘要】For the optimization of nonconvex functions in mathematical programming,according to the know nconvex function optimization results and the corresponding algorithm,a new im-proved asymptotic algorithm is constructed,and by using Kurdyka-Lojasiewicz property,the convergence analysis of unconstrained nonconvex optimization problems for real lower semicon-tinuous nonconvex functions is considered.The sequence generated by the improved asymptotic algorithm has finite length and converges to a critical point of the function are obtained.Mean-while,the result representation of the convergence rate of the sequence is given.%针对数学规划中的非凸函数的优化问题,根据已知的凸函数的优化结果及相应算法,构造新的渐进算法,并运用Kurdyka-Lojasiewicz不等式,对真下半连续的非凸函数的无约束非凸优化问题进行了收敛分析,得到了由改进的渐进算法生成的序列具有有限长且收敛于该函数的一个临界点.同时给出了序列收敛速率的结果表示.【期刊名称】《纺织高校基础科学学报》【年(卷),期】2018(031)001【总页数】8页(P55-62)【关键词】渐近算法;Kurdyka-Lojasiewicz性质;无约束非凸优化问题;收敛速率【作者】陈汝栋;吴成玉【作者单位】天津工业大学理学院,天津 300387;天津工业大学理学院,天津300387【正文语种】中文【中图分类】O1770 引言在数学规划中,研究的大多数是凸函数的优化问题,而非凸函数优化很少涉及.非凸函数优化还是一个新兴的研究方向,发展较为缓慢,且主要应用于非凸优化方面的算法,Martient[1]和Rockafellar[2]在对极大单调算子的变分不等式的研究中引进渐近算法和在凸优化中引进一个渐近正则方法.文献[3-8]给出了凸优化中的非单调算子.文献[9-13]给出了关于非凸函数的概念以及非凸条件下的优化理论.文献[14]研究了f:X→(-∞,∞]和g:Y→(-∞,∞] 是真下半连续函数(未必是凸的)的优化问题,其中X⊂⊂Rn是闭凸集,目的在于找到关于下列函数的临界点f(x)+g(y),(1)使得min(f(x)+g(y)).(2)引进交替方向法去解决非凸的线性约束问题,其中目标函数是真下半连续的非凸函数构造的迭代方法如下:(3)通过收敛性分析,得到了交替方向法生成的序列{(xk,yk)}收敛于目标函数的一个临界点{(x*,y*)}以及{(xk,yk)}具有有限长.但所是构造的迭代方法中的罚参数的控制条件比较严格,且没有给出收敛速率的结果.本文借鉴文献[14]研究非凸函数无约束的优化问题,构造了新的渐进算法,并在适当条件下检验改进的迭代算法的收敛性.1 预备知识设H是一个定义了内积〈·,·〉和范数‖·‖的希尔伯特空间.命题1 (ⅰ) domf:={x∈Rn:f(x)<+∞}表示f的定义域;(ⅱ) 对于一点表示f在x的Fre′chet次微分,它是关于向量x*∈Rn的集合,该集合满足(ⅲ) ∂f(x)表示f在x∈Rn处的极限次微分,定义为∂f(x):={x*∈Rn:∃∀x∈Rn,显然有⊂∂f(x).其中第一个集合是闭的和凸的, 然而第二个集合是闭的. 用critf来表示f的临界点的集合, 即若0∈∂f(x):则有x∈crit.命题2 设f:Rn→(-∞,+∞]是一个真下半连续函数.如果C是Rn的一个闭子集,对x∈Rn,x到C的距离dist(x,C):=inf{‖x-y‖:y∈C}.(4)如果C是空的, 对所有的x∈Rn,有dist(x,C)=∞.若dist(x,C)=0,则x∈C.命题3 如果C是Rn中的一个闭子集,用δC表示其指示函数, 即对所有x∈Rn,有(5)在C上的投影PC(x):=argmin{‖x-z‖:z∈C}来表示.命题4 设η∈(0,+∞].用Φη来表示所有凹函数和连续函数φ:[0,η)→R+,该函数满足下列条件:(ⅰ) φ(0)=0;(ⅱ) φ在(0,η)上是C1和在0是连续的;(ⅲ) ∀x∈(0,η),φ′(x)>0.引理1[15] 设f:Rn→(-∞,+∞]是真下半连续函数,如果x∈Rn是f的局部极小值,则有0∈∂f(x).引理2(KL性质)[16] 设f:Rn→(-∞,+∞]是真下半连续函数.则(ⅰ) 设Dom∂f(x):={x∈Rn:∂f(x)≠φ}.f在具有Kurdyka-Lojasiewicz(KL)性质,如果∃η∈(0,+∞],对于的一个邻域U和一个函数φ∈Φη,使得∀(6)有(7)(ⅱ) 如果一个凹函数f在关于Dom∂f(x)的每一点满足KL性质,则f被称为KL函数.2 迭代算法首先构造与式(1)有关的目标函数:(8)然后,给出该方法的迭代序列:(9)上述方法就是所构造的新的渐近算法.针对不同非凸问题的渐近算法和相关知识可以参考文献[17-22].假设满足的条件如下:(H1) 式(9)的解集是非空的且(H2) f和g是下有界的KL函数;(H3) ∀k≥0,序列{λk},{μk}属于(λ-,λ+).引理3 假定满足(H1)-(H3),设由式(9)生成的序列是{(xk,yk)},且则(Δx,k,Δy,k)∈∂Ψ(xk,yk).故存在一个常数M>0,使得‖(Δx,k,Δy,k)‖≤M(‖xk-xk-1‖+‖yk-yk-1‖).证明由式(9),得(10)设αk∈∂f(xk),可得关于式(10)的优化条件为同理,(11)因为∂xΨ(xk,yk-1)=αk+‖xk-yk-1‖和∂yΨ(xk,yk)=βk-‖xk-yk‖, 有最后因此得到(Δx,k,Δy,k)∈∂Ψ(xk,yk).根据三角不等式,有其中3 收敛性分析定理1 假定满足(H1)~(H3), 由式(9)生成的序列是{(xk,yk)}. 则下面的假设成立: (ⅰ) 序列{Ψ(xk,yk)}是递增的且存在一个常数M1>0, 使得M1(‖xk+1-xk‖2+‖yk+1-yk‖2)≤Ψ(xk,yk)-Ψ(xk+1,yk+1)(12)(ⅱ)如果{(xk,yk)}是有界的, 则此外,和是有限的,则证明(ⅰ)由式(9)知(13)(14)∀k≥1,式(13)和(14)相加可得(15)根据Ψ(x,y)的定义, 有(16)这表明{Ψ(xk,yk)}是非增的. 其中(ⅱ) 对不等式(16)从0到N(N≥0)求和,得(17)因为{(xk,yk)}是有界的,∀ε1>0, 存在N1>0使得∀k>N1 dist((xk,yk),(x*,y*))<ε1.由f的下半连续性知(18)从式(9)知因(xk,yk)→(x*,y*)且{λk}是有界的, 设k→∞,则有(19)结合式(18)和(19), 得到同理故(20)即∀ε2>0,∃N2>0, 使得∀k>N2‖Ψ(xk,yk)-Ψ(x*,y*)‖<ε2.(21)因{Ψ(xk,yk)}是非增序列, 则可得∀k≥1Ψ(x*,y*)<Ψ(xk,yk).设N=max{1,N1,N2},∀k>N,有(xk,yk)∈ {(xk,yk)|dist((xk,yk),(x*,y*))<ε1}∩(Ψ(x*,y*)<Ψ(xk,yk)<Ψ(x*,y*)+ε2).由KL性质知, 有φ′(ψ(xk,yk)-ψ(x*,y*))dist((0,0),∂Ψ(xk,yk))≥1.(22)由引理3得(23)由φ的凹性有φ(Ψ(xk,yk)-Ψ(x*,y*))-φ(Ψ(xk,yk)-Ψ(x*,y*))≥φ,(Ψ(xk,yk)-Ψ(x*,y*))(Ψ(xk,yk)-Ψ(xk+1,yk+1)).(24)∀k>N, 由式(22),式(11)式和φ的凹性可得(25)其中Ωk,k+1=φ(Ψ(xk,yk)-Ψ(x*,y*))-φ(Ψ(xk,yk)-Ψ(x*,y*)).根据(a+b)2≤2a2+2b2和有2(‖xk+1-xk‖+‖yk+1-yk‖)≤M2Ωk,k+1+(‖xk-xk-1‖+‖yk-yk-1‖).(26)其中将式(26)从k=N+1,N+2,…,n相加,化简得到其中ΩN+1,n+1=ΩN+1,q+Ωq,n+1(q是一个正整数).然后由ΩN+1,n+1的定义和φ∈Φη,可得+M2φ(Ψ(xN+1,yN+1)-Ψ(x*,y*)).(27)设n→∞,根据式(27)可得到因此,由于得到是有限的.最终4 收敛结果定理2(收敛定理) 假定满足(H1)~(H3), 由式(9)生成的序列记作{(xk,yk)}. 用{(x*,y*)}表示关于Ψ(xk,yk)的极限点, 则{(xk,yk)}收敛于一个临界点{(x*,y*)}.证明设m>n>N,得到(28)式(28)表明{(xk,yk)}是一个收敛序列,从定理1(ⅱ)知,由引理2和定理1(ⅱ),有(Δx,k,Δy,k)∈∂Ψ(xk,yk),(Δx,k,Δy,k)→(0,0)当k→∞.因此,由∂Ψ的封闭性可知(0,0)∈∂Ψ(x*,y*),这表明(x*,y*)是Ψ的一个临界点.推论1 假定Ψ满足(H1)~(H3)且在处具有Kurdyka-Lojasiewicz性质,且是Ψ一个局部极小点.则∃ε3>0和υ>0,使得(ⅰ)(ⅱ) minΨ<Ψ(x0,y0)<minΨ+υ.这表明以(x0,y0)为起始点的序列(xk,yk)具有有限长性质且收敛于(x*,y*), 即Ψ(x*,y*)=minΨ.证明从定理1知(xk,yk)收敛于(x*,y*), 一个临界点Ψ满足minΨ<Ψ(x0,y0)<minΨ+υ,∀k>0.如果由引理2知这与(0,0)∈∂Ψ(x*,y*)矛盾.定理3(收敛速率定理) 假设Ψ(x,y)满足(H1)~(H3).假定(xk,yk)收敛于(x∞,y∞),Ψ(x,y)在(x∞,y∞)具有Kurdyka-Lojasiewicz性质.φ(s)=cs1-θ,θ∈[0,1),c>0.其中θ是关于(x∞,y∞)的一个Lojasiewicz指数. 则下列假设成立: (ⅰ) 如果θ=0,序列(xk,yk)收敛于有限步长;(ⅱ) 如果使得‖(xk,yk)-(x∞,y∞)‖≤cτk.(ⅲ) 如果证明(ⅰ) 假设θ=0. 如果Ψ(xk,yk)是固定的, 根据定理2,(xk,yk)收敛于有限步长.如果Ψ(xk,yk)不是固定的, 则对于任意k充分大, 由Kurdyka-Lojasiewicz不等式可得cdist((0,0),∂Ψ(xk,yk))≥1.这与(0,0)∈∂Ψ(x k,yk)矛盾.(ⅱ) 假设θ>0.∀k≥0, 设由定理1知它是有限的. 因为Δk≥‖xk-x∞‖+‖yk-y∞‖,估计Δk就足够了. 接下来有Δk≤Δk-1-Δk+M2Ωk,k+1.由Kurdyka-Lojasiewicz不等式可得φ′[Ψ(xk,yk)-Ψ(x*,y*)]dist[(0,0),∂Ψ(xk,yk)]=c(1-θ)[Ψ(xk,yk)-Ψ(x*,y*)]-θdist[(0,0),∂Ψ(xk,yk)]≥1.因此(Ψ(xk,yk)-Ψ(x*,y*))θ≤c(1-θ)dist[(0,0),∂Ψ(xk,yk)].又因为dist((0,0),∂Ψ(xk,yk))≤ ‖Δx,k,Δy,k‖≤M(‖(xk-1-xk)‖+‖(yk-1-xk)‖)≤M(Δk-1-Δk).最后得到然后由Ωk,k+1的定义可得Ωk,k+1≤φ(Ψ(xk,yk)-Ψ(x*,y*))=[c(Ψ(xk,yk)-Ψ(x*,y*))]1-θ.最后给出其中再结合文献[23]可以得到(ⅱ)和(ⅲ).5 结束语研究解决无约束非凸可分离规划的算法, 该目标函数是真下半连续的, 但未必是凸的. 目标函数具有KL性质,证明了算法的收敛性,也获得了收敛速率结果.通过Lojasiewicz指数相关的函数获得了收敛速率的结果.参考文献(References):[1] MARTINET B. Regularisation d′inequations variationnelles par approximations successives(French)[J].Rev Francaise informat.Recherche Operationnelle, 1970,4(4):154-158.[2] ROCKAFELLAR R T.Augmented Lagrangians and applications of the proximal point algorithm in convex programming[J].Mathematics ofOperations Research 1976,1(2):97-116.[3] COMBETTES P,PENNANEN T. Proximal methods for cohypomonotone operators[J].SIAM J Control Optim,2004,43(2):731-742.[4] KAPLAN A,TICHATSCHKE R.Proximal point methods and nonconvex optimization[J].Journal of Globl Optimization,1998,13(4):389-406.[5] MIETTNEN M,MKEL M M,HASLINGER J.On numerical solution of hemivariational inequalities by nonsmooth optimizationmethods[J].Journal of Global Optimization,1995,6(4):401-425.[6] MIFFLIN R,SAGASTIZABAL C.νμ-smoothness and proximal point results for some nonconvex functions[J].Optimization Methods &Software,2004,19(5):463-478.[7] PENNANEN T.Local convergence of the proximal point algorithm and multiplier methods without monotonicity[J].Mathematics of Operations Research,2002,27(1):170-191.[8] Spingarn J E.Submonotone mappings and the proximal point algorithm[J].Numerical Functional Analysis & Optimization,1982,4(2):123-150.[9] ATTOUCH H,SOUBEYRAN A.Inertia and reactivity in decision making as cognitive variational inequalities[J].Journal of ConvexAnalysis,2006,13(13):207-224.[10] CLARKE F H,STERN R J,LEDYAEV Y S,et al.Nonsmooth analysis and control theory[J].Graduate Texts in Mathematics,1998,178(7):137-151. [11] MORDUKHOVICH B S.Maximum principle in the problem of time optimal response with nonsmooth constraints[J].Journal of AppliedMathematics and Mechanics,1976,40(6):960-969.[12] MORDUKHOVICH B.Variational analysis and generalized differentiation[M].Heidelberg:Springer,1998.[13] ROCKAFELLAR R T,WETS R.Variationalanalysis[M].Heidelberg:Springer,1998.[14] WANG X Y,LI S J,KOU X P,et al.A new alternating direction method for linearly constrained nonconvex optimization problems[J].Journal of Global Optimization,2015,62(4):695-709.[15] NOCEDAL J,WRIGHT S J.Numerical optimization[M].NewYork:Springer,2006.[16] BOLTE J,DANIILIDIS A,LEWIS A.The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems[J].SIAM Journal of Optimization,2006,17(4):1205-1223.[17] REDONT P,SOUBEYRAN A.Proximal alternating minimization and projection methods for nonconvex problems:An approach based on the Kurdyka-jasiewicz inequality[J].Mathematics of Operations Research,2010,35(2):438-457.[18] BOLTE J,SABACH S,TEBOULLE M.Proximal alternating linearized minimization for nonconvex and nonsmooth problems[J].Mathematical Programming,2014,146(1-2):459-494.[19] LEMAIRE B.The proximal algorithm[J].New Methods in Optimization and Their Industrial Uses,International Series of Numerical,1987,87:73-87.[20] ROCKAFELLAR R T.Monotone operators and the proximal point algorithm[J].Siam Journal on Control & Optimization,1976,14(5):877-898.[21] SPINGARN J E.Submonotone mappings and the proximal point algorithm[J].Numerical Functional Analysis & Optimization,1982,4(2):123-150.[22] ATTOUCH H,BOLTE J,SVAITER B F.Convergence of descent methods for semi-algebraic and tame problems:Proximal algorithms,forward-backward splitting, and regularized Gauss-Seidel methods[J].Mathematical Programming, 2013,137(1-2):91-129.[23] ATTOUCH H,BOLTE J.On the convergence of the proximal algorithm for nonsmooth functions involving analytic features[J].Mathematical Programming,2009,116(1-2):5-16.。

联合去马赛克和去噪

联合去马赛克和去噪

联合去马赛克和去噪任娟【摘要】联合去马赛克和去噪是对数码相机传感器输出的嘈杂的彩色图像原始数据进行重建.使用全变差重建图像是常用的方法,正则化项的构造对图像的处理起很重要的作用.传统的各向同性有界变差实际上远非真正意义上的各向同性,新定义的正则化项将各向同性有界变差和L2范数进行了适当的结合,在保持边缘特征方面取得了更好的效果,尤其提高了彩色图像的处理效果.【期刊名称】《太原科技大学学报》【年(卷),期】2018(039)003【总页数】5页(P184-188)【关键词】去马赛克;去噪;全变分(TV)【作者】任娟【作者单位】太原科技大学应用科学学院,太原030024【正文语种】中文【中图分类】O224彩色图像通常是通过数码相机由滤色器阵列(CFA)所覆盖的单一传感器获得的.为了重建一个全彩色图像,需要从传感器所传递的原始数据进行插值运算,这种恢复丢失的颜色信息的过程叫做去马赛克.在由传感器获取图片的过程中,由于光子计数和电热效应的影响往往会引入噪声所以不得不考虑去噪问题.去噪和去马赛克作为彩色图像处理的两个过程,先处理哪个都会带来不足之处,详细可以参考文献[1].联合去噪和去马赛克不失为一种行之有效的好方法,这种方法在文献[2]中提出来。

随后各种处理技术相继涌现,比如运用小波处理[3],Condat将在频域去马赛克[4]的方法延伸到联合去噪和去马赛克[5]。

2012年,Condat使用基于有界变差的正则化模型重建图像[6],并用原始对偶算法求解,取得了很大的进步.利用全变差作为正则化项在图像处理中也是常用的方法,为了更好地重建图像,各种形式的全变分模型[7-8]被提出。

目前,Rudin等人[9]已经对图像处理问题的TV正则化模型做了介绍,由于基于有界变差的正则化项是不可微的,有效求解TV最小化长期以来一直有挑战性,但在大规模凸非光滑优化不断进展的情况下TV最小化变得相对容易求解了,如原始对偶分裂方法和交替方向乘子法等。

2014_ICASSP_EFFICIENT CONVOLUTIONAL SPARSE CODING

2014_ICASSP_EFFICIENT CONVOLUTIONAL SPARSE CODING

EFFICIENT CONVOLUTIONAL SPARSE CODINGBrendt WohlbergTheoretical DivisionLos Alamos National LaboratoryLos Alamos,NM87545,USAABSTRACTWhen applying sparse representation techniques to images,the standard approach is to independently compute the rep-resentations for a set of image patches.Thismethod performs very well in a variety of applications,butthe independent sparse coding of each patch results in a rep-resentation that is not optimal for the image as a whole.Arecent development is convolutional sparse coding,in whicha sparse representation for an entire image is computed by re-placing the linear combination of a set of dictionary vectorsby the sum of a set of convolutions with dictionaryfilters.Adisadvantage of this formulation is its computational expense,but the development of efficient algorithms has received someattention in the literature,with the current leading method ex-ploiting a Fourier domain approach.The present paper intro-duces a new way of solving the problem in the Fourier do-main,leading to substantially reduced computational cost.Index Terms—Sparse Representation,Sparse Coding,Convolutional Sparse Coding,ADMM1.INTRODUCTIONOver the past15year or so,sparse representations[1]havebecome a very widely used technique for a variety of prob-lems in image processing.There are numerous approaches tosparse coding,the inverse problem of computing a sparse rep-resentation of a signal or image vector s,one of themost widely used being Basis Pursuit DeNoising(BPDN)[2]arg minx 12D x−s 22+λ x 1,(1)where D is a dictionary matrix,x is the sparse representation, andλis a regularization parameter.When applied to images, decomposition is usually applied independently to a set of overlapping image patches covering the image;this approach is convenient,but often necessitates somewhat ad hoc subse-quent handling of the overlap between patches,and results in a representation over the whole image that is suboptimal.This research was supported by the U.S.Department of Energy through the LANL/LDRD Program.More recently,these techniques have also begun to be ap-plied,with considerable success,to computer vision problems such as face recognition[3]and image classification[4,5,6]. It is in this application context that convolutional sparse rep-resentations were introduced[7],replacing(1)with arg min{x m}12md m∗x m−s22+λmx m 1,(2)where{d m}is a set of M dictionaryfilters,∗denotes convo-lution,and{x m}is a set of coefficient maps,each of which is the same size as s.Here s is a full image,and the{d m} are usually much smaller.For notational simplicity s and x m are considered to be N dimensional vectors,where N is the the number of pixels in an image,and the notation{x m}is adopted to denote all M of the x m stacked as a single column vector.The derivations presented here are for a single image with a single color band,but the extension to multiple color bands(for both image andfilters)and simultaneous sparse coding of multiple images is mathematically straightforward.The original algorithm proposed for convolutional sparse coding[7]adopted a splitting technique with alternating minimization of two subproblems,thefirst consisting of the solution of a large linear system via an iterative method, and the other a simple shrinkage.The resulting alternating minimization algorithm is similar to one that would be ob-tained within an Alternating Direction Method of Multipliers (ADMM)[8,9]framework,but requires continuation on the auxiliary parameter to enforce the constraint inherent in the splitting.All computation is performed in the spatial domain, the authors expecting that computation in the Discrete Fourier Transform(DFT)domain would result in undesirable bound-ary artifacts[7].Other algorithms that have been proposed for this problem include coordinate descent[10],and a proximal gradient method[11],both operating in the spatial domain.Very recently,an ADMM algorithm operating in the DFT domain has been proposed for dictionary learning for con-volutional sparse representations[12].The use of the Fast Fourier Transform(FFT)in solving the relevant linear sys-tems is shown to give substantially better asymptotic perfor-mance than the original spatial domain method,and evidence is presented to support the claim that the resulting boundary2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP)effects are not significant.The present paper describes a convolutional sparse coding algorithm that is derived within the ADMM framework and exploits the FFT for computational advantage.It is very sim-ilar to the sparse coding component of the dictionary learning algorithm of[12],but introduces a method for solving the linear systems that dominate the computational cost of the al-gorithm in time that is linear in the number offilters,instead of cubic as in the method of[12].2.ADMM ALGORITHMRewriting(2)in a form suitable for ADMM by introducing auxiliary variables{y m},we havearg min {x m},{y m}12md m∗x m−s22+λmy m 1 such that x m−y m=0∀m,(3)for which the corresponding iterations(see[8,Sec.3]),with dual variables{u m},are{x m}(k+1)=arg min{x m}12md m∗x m−s22+ρ2mx m−y(k)m+u(k)m22(4){y m}(k+1)=arg min{y m}λmy m 1+ρ2mx(k+1)m−y m+u(k)m22(5)u(k+1) m =u(k)m+x(k+1)m−y(k+1)m.(6)Subproblem(5)is solved via shrinkage/soft thresholding asy(k+1) m =Sλ/ρx(k+1)m+u(k)m,(7)whereSγ(u)=sign(u) max(0,|u|−γ),(8) with sign(·)and|·|of a vector considered to be applied element-wise.The computational cost is O(MN).The only computationally expensive step is solving(4), which is of the formarg min {x m}12md m∗x m−s22+ρ2mx m−z m 22.(9)2.1.DFT Domain FormulationAn obvious approach is to attempt to exploit the FFT for ef-ficient implementation of the convolution via the DFT convo-lution theorem.(This does involve some increase in memory requirement since the d m are zero-padded to the size of the x m before application of the FFT.)Define linear operators D m such that D m x m=d m∗x m,and denote the variables D m,x m,s,and z m in the DFT domain byˆD m,ˆx m,ˆs,andˆz m respectively.It is easy to show via the DFT convolution theorem that(9)is equivalent toarg min{ˆx m}12mˆDmˆx m−ˆs22+ρ2mˆx m−ˆz m 22(10)with the{x m}minimizing(9)being given by the inverse DFT of the{ˆx m}minimizing(10).DefiningˆD= ˆDˆD1...,ˆx=⎛⎜⎝ˆx0ˆx1...⎞⎟⎠,ˆz=⎛⎜⎝ˆz0ˆz1...⎞⎟⎠,(11) this problem can be expressed asarg minˆx12ˆDˆx−ˆs22+ρ2ˆx−ˆz 22,(12) the solution being given by(ˆD HˆD+ρI)ˆx=ˆD Hˆs+ρˆz.(13) 2.2.Independent Linear SystemsMatrixˆD has a block structure consisting of M concatenated N×N diagonal matrices,where M is the number offilters and N is the number of samples in s.ˆD HˆD is an MN×MN matrix,but due to the diagonal block(not block diagonal) structure ofˆD,a row ofˆD H with its non-zero element at col-umn n will only have a non-zero product with a column ofˆD with its non-zero element at row n.As a result,there is no interaction between elements ofˆD corresponding to differ-ent frequencies,so that(as pointed out in[12])one need only solve N independent M×M linear systems to solve(13). Bristow et al.[12]do not specify how they solve these linear systems(and their software implementation was not available for inspection),but since they rate the computational cost of solving them as O(M3),it is reasonable to conclude that they apply a direct method such as Gaussian elimination.This can be very effective[8,Sec. 4.2.3]when it is possible to pre-compute and store a Cholesky or similar decomposition of the linear system(s),but in this case it is not practical unless M is very small,having an O(M2N)memory requirement for storage of these decomposition.Nevertheless,this remains a reasonable approach,the only obvious alternative being an iterative method such as conjugate gradient(CG).A more careful analysis of the unique structure of this problem,however,reveals that there is an alternative,and vastly more effective,solution.First,define the m th block of the right hand side of(13)asˆr m=ˆD H mˆs+ρˆz m,(14)so that⎛⎜⎝ˆr 0ˆr 1...⎞⎟⎠=ˆDH ˆs +ρˆz .(15)Now,denoting the n th element of a vector x by x (n )to avoid confusion between indexing of the vectors themselves and se-lection of elements of these vectors,definev n =⎛⎜⎝ˆx 0(n )ˆx 1(n )...⎞⎟⎠b n =⎛⎜⎝ˆr 0(n )ˆr 1(n )...⎞⎟⎠,(16)and define a n as the column vector containing all of the non-zero entries from column n of ˆDH ,i.e.writing ˆD =⎛⎜⎜⎜⎝ˆd 0,00...ˆd 1,00...0ˆd 0,10...0ˆd 1,10...00ˆd 0,2...00ˆd 1,2...........................⎞⎟⎟⎟⎠(17)thena n =⎛⎜⎝ˆd ∗0,nˆd ∗1,n ...⎞⎟⎠,(18)where ∗denotes complex conjugation.The linear system to solve corresponding to element n of the {x m }is (a n a H n +ρI )v n =b n .(19)The critical observation is that the matrix on the left handside of this system consists of a rank-one matrix plus a scaled identity.Applying the Sherman-Morrison formula(A +uv H )−1=A −1−A −1uv H A −11+u H A −1v (20)gives(ρI +aa H )−1=ρ−1 I −aaHρ+a H a,(21)so that the solution to (19)isv n =ρ−1b n −a H n b nρ+a H n a na n.(22)The only vector operations here are inner products,element-wise addition,and scalar multiplication,so that this method is O (M )instead of O (M 3)as in [12].The cost of solving N of these systems is O (MN ),and the cost of the FFTs is O (MN log N ).Here it is the cost of the FFTs that dominates,whereas in [12]the cost of solving the DFT domain linear systems dominates the cost of the FFTs.This approach can be implemented in an interpreted lan-guage such as Matlab in a form that avoids explicit iteration over the N frequency indices by passing data for all N in-dices as a single array to the relevant linear-algebraic routines (commonly referred to as vectorization in Matlab terminol-ogy).Some additional computation time improvement is pos-sible,at the cost of additional memory requirements,by pre-computing a H n /(ρ+a Hn a n )in (22).2.3.Algorithm SummaryThe proposed algorithm is summarized in Alg.1.stop-ping criteria are those discussed in [8,Sec.3.3],together withan upper bound on the number of iterations.The options for the ρupdate are (i)fixed ρ(i.e.no update),(ii)the adaptive update strategy described in [8,Sec. 3.4.1],and the multi-plicative increase scheme advocated in [12].Input :image s ,filter dictionary {d m },parameters λ,ρPrecompute:FFTs of {d m }→{ˆDm },FFT of s →ˆs Initialize:{y m }={u m }=0while stopping criteria not met doCompute FFTs of {y m }→{ˆym },{u m }→{ˆu m }Compute {ˆxm }using the method in Sec.2.2Compute inverse FFTs of {ˆxm }→{x m }{y m }=S λ/ρ({x m }+{u m }){u m }={u m }+{x m }−{y m }Update ρif appropriate endOutput :Coefficient maps {x m }Algorithm 1:Summary of proposed ADMM algorithm The computational cost of the algorithm components is O (MN log N )for the FFTs,order O (MN )for the proposed linear solver,and O (MN )for both the shrinkage and dual variable update,so that the cost of the entire algorithm is O (MN log N ),dominated by the cost of FFTs.In contrast,the cost of the algorithm proposed in [12]is O (M 3N )(there is also an O (MN log N )cost for FFTs,but it is dominated by the O (M 3N )cost of the linear solver),and the cost of the original spatial-domain algorithm [7]is O (M 2N 2L ),where L is the dimensionality of the filters.3.DICTIONARY LEARNINGThe extension of (2)to learning a dictionary from training data involves replacing the minimization with respect to x m with minimization with respect to both x m and d m .The op-timization is invariably performed via alternating minimiza-tion between the two variables,the most common approach consisting of a sparse coding step followed by a dictionary update [13].The commutativity of convolution suggests that the DFT domain solution of Sec.2.1can be directly applied in minimizing with respect to d m instead of x m ,but this is not possible since the d m are of constrained size,and must be zero-padded to the size of the x m prior to a DFT domain im-plementation of the convolution.If the size constraint is im-plemented in an ADMM framework [14],however,the prob-lem is decomposed into a computationally cheap subproblem corresponding to a projection onto to constraint set,and an-other subproblem that can be efficiently solved by extending the method in Sec.2.1.This iterative algorithm for the dictio-nary update can alternate with a sparse coding stage to form amore traditional dictionary learning method [15],or the sub-problems of the sparse coding and dictionary update algo-rithms can be combined into a single ADMM algorithm [12].4.RESULTScomparison of execution times for the algorithm (λ=0.05)with different methods of solving the linear system,for a set of overcomplete 8×8DCT dictionaries and the 512×512greyscale Lena image,is presented in Fig.1.It is worth em-phasizing that this is a large image by the standards of prior publications on convolutional sparse coding;the test images in [12],for example,are 50×50and 128×128pixels in size.The Gaussian elimination solution is computed using a Cholesky decomposition (since it is,in general,impossible to cache this decomposition,it is necessary to recompute it at every solution),as implemented by the Matlab mldivide function,and is applied by iterating over all frequencies in the apparent absence of any practical alternative.The conjugate gradient solution is computed using two different relative error tolerances.A significant part of the computational advantage here of CG over the direct method is that it is applied simultaneously over all frequencies.The two curves for the proposed solver based on the Sherman-Morrison formula illustrate the significant gain from an implementation that simultaneously solves over all frequencies and that the relative advantage of doing so de-creases with increasing M .Dictionary size (M )E x e c u t i o n t i m e (s )512256128641e+051e+041e+031e+021e+01Fig.1.A comparison of execution times for 10steps of the ADMM algorithm for different methods of solving the lin-ear system:Gaussian elimination (GE),Conjugate Gradient with relative error tolerance 10−5(CG 10−5)and 10−3(CG 10−3),and Sherman-Morrison implemented with a loop over frequencies (SM-L)or jointly over all frequencies (SM-V).The performance of the three ρupdate strategies dis-cussed in the previous section was compared by sparse cod-ing a 256×256Lena image using a 9×9×512dictionary (from [16],by the authors of [17])with a fixed value of λ=0.02and a range of initial ρvalues ρ0.The resulting values of the functional in (2)after 100,500,and 1000itera-tions of the proposed algorithm are displayed in Table 1.The adaptive update strategy uses the default parameters of [8,Sec. 3.4.1],and the increasing strategy uses a multiplica-tive update by a factor of 1.1with a maximum of 105,as advocated by [12].In summary,a fixed ρcan perform well,but is sensitive to a good choice of parameter.When initialized with a small ρ0,the increasing ρstrategy provides the most rapid decrease in functional value,but thereafter converges very slowly.Over-all,unless rapid computation of an approximate solution is desired,the adaptive ρstrategy appears to provide the best performance,with the least sensitivity to choice of ρ0.This is-sue is complex,however,and further experimentation is nec-essary before drawing any general conclusions that could be considered valid over a broad range of problems.Iter.ρ010−210−1100101102103Fixed ρ10028.2727.8018.1010.099.7611.6050028.0522.2511.118.899.1110.13100027.8017.009.648.828.969.71Adaptive ρ10021.6216.9714.5610.7111.1411.4150010.8110.239.819.019.189.0910009.449.219.068.838.878.84Increasing ρ10014.789.829.509.9011.5115.155009.559.459.469.8911.4714.5110009.539.449.459.8811.4113.97Table parison of functional value convergence for thesame problem with three different ρupdate strategies.5.CONCLUSIONA computationally efficient algorithm is proposed for solving the convolutional sparse coding problem in the Fourier do-main.This algorithm has the same general structure as a pre-viously proposed approach [12],but enables a very significantreduction in computational cost by careful design of a linear solver for the most critical component of the iterative algo-rithm.The theoretical computational cost of the algorithm is reduced from O (M 3)to O (MN log N )(where N is the di-mensionality of the data and M is the number of elementsin the dictionary),and is also shown empirically to result in greatly reduced computation time.The significant improve-ment in efficiency of the proposed approach is expected togreatly increase the range of problems that can practically be addressed via convolutional sparse representations.6.REFERENCES[1]A.M.Bruckstein,D.L.Donoho,and M.Elad,“Fromsparse solutions of systems of equations to sparse mod-eling of signals and images,”SIAM Review,vol.51, no.1,pp.34–81,2009.doi:10.1137/060657704[2]S.S.Chen,D.L.Donoho,and M.A.Saunders,“Atomicdecomposition by basis pursuit,”SIAM Journal on Sci-entific Computing,vol.20,no.1,pp.33–61,1998.doi:10.1137/S1064827596304010[3]J.Wright,A.Y.Yang,A.Ganesh,S.S.Sastry,andY.Ma,“Robust face recognition via sparse representa-tion,”IEEE Transactions on Pattern Analysis and Ma-chine Intelligence,vol.31,no.2,pp.210–227,February 2009.doi:10.1109/tpami.2008.79[4]Y.Boureau,F.Bach,Y.A.LeCun,and J.Ponce,“Learn-ing mid-level features for recognition,”in Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition(CVPR),June2010,pp.2559–2566.doi:10.1109/cvpr.2010.5539963[5]J.Yang,K.Yu,and T.S.Huang,“Supervisedtranslation-invariant sparse coding,”Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition(CVPR),pp.3517–3524,2010.doi:10.1109/cvpr.2010.5539958[6]J.Mairal,F.Bach,and J.Ponce,“Task-driven dictionarylearning,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.34,no.4,pp.791–804,April 2012.doi:10.1109/tpami.2011.156[7]M.D.Zeiler,D.Krishnan,G.W.Taylor,and R.Fer-gus,“Deconvolutional networks,”in Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition(CVPR),June2010,pp.2528–2535.doi:10.1109/cvpr.2010.5539957[8]S.Boyd,N.Parikh,E.Chu,B.Peleato,and J.Eckstein,“Distributed optimization and statistical learning via the alternating direction method of multipliers,”Founda-tions and Trends in Machine Learning,vol.3,no.1,pp.1–122,2010.doi:10.1561/2200000016[9]J.Eckstein,“Augmented Lagrangian and alternatingdirection methods for convex optimization:A tutorial and some illustrative computational results,”Rutgers Center for Operations Research,Rutgers University, Rutcor Research Report RRR32-2012,December 2012.[Online].Available:/pub/ rrr/reports2012/322012.pdf[10]K.Kavukcuoglu,P.Sermanet,Y.Boureau,K.Gregor,M.Mathieu,and Y.A.LeCun,“Learning convolutionalfeature hierachies for visual recognition,”in Advances in Neural Information Processing Systems(NIPS2010), 2010.[11]R.Chalasani,J.C.Principe,and N.Ramakrishnan,“A fast proximal method for convolutional sparse cod-ing,”in Proceedings of the International Joint Confer-ence on Neural Networks(IJCNN),Aug.2013,pp.1–5.doi:10.1109/IJCNN.2013.6706854[12]H.Bristow, A.Eriksson,and S.Lucey,“Fast con-volutional sparse coding,”in Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition(CVPR),Jun.2013,pp.391–398.doi:10.1109/CVPR.2013.57[13]B.Mailh´e and M.D.Plumbley,“Dictionary learningwith large step gradient descent for sparse representa-tions,”in Latent Variable Analysis and Signal Sepa-ration,ser.Lecture Notes in Computer Science,F.J.Theis,A.Cichocki,A.Yeredor,and M.Zibulevsky,Eds.Springer Berlin Heidelberg,2012,vol.7191,pp.231–238.doi:10.1007/978-3-642-28551-629[14]M.V.Afonso,J.M.Bioucas-Dias,and M. A.T.Figueiredo,“An Augmented Lagrangian approach to the constrained optimization formulation of imaging inverse problems,”IEEE Transactions on Image Pro-cessing,vol.20,no.3,pp.681–695,March2011.doi:10.1109/tip.2010.2076294[15]K.Engan,S.O.Aase,and J.H.Husøy,“Method ofoptimal directions for frame design,”in Proceedings of the IEEE International Conference on Acoustics, Speech,and Signal Processing(ICASSP),vol.5,1999, pp.2443–2446.doi:10.1109/icassp.1999.760624 [16]J.Mairal,Software available from http://lear.inrialpes.fr/people/mairal/denoise ICCV09.tar.gz.[17]J.Mairal,F.Bach,J.Ponce,G.Sapiro,and A.Zis-serman,“Non-local sparse models for image restora-tion,”in Proceedings of the IEEE International Con-ference on Computer Vision(CVPR),2009,pp.2272–2279.doi:10.1109/iccv.2009.5459452。

并行最小割算法及其在金融社交网络中的应用

并行最小割算法及其在金融社交网络中的应用

并行最小割算法及其在金融社交网络中的应用饶东宁;王军星;魏来;王雅丽【摘要】Effective financial supervision has become a necessary guarantee for sound development of economy. Supervising the whole financial social network effectively would become possible if a set of key nodes which carry all the information flow in the financial social network can be found. The scale of social network is often quite large, so parallel algorithms for large-scale graph processing are necessary. A Pregel-based parallel algorithm for the minimal cut set problem is proposed. The experiment is conducted on Apache Spark platform. All data used in the experiment is from the BoardEx database. Experiment results show that the algorithm has a good performance in large-scale social network graph processing. With this parallel algorithm, minimal cut sets of financial social network graphs can be obtained so that effective financial supervision can be implemented.%有效实施金融监管已成为金融健康发展的必要保证. 若能够在金融社交网络中,找到一部分承载网络中所有信息流动的关键节点,便能实现整个金融社交网络的有效监管. 金融社交网络图规模通常较大,须开发大规模图处理并行算法. 本文提出基于分布式图处理平台Pregel的并行最小割算法. 实验基于Apache Spark平台开展,所用数据均来自BoardEx数据库. 实验结果表明,在大规模社交网络图的处理中,该算法具有良好性能. 利用该并行算法得到金融社交网络图的最小割,便可有效实施金融监管.【期刊名称】《广东工业大学学报》【年(卷),期】2018(035)002【总页数】5页(P46-50)【关键词】大数据;社交网络;并行算法;最小割;ApacheSpark【作者】饶东宁;王军星;魏来;王雅丽【作者单位】广东工业大学计算机学院, 广东广州 510006;广东工业大学计算机学院, 广东广州 510006;香港大学经济与金融学院, 中国香港 999077;华南师范大学经济与管理学院, 广东广州 510631【正文语种】中文【中图分类】TP182社交网络研究要解决规模图计算的问题. 现代人的生活已由各种社交网络联系在一起. 近年,Facebook、Twitter等全球性社交网络快速发展,从这些社交网络数据中获取有价值的信息成为人们的努力方向. 为此,要解决两个问题,一是大规模图计算平台,二是开发适合大规模图计算的并行算法.Google提出的GFS[1]、MapReduce[2]和BigTable[3]成为分布式计算模型基础. Pregel API是Google 于2010年提出的大规模分布式图计算框架,影响较大[4].金融监管问题亟待解决. 随着金融信息化不断加深,国家金融安全形势日益严峻,这为金融监管提出了新课题. 信息安全风险已经被提上了金融监管日程,金融监管在预防金融腐败方面的重要性也逐渐增长. 在金融关系网络规模快速增长的今天,需要思考更高效的监管方法,实现覆盖整个金融关系网络的有效监管.为了解决上述问题,本文提出一种基于Pregel平台的并行最小割算法,利用Spark集群处理大规模社交网络数据. 利用该并行算法计算得到金融社交网络图的最小割集,即可找到金融社交网络中需要监管的节点,有效实现对整个金融社交网络的监管.1 相关工作我国的金融监管体制仍处于完善进程中. 李成等[5]提出,我国现有金融监管的目标、内容以及方式存在缺陷,缺乏统一协调与信息共享机制. 张圣[6]阐述了互联网金融监管的必要性及其原则. 王平[7]指出,我国金融监管体制已不能适应新形势下的金融业发展,需要借鉴适合我国国情的国际经验,建立统一的金融监管体制. 何剑锋[8]从法律角度给出了关于互联网金融监管的一系列建议. 赵江红[9]研究分析我国金融反腐现状,从金融监管的角度提出调整组织结构、加强现有监管职能等对策. Spark Pregel作为基于BSP模型的并行大规模图处理平台,自诞生以来便获得了研究者的极大关注,不断有新的算法及其相关改进成果诞生. Pregel上最典型的并行算法之一是最早由美国斯坦福大学的Sergey等于1996年提出的PageRank算法. PageRank算法在搜索引擎优化和文献检索等方面得到广泛应用,不断有研究者提出改进的PageRank算法.Taher[10]提出主题敏感PageRank算法TSPR,降低了主题漂移现象带来的PageRank值不准确性.Matthew等[11]提出结合网页链接和文本信息的PageRank算法MP PageRank,降低了网页内容关联对PageRank值的干扰.张岭等[12]提出基于时间序列分析的加速PageRank算法,能够让新的高质量Web页面尽快被搜索引擎展示.基于Spark平台的算法并行化有良好的应用前景与现实意义. 饶东宁等[13]综合考虑各种中心度指标的影响,提出基于Spark平台的社交网络在不同文化环境中的中心度加权算法. 曾雪琳等[14]针对传统协同过滤算法存在的准确率损失问题,提出一种基于位置的社会化网络的并行化推荐算法. 梁彦[15]针对k-means算法计算海量数据面临的瓶颈,提出基于Spark平台的k-means并行化算法与改进的canopykmeans并行化算法,得到了较高的效率以及良好的聚类效果. 陈建国等[16]基于Spark RDD与DAG模型,从数据并行以及任务并行两方面对随机森林算法进行优化,提出一种基于Spark平台的并行随机森林算法(PRF),解决了原算法的负载不均等问题,显著提高了算法表现. 邱荣财[17]考虑到CURE聚类算法适用面广泛,为将聚类算法应用于大数据处理,提出了改进的基于Spark平台的并行化ACURE算法.此外,与Spark Pregel类似的其他平台,如Apache Giraph,GPS,Mizan以及GraphLab等,在实验中也有不俗表现[18]. 算法并行化及其相关应用具有广阔的发展前景.2 算法背景2.1 点割集定义定义1 设无向图G=(V, E),v为V的子集,若从图G中删去子集v的所有节点,得到的子图不连通,而删去v的任意真子集后得到的子图仍为连通图,则称v为G的一个点割集.2.2 最大独立集定义定义2 独立集指图G=(V, E)中两两互不相邻的顶点构成的集合.定义3 最大独立集是一个点集I,I中任意两顶点互不相邻,且图中所有不属于I 的顶点均至少与I的一个顶点相邻. 也即,最大独立集是包含顶点数最多的、由两两互不相邻的顶点构成的集合. 其数学模型如下:2.3 LMIS算法最大独立集(MIS)并行求解算法经历了以下发展阶段. R. M. Karp等[19]首先提出求解MIS快速并行算法. 后续研究中,影响较大的是Michael G.Luby[20]提出的求解MIS并行算法. Luby证明并行算法求解MIS问题属于NC2类,并给出了确定性的NC算法(LMIS).设图G=(V, E),v代表一个顶点(vertex),I是V的子集.定义4 顶点v的相邻顶点集合为定义5 顶点集中所有顶点的相邻顶点集合,定义6 顶点的度为LMIS算法核心是一系列选择步. 定义一枚有偏硬币,每次投掷这枚硬币时,正面朝上的概率反面朝上的概率为1–,在每步中,用计算机模拟投掷这枚硬币,以决定是否将顶点加入集合S. 每步找出一个独立集I,随后将集合I中的顶点、I中顶点的相邻顶点以及所有与集合中顶点相邻的边都从图中删去. 重复此过程,直到G 为空. 最终将所有步中得到的独立集I相并,所得集合即为所求最大独立集. 每步删去的边数至少是图中剩余边的数目的一部分,总步数期望值为,该算法是快速可并行的.每步过程为(1) 初始化一个集合S. 并行处理每个顶点,每个顶点被加入集合S的概率均为(2) 对于边集E的每一条边,若其两端的顶点都在集合S中,则丢弃度较小的顶点.3 并行最小割算法3.1 Pregel迭代过程Spark Pregel的计算过程以顶点为中心,由一系列supersteps(超级步)组成. 其核心部分是3个函数:顶点消息处理函数vprog、消息发送函数sendMsg及消息合并函数mergeMsg. 顶点状态有两种:活跃或不活跃. 在每一个superstep 中,顶点状态均可能变化:非活跃顶点收到消息后转为活跃状态,完成计算任务的顶点将被置为非活跃状态. 在第一次迭代(superstep 0)中,所有顶点均处于活跃状态,它们将接收到用户定义的初始消息.一个superstep的执行过程为:(1) 顶点接收由上一个superstep传来的消息,若收到多个消息,则通过消息合并函数mergeMsg进行合并处理;(2) 并行地,对活跃顶点调用用户定义的顶点消息处理函数vprog,此时顶点属性值可能改变;(3) 被置为非活跃状态的顶点将不会向相邻顶点发送消息,而活跃顶点经消息发送函数sendMsg判断后,可能向相邻顶点发送消息,这些消息将在下一个superstep中被相邻顶点接收. 当所有顶点均处于非活跃状态或迭代次数达到用户设置的最大迭代次数后,迭代终止.3.2 算法设计在LMIS算法的启发下,希望在每次迭代中判断顶点是否属于最小割集S,并将一部分顶点所属确定下来. 最后一次迭代完成时,每个顶点的属性值value标志该节点是否属于最小割集S.首轮迭代中,对顶点度d(v)进行判断,若d(v)大于中值m,则该顶点被加入集合S的概率为1 若顶点度d(v)=1,则该顶点不在最小割集中,被标记为不属于集合S. 此时暂不能判断剩余顶点是否属于割集S,标记其为未知顶点.第二轮及其后迭代中,对未知顶点进行处理. 若顶点收到“集合S中已有相邻顶点”消息,则将该顶点标记为不属于集合S;若该顶点的所有相邻顶点均不属于集合S,则将顶点加入集合S;否则,若该顶点的度d(v)>1,则该顶点被加入集合S的概率为算法设计表示为Algorithm 1. The minimal cut set algorithmwith you" to its adjacent vertexes;31: end if 32: end while算法迭代步骤如图1所示.图 1 并行最小割算法的迭代步骤Fig.1 Supersteps of parallel minimal cut set algorithm4 实验4.1 实验环境使用Spark集群进行实验. Spark是美国UCBerkeley大学AMP实验室于2009年为大规模数据处理设计的快速通用计算平台. 与Hadoop MapReduce相比,Spark支持基于内存计算,中间数据在内存中读写,迭代计算效率大幅提高. Spark集群配置如表1所示.表 1 Spark集群配置Tab.1 Spark cluster configuration机器内存/GB处理器IP 硬盘容量/TB 集群角色Master 16 i5 4590 10.21.63.46 2 Master/Worker Slave1 16 i5 4590 10.21.63.91 2 Worker Slave2 16 i5 4590 10.21.63.143 2 Worker4.2 实验数据实验使用香港大学提供的BoardEx数据库.BoardEx是一家总部位于伦敦的数据服务公司.BoardEx数据库包含世界上大部分董事会成员及高级执行官的详细关系信息,也包含个人教育背景和职业生涯等信息.实验使用各个地区的董事会关系网数据. 数据以文本文件形式导入,利用数据文件中的公司标识号CompanyID与主管标识号DirectorID字段生成二分图.4.3 实验设计实验一共使用3个BoardEx数据集,分别为欧洲地区、美国和英国的董事会关系网数据. 其中,美国的数据集规模最大,成员关系最复杂. 欧洲地区与英国的数据集规模基本相同. 基于这3个完整数据集开展并行最小割算法(PMCS)实验,结果如表2所示.表 2 实验一结果Tab.2 Results of test 1数据集迭代次数原始大小/GB 顶点/个边/条运行时间/s美国 4 39.91 842 990 130 362 143 13 086欧洲 4 8.67 693 110 39 797 782 496英国 4 9.89 714 048 50 693 467 888从表2可见,顶点间边的数量,也即不同成员之间的关系数,对时间开销的影响较大. 美国数据集成员数与英欧接近,但其成员间关系远比英欧复杂,求解时间开销远高于英欧. 主要原因是Pregel采用哈希方法进行图划分,虽能保证各节点的负载均衡,却未考虑节点间通信开销. 通信开销随各划分子图间边数增长而增大,是影响性能的关键因素. 处理具有相同数量级顶点个数的图时,由于关系边数目不同,时间开销可相差数个数量级. 基于Spark平台的并行最小割算法应用于大规模社交网络数据处理,可在时间开销较理想情况下得到实用结果.实验二使用规模较小的数据集,将穷举法与并行最小割算法(PMCS)进行对比,结果如图2所示.图 2 实验二结果Fig.2 Results of test 2实验二结果表明,使用穷举法求解最小割集的时间开销呈指数增长. 这是由于顶点集的子集数目随顶点增加而呈指数增长,而穷举法须逐一判断子集是否为最小割集. 由实验二可见,使用穷举法处理仅含数十个顶点的图,其时间开销已较大. 在大规模社交网络图情境下,需处理的顶点数目可达上百万,非并行穷举法显然无法实际应用. 并行最小割算法的高效率使其相较于非并行算法具有明显优势,在数据集规模较大的情况下,并行算法是合理选择.实验三使用单机运行PMCS算法,基于美国、欧洲与英国的数据集进行实验. 其中基于美国数据集的实验经过5d未完成,已终止. 单机及三机集群时间开销对比见图3,可见三机集群时间开销远低于单机.图 3 实验三结果Fig.3 Results of test 35 结论本文提出一种基于Spark Pregel平台的并行最小割算法,利用分布式Spark集群处理大规模社交网络数据,使用BoardEx数据库进行实验. 实验结果显示,在社交网络大数据计算中,该并行算法相较于非并行算法具有明显优势. 实际应用中,可借助该算法找到金融社交网络中需监管的最小割集节点,提升金融监管效率.在未来的工作中,需要进一步探讨与完善,工作将主要在两个方面展开:(1) 尝试进一步完善算法,探讨算法的改进点,提高该并行算法的效率. (2) 结合实际,探讨具体应用情景下算法效率与精确度的取舍关系.参考文献:[ 1 ]GHEMAWAT S, GOBIOFF H, LEUNG S T. The Google file system[C]//ROBBERT V R. SIGOPS Operating Systems Review. NY: ACM Press, 2003, 37(5): 29-43.[ 2 ]DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters[J]. Communications of the ACM,2008, 51(1): 107-113.[ 3 ]CHANG F, DEAN J, GHEMAWAT S, et al. Bigtable: A distributed storage system for structured data[J]. ACM Transactions on Computer Systems (TOCS), 2008, 26(2): 4.[ 4 ]MALEWICZ G, AUSTERN M H, BIK A J C, et al. Pregel:a system for large-scale graph processing[C] //AHMED E.Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. NY: ACM Press, 2010:135-146.[ 5 ]李成, 涂永前. “后金融危机时代”我国金融监管体制的完善[J]. 南京审计学院学报, 2011, 8(1): 21-26.LI C, TU Y Q. The improvement of China's financial supervision system after the global financial crisis[J]. Journal of Nanjing Audit University, 2011, 8(1): 21-26.[ 6 ]张圣. 互联网金融监管的必要性与核心原则[J]. 时代金融, 2016(23): 25.[ 7 ]王平. 新形势下我国金融监管改革与完善[J]. 法学杂志,2011, 32(10): 44-47.WANG P. Research about reform of finance supervision under new situation[J]. Law Science Magazine, 2011, 32(10):44-47.[ 8 ]何剑锋. 论我国互联网金融监管的法律路径[J]. 暨南学报(哲学社会科学版), 2016, 38(1): 58-65.HE J F. On the legal paths to the internet financial regulation of our country[J]. Jinan Journal(Philosophy and Social Sciences), 2016, 38(1): 58-65.[ 9 ]赵江红. 金融腐败的对策思考[J]. 现代经济信息, 2010,(7): 149+152. [10]HAVELIWALA T H. Topic-sensitive pagerank: A contextsensitive ranking algorithm for web search[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4):784-796.[11]RICHARDSON M, DOMINGOS P. The intelligent surfer:Probabilistic combination of link and content information in pagerank[C] //THOMAS G D. Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2002:1441-1448.[12]张岭, 马范援. 加速评估算法: 一种提高Web结构挖掘质量的新方法[J]. 计算机研究与发展, 2004, 41(1): 98-103.ZHANG L, MA F Y. Accelerated ranking: a new method to improve Web structure mining quality[J]. Journal of Computer Research and Development, 2004, 41(1): 98-103.[13]饶东宁, 温远丽, 魏来, 等. 基于Spark平台的社交网络在不同文化环境中的中心度加权算法[J]. 广东工业大学学报, 2017, 34(3): 15-20.RAO D N, WEN Y L, WEI L, et al. A weighted centrality algorithm for social networks based on Spark Platform in Different Cultural environments[J]. Journal of Guangdong University of Technology, 2017, 34(3): 15-20.[14]曾雪琳, 吴斌. 基于位置的社会化网络的并行化推荐算法[J]. 计算机应用, 2016, 36(2): 316-323.ZENG X L, WU B. Parallelized recommendation algorithm in location-based social network[J]. Journal of Computer Applications, 2016, 36(2): 316-323.[15]梁彦. 基于分布式平台Spark和YARN的数据挖掘算法的并行化研究[D]. 广州: 中山大学数据科学与计算机学院,2014.[16]CHEN J, LI K, TANG Z, et al. A parallel random forest algorithm for big data in a Spark cloud computing environment[J]. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(4): 919-933.[17]邱荣财. 基于Spark平台的CURE算法并行化设计与应用[D]. 广州: 华南理工大学计算机科学与工程学院, 2014.[18]HAN M, DAUDJEE K, AMMAR K, et al. An experimental comparison of pregel-like graph processing systems[J]. Proceedings of the VLDB Endowment, 2014, 7(12): 1047-1058.[19]KARP R M, WIGDERSON A. A fast parallel algorithm for the maximal independent set problem[J]. Journal of the ACM (JACM), 1985, 32(4): 762-773.[20]LUBY M. A simple parallel algorithm for the maximal independent set problem[J]. SIAM Journal on Computing, 1986,15(4): 1036-1053.。

求解非线性极大极小问题的一种新的混合算法

求解非线性极大极小问题的一种新的混合算法
q iky. S m u ai n n c mp rs n b s d n wo uc l i lto a d o a io s a e o t wel n wn r b e d mo sr t t e f c i e e s l -k o p o l ms e n tae h e f t n s e v
第 3 卷 第4 4 期 21年 l月 01 2
长春理工大学学报 ( 自然科学版 )
J u n l f a g h nUn v ri fS in ea dT c n l g ( tr l ce c i o o r a Ch n c u ie s y o ce c n e h oo y Nau a in eEdt n) o t S i
p ril s r a t e wa m o t z t n wi a e sb l y a e l f r n n i e r r i —ma r b e c p i a i t f a i i t —b s d r e o o l a n mi o h i o n a x p o lms Co p e t h a g e a e m a d wi t e g r g t r h
A e y i N w H brd App o c o o i a i i r a h f rN nl ne rM n -m a o lm s x Pr b e
LI Gu z i U o h
( olg f cec ,La n gUnvri f erl m & C e cl cn lg .F su 10 1 C l eo i e i i i syo P t e e S n on e t ou h mi h ooy a Te uh n1 3 0 )
行基规则 , 避免了惩罚函数法的缺点 , 且计算结果表 oe Jee 方 令 n表示 搜 索 空 间的 维数 ,z=(fz2… , 明 了 Hok- evs 法 的快 速 收 敛 性 和微 粒 群 算 - z1 f j j l) z 表示微粒 i 当前 的位置 , fP2… ,加 表示 微 法 的可 靠性 都得 到 了进一 步 的改善 。 P =( 1 f ' ' P) 混 合算 法 : 粒 i 曾经 达 到 的最 好 位 置 。种群 中最优 微 粒 的序 号 第1 : 步 随机初始化一群微粒的位置和速度 ; 用 g表 示 , 粒 i 速 度 用 :(小 , ,i) 微 的 … " 表 U D 第2 : 步 以 为 评 价 函数 , 别对 每 个微 粒 求 问 分 示 。每个微粒根据( ) 2 式来更新 自己的速度和位置 :

基于潜在特征空间的低秩表示算法

基于潜在特征空间的低秩表示算法
基于潜在特征空间的低秩表示算法
周翊航 广东工业大学计算机学院,广东 广州
收稿日期:2021年3月28日;录用日期:2021年4月21日;发布日期:2021年4月28日
摘要
现有的大多低秩表示算法都是直接使用原始数据矩阵作为特征字典,然而原始数据中的冗余特征和噪声 信息很可能会导致算法效果不佳。针对这种情况,本文提出一种基于潜在特征空间的低秩表示算法,通 过正交字典来学习原始数据的潜在表示,然后利用潜在表示作为字典进行低秩表示学习,从而避免原始 数据中的不利影响。
1141
计算机科学与应用
周翊航
2.2. 算法引入
在一般情况下,LRR 算法可以取得不错的聚类效果。然而在实际应用中,受到各种环境和人为等各
种因素的影响,我们所得到的数据往往包含冗余特征和噪声信息。所以直接将原始特征矩阵作为字典,
学习得到的表示系数矩阵通常会受到其中的不利因素的影响。为此,我们提出潜在特征空间学习方法,
+ Tr
( X − PK − E1 )T Y1
( ) =
arg minP
µ Tr
2
( X − PK − E1 + Y1
µ )T ( X − PK − E1 + Y1
µ)
=
µ arg minP 2
X − PK − E1 + Y1
µ
2 F
(7)
=
arg minP
µ 2
( X + Y1
µ − E1 ) − PK
Computer Science and Application 计算机科学与应用, 2021, 11(4), 1140-1148 Published Online April 2021 in Hans. /journal/csa https:///10.12677/csa.2021.114117

一类非光滑优化问题的邻近交替方向法

一类非光滑优化问题的邻近交替方向法

一类非光滑优化问题的邻近交替方向法钱伟懿;杨岩【摘要】非光滑优化问题在现实生活中有着广泛应用.针对一类带有结构特征为两个连续凸函数与具有Lipschitz梯度的二次可微函数的和的无约束非光滑非凸优化问题,给出了一种邻近交替方向法,称之为二次上界逼近算法.该算法结合交替方向法与邻近点算法的思想,将上述优化问题转化为平行的子问题.在求解子问题的过程中,对目标函数中的光滑部分线性化,此时子问题被转化为凸优化问题.然后分别对两个凸优化子问题交替利用邻近点算法求解.基于以上思想,首先我们给出算法的伪代码,然后建立了算法收敛性的充分条件,最后证明在该条件下,算法产生迭代序列的每个极限点是原问题的临界点.【期刊名称】《渤海大学学报(自然科学版)》【年(卷),期】2018(039)002【总页数】5页(P134-138)【关键词】非光滑优化;交替方向法;邻近点算法;收敛性分析;临界点【作者】钱伟懿;杨岩【作者单位】渤海大学数理学院, 辽宁锦州121013;渤海大学数理学院, 辽宁锦州121013【正文语种】中文【中图分类】O2210 引言考虑下列非凸非光滑的极小化问题(P) min{φ(x,y):=f(x)+g(y)+h(x,y)|x∈Rn,y∈Rm}其中函数φ有下界→(-∞,+∞),g: Rm→(-∞,+∞)都是正常的连续凸函数,h:Rn×Rm→R是具有Lipschitz梯度的二次可微函数,即存在常数L∈(0,∞),使得‖▽h(x,y)-▽h(x′,y′)‖≤L‖(x,y)-(x′,y′)‖Attouch等人〔1〕最先对问题(P)进行研究,将常规的Gauss-Seidel迭代引入算法中,给定初始点(x0,y0),由下列公式产生迭代序列{(xk,yk)}k∈N该方法被称为交替极小化方法. Gauss-Seidel方法的收敛性分析在很多文献中可见〔2-4〕,证明其收敛性的必要假设条件之一是每步迭代得到唯一最小解〔5〕,否则算法可能无限循环没有收敛性〔6〕 . 当一个变量固定,假设连续可微函数φ关于另一个变量是严格凸的,按照以上的方法迭代产生的迭代序列{(xk,yk)}k∈N 的极限点极小化目标函数φ〔3〕. 对凸光滑约束最小化问题,Beck和Tetruashvili〔7〕提出了块坐标梯度投影算法,并讨论了其全局收敛速率.去掉严格凸的假设,考虑邻近正则化Gauss-Seidel迭代其中αk,βk是正实数.该方法最先是Auslender〔8〕提出的,并进一步研究了含有非二次邻近项的线性约束凸问题〔9〕. 以上结果只是得到子序列的收敛性.当φ非凸非光滑情况下,收敛性分析是一个值得研究的课题.当φ是非凸非光滑的条件下,Attouch等人〔1,10〕证明了由邻近Gauss-Seidel 迭代〔10〕产生的序列是收敛的. 在文献〔10〕中,Attouch用熟知的proximal-forward-backward (PFB)算法求解非凸非光滑最小化问题,也得到了相似的结论. Bolte〔11〕和Daniilidis〔12〕等人在假设目标函数φ满足Kurdyka-Lojasiewicz(KL)性质的条件下,研究了一类非光滑最优化问题.交替方向法(Alternating direction method,简称ADM)最初是由Gabay和Mercier〔13〕提出. 该方法与Douglas-Rachford算子分裂算法紧密相关〔14-16〕. Eckstein〔17〕将邻近点算法(Proximal point algorithm,简称PPA)与ADM方法相结合得到了邻近交替方向法(PADM). 基于ADM方法,Beck〔18〕对凸最小化问题提出了次线性收敛速度的迭代再加权最小二乘法. Bolte和Sabach 等人〔19〕在强Kurdyka-ojasiewicz性质下对非光滑非凸优化问题提出了邻近交替线性化算法,该方法是对优化问题中光滑部分向前一个梯度步,非光滑部分向后一个邻近步,非精确求解每个线性化的子问题,迭代产生序列收敛到一个临界点. Fazel等人〔20〕提出了带半正定邻近项的交替方法,是在一定的条件下将邻近项中的正定算子扩展到半正定算子.1 预备知识本节,我们陈述一些基本概念和性质〔21〕,方便之后的证明.定义1.1 设S⊂Rn,如果对∀x1,x2∈S,0≤λ≤1,有λx1+(1-λ)x2∈S,则称S为凸集.定义1.2 设S为Rn上的凸集,如果对任意x,y ∈S,0≤λ≤1,有f(λx+(1-λ)y)≤λf(x)+(1-λ)f(y),∀x,y∈S,λ∈[0,1]则称f(x)为S上的凸函数.定理1.1 对于定义在一个开凸集O⊂Rn上的可微函数f,下面条件等价:(a)f在O上是凸函数;(b)<x1-x0,▽f(x1)-▽f(x0)>≥0对于任意的x0和x1在O上成立;(c)f(y)≥f(x)+<▽f(x),y-x>对于任意的x和y在O上成立;(d)▽2f(x)是半正定对任意的x在O上.定义1.3 函数f的次微分∂f:Rn→Rn,定义为∂f(x)={w∈Rn:f(x)-f(v)≤<w,x-v>,∀v∈Rn}若那么点称为函数f:Rn→R的临界点.定义1.4 设S⊆Rn为非空闭凸集,若f:S→R可微,其满足对任意的x,y∈S,μ>0总有f(y)≥f(x)+<▽则称f在非空闭凸集C上是μ强凸的.2 算法及收敛性分析设(1)(2)其中s∈Rn,x∈Rn,y∈Rm,t∈Rm,x∈Rn,y∈Rm. 式子(1)和(2)是将问题(P)用交替方向法产生的逼近子问题,因而称为二次上界逼近算法(Quadratic Upper-bound Approximation algorithm,简称QUA算法.QUA算法的伪代码:1. 给定初始点x0∈Rn,y0∈Rm,正实数选择正实数αk↘↘令k=0,2. k=k+13. xk+1∈arg min{ux(x,xk,yk,αk):x∈Rn}(3)4. yk+1∈arg min{uy(y,xk+1,yk,βk):y∈Rm}(4)回到第二步,直到满足某一终止条件.引理2.1 设(xk,yk)是由QUA算法迭代产生的序列,那么(5)且无穷级数和是可和的,从而有‖xk+1-xk‖→0和‖yk+1-yk‖→0.证明由二阶梯度的定义得‖▽2h(x,y)‖≤L, ∀x,y∈Rn函数h分别对x和y泰勒展开,可得下列不等式(6)(7)由αK>L ,βk>L 得f(xk+1)+g(yk)+h(xk+1,yk)≤ux(xk+1,xk,yk,αk)(8)f(xk+1)+g(yk+1)+h(xk+1,yk+1)≤uy(yk+1,xk+1,yk,βk)(9)因为在xk+1和yk+1取得极小,所以有ux(xk+1,xk,yk,αk)≤ux(xk,xk,yk,αk)=f(xk)+g(yk)+h(xk,yk)(10)uy(yk+1,xk+1,yk,βk)≤uy(yk,xk+1,yk,βk)=f(xk+1)+g(yk)+h(xk+1,yk) (11)▽xh(xk,yk)>-f(xk+1)▽yh(xk+1,yk)>-g(yk+1)应用不等式(6)和(7)得(12)(13)将不等式(12)和(13)相加得不等式(5).进一步,由不等式(5)得因此,无穷级数是可和的. 证毕.定理2.1 QUA算法迭代序列(xk,yk)的每个极限点(x*,y*)是问题(P)的临界点. 证明对迭代序列(xk,yk)的每个极限点(x*,y*)总是存在一个子序列,使得(xkj,ykj)→(x*,y*). 因为xkj+1∈arg min{ux(x,xkj,ykj,αkj):x∈Rn}(14)ykj+1∈arg min{uy(y,xkj+1,ykj,βkj):y∈Rm}(15)可得ux(xkj+1,xkj,ykj,αkj)≤ux(x,xkj,ykj,αkj), ∀x∈Rn(16)uy(ykj+1,xkj+1,ykj,βkj)≤uy(y,xkj+1,ykj,βkj), ∀y∈Rm(17)由引理2.1知‖xkj+1-xkj‖→0,‖ykj+1-ykj‖→0,从而(xkj+1,ykj+1)→(x*,y*).令j→∞得∀x∈Rn(18)∀y∈Rm(19)由最优性条件得-▽xh(x*,y*)∈∂f(x*)-▽yh(x*,y*)∈∂g(y*)极限点(x*,y*)是问题(P)的临界点.证毕.参考文献:【相关文献】〔1〕ATTOUCH H, BOLTE J, REDONT P, et al. Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-ojasiewicz inequality〔J〕. Mathematics of Operations Research, 2010, 35(2): 438-457. 〔2〕AUSLENDER A. Méthodes numériques pour la décomposition et la minimisation de functions non différentiables〔J〕. Numerische Mathematik, 1971, 18: 213-223.〔3〕BERTSEKAS D P, TSITSIKLIS J N. Parallel and distributed computation: numerical methods, prentice-hall〔M〕. New Jersey, 1989.〔4〕TSENG P. Convergence of block coordinate descent method for nondifferentiable minimization〔J〕. Journal of Optimization Theory and Applications, 2001, 109(3): 475-494.〔5〕ZANGWILL W L. Nonlinear programming: a unified approach〔M〕. Prentice Hall, 1971.〔6〕POWELL M. On search directions for minimization algorithms〔J〕. Mathematical Programming, 1973, 4: 193-201.〔7〕BECK A, TETRUASHVILI L. On the convergence of block coordinate descent type methods〔M〕. Preprint, 2011.〔8〕AUSLENDER A. Asymptotic properties of the fenchel dual functional and applications to decomposition problems〔J〕. Journal of Optimization Theory & Applications, 1992,73(3): 427-449.〔9〕AUSLENDER A, TEBOULLE M, BEN-TIBA S. Coupling the logarithmic-quadratic proximal method and the block nonlinear Gauss-Seidel algorithm for linearly constrained convex minimization〔J〕. 1999, 477: 35-47.〔10〕ATTOUCH H, BOLTE J, SVAITER B F. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods〔J〕. Mathematical Programming, 2013, 137(1-2): 91-129.〔11〕BOLTE J, DANIILIDIS A, LEWIS A A. The ojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems〔J〕. Siam Journal on Optimization, 2007, 17(4): 1205-1223.〔12〕DANIILIDIS A, LEWIS A A, SHIOTAH M. Clarke subgradients of stratifiable functions 〔J〕. Siam Journal on Optimization, 2007, 18(2): 556-572.〔13〕GABAY D, B MERCIER. A dual algorithm for the solution of nonlinear variational problems via finite element approximations〔J〕. Computers & Mathematics with Applications, 1976, 2(1): 17-40.〔14〕DOUGLAS J, RACHFORD H. On the numerical solution of heat conduction problems in two and three space variables〔M〕. Transactions of the American Mathematical Society, 1956, 82(2): 421-439.〔15〕SVAITER B. On weak convergence of the Douglas-Rachford method〔M〕. Society for Industrial and Applied Mathematics, 2011.〔16〕BOT R, HENDRICH C. A Douglas-Rachford type primal-dual method for solving inclusions with mixtures of composite and parallel-sum type monotone operators〔J〕. Siam Journal on Optimization, 2012, 23(4): 2541-2565.〔17〕ECKSTEIN J. Some saddle-function splitting methods for convex programming〔J〕. Optimization Methods & Software, 1994, 4(1): 75-83.〔18〕BECK A. On the convergence of alternating minimization for convex programming with applications to Iteratively reweighted least squares and decomposition schemes〔J〕. Siam Journal on Optimization, 2013, 25(1): 185-209.〔19〕BOLTE J, SABACH S, TEBOULLE M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems〔J〕. Mathematical Programming, 2014, 146(1-2): 459-494.〔20〕FAZEL M, PONG T K, SUN D, et al. Hankel matrix rank minimization with applications to system identification and realization〔J〕. Siam Journal on Matrix Analysis & Applications, 2012, 34(3): 946-977.〔21〕ROCKAFELLAR R T. Convex analysis〔M〕. Princeton University Press, 1970.。

机器学习基石资源-215_handout

机器学习基石资源-215_handout
2 3 4
˜ N ·M ·d ˜ (N + M ) · d ˜ (N · M ) + d
Matrix Factorization
Linear Network Hypothesis
Fun Time
˜ ‘features’, how many variables need to For N users, M movies, and d be used to specify a linear network hypothesis h(x) = WT Vx? ˜ 1 N +M +d
‘Linear Network’ Hypothesis
x1 x2 ≈ y1
x=
x3 x4
VT : wni
(1)
W : wim
(2)
≈ y2 ≈ y3
=y
(xn = BinaryVectorEncoding(n), yn = [rn1 ? ? rn4 rn5 . . . rnM ]T )
• rename: VT for wni
—except for decision trees
• need: encoding (transform) from categorical to numerical
binary vector encoding: A = [1 0 0 0]T , B = [0 1 0 0]T , AB = [0 0 1 0]T , O = [0 0 0 1]T
• when wm fixed, minimizing vn ?
Basic Matrix Factorization
Matrix Factorization
T T rnm ≈ wT m vn = vn wm ⇐⇒ R ≈ V W
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Date : 8 May 2003. J. Tropp is with The University of Texas, Austin, TX 78712 USA, jtropp@.
1
NON-NEGATIVE MATRIX APPROXIMATION
2
1. Introduction Suppose that A is a d × N matrix whose columns are vectors of non-negative data. The problem of extracting r linear features from A can be stated as solving the approximation problem A ≈ VH , where V is a d × r matrix of feature vectors and H is an r × N matrix of coefficients. We quantify the approximation error using a measure that reflects the provenance of the data. For example, if we are working with probability distributions, it makes sense to use the variational distance or the Kullback-Leibler divergence. A common way to find features is to compute a partial singular value decomposition (SVD) of the data matrix [2]. Although partial SVD provides the best approximation to A in the sense of total least squares, it yields feature vectors that have both positive and negative components. Such features have no interpretation if the data are non-negative. If the data consist of pixel intensities from images, what does a negative intensity signify? If the data consist of word counts from documents, what does a negative word count mean? To address this metaphysical complaint, several groups of researchers have proposed adding non-negativity constraints to the approximation problem. Paatero and Tapper advocated solving minV ,H ≥0 VH − A
2. Mathematical Background 2.1. Point-to-Set Maps. To understand the convergence of the algorithm, we need some elementary concepts from the theory of point-to-set maps. The power set P (Y ) of a set Y is the collection of all subsets of Y . A point-to-set map Ω from the set X to the set Y is a function Ω : X −→ P (Y ). The composition of two point-to-set maps Ω1 : X −→ P (Y ) and Ω2 : Y −→ P (Z ) is defined by (Ω2 ◦ Ω1 )(x) = ∪y∈Ω1 (x) Ω2 (y ) [8]. Suppose that X and Y are endowed with topologies so that we may speak of convergence. A point-to-set map Ω is closed at x ∈ X if {xk } ⊂ X , xk −→ x, yk ∈ Ω(xk ) and yk −→ y together imply that y ∈ Ω(x). In words, every convergent sequence whose elements lie in the sets {Ω(xk )} must have its limit in the set Ω(x). A map is closed on X if it is closed at each point of X . The composition of two closed maps is always closed [8]. We also define two different types of stationary points. A fixed point of the map Ω : X −→ P (X ) is a point x for which ralized fixed point of Ω is a point x for which x ∈ Ω(x) [9]. 2.2. Iterative Algorithms. Many procedures in mathematical programming can be described using the language of point-to-set maps. An algorithm is a point-to-set map Ω : X −→ P (X ). Given an initial point x0 , an algorithm generates a sequence of points via the rule xk+1 ∈ Ω(xk ). Now, suppose that J : X −→ R+ is a continuous, non-negative function. An algorithm Ω is monotonic with respect to J whenever y ∈ Ω(x) implies that J (y ) ≤ J (x). If, in addition, y ∈ Ω(x) and J (y ) = J (x) imply that y = x, then we say that the algorithm is strictly monotonic.
AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION
JOEL A. TROPP Abstract. Matrix approximation problems with non-negativity constraints arise during the analysis of high-dimensional non-negative data. Applications include dimensionality reduction, feature extraction and statistical factor analysis. This paper discusses a general algorithm for solving these problems with respect to a large class of error measures. The primary goal of this article is to develop a rigorous convergence theory for the algorithm. It also touches on some experimental results of S. Sra [1] which indicate the superiority of this method to some previously proposed techniques.
F
for statistical factor analysis [3]. In their first paper
on the subject [3], they suggest using an alternating least-squares algorithm [3] to solve the problem. Later Paatero proposes instead a modified gradient descent [4] and a conjugate gradient method [5]. Independently, Lee and Seung considered the same optimization problem for some applications in machine learning. They originally solved it using an alternating projected gradient technique [6]. More recently, they have provided two optimally scaled gradient descent algorithms which respectively minimize the Frobenius norm and the Kullback-Leibler divergence of the approximation error [7]. None of these algorithms have been compared in the literature, and proper convergence proofs are scarce. We shall call the problem of approximating A ≈ VH subject to the constraints V , H ≥ 0 non-negative matrix approximation (NNMA). The present work discusses an alternating minimization algorithm for solving NNMA problems where the approximation error is measured
相关文档
最新文档