Alternating Sign Matricesand Symmetry

合集下载

识别序列为非回纹对称结构限制核酸酶的正确切割位点

识别序列为非回纹对称结构限制核酸酶的正确切割位点
颜 炳 学 , 李 宁 , 常 信 。 吴
(. 国农 业 大学农 业 生物 技术 国家 实验 室 , 京 10 9 ;. 国农 业大 学动 物科 学技 术学 院 , 京 10 9) 1中 北 00 42 中 北 0 0 4

要 : 制 性 内切 核 酸 酶 的 切 割 识 别 序 列 分 为 回 纹 对 称 和 非 回 纹 对 称 结 构 两 类 , 于 DNA 是 互 补 双 链 ,所 以 对 限 由
t o s r ndsOfD N A w e e oti ntc 、T her f e t e r r o w ta r n de i al e or h t ue ec gnii s que e ar zng e nc s e not onl y one n hi .I t s exp i er—
中 图 分 类 号 : 5 . Q5 6 4 文献 标 识码 : A 文 章 编 号 : 2 3 9 7 ( 0 2) 4 0 2 0 5 — 7 2 2 0 0 — 4 0—0 3
The Tr e R e o ni i e e e o h e t i to u c g z ng S qu nc f t e R s r c i n Enz m e W h s y o e R e o ni i e c g z ng S que e i o nc s N npa i dr m i ln o c
he i c lx om p e e ar t a l m nt y s r nds .So t e o he r c gni i e zng s que e lndr nc s ofpa i om e e ym e i w o standsofD N A e e i nt— nz n t r w r de i

ABSTRACT Progressive Simplicial Complexes

ABSTRACT Progressive Simplicial Complexes

Progressive Simplicial Complexes Jovan Popovi´c Hugues HoppeCarnegie Mellon University Microsoft ResearchABSTRACTIn this paper,we introduce the progressive simplicial complex(PSC) representation,a new format for storing and transmitting triangu-lated geometric models.Like the earlier progressive mesh(PM) representation,it captures a given model as a coarse base model together with a sequence of refinement transformations that pro-gressively recover detail.The PSC representation makes use of a more general refinement transformation,allowing the given model to be an arbitrary triangulation(e.g.any dimension,non-orientable, non-manifold,non-regular),and the base model to always consist of a single vertex.Indeed,the sequence of refinement transforma-tions encodes both the geometry and the topology of the model in a unified multiresolution framework.The PSC representation retains the advantages of PM’s.It defines a continuous sequence of approx-imating models for runtime level-of-detail control,allows smooth transitions between any pair of models in the sequence,supports progressive transmission,and offers a space-efficient representa-tion.Moreover,by allowing changes to topology,the PSC sequence of approximations achieves betterfidelity than the corresponding PM sequence.We develop an optimization algorithm for constructing PSC representations for graphics surface models,and demonstrate the framework on models that are both geometrically and topologically complex.CR Categories:I.3.5[Computer Graphics]:Computational Geometry and Object Modeling-surfaces and object representations.Additional Keywords:model simplification,level-of-detail representa-tions,multiresolution,progressive transmission,geometry compression.1INTRODUCTIONModeling and3D scanning systems commonly give rise to triangle meshes of high complexity.Such meshes are notoriously difficult to render,store,and transmit.One approach to speed up rendering is to replace a complex mesh by a set of level-of-detail(LOD) approximations;a detailed mesh is used when the object is close to the viewer,and coarser approximations are substituted as the object recedes[6,8].These LOD approximations can be precomputed Work performed while at Microsoft Research.Email:jovan@,hhoppe@Web:/jovan/Web:/hoppe/automatically using mesh simplification methods(e.g.[2,10,14,20,21,22,24,27]).For efficient storage and transmission,meshcompression schemes[7,26]have also been developed.The recently introduced progressive mesh(PM)representa-tion[13]provides a unified solution to these problems.In PM form,an arbitrary mesh M is stored as a coarse base mesh M0together witha sequence of n detail records that indicate how to incrementally re-fine M0into M n=M(see Figure7).Each detail record encodes theinformation associated with a vertex split,an elementary transfor-mation that adds one vertex to the mesh.In addition to defininga continuous sequence of approximations M0M n,the PM rep-resentation supports smooth visual transitions(geomorphs),allowsprogressive transmission,and makes an effective mesh compressionscheme.The PM representation has two restrictions,however.First,it canonly represent meshes:triangulations that correspond to orientable12-dimensional manifolds.Triangulated2models that cannot be rep-resented include1-d manifolds(open and closed curves),higherdimensional polyhedra(e.g.triangulated volumes),non-orientablesurfaces(e.g.M¨o bius strips),non-manifolds(e.g.two cubes joinedalong an edge),and non-regular models(i.e.models of mixed di-mensionality).Second,the expressiveness of the PM vertex splittransformations constrains all meshes M0M n to have the same topological type.Therefore,when M is topologically complex,the simplified base mesh M0may still have numerous triangles(Fig-ure7).In contrast,a number of existing simplification methods allowtopological changes as the model is simplified(Section6).Ourwork is inspired by vertex unification schemes[21,22],whichmerge vertices of the model based on geometric proximity,therebyallowing genus modification and component merging.In this paper,we introduce the progressive simplicial complex(PSC)representation,a generalization of the PM representation thatpermits topological changes.The key element of our approach isthe introduction of a more general refinement transformation,thegeneralized vertex split,that encodes changes to both the geometryand topology of the model.The PSC representation expresses anarbitrary triangulated model M(e.g.any dimension,non-orientable,non-manifold,non-regular)as the result of successive refinementsapplied to a base model M1that always consists of a single vertex (Figure8).Thus both geometric and topological complexity are recovered progressively.Moreover,the PSC representation retains the advantages of PM’s,including continuous LOD,geomorphs, progressive transmission,and model compression.In addition,we develop an optimization algorithm for construct-ing a PSC representation from a given model,as described in Sec-tion4.1The particular parametrization of vertex splits in[13]assumes that mesh triangles are consistently oriented.2Throughout this paper,we use the words“triangulated”and“triangula-tion”in the general dimension-independent sense.Figure 1:Illustration of a simplicial complex K and some of its subsets.2BACKGROUND2.1Concepts from algebraic topologyTo precisely define both triangulated models and their PSC repre-sentations,we find it useful to introduce some elegant abstractions from algebraic topology (e.g.[15,25]).The geometry of a triangulated model is denoted as a tuple (K V )where the abstract simplicial complex K is a combinatorial structure specifying the adjacency of vertices,edges,triangles,etc.,and V is a set of vertex positions specifying the shape of the model in 3.More precisely,an abstract simplicial complex K consists of a set of vertices 1m together with a set of non-empty subsets of the vertices,called the simplices of K ,such that any set consisting of exactly one vertex is a simplex in K ,and every non-empty subset of a simplex in K is also a simplex in K .A simplex containing exactly d +1vertices has dimension d and is called a d -simplex.As illustrated pictorially in Figure 1,the faces of a simplex s ,denoted s ,is the set of non-empty subsets of s .The star of s ,denoted star(s ),is the set of simplices of which s is a face.The children of a d -simplex s are the (d 1)-simplices of s ,and its parents are the (d +1)-simplices of star(s ).A simplex with exactly one parent is said to be a boundary simplex ,and one with no parents a principal simplex .The dimension of K is the maximum dimension of its simplices;K is said to be regular if all its principal simplices have the same dimension.To form a triangulation from K ,identify its vertices 1m with the standard basis vectors 1m ofm.For each simplex s ,let the open simplex smdenote the interior of the convex hull of its vertices:s =m:jmj =1j=1jjsThe topological realization K is defined as K =K =s K s .The geometric realization of K is the image V (K )where V :m 3is the linear map that sends the j -th standard basis vector jm to j 3.Only a restricted set of vertex positions V =1m lead to an embedding of V (K )3,that is,prevent self-intersections.The geometric realization V (K )is often called a simplicial complex or polyhedron ;it is formed by an arbitrary union of points,segments,triangles,tetrahedra,etc.Note that there generally exist many triangulations (K V )for a given polyhedron.(Some of the vertices V may lie in the polyhedron’s interior.)Two sets are said to be homeomorphic (denoted =)if there ex-ists a continuous one-to-one mapping between them.Equivalently,they are said to have the same topological type .The topological realization K is a d-dimensional manifold without boundary if for each vertex j ,star(j )=d .It is a d-dimensional manifold if each star(v )is homeomorphic to either d or d +,where d +=d:10.Two simplices s 1and s 2are d-adjacent if they have a common d -dimensional face.Two d -adjacent (d +1)-simplices s 1and s 2are manifold-adjacent if star(s 1s 2)=d +1.Figure 2:Illustration of the edge collapse transformation and its inverse,the vertex split.Transitive closure of 0-adjacency partitions K into connected com-ponents .Similarly,transitive closure of manifold-adjacency parti-tions K into manifold components .2.2Review of progressive meshesIn the PM representation [13],a mesh with appearance attributes is represented as a tuple M =(K V D S ),where the abstract simpli-cial complex K is restricted to define an orientable 2-dimensional manifold,the vertex positions V =1m determine its ge-ometric realization V (K )in3,D is the set of discrete material attributes d f associated with 2-simplices f K ,and S is the set of scalar attributes s (v f )(e.g.normals,texture coordinates)associated with corners (vertex-face tuples)of K .An initial mesh M =M n is simplified into a coarser base mesh M 0by applying a sequence of n successive edge collapse transforma-tions:(M =M n )ecol n 1ecol 1M 1ecol 0M 0As shown in Figure 2,each ecol unifies the two vertices of an edgea b ,thereby removing one or two triangles.The position of the resulting unified vertex can be arbitrary.Because the edge collapse transformation has an inverse,called the vertex split transformation (Figure 2),the process can be reversed,so that an arbitrary mesh M may be represented as a simple mesh M 0together with a sequence of n vsplit records:M 0vsplit 0M 1vsplit 1vsplit n 1(M n =M )The tuple (M 0vsplit 0vsplit n 1)forms a progressive mesh (PM)representation of M .The PM representation thus captures a continuous sequence of approximations M 0M n that can be quickly traversed for interac-tive level-of-detail control.Moreover,there exists a correspondence between the vertices of any two meshes M c and M f (0c f n )within this sequence,allowing for the construction of smooth vi-sual transitions (geomorphs)between them.A sequence of such geomorphs can be precomputed for smooth runtime LOD.In addi-tion,PM’s support progressive transmission,since the base mesh M 0can be quickly transmitted first,followed the vsplit sequence.Finally,the vsplit records can be encoded concisely,making the PM representation an effective scheme for mesh compression.Topological constraints Because the definitions of ecol and vsplit are such that they preserve the topological type of the mesh (i.e.all K i are homeomorphic),there is a constraint on the min-imum complexity that K 0may achieve.For instance,it is known that the minimal number of vertices for a closed genus g mesh (ori-entable 2-manifold)is (7+(48g +1)12)2if g =2(10if g =2)[16].Also,the presence of boundary components may further constrain the complexity of K 0.Most importantly,K may consist of a number of components,and each is required to appear in the base mesh.For example,the meshes in Figure 7each have 117components.As evident from the figure,the geometry of PM meshes may deteriorate severely as they approach topological lower bound.M 1;100;(1)M 10;511;(7)M 50;4656;(12)M 200;1552277;(28)M 500;3968690;(58)M 2000;14253219;(108)M 5000;029010;(176)M n =34794;0068776;(207)Figure 3:Example of a PSC representation.The image captions indicate the number of principal 012-simplices respectively and the number of connected components (in parenthesis).3PSC REPRESENTATION 3.1Triangulated modelsThe first step towards generalizing PM’s is to let the PSC repre-sentation encode more general triangulated models,instead of just meshes.We denote a triangulated model as a tuple M =(K V D A ).The abstract simplicial complex K is not restricted to 2-manifolds,but may in fact be arbitrary.To represent K in memory,we encode the incidence graph of the simplices using the following linked structures (in C++notation):struct Simplex int dim;//0=vertex,1=edge,2=triangle,...int id;Simplex*children[MAXDIM+1];//[0..dim]List<Simplex*>parents;;To render the model,we draw only the principal simplices ofK ,denoted (K )(i.e.vertices not adjacent to edges,edges not adjacent to triangles,etc.).The discrete attributes D associate amaterial identifier d s with each simplex s(K ).For the sake of simplicity,we avoid explicitly storing surface normals at “corners”(using a set S )as done in [13].Instead we let the material identifier d s contain a smoothing group field [28],and let a normal discontinuity (crease )form between any pair of adjacent triangles with different smoothing groups.Previous vertex unification schemes [21,22]render principal simplices of dimension 0and 1(denoted 01(K ))as points and lines respectively with fixed,device-dependent screen widths.To better approximate the model,we instead define a set A that associates an area a s A with each simplex s 01(K ).We think of a 0-simplex s 00(K )as approximating a sphere with area a s 0,and a 1-simplex s 1=j k 1(K )as approximating a cylinder (with axis (j k ))of area a s 1.To render a simplex s 01(K ),we determine the radius r model of the corresponding sphere or cylinder in modeling space,and project the length r model to obtain the radius r screen in screen pixels.Depending on r screen ,we render the simplex as a polygonal sphere or cylinder with radius r model ,a 2D point or line with thickness 2r screen ,or do not render it at all.This choice based on r screen can be adjusted to mitigate the overhead of introducing polygonal representations of spheres and cylinders.As an example,Figure 3shows an initial model M of 68,776triangles.One of its approximations M 500is a triangulated model with 3968690principal 012-simplices respectively.3.2Level-of-detail sequenceAs in progressive meshes,from a given triangulated model M =M n ,we define a sequence of approximations M i :M 1op 1M 2op 2M n1op n 1M nHere each model M i has exactly i vertices.The simplification op-erator M ivunify iM i +1is the vertex unification transformation,whichmerges two vertices (Section 3.3),and its inverse M igvspl iM i +1is the generalized vertex split transformation (Section 3.4).Thetuple (M 1gvspl 1gvspl n 1)forms a progressive simplicial complex (PSC)representation of M .To construct a PSC representation,we first determine a sequence of vunify transformations simplifying M down to a single vertex,as described in Section 4.After reversing these transformations,we renumber the simplices in the order that they are created,so thateach gvspl i (a i)splits the vertex a i K i into two vertices a i i +1K i +1.As vertices may have different positions in the different models,we denote the position of j in M i as i j .To better approximate a surface model M at lower complexity levels,we initially associate with each (principal)2-simplex s an area a s equal to its triangle area in M .Then,as the model is simplified,wekeep constant the sum of areas a s associated with principal simplices within each manifold component.When2-simplices are eventually reduced to principal1-simplices and0-simplices,their associated areas will provide good estimates of the original component areas.3.3Vertex unification transformationThe transformation vunify(a i b i midp i):M i M i+1takes an arbitrary pair of vertices a i b i K i+1(simplex a i b i need not be present in K i+1)and merges them into a single vertex a i K i. Model M i is created from M i+1by updating each member of the tuple(K V D A)as follows:K:References to b i in all simplices of K are replaced by refer-ences to a i.More precisely,each simplex s in star(b i)K i+1is replaced by simplex(s b i)a i,which we call the ancestor simplex of s.If this ancestor simplex already exists,s is deleted.V:Vertex b is deleted.For simplicity,the position of the re-maining(unified)vertex is set to either the midpoint or is left unchanged.That is,i a=(i+1a+i+1b)2if the boolean parameter midp i is true,or i a=i+1a otherwise.D:Materials are carried through as expected.So,if after the vertex unification an ancestor simplex(s b i)a i K i is a new principal simplex,it receives its material from s K i+1if s is a principal simplex,or else from the single parent s a i K i+1 of s.A:To maintain the initial areas of manifold components,the areasa s of deleted principal simplices are redistributed to manifold-adjacent neighbors.More concretely,the area of each princi-pal d-simplex s deleted during the K update is distributed toa manifold-adjacent d-simplex not in star(a ib i).If no suchneighbor exists and the ancestor of s is a principal simplex,the area a s is distributed to that ancestor simplex.Otherwise,the manifold component(star(a i b i))of s is being squashed be-tween two other manifold components,and a s is discarded. 3.4Generalized vertex split transformation Constructing the PSC representation involves recording the infor-mation necessary to perform the inverse of each vunify i.This inverse is the generalized vertex split gvspl i,which splits a0-simplex a i to introduce an additional0-simplex b i.(As mentioned previously, renumbering of simplices implies b i i+1,so index b i need not be stored explicitly.)Each gvspl i record has the formgvspl i(a i C K i midp i()i C D i C A i)and constructs model M i+1from M i by updating the tuple (K V D A)as follows:K:As illustrated in Figure4,any simplex adjacent to a i in K i can be the vunify result of one of four configurations in K i+1.To construct K i+1,we therefore replace each ancestor simplex s star(a i)in K i by either(1)s,(2)(s a i)i+1,(3)s and(s a i)i+1,or(4)s,(s a i)i+1and s i+1.The choice is determined by a split code associated with s.Thesesplit codes are stored as a code string C Ki ,in which the simplicesstar(a i)are sortedfirst in order of increasing dimension,and then in order of increasing simplex id,as shown in Figure5. V:The new vertex is assigned position i+1i+1=i ai+()i.Theother vertex is given position i+1ai =i ai()i if the boolean pa-rameter midp i is true;otherwise its position remains unchanged.D:The string C Di is used to assign materials d s for each newprincipal simplex.Simplices in C Di ,as well as in C Aibelow,are sorted by simplex dimension and simplex id as in C Ki. A:During reconstruction,we are only interested in the areas a s fors01(K).The string C Ai tracks changes in these areas.Figure4:Effects of split codes on simplices of various dimensions.code string:41422312{}Figure5:Example of split code encoding.3.5PropertiesLevels of detail A graphics application can efficiently transitionbetween models M1M n at runtime by performing a sequence ofvunify or gvspl transformations.Our current research prototype wasnot designed for efficiency;it attains simplification rates of about6000vunify/sec and refinement rates of about5000gvspl/sec.Weexpect that a careful redesign using more efficient data structureswould significantly improve these rates.Geomorphs As in the PM representation,there exists a corre-spondence between the vertices of the models M1M n.Given acoarser model M c and afiner model M f,1c f n,each vertexj K f corresponds to a unique ancestor vertex f c(j)K cfound by recursively traversing the ancestor simplex relations:f c(j)=j j cf c(a j1)j cThis correspondence allows the creation of a smooth visual transi-tion(geomorph)M G()such that M G(1)equals M f and M G(0)looksidentical to M c.The geomorph is defined as the modelM G()=(K f V G()D f A G())in which each vertex position is interpolated between its originalposition in V f and the position of its ancestor in V c:Gj()=()fj+(1)c f c(j)However,we must account for the special rendering of principalsimplices of dimension0and1(Section3.1).For each simplexs01(K f),we interpolate its area usinga G s()=()a f s+(1)a c swhere a c s=0if s01(K c).In addition,we render each simplexs01(K c)01(K f)using area a G s()=(1)a c s.The resultinggeomorph is visually smooth even as principal simplices are intro-duced,removed,or change dimension.The accompanying video demonstrates a sequence of such geomorphs.Progressive transmission As with PM’s,the PSC representa-tion can be progressively transmitted by first sending M 1,followed by the gvspl records.Unlike the base mesh of the PM,M 1always consists of a single vertex,and can therefore be sent in a fixed-size record.The rendering of lower-dimensional simplices as spheres and cylinders helps to quickly convey the overall shape of the model in the early stages of transmission.Model compression Although PSC gvspl are more general than PM vsplit transformations,they offer a surprisingly concise representation of M .Table 1lists the average number of bits re-quired to encode each field of the gvspl records.Using arithmetic coding [30],the vertex id field a i requires log 2i bits,and the boolean parameter midp i requires 0.6–0.9bits for our models.The ()i delta vector is quantized to 16bitsper coordinate (48bits per),and stored as a variable-length field [7,13],requiring about 31bits on average.At first glance,each split code in the code string C K i seems to have 4possible outcomes (except for the split code for 0-simplex a i which has only 2possible outcomes).However,there exist constraints between these split codes.For example,in Figure 5,the code 1for 1-simplex id 1implies that 2-simplex id 1also has code 1.This in turn implies that 1-simplex id 2cannot have code 2.Similarly,code 2for 1-simplex id 3implies a code 2for 2-simplex id 2,which in turn implies that 1-simplex id 4cannot have code 1.These constraints,illustrated in the “scoreboard”of Figure 6,can be summarized using the following two rules:(1)If a simplex has split code c12,all of its parents havesplit code c .(2)If a simplex has split code 3,none of its parents have splitcode 4.As we encode split codes in C K i left to right,we apply these two rules (and their contrapositives)transitively to constrain the possible outcomes for split codes yet to be ing arithmetic coding with uniform outcome probabilities,these constraints reduce the code string length in Figure 6from 15bits to 102bits.In our models,the constraints reduce the code string from 30bits to 14bits on average.The code string is further reduced using a non-uniform probability model.We create an array T [0dim ][015]of encoding tables,indexed by simplex dimension (0..dim)and by the set of possible (constrained)split codes (a 4-bit mask).For each simplex s ,we encode its split code c using the probability distribution found in T [s dim ][s codes mask ].For 2-dimensional models,only 10of the 48tables are non-trivial,and each table contains at most 4probabilities,so the total size of the probability model is small.These encoding tables reduce the code strings to approximately 8bits as shown in Table 1.By comparison,the PM representation requires approximately 5bits for the same information,but of course it disallows topological changes.To provide more intuition for the efficiency of the PSC repre-sentation,we note that capturing the connectivity of an average 2-manifold simplicial complex (n vertices,3n edges,and 2n trian-gles)requires ni =1(log 2i +8)n (log 2n +7)bits with PSC encoding,versus n (12log 2n +95)bits with a traditional one-way incidence graph representation.For improved compression,it would be best to use a hybrid PM +PSC representation,in which the more concise PM vertex split encoding is used when the local neighborhood is an orientableFigure 6:Constraints on the split codes for the simplices in the example of Figure 5.Table 1:Compression results and construction times.Object#verts Space required (bits/n )Trad.Con.n K V D Arepr.time a i C K i midp i (v )i C D i C Ai bits/n hrs.drumset 34,79412.28.20.928.1 4.10.453.9146.1 4.3destroyer 83,79913.38.30.723.1 2.10.347.8154.114.1chandelier 36,62712.47.60.828.6 3.40.853.6143.6 3.6schooner 119,73413.48.60.727.2 2.5 1.353.7148.722.2sandal 4,6289.28.00.733.4 1.50.052.8123.20.4castle 15,08211.0 1.20.630.70.0-43.5-0.5cessna 6,7959.67.60.632.2 2.50.152.6132.10.5harley 28,84711.97.90.930.5 1.40.453.0135.7 3.52-dimensional manifold (this occurs on average 93%of the time in our examples).To compress C D i ,we predict the material for each new principalsimplex sstar(a i )star(b i )K i +1by constructing an ordered set D s of materials found in star(a i )K i .To improve the coding model,the first materials in D s are those of principal simplices in star(s )K i where s is the ancestor of s ;the remainingmaterials in star(a i )K i are appended to D s .The entry in C D i associated with s is the index of its material in D s ,encoded arithmetically.If the material of s is not present in D s ,it is specified explicitly as a global index in D .We encode C A i by specifying the area a s for each new principalsimplex s 01(star(a i )star(b i ))K i +1.To account for this redistribution of area,we identify the principal simplex from which s receives its area by specifying its index in 01(star(a i ))K i .The column labeled in Table 1sums the bits of each field of the gvspl records.Multiplying by the number n of vertices in M gives the total number of bits for the PSC representation of the model (e.g.500KB for the destroyer).By way of compari-son,the next column shows the number of bits per vertex required in a traditional “IndexedFaceSet”representation,with quantization of 16bits per coordinate and arithmetic coding of face materials (3n 16+2n 3log 2n +materials).4PSC CONSTRUCTIONIn this section,we describe a scheme for iteratively choosing pairs of vertices to unify,in order to construct a PSC representation.Our algorithm,a generalization of [13],is time-intensive,seeking high quality approximations.It should be emphasized that many quality metrics are possible.For instance,the quadric error metric recently introduced by Garland and Heckbert [9]provides a different trade-off of execution speed and visual quality.As in [13,20],we first compute a cost E for each candidate vunify transformation,and enter the candidates into a priority queueordered by ascending cost.Then,in each iteration i =n 11,we perform the vunify at the front of the queue and update the costs of affected candidates.4.1Forming set of candidate vertex pairs In principle,we could enter all possible pairs of vertices from M into the priority queue,but this would be prohibitively expensive since simplification would then require at least O(n2log n)time.Instead, we would like to consider only a smaller set of candidate vertex pairs.Naturally,should include the1-simplices of K.Additional pairs should also be included in to allow distinct connected com-ponents of M to merge and to facilitate topological changes.We considered several schemes for forming these additional pairs,in-cluding binning,octrees,and k-closest neighbor graphs,but opted for the Delaunay triangulation because of its adaptability on models containing components at different scales.We compute the Delaunay triangulation of the vertices of M, represented as a3-dimensional simplicial complex K DT.We define the initial set to contain both the1-simplices of K and the subset of1-simplices of K DT that connect vertices in different connected components of K.During the simplification process,we apply each vertex unification performed on M to as well in order to keep consistent the set of candidate pairs.For models in3,star(a i)has constant size in the average case,and the overall simplification algorithm requires O(n log n) time.(In the worst case,it could require O(n2log n)time.)4.2Selecting vertex unifications fromFor each candidate vertex pair(a b),the associated vunify(a b):M i M i+1is assigned the costE=E dist+E disc+E area+E foldAs in[13],thefirst term is E dist=E dist(M i)E dist(M i+1),where E dist(M)measures the geometric accuracy of the approximate model M.Conceptually,E dist(M)approximates the continuous integralMd2(M)where d(M)is the Euclidean distance of the point to the closest point on M.We discretize this integral by defining E dist(M)as the sum of squared distances to M from a dense set of points X sampled from the original model M.We sample X from the set of principal simplices in K—a strategy that generalizes to arbitrary triangulated models.In[13],E disc(M)measures the geometric accuracy of disconti-nuity curves formed by a set of sharp edges in the mesh.For the PSC representation,we generalize the concept of sharp edges to that of sharp simplices in K—a simplex is sharp either if it is a boundary simplex or if two of its parents are principal simplices with different material identifiers.The energy E disc is defined as the sum of squared distances from a set X disc of points sampled from sharp simplices to the discontinuity components from which they were sampled.Minimization of E disc therefore preserves the geom-etry of material boundaries,normal discontinuities(creases),and triangulation boundaries(including boundary curves of a surface and endpoints of a curve).We have found it useful to introduce a term E area that penalizes surface stretching(a more sophisticated version of the regularizing E spring term of[13]).Let A i+1N be the sum of triangle areas in the neighborhood star(a i)star(b i)K i+1,and A i N the sum of triangle areas in star(a i)K i.The mean squared displacement over the neighborhood N due to the change in area can be approx-imated as disp2=12(A i+1NA iN)2.We let E area=X N disp2,where X N is the number of points X projecting in the neighborhood. To prevent model self-intersections,the last term E fold penalizes surface folding.We compute the rotation of each oriented triangle in the neighborhood due to the vertex unification(as in[10,20]).If any rotation exceeds a threshold angle value,we set E fold to a large constant.Unlike[13],we do not optimize over the vertex position i a, but simply evaluate E for i a i+1a i+1b(i+1a+i+1b)2and choose the best one.This speeds up the optimization,improves model compression,and allows us to introduce non-quadratic energy terms like E area.5RESULTSTable1gives quantitative results for the examples in thefigures and in the video.Simplification times for our prototype are measured on an SGI Indigo2Extreme(150MHz R4400).Although these times may appear prohibitive,PSC construction is an off-line task that only needs to be performed once per model.Figure9highlights some of the benefits of the PSC representa-tion.The pearls in the chandelier model are initially disconnected tetrahedra;these tetrahedra merge and collapse into1-d curves in lower-complexity approximations.Similarly,the numerous polyg-onal ropes in the schooner model are simplified into curves which can be rendered as line segments.The straps of the sandal model initially have some thickness;the top and bottom sides of these straps merge in the simplification.Also note the disappearance of the holes on the sandal straps.The castle example demonstrates that the original model need not be a mesh;here M is a1-dimensional non-manifold obtained by extracting edges from an image.6RELATED WORKThere are numerous schemes for representing and simplifying tri-angulations in computer graphics.A common special case is that of subdivided2-manifolds(meshes).Garland and Heckbert[12] provide a recent survey of mesh simplification techniques.Several methods simplify a given model through a sequence of edge col-lapse transformations[10,13,14,20].With the exception of[20], these methods constrain edge collapses to preserve the topological type of the model(e.g.disallow the collapse of a tetrahedron into a triangle).Our work is closely related to several schemes that generalize the notion of edge collapse to that of vertex unification,whereby separate connected components of the model are allowed to merge and triangles may be collapsed into lower dimensional simplices. Rossignac and Borrel[21]overlay a uniform cubical lattice on the object,and merge together vertices that lie in the same cubes. Schaufler and St¨u rzlinger[22]develop a similar scheme in which vertices are merged using a hierarchical clustering algorithm.Lue-bke[18]introduces a scheme for locally adapting the complexity of a scene at runtime using a clustering octree.In these schemes, the approximating models correspond to simplicial complexes that would result from a set of vunify transformations(Section3.3).Our approach differs in that we order the vunify in a carefully optimized sequence.More importantly,we define not only a simplification process,but also a new representation for the model using an en-coding of gvspl=vunify1transformations.Recent,independent work by Schmalstieg and Schaufler[23]de-velops a similar strategy of encoding a model using a sequence of vertex split transformations.Their scheme differs in that it tracks only triangles,and therefore requires regular,2-dimensional trian-gulations.Hence,it does not allow lower-dimensional simplices in the model approximations,and does not generalize to higher dimensions.Some simplification schemes make use of an intermediate vol-umetric representation to allow topological changes to the model. He et al.[11]convert a mesh into a binary inside/outside function discretized on a three-dimensional grid,low-passfilter this function,。

利用有限域上奇异辛几何构造一个新的带仲裁的认证码

利用有限域上奇异辛几何构造一个新的带仲裁的认证码


) .
收稿 日 :0 00—5 作者简介:高有 (96 月生) 期 2 1—30 . 16 年7 ,男,博 士,教授. 究方 向:代数 、编码与密码 研 基金项 目:国家 自 然科学基金 (1 706 天津 自 6 192 ) 然科学基金 (8 c J 30 ) OJ YB c19 0.
分 别 称 S E , 和 为 信 源 集 ,发 方 编 码 规 则集 , 收 方 解 码 规 则 集 和信 息 集 , 集 , T ER
合 fIlT,ER, I S,E II Il 的基数称 为这个认证码 的参数 .在 一个具有仲裁 的认证系统 中,有 4 个
参与者 :发方 ,收方 ,敌方和仲裁人 ,共有下面五种攻击:

ma {, 一 一s x 0m )
mi{, 一2} n/m 8,
成立 .若 ( s尼2 m,,; +z 是非空 的,那么它 在 2 ,(q作用 下构成一子空 问轨道 . 由 , ) 蚪l F )
文献 [ ] 1 知 6
M ( s ;u ,) =g。+一 +)(一)一) m,, 2 +e l 2 。m k m 知 ( + (
1 敌方 的模 仿攻击 .敌方在未观 测到任何信 息的条件 下 ,发送 一个信息给 收方 ,若收 方 ) 将其 当作合法信 息接 收 ( 即该信 息满足 收方 的认证 条件) ,则称敌方 的模仿攻击成 功.假定编 码 规则服从均匀分布,将敌方模仿攻击成功的最大概率记 为 P, ,则

m ax



( P P 合 同于 i ) l
, 、
c s , =
( 。。 一8 。) s ; 一
- 4 -
记F
的所有的 ( s ) m,, 型子空间所成的集合 为M ( s ;u- 1 ) m,, 2 4 , .由文献 【 】 - 1 可 6

《神经网络与深度学习综述DeepLearning15May2014

《神经网络与深度学习综述DeepLearning15May2014

Draft:Deep Learning in Neural Networks:An OverviewTechnical Report IDSIA-03-14/arXiv:1404.7828(v1.5)[cs.NE]J¨u rgen SchmidhuberThe Swiss AI Lab IDSIAIstituto Dalle Molle di Studi sull’Intelligenza ArtificialeUniversity of Lugano&SUPSIGalleria2,6928Manno-LuganoSwitzerland15May2014AbstractIn recent years,deep artificial neural networks(including recurrent ones)have won numerous con-tests in pattern recognition and machine learning.This historical survey compactly summarises relevantwork,much of it from the previous millennium.Shallow and deep learners are distinguished by thedepth of their credit assignment paths,which are chains of possibly learnable,causal links between ac-tions and effects.I review deep supervised learning(also recapitulating the history of backpropagation),unsupervised learning,reinforcement learning&evolutionary computation,and indirect search for shortprograms encoding deep and large networks.PDF of earlier draft(v1):http://www.idsia.ch/∼juergen/DeepLearning30April2014.pdfLATEX source:http://www.idsia.ch/∼juergen/DeepLearning30April2014.texComplete BIBTEXfile:http://www.idsia.ch/∼juergen/bib.bibPrefaceThis is the draft of an invited Deep Learning(DL)overview.One of its goals is to assign credit to those who contributed to the present state of the art.I acknowledge the limitations of attempting to achieve this goal.The DL research community itself may be viewed as a continually evolving,deep network of scientists who have influenced each other in complex ways.Starting from recent DL results,I tried to trace back the origins of relevant ideas through the past half century and beyond,sometimes using“local search”to follow citations of citations backwards in time.Since not all DL publications properly acknowledge earlier relevant work,additional global search strategies were employed,aided by consulting numerous neural network experts.As a result,the present draft mostly consists of references(about800entries so far).Nevertheless,through an expert selection bias I may have missed important work.A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century.For these reasons,the present draft should be viewed as merely a snapshot of an ongoing credit assignment process.To help improve it,please do not hesitate to send corrections and suggestions to juergen@idsia.ch.Contents1Introduction to Deep Learning(DL)in Neural Networks(NNs)3 2Event-Oriented Notation for Activation Spreading in FNNs/RNNs3 3Depth of Credit Assignment Paths(CAPs)and of Problems4 4Recurring Themes of Deep Learning54.1Dynamic Programming(DP)for DL (5)4.2Unsupervised Learning(UL)Facilitating Supervised Learning(SL)and RL (6)4.3Occam’s Razor:Compression and Minimum Description Length(MDL) (6)4.4Learning Hierarchical Representations Through Deep SL,UL,RL (6)4.5Fast Graphics Processing Units(GPUs)for DL in NNs (6)5Supervised NNs,Some Helped by Unsupervised NNs75.11940s and Earlier (7)5.2Around1960:More Neurobiological Inspiration for DL (7)5.31965:Deep Networks Based on the Group Method of Data Handling(GMDH) (8)5.41979:Convolution+Weight Replication+Winner-Take-All(WTA) (8)5.51960-1981and Beyond:Development of Backpropagation(BP)for NNs (8)5.5.1BP for Weight-Sharing Feedforward NNs(FNNs)and Recurrent NNs(RNNs)..95.6Late1980s-2000:Numerous Improvements of NNs (9)5.6.1Ideas for Dealing with Long Time Lags and Deep CAPs (10)5.6.2Better BP Through Advanced Gradient Descent (10)5.6.3Discovering Low-Complexity,Problem-Solving NNs (11)5.6.4Potential Benefits of UL for SL (11)5.71987:UL Through Autoencoder(AE)Hierarchies (12)5.81989:BP for Convolutional NNs(CNNs) (13)5.91991:Fundamental Deep Learning Problem of Gradient Descent (13)5.101991:UL-Based History Compression Through a Deep Hierarchy of RNNs (14)5.111992:Max-Pooling(MP):Towards MPCNNs (14)5.121994:Contest-Winning Not So Deep NNs (15)5.131995:Supervised Recurrent Very Deep Learner(LSTM RNN) (15)5.142003:More Contest-Winning/Record-Setting,Often Not So Deep NNs (16)5.152006/7:Deep Belief Networks(DBNs)&AE Stacks Fine-Tuned by BP (17)5.162006/7:Improved CNNs/GPU-CNNs/BP-Trained MPCNNs (17)5.172009:First Official Competitions Won by RNNs,and with MPCNNs (18)5.182010:Plain Backprop(+Distortions)on GPU Yields Excellent Results (18)5.192011:MPCNNs on GPU Achieve Superhuman Vision Performance (18)5.202011:Hessian-Free Optimization for RNNs (19)5.212012:First Contests Won on ImageNet&Object Detection&Segmentation (19)5.222013-:More Contests and Benchmark Records (20)5.22.1Currently Successful Supervised Techniques:LSTM RNNs/GPU-MPCNNs (21)5.23Recent Tricks for Improving SL Deep NNs(Compare Sec.5.6.2,5.6.3) (21)5.24Consequences for Neuroscience (22)5.25DL with Spiking Neurons? (22)6DL in FNNs and RNNs for Reinforcement Learning(RL)236.1RL Through NN World Models Yields RNNs With Deep CAPs (23)6.2Deep FNNs for Traditional RL and Markov Decision Processes(MDPs) (24)6.3Deep RL RNNs for Partially Observable MDPs(POMDPs) (24)6.4RL Facilitated by Deep UL in FNNs and RNNs (25)6.5Deep Hierarchical RL(HRL)and Subgoal Learning with FNNs and RNNs (25)6.6Deep RL by Direct NN Search/Policy Gradients/Evolution (25)6.7Deep RL by Indirect Policy Search/Compressed NN Search (26)6.8Universal RL (27)7Conclusion271Introduction to Deep Learning(DL)in Neural Networks(NNs) Which modifiable components of a learning system are responsible for its success or failure?What changes to them improve performance?This has been called the fundamental credit assignment problem(Minsky, 1963).There are general credit assignment methods for universal problem solvers that are time-optimal in various theoretical senses(Sec.6.8).The present survey,however,will focus on the narrower,but now commercially important,subfield of Deep Learning(DL)in Artificial Neural Networks(NNs).We are interested in accurate credit assignment across possibly many,often nonlinear,computational stages of NNs.Shallow NN-like models have been around for many decades if not centuries(Sec.5.1).Models with several successive nonlinear layers of neurons date back at least to the1960s(Sec.5.3)and1970s(Sec.5.5). An efficient gradient descent method for teacher-based Supervised Learning(SL)in discrete,differentiable networks of arbitrary depth called backpropagation(BP)was developed in the1960s and1970s,and ap-plied to NNs in1981(Sec.5.5).BP-based training of deep NNs with many layers,however,had been found to be difficult in practice by the late1980s(Sec.5.6),and had become an explicit research subject by the early1990s(Sec.5.9).DL became practically feasible to some extent through the help of Unsupervised Learning(UL)(e.g.,Sec.5.10,5.15).The1990s and2000s also saw many improvements of purely super-vised DL(Sec.5).In the new millennium,deep NNs havefinally attracted wide-spread attention,mainly by outperforming alternative machine learning methods such as kernel machines(Vapnik,1995;Sch¨o lkopf et al.,1998)in numerous important applications.In fact,supervised deep NNs have won numerous of-ficial international pattern recognition competitions(e.g.,Sec.5.17,5.19,5.21,5.22),achieving thefirst superhuman visual pattern recognition results in limited domains(Sec.5.19).Deep NNs also have become relevant for the more generalfield of Reinforcement Learning(RL)where there is no supervising teacher (Sec.6).Both feedforward(acyclic)NNs(FNNs)and recurrent(cyclic)NNs(RNNs)have won contests(Sec.5.12,5.14,5.17,5.19,5.21,5.22).In a sense,RNNs are the deepest of all NNs(Sec.3)—they are general computers more powerful than FNNs,and can in principle create and process memories of ar-bitrary sequences of input patterns(e.g.,Siegelmann and Sontag,1991;Schmidhuber,1990a).Unlike traditional methods for automatic sequential program synthesis(e.g.,Waldinger and Lee,1969;Balzer, 1985;Soloway,1986;Deville and Lau,1994),RNNs can learn programs that mix sequential and parallel information processing in a natural and efficient way,exploiting the massive parallelism viewed as crucial for sustaining the rapid decline of computation cost observed over the past75years.The rest of this paper is structured as follows.Sec.2introduces a compact,event-oriented notation that is simple yet general enough to accommodate both FNNs and RNNs.Sec.3introduces the concept of Credit Assignment Paths(CAPs)to measure whether learning in a given NN application is of the deep or shallow type.Sec.4lists recurring themes of DL in SL,UL,and RL.Sec.5focuses on SL and UL,and on how UL can facilitate SL,although pure SL has become dominant in recent competitions(Sec.5.17-5.22). Sec.5is arranged in a historical timeline format with subsections on important inspirations and technical contributions.Sec.6on deep RL discusses traditional Dynamic Programming(DP)-based RL combined with gradient-based search techniques for SL or UL in deep NNs,as well as general methods for direct and indirect search in the weight space of deep FNNs and RNNs,including successful policy gradient and evolutionary methods.2Event-Oriented Notation for Activation Spreading in FNNs/RNNs Throughout this paper,let i,j,k,t,p,q,r denote positive integer variables assuming ranges implicit in the given contexts.Let n,m,T denote positive integer constants.An NN’s topology may change over time(e.g.,Fahlman,1991;Ring,1991;Weng et al.,1992;Fritzke, 1994).At any given moment,it can be described as afinite subset of units(or nodes or neurons)N= {u1,u2,...,}and afinite set H⊆N×N of directed edges or connections between nodes.FNNs are acyclic graphs,RNNs cyclic.Thefirst(input)layer is the set of input units,a subset of N.In FNNs,the k-th layer(k>1)is the set of all nodes u∈N such that there is an edge path of length k−1(but no longer path)between some input unit and u.There may be shortcut connections between distant layers.The NN’s behavior or program is determined by a set of real-valued,possibly modifiable,parameters or weights w i(i=1,...,n).We now focus on a singlefinite episode or epoch of information processing and activation spreading,without learning through weight changes.The following slightly unconventional notation is designed to compactly describe what is happening during the runtime of the system.During an episode,there is a partially causal sequence x t(t=1,...,T)of real values that I call events.Each x t is either an input set by the environment,or the activation of a unit that may directly depend on other x k(k<t)through a current NN topology-dependent set in t of indices k representing incoming causal connections or links.Let the function v encode topology information and map such event index pairs(k,t)to weight indices.For example,in the non-input case we may have x t=f t(net t)with real-valued net t= k∈in t x k w v(k,t)(additive case)or net t= k∈in t x k w v(k,t)(multiplicative case), where f t is a typically nonlinear real-valued activation function such as tanh.In many recent competition-winning NNs(Sec.5.19,5.21,5.22)there also are events of the type x t=max k∈int (x k);some networktypes may also use complex polynomial activation functions(Sec.5.3).x t may directly affect certain x k(k>t)through outgoing connections or links represented through a current set out t of indices k with t∈in k.Some non-input events are called output events.Note that many of the x t may refer to different,time-varying activations of the same unit in sequence-processing RNNs(e.g.,Williams,1989,“unfolding in time”),or also in FNNs sequentially exposed to time-varying input patterns of a large training set encoded as input events.During an episode,the same weight may get reused over and over again in topology-dependent ways,e.g.,in RNNs,or in convolutional NNs(Sec.5.4,5.8).I call this weight sharing across space and/or time.Weight sharing may greatly reduce the NN’s descriptive complexity,which is the number of bits of information required to describe the NN (Sec.4.3).In Supervised Learning(SL),certain NN output events x t may be associated with teacher-given,real-valued labels or targets d t yielding errors e t,e.g.,e t=1/2(x t−d t)2.A typical goal of supervised NN training is tofind weights that yield episodes with small total error E,the sum of all such e t.The hope is that the NN will generalize well in later episodes,causing only small errors on previously unseen sequences of input events.Many alternative error functions for SL and UL are possible.SL assumes that input events are independent of earlier output events(which may affect the environ-ment through actions causing subsequent perceptions).This assumption does not hold in the broaderfields of Sequential Decision Making and Reinforcement Learning(RL)(Kaelbling et al.,1996;Sutton and Barto, 1998;Hutter,2005)(Sec.6).In RL,some of the input events may encode real-valued reward signals given by the environment,and a typical goal is tofind weights that yield episodes with a high sum of reward signals,through sequences of appropriate output actions.Sec.5.5will use the notation above to compactly describe a central algorithm of DL,namely,back-propagation(BP)for supervised weight-sharing FNNs and RNNs.(FNNs may be viewed as RNNs with certainfixed zero weights.)Sec.6will address the more general RL case.3Depth of Credit Assignment Paths(CAPs)and of ProblemsTo measure whether credit assignment in a given NN application is of the deep or shallow type,I introduce the concept of Credit Assignment Paths or CAPs,which are chains of possibly causal links between events.Let usfirst focus on SL.Consider two events x p and x q(1≤p<q≤T).Depending on the appli-cation,they may have a Potential Direct Causal Connection(PDCC)expressed by the Boolean predicate pdcc(p,q),which is true if and only if p∈in q.Then the2-element list(p,q)is defined to be a CAP from p to q(a minimal one).A learning algorithm may be allowed to change w v(p,q)to improve performance in future episodes.More general,possibly indirect,Potential Causal Connections(PCC)are expressed by the recursively defined Boolean predicate pcc(p,q),which in the SL case is true only if pdcc(p,q),or if pcc(p,k)for some k and pdcc(k,q).In the latter case,appending q to any CAP from p to k yields a CAP from p to q(this is a recursive definition,too).The set of such CAPs may be large but isfinite.Note that the same weight may affect many different PDCCs between successive events listed by a given CAP,e.g.,in the case of RNNs, or weight-sharing FNNs.Suppose a CAP has the form(...,k,t,...,q),where k and t(possibly t=q)are thefirst successive elements with modifiable w v(k,t).Then the length of the suffix list(t,...,q)is called the CAP’s depth (which is0if there are no modifiable links at all).This depth limits how far backwards credit assignment can move down the causal chain tofind a modifiable weight.1Suppose an episode and its event sequence x1,...,x T satisfy a computable criterion used to decide whether a given problem has been solved(e.g.,total error E below some threshold).Then the set of used weights is called a solution to the problem,and the depth of the deepest CAP within the sequence is called the solution’s depth.There may be other solutions(yielding different event sequences)with different depths.Given somefixed NN topology,the smallest depth of any solution is called the problem’s depth.Sometimes we also speak of the depth of an architecture:SL FNNs withfixed topology imply a problem-independent maximal problem depth bounded by the number of non-input layers.Certain SL RNNs withfixed weights for all connections except those to output units(Jaeger,2001;Maass et al.,2002; Jaeger,2004;Schrauwen et al.,2007)have a maximal problem depth of1,because only thefinal links in the corresponding CAPs are modifiable.In general,however,RNNs may learn to solve problems of potentially unlimited depth.Note that the definitions above are solely based on the depths of causal chains,and agnostic of the temporal distance between events.For example,shallow FNNs perceiving large“time windows”of in-put events may correctly classify long input sequences through appropriate output events,and thus solve shallow problems involving long time lags between relevant events.At which problem depth does Shallow Learning end,and Deep Learning begin?Discussions with DL experts have not yet yielded a conclusive response to this question.Instead of committing myself to a precise answer,let me just define for the purposes of this overview:problems of depth>10require Very Deep Learning.The difficulty of a problem may have little to do with its depth.Some NNs can quickly learn to solve certain deep problems,e.g.,through random weight guessing(Sec.5.9)or other types of direct search (Sec.6.6)or indirect search(Sec.6.7)in weight space,or through training an NNfirst on shallow problems whose solutions may then generalize to deep problems,or through collapsing sequences of(non)linear operations into a single(non)linear operation—but see an analysis of non-trivial aspects of deep linear networks(Baldi and Hornik,1994,Section B).In general,however,finding an NN that precisely models a given training set is an NP-complete problem(Judd,1990;Blum and Rivest,1992),also in the case of deep NNs(S´ıma,1994;de Souto et al.,1999;Windisch,2005);compare a survey of negative results(S´ıma, 2002,Section1).Above we have focused on SL.In the more general case of RL in unknown environments,pcc(p,q) is also true if x p is an output event and x q any later input event—any action may affect the environment and thus any later perception.(In the real world,the environment may even influence non-input events computed on a physical hardware entangled with the entire universe,but this is ignored here.)It is possible to model and replace such unmodifiable environmental PCCs through a part of the NN that has already learned to predict(through some of its units)input events(including reward signals)from former input events and actions(Sec.6.1).Its weights are frozen,but can help to assign credit to other,still modifiable weights used to compute actions(Sec.6.1).This approach may lead to very deep CAPs though.Some DL research is about automatically rephrasing problems such that their depth is reduced(Sec.4). In particular,sometimes UL is used to make SL problems less deep,e.g.,Sec.5.10.Often Dynamic Programming(Sec.4.1)is used to facilitate certain traditional RL problems,e.g.,Sec.6.2.Sec.5focuses on CAPs for SL,Sec.6on the more complex case of RL.4Recurring Themes of Deep Learning4.1Dynamic Programming(DP)for DLOne recurring theme of DL is Dynamic Programming(DP)(Bellman,1957),which can help to facili-tate credit assignment under certain assumptions.For example,in SL NNs,backpropagation itself can 1An alternative would be to count only modifiable links when measuring depth.In many typical NN applications this would not make a difference,but in some it would,e.g.,Sec.6.1.be viewed as a DP-derived method(Sec.5.5).In traditional RL based on strong Markovian assumptions, DP-derived methods can help to greatly reduce problem depth(Sec.6.2).DP algorithms are also essen-tial for systems that combine concepts of NNs and graphical models,such as Hidden Markov Models (HMMs)(Stratonovich,1960;Baum and Petrie,1966)and Expectation Maximization(EM)(Dempster et al.,1977),e.g.,(Bottou,1991;Bengio,1991;Bourlard and Morgan,1994;Baldi and Chauvin,1996; Jordan and Sejnowski,2001;Bishop,2006;Poon and Domingos,2011;Dahl et al.,2012;Hinton et al., 2012a).4.2Unsupervised Learning(UL)Facilitating Supervised Learning(SL)and RL Another recurring theme is how UL can facilitate both SL(Sec.5)and RL(Sec.6).UL(Sec.5.6.4) is normally used to encode raw incoming data such as video or speech streams in a form that is more convenient for subsequent goal-directed learning.In particular,codes that describe the original data in a less redundant or more compact way can be fed into SL(Sec.5.10,5.15)or RL machines(Sec.6.4),whose search spaces may thus become smaller(and whose CAPs shallower)than those necessary for dealing with the raw data.UL is closely connected to the topics of regularization and compression(Sec.4.3,5.6.3). 4.3Occam’s Razor:Compression and Minimum Description Length(MDL) Occam’s razor favors simple solutions over complex ones.Given some programming language,the prin-ciple of Minimum Description Length(MDL)can be used to measure the complexity of a solution candi-date by the length of the shortest program that computes it(e.g.,Solomonoff,1964;Kolmogorov,1965b; Chaitin,1966;Wallace and Boulton,1968;Levin,1973a;Rissanen,1986;Blumer et al.,1987;Li and Vit´a nyi,1997;Gr¨u nwald et al.,2005).Some methods explicitly take into account program runtime(Al-lender,1992;Watanabe,1992;Schmidhuber,2002,1995);many consider only programs with constant runtime,written in non-universal programming languages(e.g.,Rissanen,1986;Hinton and van Camp, 1993).In the NN case,the MDL principle suggests that low NN weight complexity corresponds to high NN probability in the Bayesian view(e.g.,MacKay,1992;Buntine and Weigend,1991;De Freitas,2003), and to high generalization performance(e.g.,Baum and Haussler,1989),without overfitting the training data.Many methods have been proposed for regularizing NNs,that is,searching for solution-computing, low-complexity SL NNs(Sec.5.6.3)and RL NNs(Sec.6.7).This is closely related to certain UL methods (Sec.4.2,5.6.4).4.4Learning Hierarchical Representations Through Deep SL,UL,RLMany methods of Good Old-Fashioned Artificial Intelligence(GOFAI)(Nilsson,1980)as well as more recent approaches to AI(Russell et al.,1995)and Machine Learning(Mitchell,1997)learn hierarchies of more and more abstract data representations.For example,certain methods of syntactic pattern recog-nition(Fu,1977)such as grammar induction discover hierarchies of formal rules to model observations. The partially(un)supervised Automated Mathematician/EURISKO(Lenat,1983;Lenat and Brown,1984) continually learns concepts by combining previously learnt concepts.Such hierarchical representation learning(Ring,1994;Bengio et al.,2013;Deng and Yu,2014)is also a recurring theme of DL NNs for SL (Sec.5),UL-aided SL(Sec.5.7,5.10,5.15),and hierarchical RL(Sec.6.5).Often,abstract hierarchical representations are natural by-products of data compression(Sec.4.3),e.g.,Sec.5.10.4.5Fast Graphics Processing Units(GPUs)for DL in NNsWhile the previous millennium saw several attempts at creating fast NN-specific hardware(e.g.,Jackel et al.,1990;Faggin,1992;Ramacher et al.,1993;Widrow et al.,1994;Heemskerk,1995;Korkin et al., 1997;Urlbe,1999),and at exploiting standard hardware(e.g.,Anguita et al.,1994;Muller et al.,1995; Anguita and Gomes,1996),the new millennium brought a DL breakthrough in form of cheap,multi-processor graphics cards or GPUs.GPUs are widely used for video games,a huge and competitive market that has driven down hardware prices.GPUs excel at fast matrix and vector multiplications required not only for convincing virtual realities but also for NN training,where they can speed up learning by a factorof50and more.Some of the GPU-based FNN implementations(Sec.5.16-5.19)have greatly contributed to recent successes in contests for pattern recognition(Sec.5.19-5.22),image segmentation(Sec.5.21), and object detection(Sec.5.21-5.22).5Supervised NNs,Some Helped by Unsupervised NNsThe main focus of current practical applications is on Supervised Learning(SL),which has dominated re-cent pattern recognition contests(Sec.5.17-5.22).Several methods,however,use additional Unsupervised Learning(UL)to facilitate SL(Sec.5.7,5.10,5.15).It does make sense to treat SL and UL in the same section:often gradient-based methods,such as BP(Sec.5.5.1),are used to optimize objective functions of both UL and SL,and the boundary between SL and UL may blur,for example,when it comes to time series prediction and sequence classification,e.g.,Sec.5.10,5.12.A historical timeline format will help to arrange subsections on important inspirations and techni-cal contributions(although such a subsection may span a time interval of many years).Sec.5.1briefly mentions early,shallow NN models since the1940s,Sec.5.2additional early neurobiological inspiration relevant for modern Deep Learning(DL).Sec.5.3is about GMDH networks(since1965),perhaps thefirst (feedforward)DL systems.Sec.5.4is about the relatively deep Neocognitron NN(1979)which is similar to certain modern deep FNN architectures,as it combines convolutional NNs(CNNs),weight pattern repli-cation,and winner-take-all(WTA)mechanisms.Sec.5.5uses the notation of Sec.2to compactly describe a central algorithm of DL,namely,backpropagation(BP)for supervised weight-sharing FNNs and RNNs. It also summarizes the history of BP1960-1981and beyond.Sec.5.6describes problems encountered in the late1980s with BP for deep NNs,and mentions several ideas from the previous millennium to overcome them.Sec.5.7discusses afirst hierarchical stack of coupled UL-based Autoencoders(AEs)—this concept resurfaced in the new millennium(Sec.5.15).Sec.5.8is about applying BP to CNNs,which is important for today’s DL applications.Sec.5.9explains BP’s Fundamental DL Problem(of vanishing/exploding gradients)discovered in1991.Sec.5.10explains how a deep RNN stack of1991(the History Compressor) pre-trained by UL helped to solve previously unlearnable DL benchmarks requiring Credit Assignment Paths(CAPs,Sec.3)of depth1000and more.Sec.5.11discusses a particular WTA method called Max-Pooling(MP)important in today’s DL FNNs.Sec.5.12mentions afirst important contest won by SL NNs in1994.Sec.5.13describes a purely supervised DL RNN(Long Short-Term Memory,LSTM)for problems of depth1000and more.Sec.5.14mentions an early contest of2003won by an ensemble of shallow NNs, as well as good pattern recognition results with CNNs and LSTM RNNs(2003).Sec.5.15is mostly about Deep Belief Networks(DBNs,2006)and related stacks of Autoencoders(AEs,Sec.5.7)pre-trained by UL to facilitate BP-based SL.Sec.5.16mentions thefirst BP-trained MPCNNs(2007)and GPU-CNNs(2006). Sec.5.17-5.22focus on official competitions with secret test sets won by(mostly purely supervised)DL NNs since2009,in sequence recognition,image classification,image segmentation,and object detection. Many RNN results depended on LSTM(Sec.5.13);many FNN results depended on GPU-based FNN code developed since2004(Sec.5.16,5.17,5.18,5.19),in particular,GPU-MPCNNs(Sec.5.19).5.11940s and EarlierNN research started in the1940s(e.g.,McCulloch and Pitts,1943;Hebb,1949);compare also later work on learning NNs(Rosenblatt,1958,1962;Widrow and Hoff,1962;Grossberg,1969;Kohonen,1972; von der Malsburg,1973;Narendra and Thathatchar,1974;Willshaw and von der Malsburg,1976;Palm, 1980;Hopfield,1982).In a sense NNs have been around even longer,since early supervised NNs were essentially variants of linear regression methods going back at least to the early1800s(e.g.,Legendre, 1805;Gauss,1809,1821).Early NNs had a maximal CAP depth of1(Sec.3).5.2Around1960:More Neurobiological Inspiration for DLSimple cells and complex cells were found in the cat’s visual cortex(e.g.,Hubel and Wiesel,1962;Wiesel and Hubel,1959).These cellsfire in response to certain properties of visual sensory inputs,such as theorientation of plex cells exhibit more spatial invariance than simple cells.This inspired later deep NN architectures(Sec.5.4)used in certain modern award-winning Deep Learners(Sec.5.19-5.22).5.31965:Deep Networks Based on the Group Method of Data Handling(GMDH) Networks trained by the Group Method of Data Handling(GMDH)(Ivakhnenko and Lapa,1965; Ivakhnenko et al.,1967;Ivakhnenko,1968,1971)were perhaps thefirst DL systems of the Feedforward Multilayer Perceptron type.The units of GMDH nets may have polynomial activation functions imple-menting Kolmogorov-Gabor polynomials(more general than traditional NN activation functions).Given a training set,layers are incrementally grown and trained by regression analysis,then pruned with the help of a separate validation set(using today’s terminology),where Decision Regularisation is used to weed out superfluous units.The numbers of layers and units per layer can be learned in problem-dependent fashion. This is a good example of hierarchical representation learning(Sec.4.4).There have been numerous ap-plications of GMDH-style networks,e.g.(Ikeda et al.,1976;Farlow,1984;Madala and Ivakhnenko,1994; Ivakhnenko,1995;Kondo,1998;Kord´ık et al.,2003;Witczak et al.,2006;Kondo and Ueno,2008).5.41979:Convolution+Weight Replication+Winner-Take-All(WTA)Apart from deep GMDH networks(Sec.5.3),the Neocognitron(Fukushima,1979,1980,2013a)was per-haps thefirst artificial NN that deserved the attribute deep,and thefirst to incorporate the neurophysiolog-ical insights of Sec.5.2.It introduced convolutional NNs(today often called CNNs or convnets),where the(typically rectangular)receptivefield of a convolutional unit with given weight vector is shifted step by step across a2-dimensional array of input values,such as the pixels of an image.The resulting2D array of subsequent activation events of this unit can then provide inputs to higher-level units,and so on.Due to massive weight replication(Sec.2),relatively few parameters may be necessary to describe the behavior of such a convolutional layer.Competition layers have WTA subsets whose maximally active units are the only ones to adopt non-zero activation values.They essentially“down-sample”the competition layer’s input.This helps to create units whose responses are insensitive to small image shifts(compare Sec.5.2).The Neocognitron is very similar to the architecture of modern,contest-winning,purely super-vised,feedforward,gradient-based Deep Learners with alternating convolutional and competition lay-ers(e.g.,Sec.5.19-5.22).Fukushima,however,did not set the weights by supervised backpropagation (Sec.5.5,5.8),but by local un supervised learning rules(e.g.,Fukushima,2013b),or by pre-wiring.In that sense he did not care for the DL problem(Sec.5.9),although his architecture was comparatively deep indeed.He also used Spatial Averaging(Fukushima,1980,2011)instead of Max-Pooling(MP,Sec.5.11), currently a particularly convenient and popular WTA mechanism.Today’s CNN-based DL machines profita lot from later CNN work(e.g.,LeCun et al.,1989;Ranzato et al.,2007)(Sec.5.8,5.16,5.19).5.51960-1981and Beyond:Development of Backpropagation(BP)for NNsThe minimisation of errors through gradient descent(Hadamard,1908)in the parameter space of com-plex,nonlinear,differentiable,multi-stage,NN-related systems has been discussed at least since the early 1960s(e.g.,Kelley,1960;Bryson,1961;Bryson and Denham,1961;Pontryagin et al.,1961;Dreyfus,1962; Wilkinson,1965;Amari,1967;Bryson and Ho,1969;Director and Rohrer,1969;Griewank,2012),ini-tially within the framework of Euler-LaGrange equations in the Calculus of Variations(e.g.,Euler,1744). Steepest descent in such systems can be performed(Bryson,1961;Kelley,1960;Bryson and Ho,1969)by iterating the ancient chain rule(Leibniz,1676;L’Hˆo pital,1696)in Dynamic Programming(DP)style(Bell-man,1957).A simplified derivation of the method uses the chain rule only(Dreyfus,1962).The methods of the1960s were already efficient in the DP sense.However,they backpropagated derivative information through standard Jacobian matrix calculations from one“layer”to the previous one, explicitly addressing neither direct links across several layers nor potential additional efficiency gains due to network sparsity(but perhaps such enhancements seemed obvious to the authors).。

基于辛几何构作一类新的带仲裁的认证码

基于辛几何构作一类新的带仲裁的认证码

i e s n to y te t n mi e ,a s c e su mp ro ai n b h e e v ra d a s c e s ls b t u in b mp ro ai n b h r s t r u c sf li e s n t y t e r c i e n u c s f u si t y a t o u t o
m2
其 中
的秩 为 2。 r 从而 P是一 个含 于 并 且满 足
中所 有元 素正交 的 向量 组成 的集合 , 则
( v) 2
与 的 和 为 ( r 型 子 空 间 的 ( m,) m—m. ) 子 空 一m , 型 r
P ∈ =
,1 、
K =0 V 期
中 国 民 航 大 学 学 报
J oURNAL VI AVI OF CI L ATI oN UNI VEI I Y T oF CHI NA
V0 . 9 12 No 1 .
21 0 1年 2月
F b ay 2 1 e r r 01 u
基 于 辛几 何 构 作 一 类新 的 带仲 裁 的认 证 码
显 然 , 知 ) G 勘 F ) 一 个 子 群 , 定 义 ( 是 L ( q的 则

( ) 在 驯上 的作 用如 下


x S Fq— p ) Fq
基 金 项 目 : 津 市 自然 科 学 基 金 项 目 ( 8 G J 1 9 0) 中 国 民航 大 学 科 研 项 目( 9 A C S 2 天 0 J YB G 3 0 ; 0C U — 0 )
称 为 维 非迷 向子空 间 。
设 P是 F 中的 m 维子 空 间 ,用 P 表 示 由与 P

MXM工具包:一种用于特征选择、交叉验证和贝叶斯网络的R包说明书

MXM工具包:一种用于特征选择、交叉验证和贝叶斯网络的R包说明书

A very brief guide to using MXMMichail Tsagris,Vincenzo Lagani,Ioannis Tsamardinos1IntroductionMXM is an R package which contains functions for feature selection,cross-validation and Bayesian Networks.The main functionalities focus on feature selection for different types of data.We highlight the option for parallel computing and the fact that some of the functions have been either partially or fully implemented in C++.As for the other ones,we always try to make them faster.2Feature selection related functionsMXM offers many feature selection algorithms,namely MMPC,SES,MMMB,FBED,forward and backward regression.The target set of variables to be selected,ideally what we want to discover, is called Markov Blanket and it consists of the parents,children and parents of children(spouses) of the variable of interest assuming a Bayesian Network for all variables.MMPC stands for Max-Min Parents and Children.The idea is to use the Max-Min heuristic when choosing variables to put in the selected variables set and proceed in this way.Parents and Children comes from the fact that the algorithm will identify the parents and children of the variable of interest assuming a Bayesian Network.What it will not recover is the spouses of the children of the variable of interest.For more information the reader is addressed to[23].MMMB(Max-Min Markov Blanket)extends the MMPC to discovering the spouses of the variable of interest[19].SES(Statistically Equivalent Signatures)on the other hand extends MMPC to discovering statistically equivalent sets of the selected variables[18,9].Forward and Backward selection are the two classical procedures.The functionalities or the flexibility offered by all these algorithms is their ability to handle many types of dependent variables,such as continuous,survival,categorical(ordinal,nominal, binary),longitudinal.Let us now see all of them one by one.The relevant functions are1.MMPC and SES.SES uses MMPC to return multiple statistically equivalent sets of vari-ables.MMPC returns only one set of variables.In all cases,the log-likelihood ratio test is used to assess the significance of a variable.These algorithms accept categorical only, continuous only or mixed data in the predictor variables side.2.wald.mmpc and wald.ses.SES uses MMPC using the Wald test.These two algorithmsaccept continuous predictor variables only.3.perm.mmpc and perm.ses.SES uses MMPC where the p-value is obtained using per-mutations.Similarly to the Wald versions,these two algorithms accept continuous predictor variables only.4.ma.mmpc and ma.ses.MMPC and SES for multiple datasets measuring the same variables(dependent and predictors).5.MMPC.temporal and SES.temporal.Both of these algorithms are the usual SES andMMPC modified for correlated data,such as clustered or longitudinal.The predictor vari-ables can only be continuous.6.fbed.reg.The FBED feature selection method[2].The log-likelihood ratio test or the eBIC(BIC is a special case)can be used.7.fbed.glmm.reg.FBED with generalised linear mixed models for repeated measures orclustered data.8.fbed.ge.reg.FBED with GEE for repeated measures or clustered data.9.ebic.bsreg.Backward selection method using the eBIC.10.fs.reg.Forward regression method for all types of predictor variables and for most of theavailable tests below.11.glm.fsreg Forward regression method for logistic and Poisson regression in specific.Theuser can call this directly if he knows his data.12.lm.fsreg.Forward regression method for normal linear regression.The user can call thisdirectly if he knows his data.13.bic.fsreg.Forward regression using BIC only to add a new variable.No statistical test isperformed.14.bic.glm.fsreg.The same as before but for linear,logistic and Poisson regression(GLMs).15.bs.reg.Backward regression method for all types of predictor variables and for most of theavailable tests below.16.glm.bsreg.Backward regression method for linear,logistic and Poisson regression(GLMs).17.iamb.The IAMB algorithm[20]which stands for Incremental Association Markov Blanket.The algorithm performs a forward regression at first,followed by a backward regression offering two options.Either the usual backward regression is performed or a faster variation, but perhaps less correct variation.In the usual backward regression,at every step the least significant variable is removed.In the IAMB original version all non significant variables are removed at every step.18.mmmb.This algorithm works for continuous or categorical data only.After applying theMMPC algorithm one can go to the selected variables and perform MMPC on each of them.A list with the available options for this argument is given below.Make sure you include the test name within””when you supply it.Most of these tests come in their Wald and perm (permutation based)versions.In their Wald or perm versions,they may have slightly different acronyms,for example waldBinary or WaldOrdinal denote the logistic and ordinal regression respectively.1.testIndFisher.This is a standard test of independence when both the target and the setof predictor variables are continuous(continuous-continuous).2.testIndSpearman.This is a non-parametric alternative to testIndFisher test[6].3.testIndReg.In the case of target-predictors being continuous-mixed or continuous-categorical,the suggested test is via the standard linear regression.If the robust option is selected,M estimators[11]are used.If the target variable consists of proportions or percentages(within the(0,1)interval),the logit transformation is applied beforehand.4.testIndRQ.Another robust alternative to testIndReg for the case of continuous-mixed(or continuous-continuous)variables is the testIndRQ.If the target variable consists of proportions or percentages(within the(0,1)interval),the logit transformation is applied beforehand.5.testIndBeta.When the target is proportion(or percentage,i.e.,between0and1,notinclusive)the user can fit a regression model assuming a beta distribution[5].The predictor variables can be either continuous,categorical or mixed.6.testIndPois.When the target is discrete,and in specific count data,the default test isvia the Poisson regression.The predictor variables can be either continuous,categorical or mixed.7.testIndNB.As an alternative to the Poisson regression,we have included the Negativebinomial regression to capture cases of overdispersion[8].The predictor variables can be either continuous,categorical or mixed.8.testIndZIP.When the number of zeros is more than expected under a Poisson model,thezero inflated poisson regression is to be employed[10].The predictor variables can be either continuous,categorical or mixed.9.testIndLogistic.When the target is categorical with only two outcomes,success or failurefor example,then a binary logistic regression is to be used.Whether regression or classifi-cation is the task of interest,this method is applicable.The advantage of this over a linear or quadratic discriminant analysis is that it allows for categorical predictor variables as well and for mixed types of predictors.10.testIndMultinom.If the target has more than two outcomes,but it is of nominal type(political party,nationality,preferred basketball team),there is no ordering of the outcomes,multinomial logistic regression will be employed.Again,this regression is suitable for clas-sification purposes as well and it to allows for categorical predictor variables.The predictor variables can be either continuous,categorical or mixed.11.testIndOrdinal.This is a special case of multinomial regression,in which case the outcomeshave an ordering,such as not satisfied,neutral,satisfied.The appropriate method is ordinal logistic regression.The predictor variables can be either continuous,categorical or mixed.12.testIndTobit(Tobit regression for left censored data).Suppose you have measurements forwhich values below some value were not recorded.These are left censored values and by using a normal distribution we can by pass this difficulty.The predictor variables can be either continuous,categorical or mixed.13.testIndBinom.When the target variable is a matrix of two columns,where the first one isthe number of successes and the second one is the number of trials,binomial regression is to be used.The predictor variables can be either continuous,categorical or mixed.14.gSquare.If all variables,both the target and predictors are categorical the default test isthe G2test of independence.An alternative to the gSquare test is the testIndLogistic.With the latter,depending on the nature of the target,binary,un-ordered multinomial or ordered multinomial the appropriate regression model is fitted.The predictor variables can be either continuous,categorical or mixed.15.censIndCR.For the case of time-to-event data,a Cox regression model[4]is employed.Thepredictor variables can be either continuous,categorical or mixed.16.censIndWR.A second model for the case of time-to-event data,a Weibull regression modelis employed[14,13].Unlike the semi-parametric Cox model,the Weibull model is fully parametric.The predictor variables can be either continuous,categorical or mixed.17.censIndER.A third model for the case of time-to-event data,an exponential regressionmodel is employed.The predictor variables can be either continuous,categorical or mixed.This is a special case of the Weibull model.18.testIndIGreg.When you have non negative data,i.e.the target variable takes positivevalues(including0),a suggested regression is based on the the inverse Gaussian distribution.The link function is not the inverse of the square root as expected,but the logarithm.This is to ensure that the fitted values will be always be non negative.An alternative model is the Weibull regression(censIndWR).The predictor variables can be either continuous, categorical or mixed.19.testIndGamma(Gamma regression).Gamma distribution is designed for strictly positivedata(greater than zero).It is used in reliability analysis,as an alternative to the Weibull regression.This test however does not accept censored data,just the usual numeric data.The predictor variables can be either continuous,categorical or mixed.20.testIndNormLog(Gaussian regression with a log link).Gaussian regression using the loglink(instead of the identity)allows non negative data to be handled naturally.Unlike the gamma or the inverse gaussian regression zeros are allowed.The predictor variables can be either continuous,categorical or mixed.21.testIndClogit.When the data come from a case-control study,the suitable test is via con-ditional logistic regression[7].The predictor variables can be either continuous,categorical or mixed.22.testIndMVReg.In the case of multivariate continuous target,the suggested test is viaa multivariate linear regression.The target variable can be compositional data as well[1].These are positive data,whose vectors sum to1.They can sum to any constant,as long as it the same,but for convenience reasons we assume that they are normalised to sum to1.In this case the additive log-ratio transformation(multivariate logit transformation)is applied beforehand.The predictor variables can be either continuous,categorical or mixed.23.testIndGLMMReg.In the case of a longitudinal or clustered target(continuous,propor-tions within0and1(not inclusive)),the suggested test is via a(generalised)linear mixed model[12].The predictor variables can only be continuous.This test is only applicable in SES.temporal and MMPC.temporal.24.testIndGLMMPois.In the case of a longitudinal or clustered target(counts),the suggestedtest is via a(generalised)linear mixed model[12].The predictor variables can only be continuous.This test is only applicable in SES.temporal and MMPC.temporal.25.testIndGLMMLogistic.In the case of a longitudinal or clustered target(binary),thesuggested test is via a(generalised)linear mixed model[12].The predictor variables can only be continuous.This test is only applicable in SES.temporal and MMPC.temporal.To avoid any mistakes or wrongly selected test by the algorithms you are advised to select the test you want to use.All of these tests can be used with SES and MMPC,forward and backward regression methods.MMMB accepts only testIndFisher,testIndSpearman and gSquare.The reason for this is that MMMB was designed for variables(dependent and predictors)of the same type.For more info the user should see the help page of each function.2.1A more detailed look at some arguments of the feature selection algorithmsSES,MMPC,MMMB,forward and backward regression offer the option for robust tests(the argument robust).This is currently supported for the case of Pearson correlation coefficient and linear regression at the moment.We plan to extend this option to binary logistic and Poisson regression as well.These algorithms have an argument user test.In the case that the user wants to use his own test,for example,mytest,he can supply it in this argument as is,without””. For all previously mentioned regression based conditional independence tests,the argument works as test=”testIndFisher”.In the case of the user test it works as user test=mytest.The max kargument must always be at least1for SES,MMPC and MMMB,otherwise it is a simple filtering of the variables.The argument ncores offers the option for parallel implementation of the first step of the algorithms.The filtering step,where the significance of each predictor is assessed.If you have a few thousands of variables,maybe this option will do no significant improvement.But, if you have more and a”difficult”regression test,such as quantile regression(testIndRQ),then with4cores this could reduce the computational time of the first step up to nearly50%.For the Poisson,logistic and normal linear regression we have included C++codes to speed up this process,without the use of parallel.The FBED(Forward Backward Early Dropping)is a variant of the Forward selection is per-formed in the first phase followed by the usual backward regression.In some,the variation is that every non significant variable is dropped until no mre significant variables are found or there is no variable left.The forward and backward regression methods have a few different arguments.For example stopping which can be either”BIC”or”adjrsq”,with the latter being used only in the linear regression case.Every time a variable is significant it is added in the selected variables set.But, it may be the case,that it is actually not necessary and for this reason we also calculate the BIC of the relevant model at each step.If the difference BIC is less than the tol(argument)threshold value the variable does not enter the set and the algorithm stops.The forward and backward regression methods can proceed via the BIC as well.At every step of the algorithm,the BIC of the relevant model is calculated and if the BIC of the model including a candidate variable is reduced by more that the tol(argument)threshold value that variable is added.Otherwise the variable is not included and the algorithm stops.2.2Other relevant functionsOnce SES or MMPC are finished,the user might want to see the model produced.For this reason the functions ses.model and mmpc.model can be used.If the user wants to get some summarised results with MMPC for many combinations of max k and treshold values he can use the mmpc.path function.Ridge regression(ridge.reg and ridge.cv)have been implemented. Note that ridge regression is currently offered only for linear regression with continuous predictor variables.As for some miscellaneous,we have implemented the zero inflated Poisson and beta regression models,should the user want to use them.2.3Cross-validationcv.ses and cv.mmpc perform a K-fold cross validation for most of the aforementioned regression models.There are many metric functions to be used,appropriate for each case.The folds can be generated in a stratified fashion when the dependent variable is categorical.3NetworksCurrently three algorithms for constructing Bayesian Networks(or their skeleton)are offered,plus modifications.MMHC(Max-Min Hill-Climbing)[23],(mmhc.skel)which constructs the skeleton of the Bayesian Network(BN).This has the option of running SES[18]instead.MMHC(Max-Min Hill-Climbing)[23],(local.mmhc.skel)which constructs the skeleton around a selected node.It identifies the Parents and Children of that node and then finds their Parents and Children.MMPC followed by the PC rules.This is the command mmpc.or.PC algorithm[15](pc.skel for which the orientation rules(pc.or)have been implemented as well.Both of these algorithms accept continuous only,categorical data only or a mix of continuous,multinomial and ordinal.The skeleton of the PC algorithm has the option for permutation based conditional independence tests[21].The functions ci.mm and ci.fast perform a symmetric test with mixed data(continuous, ordinal and binary data)[17].This is employed by the PC algorithm as well.Bootstrap of the PC algorithm to estimate the confidence of the edges(pc.skel.boot).PC skeleton with repeated measures(glmm.pc.skel).This uses the symetric test proposed by[17]with generalised linear models.Skeleton of a network with continuous data using forward selection.The command work does a similar to MMHC task.It goes to every variable and instead applying the MMPC algorithm it applies the forward selection regression.All data must be continuous,since the Pearson correlation is used.The algorithm is fast,since the forward regression with the Pearson correlation is very fast.We also have utility functions,such as1.rdag and rdag2.Data simulation assuming a BN[3].2.findDescendants and findAncestors.Descendants and ancestors of a node(variable)ina given Bayesian Network.3.dag2eg.Transforming a DAG into an essential(mixed)graph,its class of equivalent DAGs.4.equivdags.Checking whether two DAGs are equivalent.5.is.dag.In fact this checks whether cycles are present by trying to topologically sort theedges.BNs do not allow for cycles.6.mb.The Markov Blanket of a node(variable)given a Bayesian Network.7.nei.The neighbours of a node(variable)given an undirected graph.8.undir.path.All paths between two nodes in an undirected graph.9.transitiveClosure.The transitive closure of an adjacency matrix,with and without arrow-heads.10.bn.skel.utils.Estimation of false discovery rate[22],plus AUC and ROC curves based onthe p-values.11.bn.skel.utils2.Estimation of the confidence of the edges[16],plus AUC and ROC curvesbased on the confidences.12.plotnetwork.Interactive plot of a graph.4AcknowledgmentsThe research leading to these results has received funding from the European Research Coun-cil under the European Union’s Seventh Framework Programme(FP/2007-2013)/ERC Grant Agreement n.617393.References[1]John Aitchison.The statistical analysis of compositional data.Chapman and Hall London,1986.[2]Giorgos Borboudakis and Ioannis Tsamardinos.Forward-Backward Selection with Early Drop-ping,2017.[3]Diego Colombo and Marloes H Maathuis.Order-independent constraint-based causal structurelearning.Journal of Machine Learning Research,15(1):3741–3782,2014.[4]David Henry Cox.Regression Models and Life-Tables.Journal of the Royal Statistical Society,34(2):187–220,1972.[5]Silvia Ferrari and Francisco Cribari-Neto.Beta regression for modelling rates and proportions.Journal of Applied Statistics,31(7):799–815,2004.[6]Edgar C Fieller and Egon S Pearson.Tests for rank correlation coefficients:II.Biometrika,48:29–40,1961.[7]Mitchell H Gail,Jay H Lubin,and Lawrence V Rubinstein.Likelihood calculations for matchedcase-control studies and survival studies with tied death times.Biometrika,68(3):703–707, 1981.[8]Joseph M Hilbe.Negative binomial regression.Cambridge University Press,2011.[9]Vincenzo Lagani,Giorgos Athineou,Alessio Farcomeni,Michail Tsagris,and IoannisTsamardinos.Feature Selection with the R Package MXM:Discovering Statistically-Equivalent Feature Subsets.Journal of Statistical Software,80(7),2017.[10]Diane Lambert.Zero-inflated Poisson regression,with an application to defects in manufac-turing.Technometrics,34(1):1–14,1992.[11]RARD Maronna,Douglas Martin,and Victor Yohai.Robust statistics.John Wiley&Sons,Chichester.ISBN,2006.[12]Jose Pinheiro and Douglas Bates.Mixed-effects models in S and S-PLUS.Springer Science&Business Media,2006.[13]FW Scholz.Maximum likelihood estimation for type I censored Weibull data including co-variates,1996.[14]Richard L Smith.Weibull regression models for reliability data.Reliability Engineering&System Safety,34(1):55–76,1991.[15]Peter Spirtes,Clark Glymour,and Richard Scheines.Causation,Prediction,and Search.TheMIT Press,second edi edition,12001.[16]Sofia Triantafillou,Ioannis Tsamardinos,and Anna Roumpelaki.Learning neighborhoods ofhigh confidence in constraint-based causal discovery.In European Workshop on Probabilistic Graphical Models,pages487–502.Springer,2014.[17]Michail Tsagris,Giorgos Borboudakis,Vincenzo Lagani,and Ioannis Tsamardinos.Constraint-based Causal Discovery with Mixed Data.In The2017ACM SIGKDD Work-shop on Causal Discovery,14/8/2017,Halifax,Nova Scotia,Canada,2017.[18]I.Tsamardinos,gani,and D.Pappas.Discovering multiple,equivalent biomarker sig-natures.In In Proceedings of the7th conference of the Hellenic Society for Computational Biology&Bioinformatics,Heraklion,Crete,Greece,2012.[19]Ioannis Tsamardinos,Constantin F Aliferis,and Alexander Statnikov.Time and sampleefficient discovery of Markov blankets and direct causal relations.In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,pages673–678.ACM,2003.[20]Ioannis Tsamardinos,Constantin F Aliferis,Alexander R Statnikov,and Er Statnikov.Al-gorithms for Large Scale Markov Blanket Discovery.In FLAIRS conference,volume2,pages 376–380,2003.[21]Ioannis Tsamardinos and Giorgos Borboudakis.Permutation testing improves Bayesian net-work learning.In ECML PKDD’10Proceedings of the2010European conference on Machine learning and knowledge discovery in databases,pages322–337.Springer-Verlag,2010.[22]Ioannis Tsamardinos and Laura E Brown.Bounding the False Discovery Rate in LocalBayesian Network Learning.In AAAI,pages1100–1105,2008.[23]Ioannis Tsamardinos,Laura E.Brown,and Constantin F.Aliferis.The Max-Min Hill-ClimbingBayesian Network Structure Learning Algorithm.Machine Learning,65(1):31–78,2006.。

Graph Regularized Nonnegative Matrix


Ç
1 INTRODUCTION
HE
techniques for matrix factorization have become popular in recent years for data representation. In many problems in information retrieval, computer vision, and pattern recognition, the input data matrix is of very high dimension. This makes learning from example infeasible [15]. One then hopes to find two or more lower dimensional matrices whose product provides a good approximation to the original one. The canonical matrix factorization techniques include LU decomposition, QR decomposition, vector quantization, and Singular Value Decomposition (SVD). SVD is one of the most frequently used matrix factorization techniques. A singular value decomposition of an M Â N matrix X has the following form: X ¼ UÆVT ; where U is an M Â M orthogonal matrix, V is an N Â N orthogonal matrix, and Æ is an M Â N diagonal matrix with Æij ¼ 0 if i 6¼ j and Æii ! 0. The quantities Æii are called the singular values of X, and the columns of U and V are called

International Journal of Pattern Recognition and Artificial Intelligence c ○ World Scienti


AUTOMATIC CLASSIFICATION OF DIGITAL PHOTOGRAPHS BASED ON DECISION FORESTS

RAIMONDO SCHETTINI DISCo, University of Milano Bicocca, Via Bicocca degli Arcimboldi 8 Milano, 20126, Italy schettini@disco.unimib.it CARLA BRAMBILLA IMATI, CNR, Via Bassini 15 Milano, 20131, Italy carla@r.it CLAUDIO CUSANO ITC, CNR, Via Bassini 15 Milano, 20131, Italy DISCo, University of Milano Bicocca, Via Bicocca degli Arcimboldi 8 Milano, 20126, Italy cusano@r.it GIANLUIGI CIOCCA ITC, CNR, Via Bassini 15 Milano, 20131, Italy DISCo, University of Milano Bicocca, Via Bicocca degli Arcimboldi 8 Milano, 20126, Italy ciocca@r.it
Annotating photographs with broad semantic labels can be useful in both image processing and content-based image retrieval. We show here how low-level features can be related to semantic photo categories, such as indoor, outdoor and close-up, using decision forests consisting of trees constructed according to CART methodology. We also show how the results can be improved by introducing a rejection option in the classification process. Experimental results on a test set of 4500 photographs are reported and discussed. Keywords : CART, decision forest, digital images, image classification, low-level features.

From Data Mining to Knowledge Discovery in Databases

s Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media atten-tion of late. What is all the excitement about?This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges in-volved in real-world applications of knowledge discovery, and current and future research direc-tions in the field.A cross a wide variety of fields, data arebeing collected and accumulated at adramatic pace. There is an urgent need for a new generation of computational theo-ries and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data. These theories and tools are the subject of the emerging field of knowledge discovery in databases (KDD).At an abstract level, the KDD field is con-cerned with the development of methods and techniques for making sense of data. The basic problem addressed by the KDD process is one of mapping low-level data (which are typically too voluminous to understand and digest easi-ly) into other forms that might be more com-pact (for example, a short report), more ab-stract (for example, a descriptive approximation or model of the process that generated the data), or more useful (for exam-ple, a predictive model for estimating the val-ue of future cases). At the core of the process is the application of specific data-mining meth-ods for pattern discovery and extraction.1This article begins by discussing the histori-cal context of KDD and data mining and theirintersection with other related fields. A briefsummary of recent KDD real-world applica-tions is provided. Definitions of KDD and da-ta mining are provided, and the general mul-tistep KDD process is outlined. This multistepprocess has the application of data-mining al-gorithms as one particular step in the process.The data-mining step is discussed in more de-tail in the context of specific data-mining al-gorithms and their application. Real-worldpractical application issues are also outlined.Finally, the article enumerates challenges forfuture research and development and in par-ticular discusses potential opportunities for AItechnology in KDD systems.Why Do We Need KDD?The traditional method of turning data intoknowledge relies on manual analysis and in-terpretation. For example, in the health-careindustry, it is common for specialists to peri-odically analyze current trends and changesin health-care data, say, on a quarterly basis.The specialists then provide a report detailingthe analysis to the sponsoring health-care or-ganization; this report becomes the basis forfuture decision making and planning forhealth-care management. In a totally differ-ent type of application, planetary geologistssift through remotely sensed images of plan-ets and asteroids, carefully locating and cata-loging such geologic objects of interest as im-pact craters. Be it science, marketing, finance,health care, retail, or any other field, the clas-sical approach to data analysis relies funda-mentally on one or more analysts becomingArticlesFALL 1996 37From Data Mining to Knowledge Discovery inDatabasesUsama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth Copyright © 1996, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1996 / $2.00areas is astronomy. Here, a notable success was achieved by SKICAT ,a system used by as-tronomers to perform image analysis,classification, and cataloging of sky objects from sky-survey images (Fayyad, Djorgovski,and Weir 1996). In its first application, the system was used to process the 3 terabytes (1012bytes) of image data resulting from the Second Palomar Observatory Sky Survey,where it is estimated that on the order of 109sky objects are detectable. SKICAT can outper-form humans and traditional computational techniques in classifying faint sky objects. See Fayyad, Haussler, and Stolorz (1996) for a sur-vey of scientific applications.In business, main KDD application areas includes marketing, finance (especially in-vestment), fraud detection, manufacturing,telecommunications, and Internet agents.Marketing:In marketing, the primary ap-plication is database marketing systems,which analyze customer databases to identify different customer groups and forecast their behavior. Business Week (Berry 1994) estimat-ed that over half of all retailers are using or planning to use database marketing, and those who do use it have good results; for ex-ample, American Express reports a 10- to 15-percent increase in credit-card use. Another notable marketing application is market-bas-ket analysis (Agrawal et al. 1996) systems,which find patterns such as, “If customer bought X, he/she is also likely to buy Y and Z.” Such patterns are valuable to retailers.Investment: Numerous companies use da-ta mining for investment, but most do not describe their systems. One exception is LBS Capital Management. Its system uses expert systems, neural nets, and genetic algorithms to manage portfolios totaling $600 million;since its start in 1993, the system has outper-formed the broad stock market (Hall, Mani,and Barr 1996).Fraud detection: HNC Falcon and Nestor PRISM systems are used for monitoring credit-card fraud, watching over millions of ac-counts. The FAIS system (Senator et al. 1995),from the U.S. Treasury Financial Crimes En-forcement Network, is used to identify finan-cial transactions that might indicate money-laundering activity.Manufacturing: The CASSIOPEE trou-bleshooting system, developed as part of a joint venture between General Electric and SNECMA, was applied by three major Euro-pean airlines to diagnose and predict prob-lems for the Boeing 737. To derive families of faults, clustering methods are used. CASSIOPEE received the European first prize for innova-intimately familiar with the data and serving as an interface between the data and the users and products.For these (and many other) applications,this form of manual probing of a data set is slow, expensive, and highly subjective. In fact, as data volumes grow dramatically, this type of manual data analysis is becoming completely impractical in many domains.Databases are increasing in size in two ways:(1) the number N of records or objects in the database and (2) the number d of fields or at-tributes to an object. Databases containing on the order of N = 109objects are becoming in-creasingly common, for example, in the as-tronomical sciences. Similarly, the number of fields d can easily be on the order of 102or even 103, for example, in medical diagnostic applications. Who could be expected to di-gest millions of records, each having tens or hundreds of fields? We believe that this job is certainly not one for humans; hence, analysis work needs to be automated, at least partially.The need to scale up human analysis capa-bilities to handling the large number of bytes that we can collect is both economic and sci-entific. Businesses use data to gain competi-tive advantage, increase efficiency, and pro-vide more valuable services to customers.Data we capture about our environment are the basic evidence we use to build theories and models of the universe we live in. Be-cause computers have enabled humans to gather more data than we can digest, it is on-ly natural to turn to computational tech-niques to help us unearth meaningful pat-terns and structures from the massive volumes of data. Hence, KDD is an attempt to address a problem that the digital informa-tion era made a fact of life for all of us: data overload.Data Mining and Knowledge Discovery in the Real WorldA large degree of the current interest in KDD is the result of the media interest surrounding successful KDD applications, for example, the focus articles within the last two years in Business Week , Newsweek , Byte , PC Week , and other large-circulation periodicals. Unfortu-nately, it is not always easy to separate fact from media hype. Nonetheless, several well-documented examples of successful systems can rightly be referred to as KDD applications and have been deployed in operational use on large-scale real-world problems in science and in business.In science, one of the primary applicationThere is an urgent need for a new generation of computation-al theories and tools toassist humans in extractinguseful information (knowledge)from the rapidly growing volumes ofdigital data.Articles38AI MAGAZINEtive applications (Manago and Auriol 1996).Telecommunications: The telecommuni-cations alarm-sequence analyzer (TASA) wasbuilt in cooperation with a manufacturer oftelecommunications equipment and threetelephone networks (Mannila, Toivonen, andVerkamo 1995). The system uses a novelframework for locating frequently occurringalarm episodes from the alarm stream andpresenting them as rules. Large sets of discov-ered rules can be explored with flexible infor-mation-retrieval tools supporting interactivityand iteration. In this way, TASA offers pruning,grouping, and ordering tools to refine the re-sults of a basic brute-force search for rules.Data cleaning: The MERGE-PURGE systemwas applied to the identification of duplicatewelfare claims (Hernandez and Stolfo 1995).It was used successfully on data from the Wel-fare Department of the State of Washington.In other areas, a well-publicized system isIBM’s ADVANCED SCOUT,a specialized data-min-ing system that helps National Basketball As-sociation (NBA) coaches organize and inter-pret data from NBA games (U.S. News 1995). ADVANCED SCOUT was used by several of the NBA teams in 1996, including the Seattle Su-personics, which reached the NBA finals.Finally, a novel and increasingly importanttype of discovery is one based on the use of in-telligent agents to navigate through an infor-mation-rich environment. Although the ideaof active triggers has long been analyzed in thedatabase field, really successful applications ofthis idea appeared only with the advent of theInternet. These systems ask the user to specifya profile of interest and search for related in-formation among a wide variety of public-do-main and proprietary sources. For example, FIREFLY is a personal music-recommendation agent: It asks a user his/her opinion of several music pieces and then suggests other music that the user might like (<http:// www.ffl/>). CRAYON(/>) allows users to create their own free newspaper (supported by ads); NEWSHOUND(<http://www. /hound/>) from the San Jose Mercury News and FARCAST(</> automatically search information from a wide variety of sources, including newspapers and wire services, and e-mail rele-vant documents directly to the user.These are just a few of the numerous suchsystems that use KDD techniques to automat-ically produce useful information from largemasses of raw data. See Piatetsky-Shapiro etal. (1996) for an overview of issues in devel-oping industrial KDD applications.Data Mining and KDDHistorically, the notion of finding useful pat-terns in data has been given a variety ofnames, including data mining, knowledge ex-traction, information discovery, informationharvesting, data archaeology, and data patternprocessing. The term data mining has mostlybeen used by statisticians, data analysts, andthe management information systems (MIS)communities. It has also gained popularity inthe database field. The phrase knowledge dis-covery in databases was coined at the first KDDworkshop in 1989 (Piatetsky-Shapiro 1991) toemphasize that knowledge is the end productof a data-driven discovery. It has been popular-ized in the AI and machine-learning fields.In our view, KDD refers to the overall pro-cess of discovering useful knowledge from da-ta, and data mining refers to a particular stepin this process. Data mining is the applicationof specific algorithms for extracting patternsfrom data. The distinction between the KDDprocess and the data-mining step (within theprocess) is a central point of this article. Theadditional steps in the KDD process, such asdata preparation, data selection, data cleaning,incorporation of appropriate prior knowledge,and proper interpretation of the results ofmining, are essential to ensure that usefulknowledge is derived from the data. Blind ap-plication of data-mining methods (rightly crit-icized as data dredging in the statistical litera-ture) can be a dangerous activity, easilyleading to the discovery of meaningless andinvalid patterns.The Interdisciplinary Nature of KDDKDD has evolved, and continues to evolve,from the intersection of research fields such asmachine learning, pattern recognition,databases, statistics, AI, knowledge acquisitionfor expert systems, data visualization, andhigh-performance computing. The unifyinggoal is extracting high-level knowledge fromlow-level data in the context of large data sets.The data-mining component of KDD cur-rently relies heavily on known techniquesfrom machine learning, pattern recognition,and statistics to find patterns from data in thedata-mining step of the KDD process. A natu-ral question is, How is KDD different from pat-tern recognition or machine learning (and re-lated fields)? The answer is that these fieldsprovide some of the data-mining methodsthat are used in the data-mining step of theKDD process. KDD focuses on the overall pro-cess of knowledge discovery from data, includ-ing how the data are stored and accessed, howalgorithms can be scaled to massive data setsThe basicproblemaddressed bythe KDDprocess isone ofmappinglow-leveldata intoother formsthat might bemorecompact,moreabstract,or moreuseful.ArticlesFALL 1996 39A driving force behind KDD is the database field (the second D in KDD). Indeed, the problem of effective data manipulation when data cannot fit in the main memory is of fun-damental importance to KDD. Database tech-niques for gaining efficient data access,grouping and ordering operations when ac-cessing data, and optimizing queries consti-tute the basics for scaling algorithms to larger data sets. Most data-mining algorithms from statistics, pattern recognition, and machine learning assume data are in the main memo-ry and pay no attention to how the algorithm breaks down if only limited views of the data are possible.A related field evolving from databases is data warehousing,which refers to the popular business trend of collecting and cleaning transactional data to make them available for online analysis and decision support. Data warehousing helps set the stage for KDD in two important ways: (1) data cleaning and (2)data access.Data cleaning: As organizations are forced to think about a unified logical view of the wide variety of data and databases they pos-sess, they have to address the issues of map-ping data to a single naming convention,uniformly representing and handling missing data, and handling noise and errors when possible.Data access: Uniform and well-defined methods must be created for accessing the da-ta and providing access paths to data that were historically difficult to get to (for exam-ple, stored offline).Once organizations and individuals have solved the problem of how to store and ac-cess their data, the natural next step is the question, What else do we do with all the da-ta? This is where opportunities for KDD natu-rally arise.A popular approach for analysis of data warehouses is called online analytical processing (OLAP), named for a set of principles pro-posed by Codd (1993). OLAP tools focus on providing multidimensional data analysis,which is superior to SQL in computing sum-maries and breakdowns along many dimen-sions. OLAP tools are targeted toward simpli-fying and supporting interactive data analysis,but the goal of KDD tools is to automate as much of the process as possible. Thus, KDD is a step beyond what is currently supported by most standard database systems.Basic DefinitionsKDD is the nontrivial process of identifying valid, novel, potentially useful, and ultimate-and still run efficiently, how results can be in-terpreted and visualized, and how the overall man-machine interaction can usefully be modeled and supported. The KDD process can be viewed as a multidisciplinary activity that encompasses techniques beyond the scope of any one particular discipline such as machine learning. In this context, there are clear opportunities for other fields of AI (be-sides machine learning) to contribute to KDD. KDD places a special emphasis on find-ing understandable patterns that can be inter-preted as useful or interesting knowledge.Thus, for example, neural networks, although a powerful modeling tool, are relatively difficult to understand compared to decision trees. KDD also emphasizes scaling and ro-bustness properties of modeling algorithms for large noisy data sets.Related AI research fields include machine discovery, which targets the discovery of em-pirical laws from observation and experimen-tation (Shrager and Langley 1990) (see Kloes-gen and Zytkow [1996] for a glossary of terms common to KDD and machine discovery),and causal modeling for the inference of causal models from data (Spirtes, Glymour,and Scheines 1993). Statistics in particular has much in common with KDD (see Elder and Pregibon [1996] and Glymour et al.[1996] for a more detailed discussion of this synergy). Knowledge discovery from data is fundamentally a statistical endeavor. Statistics provides a language and framework for quan-tifying the uncertainty that results when one tries to infer general patterns from a particu-lar sample of an overall population. As men-tioned earlier, the term data mining has had negative connotations in statistics since the 1960s when computer-based data analysis techniques were first introduced. The concern arose because if one searches long enough in any data set (even randomly generated data),one can find patterns that appear to be statis-tically significant but, in fact, are not. Clearly,this issue is of fundamental importance to KDD. Substantial progress has been made in recent years in understanding such issues in statistics. Much of this work is of direct rele-vance to KDD. Thus, data mining is a legiti-mate activity as long as one understands how to do it correctly; data mining carried out poorly (without regard to the statistical as-pects of the problem) is to be avoided. KDD can also be viewed as encompassing a broader view of modeling than statistics. KDD aims to provide tools to automate (to the degree pos-sible) the entire process of data analysis and the statistician’s “art” of hypothesis selection.Data mining is a step in the KDD process that consists of ap-plying data analysis and discovery al-gorithms that produce a par-ticular enu-meration ofpatterns (or models)over the data.Articles40AI MAGAZINEly understandable patterns in data (Fayyad, Piatetsky-Shapiro, and Smyth 1996).Here, data are a set of facts (for example, cases in a database), and pattern is an expres-sion in some language describing a subset of the data or a model applicable to the subset. Hence, in our usage here, extracting a pattern also designates fitting a model to data; find-ing structure from data; or, in general, mak-ing any high-level description of a set of data. The term process implies that KDD comprises many steps, which involve data preparation, search for patterns, knowledge evaluation, and refinement, all repeated in multiple itera-tions. By nontrivial, we mean that some search or inference is involved; that is, it is not a straightforward computation of predefined quantities like computing the av-erage value of a set of numbers.The discovered patterns should be valid on new data with some degree of certainty. We also want patterns to be novel (at least to the system and preferably to the user) and poten-tially useful, that is, lead to some benefit to the user or task. Finally, the patterns should be understandable, if not immediately then after some postprocessing.The previous discussion implies that we can define quantitative measures for evaluating extracted patterns. In many cases, it is possi-ble to define measures of certainty (for exam-ple, estimated prediction accuracy on new data) or utility (for example, gain, perhaps indollars saved because of better predictions orspeedup in response time of a system). No-tions such as novelty and understandabilityare much more subjective. In certain contexts,understandability can be estimated by sim-plicity (for example, the number of bits to de-scribe a pattern). An important notion, calledinterestingness(for example, see Silberschatzand Tuzhilin [1995] and Piatetsky-Shapiro andMatheus [1994]), is usually taken as an overallmeasure of pattern value, combining validity,novelty, usefulness, and simplicity. Interest-ingness functions can be defined explicitly orcan be manifested implicitly through an or-dering placed by the KDD system on the dis-covered patterns or models.Given these notions, we can consider apattern to be knowledge if it exceeds some in-terestingness threshold, which is by nomeans an attempt to define knowledge in thephilosophical or even the popular view. As amatter of fact, knowledge in this definition ispurely user oriented and domain specific andis determined by whatever functions andthresholds the user chooses.Data mining is a step in the KDD processthat consists of applying data analysis anddiscovery algorithms that, under acceptablecomputational efficiency limitations, pro-duce a particular enumeration of patterns (ormodels) over the data. Note that the space ofArticlesFALL 1996 41Figure 1. An Overview of the Steps That Compose the KDD Process.methods, the effective number of variables under consideration can be reduced, or in-variant representations for the data can be found.Fifth is matching the goals of the KDD pro-cess (step 1) to a particular data-mining method. For example, summarization, clas-sification, regression, clustering, and so on,are described later as well as in Fayyad, Piatet-sky-Shapiro, and Smyth (1996).Sixth is exploratory analysis and model and hypothesis selection: choosing the data-mining algorithm(s) and selecting method(s)to be used for searching for data patterns.This process includes deciding which models and parameters might be appropriate (for ex-ample, models of categorical data are differ-ent than models of vectors over the reals) and matching a particular data-mining method with the overall criteria of the KDD process (for example, the end user might be more in-terested in understanding the model than its predictive capabilities).Seventh is data mining: searching for pat-terns of interest in a particular representa-tional form or a set of such representations,including classification rules or trees, regres-sion, and clustering. The user can significant-ly aid the data-mining method by correctly performing the preceding steps.Eighth is interpreting mined patterns, pos-sibly returning to any of steps 1 through 7 for further iteration. This step can also involve visualization of the extracted patterns and models or visualization of the data given the extracted models.Ninth is acting on the discovered knowl-edge: using the knowledge directly, incorpo-rating the knowledge into another system for further action, or simply documenting it and reporting it to interested parties. This process also includes checking for and resolving po-tential conflicts with previously believed (or extracted) knowledge.The KDD process can involve significant iteration and can contain loops between any two steps. The basic flow of steps (al-though not the potential multitude of itera-tions and loops) is illustrated in figure 1.Most previous work on KDD has focused on step 7, the data mining. However, the other steps are as important (and probably more so) for the successful application of KDD in practice. Having defined the basic notions and introduced the KDD process, we now focus on the data-mining component,which has, by far, received the most atten-tion in the literature.patterns is often infinite, and the enumera-tion of patterns involves some form of search in this space. Practical computational constraints place severe limits on the sub-space that can be explored by a data-mining algorithm.The KDD process involves using the database along with any required selection,preprocessing, subsampling, and transforma-tions of it; applying data-mining methods (algorithms) to enumerate patterns from it;and evaluating the products of data mining to identify the subset of the enumerated pat-terns deemed knowledge. The data-mining component of the KDD process is concerned with the algorithmic means by which pat-terns are extracted and enumerated from da-ta. The overall KDD process (figure 1) in-cludes the evaluation and possible interpretation of the mined patterns to de-termine which patterns can be considered new knowledge. The KDD process also in-cludes all the additional steps described in the next section.The notion of an overall user-driven pro-cess is not unique to KDD: analogous propos-als have been put forward both in statistics (Hand 1994) and in machine learning (Brod-ley and Smyth 1996).The KDD ProcessThe KDD process is interactive and iterative,involving numerous steps with many deci-sions made by the user. Brachman and Anand (1996) give a practical view of the KDD pro-cess, emphasizing the interactive nature of the process. Here, we broadly outline some of its basic steps:First is developing an understanding of the application domain and the relevant prior knowledge and identifying the goal of the KDD process from the customer’s viewpoint.Second is creating a target data set: select-ing a data set, or focusing on a subset of vari-ables or data samples, on which discovery is to be performed.Third is data cleaning and preprocessing.Basic operations include removing noise if appropriate, collecting the necessary informa-tion to model or account for noise, deciding on strategies for handling missing data fields,and accounting for time-sequence informa-tion and known changes.Fourth is data reduction and projection:finding useful features to represent the data depending on the goal of the task. With di-mensionality reduction or transformationArticles42AI MAGAZINEThe Data-Mining Stepof the KDD ProcessThe data-mining component of the KDD pro-cess often involves repeated iterative applica-tion of particular data-mining methods. This section presents an overview of the primary goals of data mining, a description of the methods used to address these goals, and a brief description of the data-mining algo-rithms that incorporate these methods.The knowledge discovery goals are defined by the intended use of the system. We can distinguish two types of goals: (1) verification and (2) discovery. With verification,the sys-tem is limited to verifying the user’s hypothe-sis. With discovery,the system autonomously finds new patterns. We further subdivide the discovery goal into prediction,where the sys-tem finds patterns for predicting the future behavior of some entities, and description, where the system finds patterns for presenta-tion to a user in a human-understandableform. In this article, we are primarily con-cerned with discovery-oriented data mining.Data mining involves fitting models to, or determining patterns from, observed data. The fitted models play the role of inferred knowledge: Whether the models reflect useful or interesting knowledge is part of the over-all, interactive KDD process where subjective human judgment is typically required. Two primary mathematical formalisms are used in model fitting: (1) statistical and (2) logical. The statistical approach allows for nondeter-ministic effects in the model, whereas a logi-cal model is purely deterministic. We focus primarily on the statistical approach to data mining, which tends to be the most widely used basis for practical data-mining applica-tions given the typical presence of uncertain-ty in real-world data-generating processes.Most data-mining methods are based on tried and tested techniques from machine learning, pattern recognition, and statistics: classification, clustering, regression, and so on. The array of different algorithms under each of these headings can often be bewilder-ing to both the novice and the experienced data analyst. It should be emphasized that of the many data-mining methods advertised in the literature, there are really only a few fun-damental techniques. The actual underlying model representation being used by a particu-lar method typically comes from a composi-tion of a small number of well-known op-tions: polynomials, splines, kernel and basis functions, threshold-Boolean functions, and so on. Thus, algorithms tend to differ primar-ily in the goodness-of-fit criterion used toevaluate model fit or in the search methodused to find a good fit.In our brief overview of data-mining meth-ods, we try in particular to convey the notionthat most (if not all) methods can be viewedas extensions or hybrids of a few basic tech-niques and principles. We first discuss the pri-mary methods of data mining and then showthat the data- mining methods can be viewedas consisting of three primary algorithmiccomponents: (1) model representation, (2)model evaluation, and (3) search. In the dis-cussion of KDD and data-mining methods,we use a simple example to make some of thenotions more concrete. Figure 2 shows a sim-ple two-dimensional artificial data set consist-ing of 23 cases. Each point on the graph rep-resents a person who has been given a loanby a particular bank at some time in the past.The horizontal axis represents the income ofthe person; the vertical axis represents the to-tal personal debt of the person (mortgage, carpayments, and so on). The data have beenclassified into two classes: (1) the x’s repre-sent persons who have defaulted on theirloans and (2) the o’s represent persons whoseloans are in good status with the bank. Thus,this simple artificial data set could represent ahistorical data set that can contain usefulknowledge from the point of view of thebank making the loans. Note that in actualKDD applications, there are typically manymore dimensions (as many as several hun-dreds) and many more data points (manythousands or even millions).ArticlesFALL 1996 43Figure 2. A Simple Data Set with Two Classes Used for Illustrative Purposes.。

三维简单正交晶格伊辛(Ising)模型精确解猜想

三维简单正交晶格伊辛(Ising)模型精确解的猜想伊辛(Ising)模型是一个最简单的模型可以提供非常丰富的物理内容,可以被用来帮助我们发现物理世界的原则。

它不仅可以用来描述晶体的磁性,还可以用来描述非常广泛的现象,如合金中的有序-无序转变、液氦到超流态的转变、液体的冻结和蒸发、晶格气体、玻璃物质的性质、森林火灾、城市交通、蛋白质分子进入它们的活性形式的折叠等。

科学家对伊辛模型的广泛兴趣还源于它是描述相互作用的粒子(或原子或自旋)最简单的模型。

它可以用来测试研究相互作用的粒子在多体系统(特别是理解在临界点及其附近的合作现象和临界行为)任何近似方法的理想工具。

进一步说,三维伊辛模型可以研究从无限大温度到绝对零度相互作用的粒子(或原子或自旋) 系统的演变过程,如果将热力学中的温度作为动力学中时间来考量,它不仅可以理解热力学平衡的无限系统,如一个磁铁,还可以帮助理解我们的宇宙。

另外,平衡相变的理论可以用来研究连续的量子相变、基本粒子的超弦理论、在动力学系统到混沌的转变、系统偏离平衡的长时间行为和动力学临界行为等。

由于伊辛模型中的粒子(或原子或自旋)具有两种可能的状态(自旋向上或向下),它实际上可以对应黑白、上下、左右、前后、是非、正负……原则上,伊辛模型可以描述所有具有两种可能的状态的多体系统,描述两种极端条件间的相互竞争。

尽管伊辛模型是一个最简单的物理模型,目前仅有一维和二维的精确解。

Ising 在1925年解出的精确解表明一维伊辛模型中没有相变发生。

Onsager 于1944年获得二维伊辛模型的配分函数和比热的精确解,为统计物理领域的一个重大进展。

杨振宁于1952年求出二维伊辛模型的自发磁化强度。

二维正方伊辛模型的居里温度精确地存在于122−==−c K c e x 即1/K c = 2.26918531……。

二维伊辛模型的临界指数为α = 0, β = 1/8, γ = 7/4, δ = 15, η = 1/4 和 ν = 1。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Why haven’t you heard of this method before?
Consider using condensation on our 1st example
of a matrix:
3 1 5 2 0 2 2.5 7 4
Will condensation give the correct answer of 31?
• What are Alternating Sign Matrices? • A counting problem • How are they symmetric? • Some faces of Alternating Sign Matrices
Matrices & Determinants
• A Matrix is a rectangular array of numbers.
• An example of a 3-by-3 matrix:
3 1 5 2 0 2 2.5 7 4
Matrices & Determinants
• The determinant of a square matrix is a number.
Ans: No. We will end up with 0/0.
Matrices & Determinants
Ignoring the problem of division by zero, if we use condensation on a general 3-by-3 matrix:
a d g
aei afh bdi bdfhe1 bfg cdh ceg
1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0
Matrices & Determinants
The determinant of the 2-by-2 matrix
a c
b d
is defined to be the number ad bc.
Example: The determinant of
3 1 5 4
is equal to (3)(4) (1)(5) 12 5 7
Matrices & Determinants
1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0
Things to notice about these matrices: • The entries are 0, 1, or -1
Matrices & Determinants
The Method of Condensation –
Example:
1
2
2 4
1 3 1 2
3 2 4 2
2 0 12
5 8 8
7 10 10
4 2 6

6 3
0 1
54 2
40 4
2 0
2170
20 2 10
Matrices & Determinants
Matrices & Determinants
The Method of Condensation – • Let A be an n-by-n matrix. Compute the
determinant of each connected 2-by-2 minor of A and form an (n-1)-by-(n-1) matrix from these numbers. • Repeat this process until a 1-by-1 matrix results. • After 2 iterations, each entry in resulting matrices must be divided by the center entry of the corresponding 3-by-3 submatrix 2 steps prior.
b e h
c f i
ae dh
bd eg
bf ei
cf he
aei afh bdi bdfhe1 bdfhe1 bfg cdh ceg
We get 7 distinct terms (up to sign)…
Matrices & Determinants
For each term, create a 3-by-3 matrix whose entries are the exponents of the variables in their original positions.
Matrices & Determinants
• Charles Dodgson (a.k.a. Lewis Carroll) developed a method for computing determinants called the method of condensation.
• So how does it work?
• The study of Matrices and Determinants has been traced back to the 2nd century BC.
• Question: How are determinants computed? Answer: Lots of ways. Here is a lesser-known method…
Alternating Sign Matrices and Symmetry
(or)
Generalized Q*bert Games: An Introduction
(or)
The Problem With Lewis Carroll
By Nickolas Chura
What to discuss…
相关文档
最新文档