Ward Identities of $W_{infty}$ Symmetry in Liouville Theory coupled to $c_M 1$ Matter

合集下载

A Mathematical Theory of Communication

Reprinted with corrections from The Bell System Technical Journal,V ol.27,pp.379–423,623–656,July,October,1948.A Mathematical Theory of CommunicationBy C.E.SHANNONI NTRODUCTIONHE recent development of various methods of modulation such as PCM and PPM which exchangebandwidth for signal-to-noise ratio has intensiﬁed the interest in a general theory of communication.A basis for such a theory is contained in the important papers of Nyquist1and Hartley2on this subject.In the present paper we will extend the theory to include a number of new factors,in particular the effect of noise in the channel,and the savings possible due to the statistical structure of the original message and due to the nature of theﬁnal destination of the information.The fundamental problem of communication is that of reproducing at one point either exactly or ap-proximately a message selected at another point.Frequently the messages have meaning;that is they refer to or are correlated according to some system with certain physical or conceptual entities.These semantic aspects of communication are irrelevant to the engineering problem.The signiﬁcant aspect is that the actual message is one selected from a set of possible messages.The system must be designed to operate for each possible selection,not just the one which will actually be chosen since this is unknown at the time of design.If the number of messages in the set isﬁnite then this number or any monotonic function of this number can be regarded as a measure of the information produced when one message is chosen from the set,all choices being equally likely.As was pointed out by Hartley the most natural choice is the logarithmic function.Although this deﬁnition must be generalized considerably when we consider the inﬂuence of the statistics of the message and when we have a continuous range of messages,we will in all cases use an essentially logarithmic measure.The logarithmic measure is more convenient for various reasons:1.It is practically more useful.Parameters of engineering importance such as time,bandwidth,numberof relays,etc.,tend to vary linearly with the logarithm of the number of possibilities.For example,adding one relay to a group doubles the number of possible states of the relays.It adds1to the base2logarithm of this number.Doubling the time roughly squares the number of possible messages,ordoubles the logarithm,etc.2.It is nearer to our intuitive feeling as to the proper measure.This is closely related to(1)since we in-tuitively measures entities by linear comparison with common standards.One feels,for example,thattwo punched cards should have twice the capacity of one for information storage,and two identicalchannels twice the capacity of one for transmitting information.3.It is mathematically more suitable.Many of the limiting operations are simple in terms of the loga-rithm but would require clumsy restatement in terms of the number of possibilities.The choice of a logarithmic base corresponds to the choice of a unit for measuring information.If the base2is used the resulting units may be called binary digits,or more brieﬂy bits,a word suggested by J.W.Tukey.A device with two stable positions,such as a relay or aﬂip-ﬂop circuit,can store one bit of information.N such devices can store N bits,since the total number of possible states is2N and log22N N.If the base10is used the units may be called decimal digits.Sincelog2M log10M log102332log10M1Nyquist,H.,“Certain Factors Affecting Telegraph Speed,”Bell System Technical Journal,April1924,p.324;“Certain Topics in Telegraph Transmission Theory,”A.I.E.E.Trans.,v.47,April1928,p.617.2Hartley,R.V.L.,“Transmission of Information,”Bell System Technical Journal,July1928,p.535.INFORMATIONSOURCE TRANSMITTER RECEIVER DESTINATIONNOISESOURCEFig.1—Schematic diagram of a general communication system.a decimal digit is about31physical counterparts.We may roughly classify communication systems into three main categories:discrete, continuous and mixed.By a discrete system we will mean one in which both the message and the signal are a sequence of discrete symbols.A typical case is telegraphy where the message is a sequence of lettersand the signal a sequence of dots,dashes and spaces.A continuous system is one in which the message and signal are both treated as continuous functions,e.g.,radio or television.A mixed system is one in which both discrete and continuous variables appear,e.g.,PCM transmission of speech.Weﬁrst consider the discrete case.This case has applications not only in communication theory,but also in the theory of computing machines,the design of telephone exchanges and otherﬁelds.In additionthe discrete case forms a foundation for the continuous and mixed cases which will be treated in the second half of the paper.PART I:DISCRETE NOISELESS SYSTEMS1.T HE D ISCRETE N OISELESS C HANNELTeletype and telegraphy are two simple examples of a discrete channel for transmitting information.Gen-erally,a discrete channel will mean a system whereby a sequence of choices from aﬁnite set of elementarysymbols S1S n can be transmitted from one point to another.Each of the symbols S i is assumed to have a certain duration in time t i seconds(not necessarily the same for different S i,for example the dots anddashes in telegraphy).It is not required that all possible sequences of the S i be capable of transmission on the system;certain sequences only may be allowed.These will be possible signals for the channel.Thus in telegraphy suppose the symbols are:(1)A dot,consisting of line closure for a unit of time and then line open for a unit of time;(2)A dash,consisting of three time units of closure and one unit open;(3)A letter space consisting of,say,three units of line open;(4)A word space of six units of line open.We might place the restriction on allowable sequences that no spaces follow each other(for if two letter spaces are adjacent, it is identical with a word space).The question we now consider is how one can measure the capacity of such a channel to transmit information.In the teletype case where all symbols are of the same duration,and any sequence of the32symbols is allowed the answer is easy.Each symbol representsﬁve bits of information.If the system transmits n symbols per second it is natural to say that the channel has a capacity of5n bits per second.This does not mean that the teletype channel will always be transmitting information at this rate—this is the maximum possible rate and whether or not the actual rate reaches this maximum depends on the source of information which feeds the channel,as will appear later.In the more general case with different lengths of symbols and constraints on the allowed sequences,we make the following deﬁnition:Deﬁnition:The capacity C of a discrete channel is given bylog N TC LimT∞and thereforeC log X0In case there are restrictions on allowed sequences we may still often obtain a difference equation of this type andﬁnd C from the characteristic equation.In the telegraphy case mentioned aboveN t N t2N t4N t5N t7N t8N t10as we see by counting sequences of symbols according to the last or next to the last symbol occurring. Hence C is log0where0is the positive root of12457810.Solving this weﬁnd C0539.A very general type of restriction which may be placed on allowed sequences is the following:We imagine a number of possible states a1a2a m.For each state only certain symbols from the set S1S n can be transmitted(different subsets for the different states).When one of these has been transmitted the state changes to a new state depending both on the old state and the particular symbol transmitted.The telegraph case is a simple example of this.There are two states depending on whether or not a space was the last symbol transmitted.If so,then only a dot or a dash can be sent next and the state always changes. If not,any symbol can be transmitted and the state changes if a space is sent,otherwise it remains the same. The conditions can be indicated in a linear graph as shown in Fig.2.The junction points correspond totheFig.2—Graphical representation of the constraints on telegraph symbols.states and the lines indicate the symbols possible in a state and the resulting state.In Appendix1it is shown that if the conditions on allowed sequences can be described in this form C will exist and can be calculated in accordance with the following result:Theorem1:Let b s i j be the duration of the s th symbol which is allowable in state i and leads to state j. Then the channel capacity C is equal to log W where W is the largest real root of the determinant equation:∑s W bsi j i j0where i j1if i j and is zero otherwise.For example,in the telegraph case(Fig.2)the determinant is:1W2W4W3W6W2W410On expansion this leads to the equation given above for this case.2.T HE D ISCRETE S OURCE OF I NFORMATIONWe have seen that under very general conditions the logarithm of the number of possible signals in a discrete channel increases linearly with time.The capacity to transmit information can be speciﬁed by giving this rate of increase,the number of bits per second required to specify the particular signal used.We now consider the information source.How is an information source to be described mathematically, and how much information in bits per second is produced in a given source?The main point at issue is the effect of statistical knowledge about the source in reducing the required capacity of the channel,by the useof proper encoding of the information.In telegraphy,for example,the messages to be transmitted consist of sequences of letters.These sequences,however,are not completely random.In general,they form sentences and have the statistical structure of,say,English.The letter E occurs more frequently than Q,the sequence TH more frequently than XP,etc.The existence of this structure allows one to make a saving in time(or channel capacity)by properly encoding the message sequences into signal sequences.This is already done to a limited extent in telegraphy by using the shortest channel symbol,a dot,for the most common English letter E;while the infrequent letters,Q,X,Z are represented by longer sequences of dots and dashes.This idea is carried still further in certain commercial codes where common words and phrases are represented by four-orﬁve-letter code groups with a considerable saving in average time.The standardized greeting and anniversary telegrams now in use extend this to the point of encoding a sentence or two into a relatively short sequence of numbers.We can think of a discrete source as generating the message,symbol by symbol.It will choose succes-sive symbols according to certain probabilities depending,in general,on preceding choices as well as the particular symbols in question.A physical system,or a mathematical model of a system which produces such a sequence of symbols governed by a set of probabilities,is known as a stochastic process.3We may consider a discrete source,therefore,to be represented by a stochastic process.Conversely,any stochastic process which produces a discrete sequence of symbols chosen from aﬁnite set may be considered a discrete source.This will include such cases as:1.Natural written languages such as English,German,Chinese.2.Continuous information sources that have been rendered discrete by some quantizing process.Forexample,the quantized speech from a PCM transmitter,or a quantized television signal.3.Mathematical cases where we merely deﬁne abstractly a stochastic process which generates a se-quence of symbols.The following are examples of this last type of source.(A)Suppose we haveﬁve letters A,B,C,D,E which are chosen each with probability.2,successivechoices being independent.This would lead to a sequence of which the following is a typicalexample.B DC B C E C C C AD C B D D A AE C E E AA B B D A E E C A C E E B A E E C B C E A D.This was constructed with the use of a table of random numbers.4(B)Using the sameﬁve letters let the probabilities be.4,.1,.2,.2,.1,respectively,with successivechoices independent.A typical message from this source is then:A A A C D CB DC E A AD A D A CE D AE A D C A B E D A D D C E C A A A A A D.(C)A more complicated structure is obtained if successive symbols are not chosen independentlybut their probabilities depend on preceding letters.In the simplest case of this type a choicedepends only on the preceding letter and not on ones before that.The statistical structure canthen be described by a set of transition probabilities p i j,the probability that letter i is followedby letter j.The indices i and j range over all the possible symbols.A second equivalent way ofspecifying the structure is to give the“digram”probabilities p i j,i.e.,the relative frequency ofthe digram i j.The letter frequencies p i,(the probability of letter i),the transition probabilities 3See,for example,S.Chandrasekhar,“Stochastic Problems in Physics and Astronomy,”Reviews of Modern Physics,v.15,No.1, January1943,p.1.4Kendall and Smith,Tables of Random Sampling Numbers,Cambridge,1939.p i j and the digram probabilities p i j are related by the following formulas:p i ∑j p i j∑j p j i ∑j p j p j ip i j p i p i j∑j p ij ∑i p i ∑i jp i j 1As a speciﬁc example suppose there are three letters A,B,C with the probability tables:p i j A B C 045i B 21151p i jA1518270C 274135A typical message from this source is the following:A B B A B A B A B A B A B A B B B A B B B B B A B A B A B A B A B B B A C A C A BB A B B B B A B B A B AC B B B A B A.The next increase in complexity would involve trigram frequencies but no more.The choice ofa letter would depend on the preceding two letters but not on the message before that point.A set of trigram frequencies p i j k or equivalently a set of transition probabilities p i j k would be required.Continuing in this way one obtains successively more complicated stochastic pro-cesses.In the general n -gram case a set of n -gram probabilities p i 1i 2i n or of transition probabilities p i 1i 2i n 1i n is required to specify the statistical structure.(D)Stochastic processes can also be deﬁned which produce a text consisting of a sequence of“words.”Suppose there are ﬁve letters A,B,C,D,E and 16“words”in the language with associated probabilities:.10A.16BEBE .11CABED .04DEB .04ADEB.04BED .05CEED .15DEED .05ADEE.02BEED .08DAB .01EAB .01BADD .05CA .04DAD .05EESuppose successive “words”are chosen independently and are separated by a space.A typical message might be:DAB EE A BEBE DEED DEB ADEE ADEE EE DEB BEBE BEBE BEBE ADEE BED DEED DEED CEED ADEE A DEED DEED BEBE CABED BEBE BED DAB DEED ADEB.If all the words are of ﬁnite length this process is equivalent to one of the preceding type,but the description may be simpler in terms of the word structure and probabilities.We may also generalize here and introduce transition probabilities between words,etc.These artiﬁcial languages are useful in constructing simple problems and examples to illustrate vari-ous possibilities.We can also approximate to a natural language by means of a series of simple artiﬁcial languages.The zero-order approximation is obtained by choosing all letters with the same probability and independently.The ﬁrst-order approximation is obtained by choosing successive letters independently but each letter having the same probability that it has in the natural language.5Thus,in the ﬁrst-order ap-proximation to English,E is chosen with probability .12(its frequency in normal English)and W with probability .02,but there is no inﬂuence between adjacent letters and no tendency to form the preferred5Letter,digram and trigram frequencies are given in Secret and Urgent by Fletcher Pratt,Blue Ribbon Books,1939.Word frequen-cies are tabulated in Relative Frequency of English Speech Sounds,G.Dewey,Harvard University Press,1923.digrams such as TH,ED,etc.In the second-order approximation,digram structure is introduced.After aletter is chosen,the next one is chosen in accordance with the frequencies with which the various lettersfollow theﬁrst one.This requires a table of digram frequencies p i j.In the third-order approximation, trigram structure is introduced.Each letter is chosen with probabilities which depend on the preceding twoletters.3.T HE S ERIES OF A PPROXIMATIONS TO E NGLISHTo give a visual idea of how this series of processes approaches a language,typical sequences in the approx-imations to English have been constructed and are given below.In all cases we have assumed a27-symbol “alphabet,”the26letters and a space.1.Zero-order approximation(symbols independent and equiprobable).XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD QPAAMKBZAACIBZL-HJQD.2.First-order approximation(symbols independent but with frequencies of English text).OCRO HLI RGWR NMIELWIS EU LL NBNESEBY A TH EEI ALHENHTTPA OOBTTV ANAH BRL.3.Second-order approximation(digram structure as in English).ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TU-COOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE.4.Third-order approximation(trigram structure as in English).IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONS-TURES OF THE REPTAGIN IS REGOACTIONA OF CRE.5.First-order word approximation.Rather than continue with tetragram,,n-gram structure it is easierand better to jump at this point to word units.Here words are chosen independently but with their appropriate frequencies.REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NAT-URAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHESTHE LINE MESSAGE HAD BE THESE.6.Second-order word approximation.The word transition probabilities are correct but no further struc-ture is included.THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHAR-ACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THATTHE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.The resemblance to ordinary English text increases quite noticeably at each of the above steps.Note that these samples have reasonably good structure out to about twice the range that is taken into account in their construction.Thus in(3)the statistical process insures reasonable text for two-letter sequences,but four-letter sequences from the sample can usually beﬁtted into good sentences.In(6)sequences of four or more words can easily be placed in sentences without unusual or strained constructions.The particular sequence of ten words“attack on an English writer that the character of this”is not at all unreasonable.It appears then that a sufﬁciently complex stochastic process will give a satisfactory representation of a discrete source.Theﬁrst two samples were constructed by the use of a book of random numbers in conjunction with(for example2)a table of letter frequencies.This method might have been continued for(3),(4)and(5), since digram,trigram and word frequency tables are available,but a simpler equivalent method was used.To construct(3)for example,one opens a book at random and selects a letter at random on the page.This letter is recorded.The book is then opened to another page and one reads until this letter is encountered. The succeeding letter is then recorded.Turning to another page this second letter is searched for and the succeeding letter recorded,etc.A similar process was used for(4),(5)and(6).It would be interesting if further approximations could be constructed,but the labor involved becomes enormous at the next stage.4.G RAPHICAL R EPRESENTATION OF A M ARKOFF P ROCESSStochastic processes of the type described above are known mathematically as discrete Markoff processes and have been extensively studied in the literature.6The general case can be described as follows:There exist aﬁnite number of possible“states”of a system;S1S2S n.In addition there is a set of transition probabilities;p i j the probability that if the system is in state S i it will next go to state S j.To make thisMarkoff process into an information source we need only assume that a letter is produced for each transition from one state to another.The states will correspond to the“residue of inﬂuence”from preceding letters.The situation can be represented graphically as shown in Figs.3,4and5.The“states”are thejunctionFig.3—A graph corresponding to the source in example B.points in the graph and the probabilities and letters produced for a transition are given beside the correspond-ing line.Figure3is for the example B in Section2,while Fig.4corresponds to the example C.In Fig.3Fig.4—A graph corresponding to the source in example C.there is only one state since successive letters are independent.In Fig.4there are as many states as letters. If a trigram example were constructed there would be at most n2states corresponding to the possible pairs of letters preceding the one being chosen.Figure5is a graph for the case of word structure in example D. Here S corresponds to the“space”symbol.5.E RGODIC AND M IXED S OURCESAs we have indicated above a discrete source for our purposes can be considered to be represented by a Markoff process.Among the possible discrete Markoff processes there is a group with special properties of signiﬁcance in communication theory.This special class consists of the“ergodic”processes and we shall call the corresponding sources ergodic sources.Although a rigorous deﬁnition of an ergodic process is somewhat involved,the general idea is simple.In an ergodic process every sequence produced by the process 6For a detailed treatment see M.Fr´e chet,M´e thode des fonctions arbitraires.Th´e orie des´e v´e nements en chaˆıne dans le cas d’un nombreﬁni d’´e tats possibles.Paris,Gauthier-Villars,1938.is the same in statistical properties.Thus the letter frequencies,digram frequencies,etc.,obtained from particular sequences,will,as the lengths of the sequences increase,approach deﬁnite limits independent of the particular sequence.Actually this is not true of every sequence but the set for which it is false has probability zero.Roughly the ergodic property means statistical homogeneity.All the examples of artiﬁcial languages given above are ergodic.This property is related to the structure of the corresponding graph.If the graph has the following two properties7the corresponding process will be ergodic:1.The graph does not consist of two isolated parts A and B such that it is impossible to go from junctionpoints in part A to junction points in part B along lines of the graph in the direction of arrows and also impossible to go from junctions in part B to junctions in part A.2.A closed series of lines in the graph with all arrows on the lines pointing in the same orientation willbe called a“circuit.”The“length”of a circuit is the number of lines in it.Thus in Fig.5series BEBES is a circuit of length5.The second property required is that the greatest common divisor of the lengths of all circuits in the graph beone.If theﬁrst condition is satisﬁed but the second one violated by having the greatest common divisor equal to d1,the sequences have a certain type of periodic structure.The various sequences fall into d different classes which are statistically the same apart from a shift of the origin(i.e.,which letter in the sequence is called letter1).By a shift of from0up to d1any sequence can be made statistically equivalent to any other.A simple example with d2is the following:There are three possible letters a b c.Letter a isfollowed with either b or c with probabilities13respectively.Either b or c is always followed by lettera.Thus a typical sequence isa b a c a c a c a b a c a b a b a c a cThis type of situation is not of much importance for our work.If theﬁrst condition is violated the graph may be separated into a set of subgraphs each of which satisﬁes theﬁrst condition.We will assume that the second condition is also satisﬁed for each subgraph.We have in this case what may be called a“mixed”source made up of a number of pure components.The components correspond to the various subgraphs.If L1,L2,L3are the component sources we may writeL p1L1p2L2p3L37These are restatements in terms of the graph of conditions given in Fr´e chet.where p i is the probability of the component source L i.Physically the situation represented is this:There are several different sources L1,L2,L3which are each of homogeneous statistical structure(i.e.,they are ergodic).We do not know a priori which is to be used,but once the sequence starts in a given pure component L i,it continues indeﬁnitely according to the statistical structure of that component.As an example one may take two of the processes deﬁned above and assume p12and p28.A sequence from the mixed sourceL2L18L2would be obtained by choosingﬁrst L1or L2with probabilities.2and.8and after this choice generating a sequence from whichever was chosen.Except when the contrary is stated we shall assume a source to be ergodic.This assumption enables one to identify averages along a sequence with averages over the ensemble of possible sequences(the probability of a discrepancy being zero).For example the relative frequency of the letter A in a particular inﬁnite sequence will be,with probability one,equal to its relative frequency in the ensemble of sequences.If P i is the probability of state i and p i j the transition probability to state j,then for the process to be stationary it is clear that the P i must satisfy equilibrium conditions:P j∑iP i p i jIn the ergodic case it can be shown that with any starting conditions the probabilities P j N of being in state j after N symbols,approach the equilibrium values as N∞.6.C HOICE,U NCERTAINTY AND E NTROPYWe have represented a discrete information source as a Markoff process.Can we deﬁne a quantity which will measure,in some sense,how much information is“produced”by such a process,or better,at what rate information is produced?Suppose we have a set of possible events whose probabilities of occurrence are p1p2p n.These probabilities are known but that is all we know concerning which event will occur.Can weﬁnd a measure of how much“choice”is involved in the selection of the event or of how uncertain we are of the outcome?If there is such a measure,say H p1p2p n,it is reasonable to require of it the following properties:1.H should be continuous in the p i.2.If all the p i are equal,p i12,p216.On the right weﬁrst choose between two possibilities each withprobability13,1216H121312is because this second choice only occurs half the time.In Appendix2,the following result is established:Theorem2:The only H satisfying the three above assumptions is of the form:H Kn∑i1p i log p iwhere K is a positive constant.This theorem,and the assumptions required for its proof,are in no way necessary for the present theory.It is given chieﬂy to lend a certain plausibility to some of our later deﬁnitions.The real justiﬁcation of these deﬁnitions,however,will reside in their implications.Quantities of the form H∑p i log p i(the constant K merely amounts to a choice of a unit of measure) play a central role in information theory as measures of information,choice and uncertainty.The form of H will be recognized as that of entropy as deﬁned in certain formulations of statistical mechanics8where p i isthe probability of a system being in cell i of its phase space.H is then,for example,the H in Boltzmann’s famous H theorem.We shall call H∑p i log p i the entropy of the set of probabilities p1p n.If x is a chance variable we will write H x for its entropy;thus x is not an argument of a function but a label for anumber,to differentiate it from H y say,the entropy of the chance variable y.The entropy in the case of two possibilities with probabilities p and q1p,namelyH p log p q log qis plotted in Fig.7as a function of p.HBITSpFig.7—Entropy in the case of two possibilities with probabilities p and1p.The quantity H has a number of interesting properties which further substantiate it as a reasonable measure of choice or information.1.H0if and only if all the p i but one are zero,this one having the value unity.Thus only when weare certain of the outcome does H vanish.Otherwise H is positive.2.For a given n,H is a maximum and equal to log n when all the p i are equal(i.e.,1。

The Subleading Isgur-Wise Form Factor $chi_3(vcdot v')$ to Order $alpha_s$ in QCD Sum Rules

a rXiv:h ep-ph/9212266v116Dec1992SLAC–PUB–6017WIS–92/99/Dec–PH December 1992T/E The Subleading Isgur-Wise Form Factor χ3(v ·v ′)to Order αs in QCD Sum Rules Matthias Neubert Stanford Linear Accelerator Center Stanford University,Stanford,California 94309Zoltan Ligeti and Yosef Nir Weizmann Institute of Science Physics Department,Rehovot 76100,Israel We calculate the contributions arising at order αs in the QCD sum rule for the spin-symmetry violating universal function χ3(v ·v ′),which appears at order 1/m Q in the heavy quark expansion of meson form factors.In particular,we derive the two-loop perturbative contribution to the sum rule.Over the kinematic range accessible in B →D (∗)ℓνdecays,we ﬁnd that χ3(v ·v ′)does not exceed the level of ∼1%,indicating that power corrections induced by the chromo-magnetic operator in the heavy quark expansion are small.(submitted to Physical Review D)I.INTRODUCTIONIn the heavy quark eﬀective theory(HQET),the hadronic matrix elements describing the semileptonic decays M(v)→M′(v′)ℓν,where M and M′are pseudoscalar or vector mesons containing a heavy quark,can be systematically expanded in inverse powers of the heavy quark masses[1–5].The coeﬃcients in this expansion are m Q-independent,universal functions of the kinematic variable y=v·v′.These so-called Isgur-Wise form factors characterize the properties of the cloud of light quarks and gluons surrounding the heavy quarks,which act as static color sources.At leading order,a single functionξ(y)suﬃces to parameterize all matrix elements[6].This is expressed in the compact trace formula[5,7] M′(v′)|J(0)|M(v) =−ξ(y)tr{(2)m M P+ −γ5;pseudoscalar meson/ǫ;vector mesonis a spin wave function that describes correctly the transformation properties(under boosts and heavy quark spin rotations)of the meson states in the eﬀective theory.P+=1g s2m Q O mag,O mag=M′(v′)ΓP+iσαβM(v) .(4)The mass parameter¯Λsets the canonical scale for power corrections in HQET.In the m Q→∞limit,it measures theﬁnite mass diﬀerence between a heavy meson and the heavy quark that it contains[11].By factoring out this parameter,χαβ(v,v′)becomes dimensionless.The most general decomposition of this form factor involves two real,scalar functionsχ2(y)andχ3(y)deﬁned by[10]χαβ(v,v′)=(v′αγβ−v′βγα)χ2(y)−2iσαβχ3(y).(5)Irrespective of the structure of the current J ,the form factor χ3(y )appears always in the following combination with ξ(y ):ξ(y )+2Z ¯Λ d M m Q ′ χ3(y ),(6)where d P =3for a pseudoscalar and d V =−1for a vector meson.It thus eﬀectively renormalizes the leading Isgur-Wise function,preserving its normalization at y =1since χ3(1)=0according to Luke’s theorem [10].Eq.(6)shows that knowledge of χ3(y )is needed if one wants to relate processes which are connected by the spin symmetry,such as B →D ℓνand B →D ∗ℓν.Being hadronic form factors,the universal functions in HQET can only be investigated using nonperturbative methods.QCD sum rules have become very popular for this purpose.They have been reformulated in the context of the eﬀective theory and have been applied to the study of meson decay constants and the Isgur-Wise functions both in leading and next-to-leading order in the 1/m Q expansion [12–21].In particular,it has been shown that very simple predictions for the spin-symmetry violating form factors are obtained when terms of order αs are neglected,namely [17]χ2(y )=0,χ3(y )∝ ¯q g s σαβG αβq [1−ξ(y )].(7)In this approach χ3(y )is proportional to the mixed quark-gluon condensate,and it was estimated that χ3(y )∼1%for large recoil (y ∼1.5).In a recent work we have reﬁned the prediction for χ2(y )by including contributions of order αs in the sum rule analysis [20].We found that these are as important as the contribution of the mixed condensate in (7).It is,therefore,worthwhile to include such eﬀects also in the analysis of χ3(y ).This is the purpose of this article.II.DERIV ATION OF THE SUM RULEThe QCD sum rule analysis of the functions χ2(y )and χ3(y )is very similar.We shall,therefore,only brieﬂy sketch the general procedure and refer for details to Refs.[17,20].Our starting point is the correlatord x d x ′d ze i (k ′·x ′−k ·x ) 0|T[¯q ΓM ′P ′+ΓP +iσαβP +ΓM+Ξ3(ω,ω′,y )tr 2σαβ2(1+/v ′),and we omit the velocity labels in h and h ′for simplicity.The heavy-light currents interpolate pseudoscalar or vector mesons,depending on the choice ΓM =−γ5or ΓM =γµ−v µ,respectively.The external momenta k and k ′in (8)are the “residual”oﬀ-shell momenta of the heavy quarks.Due to the phase redeﬁnition of the eﬀective heavy quark ﬁelds in HQET,they are related to the total momenta P and P ′by k =P −m Q v and k ′=P ′−m Q ′v ′[3].The coeﬃcient functions Ξi are analytic in ω=2v ·k and ω′=2v ′·k ′,with discontinuities for positive values of these variables.They can be saturated by intermediate states which couple to the heavy-light currents.In particular,there is a double-pole contribution from the ground-state mesons M and M ′.To leading order in the 1/m Q expansion the pole position is at ω=ω′=2¯Λ.In the case of Ξ2,the residue of the pole is proportional to the universal function χ2(y ).For Ξ3the situation is more complicated,however,since insertions of the chromo-magnetic operator not only renormalize the leading Isgur-Wise function,but also the coupling of the heavy mesons to the interpolating heavy-light currents (i.e.,the meson decay constants)and the physical meson masses,which deﬁne the position of the pole.1The correct expression for the pole contribution to Ξ3is [17]Ξpole 3(ω,ω′,y )=F 2(ω−2¯Λ+iǫ) .(9)Here F is the analog of the meson decay constant in the eﬀective theory (F ∼f M√m QδΛ2+... , 0|j (0)|M (v ) =iF2G 2tr 2σαβΓP +σαβM (v ) ,where the ellipses represent spin-symmetry conserving or higher order power corrections,and j =¯q Γh (v ).In terms of the vector–pseudoscalar mass splitting,the parameter δΛ2isgiven by m 2V −m 2P =−8¯ΛδΛ2.For not too small,negative values of ωand ω′,the coeﬃcient function Ξ3can be approx-imated as a perturbative series in αs ,supplemented by the leading power corrections in 1/ωand 1/ω′,which are proportional to vacuum expectation values of local quark-gluon opera-tors,the so-called condensates [22].This is how nonperturbative corrections are incorporated in this approach.The idea of QCD sum rules is to match this theoretical representation of Ξ3to the phenomenological pole contribution given in (9).To this end,one ﬁrst writes the theoretical expression in terms of a double dispersion integral,Ξth 3(ω,ω′,y )= d νd ν′ρth 3(ν,ν′,y )1Thereare no such additional terms for Ξ2because of the peculiar trace structure associated with this coeﬃcient function.possible subtraction terms.Because of theﬂavor symmetry it is natural to set the Borel parameters associated withωandω′equal:τ=τ′=2T.One then introduces new variables ω±=12T ξ(y) F2e−2¯Λ/T=ω0dω+e−ω+/T ρth3(ω+,y)≡K(T,ω0,y).(12)The eﬀective spectral density ρth3arises after integration of the double spectral density over ω−.Note that for each contribution to it the dependence onω+is known on dimensionalgrounds.It thus suﬃces to calculate directly the Borel transform of the individual con-tributions toΞth3,corresponding to the limitω0→∞in(12).Theω0-dependence can be recovered at the end of the calculation.When terms of orderαs are neglected,contributions to the sum rule forΞ3can only be proportional to condensates involving the gluonﬁeld,since there is no way to contract the gluon contained in O mag.The leading power correction of this type is represented by the diagram shown in Fig.1(d).It is proportional to the mixed quark-gluon condensate and,as shown in Ref.[17],leads to(7).Here we are interested in the additional contributions arising at orderαs.They are shown in Fig.1(a)-(c).Besides a two-loop perturbative contribution, one encounters further nonperturbative corrections proportional to the quark and the gluon condensate.Let usﬁrst present the result for the nonperturbative power corrections.WeﬁndK cond(T,ω0,y)=αs ¯q q TT + αs GG y+1− ¯q g sσαβGαβq√y2−1),δn(x)=1(4π)D×1dλλ1−D∞λd u1∞1/λd u2(u1u2−1)D/2−2where C F=(N2c−1)/2N c,and D is the dimension of space-time.For D=4,the integrand diverges asλ→0.To regulate the integral,we assume D<2and use a triple integration by parts inλto obtain an expression which can be analytically continued to the vicinity of D=4.Next we set D=4+2ǫ,expand inǫ,write the result as an integral overω+,and introduce back the continuum threshold.This givesK pert(T,ω0,y)=−αsy+1 2ω0dω+ω3+e−ω+/T(16)× 12−23∂µ+3αs9π¯Λ,(17)which shows that divergences arise at orderαs.At this order,the renormalization of the sum rule is thus accomplished by a renormalization of the“bare”parameter G2in(12).In the9π¯Λ 1µ2 +O(g3s).(18)Hence a counterterm proportional to¯Λξ(y)has to be added to the bracket on the left-hand side of the sum rule(12).To evaluate its eﬀect on the right-hand side,we note that in D dimensions[17]¯Λξ(y)F2e−2¯Λ/T=3y+1 2ω0dω+ω3+e−ω+/T(19)× 1+ǫ γE−ln4π+2lnω+−ln y+12T ξ(y) F2e−2¯Λ/T=αsy+1 2ω0dω+ω3+e−ω+/T 2lnµ6+ y r(y)−1+ln y+1According to Luke’stheorem,theuniversalfunction χ3(y )vanishes at zero recoil [10].Evaluating (20)for y =1,we thus obtain a sum rule for G 2(µ)and δΛ2.It reads G 2(µ)−¯ΛδΛ224π3ω00d ω+ω3+e −ω+/T ln µ12 +K cond (T,ω0,1),(21)where we have used that r (1)=1.Precisely this sum rule has been derived previously,starting from a two-current correlator,in Ref.[16].This provides a nontrivial check of our ing the fact that ξ(y )=[2/(y +1)]2+O (g s )according to (19),we ﬁnd that the µ-dependent terms cancel out when we eliminate G 2(µ)and δΛ2from the sum rule for χ3(y ).Before we present our ﬁnal result,there is one more eﬀect which has to be taken into account,namely a spin-symmetry violating correction to the continuum threshold ω0.Since the chromo-magnetic interaction changes the masses of the ground-state mesons [cf.(10)],it also changes the masses of higher resonance states.Expanding the physical threshold asωphys =ω0 1+d M8π3 22 δ3 ω032π2ω30e −ω0/T 26π2−r (y )−ξ(y ) δ0 ω096π 248T 1−ξ(y ).It explicitly exhibits the fact that χ3(1)=0.III.NUMERICAL ANALYSISLet us now turn to the evaluation of the sum rule (23).For the QCD parameters we take the standard values¯q q =−(0.23GeV)3,αs GG =0.04GeV4,¯q g sσαβGαβq =m20 ¯q q ,m20=0.8GeV2.(24) Furthermore,we useδω2=−0.1GeV from above,andαs/π=0.1corresponding to the scale µ=2¯Λ≃1GeV,which is appropriate for evaluating radiative corrections in the eﬀective theory[15].The sensitivity of our results to changes in these parameters will be discussed below.The dependence of the left-hand side of(23)on¯Λand F can be eliminated by using a QCD sum rule for these parameters,too.It reads[16]¯ΛF2e−2¯Λ/T=9T4T − ¯q g sσαβGαβq4π2 2T − ¯q q +(2y+1)4T2.(26) Combining(23),(25)and(26),we obtainχ3(y)as a function ofω0and T.These parameters can be determined from the analysis of a QCD sum rule for the correlator of two heavy-light currents in the eﬀective theory[16,18].Oneﬁnds good stability forω0=2.0±0.3GeV,and the consistency of the theoretical calculation requires that the Borel parameter be in the range0.6<T<1.0GeV.It supports the self-consistency of the approach that,as shown in Fig.2,weﬁnd stability of the sum rule(23)in the same region of parameter space.Note that it is in fact theδω2-term that stabilizes the sum rule.Without it there were no plateau.Over the kinematic range accessible in semileptonic B→D(∗)ℓνdecays,we show in Fig.3(a)the range of predictions forχ3(y)obtained for1.7<ω0<2.3GeV and0.7<T< 1.2GeV.From this we estimate a relative uncertainty of∼±25%,which is mainly due to the uncertainty in the continuum threshold.It is apparent that the form factor is small,not exceeding the level of1%.2Finally,we show in Fig.3(b)the contributions of the individual terms in the sum rule (23).Due to the large negative contribution proportional to the quark condensate,the terms of orderαs,which we have calculated in this paper,cancel each other to a large extent.As a consequence,ourﬁnal result forχ3(y)is not very diﬀerent from that obtained neglecting these terms[17].This is,however,an accident.For instance,the order-αs corrections would enhance the sum rule prediction by a factor of two if the ¯q q -term had the opposite sign. From thisﬁgure one can also deduce how changes in the values of the vacuum condensates would aﬀect the numerical results.As long as one stays within the standard limits,the sensitivity to such changes is in fact rather small.For instance,working with the larger value ¯q q =−(0.26GeV)3,or varying m20between0.6and1.0GeV2,changesχ3(y)by no more than±0.15%.In conclusion,we have presented the complete order-αs QCD sum rule analysis of the subleading Isgur-Wise functionχ3(y),including in particular the two-loop perturbative con-tribution.Weﬁnd that over the kinematic region accessible in semileptonic B decays this form factor is small,typically of the order of1%.When combined with our previous analysis [20],which predicted similarly small values for the universal functionχ2(y),these results strongly indicate that power corrections in the heavy quark expansion which are induced by the chromo-magnetic interaction between the gluonﬁeld and the heavy quark spin are small.ACKNOWLEDGMENTSIt is a pleasure to thank Michael Peskin for helpful discussions.M.N.gratefully acknowl-edgesﬁnancial support from the BASF Aktiengesellschaft and from the German National Scholarship Foundation.Y.N.is an incumbent of the Ruth E.Recu Career Development chair,and is supported in part by the Israel Commission for Basic Research and by the Minerva Foundation.This work was also supported by the Department of Energy,contract DE-AC03-76SF00515.REFERENCES[1]E.Eichten and B.Hill,Phys.Lett.B234,511(1990);243,427(1990).[2]B.Grinstein,Nucl.Phys.B339,253(1990).[3]H.Georgi,Phys.Lett.B240,447(1990).[4]T.Mannel,W.Roberts and Z.Ryzak,Nucl.Phys.B368,204(1992).[5]A.F.Falk,H.Georgi,B.Grinstein,and M.B.Wise,Nucl.Phys.B343,1(1990).[6]N.Isgur and M.B.Wise,Phys.Lett.B232,113(1989);237,527(1990).[7]J.D.Bjorken,Proceedings of the18th SLAC Summer Institute on Particle Physics,pp.167,Stanford,California,July1990,edited by J.F.Hawthorne(SLAC,Stanford,1991).[8]M.B.Voloshin and M.A.Shifman,Yad.Fiz.45,463(1987)[Sov.J.Nucl.Phys.45,292(1987)];47,801(1988)[47,511(1988)].[9]A.F.Falk,B.Grinstein,and M.E.Luke,Nucl.Phys.B357,185(1991).[10]M.E.Luke,Phys.Lett.B252,447(1990).[11]A.F.Falk,M.Neubert,and M.E.Luke,SLAC preprint SLAC–PUB–5771(1992),toappear in Nucl.Phys.B.[12]M.Neubert,V.Rieckert,B.Stech,and Q.P.Xu,in Heavy Flavours,edited by A.J.Buras and M.Lindner,Advanced Series on Directions in High Energy Physics(World Scientiﬁc,Singapore,1992).[13]A.V.Radyushkin,Phys.Lett.B271,218(1991).[14]D.J.Broadhurst and A.G.Grozin,Phys.Lett.B274,421(1992).[15]M.Neubert,Phys.Rev.D45,2451(1992).[16]M.Neubert,Phys.Rev.D46,1076(1992).[17]M.Neubert,Phys.Rev.D46,3914(1992).[18]E.Bagan,P.Ball,V.M.Braun,and H.G.Dosch,Phys.Lett.B278,457(1992);E.Bagan,P.Ball,and P.Gosdzinsky,Heidelberg preprint HD–THEP–92–40(1992).[19]B.Blok and M.Shifman,Santa Barbara preprint NSF–ITP–92–100(1992).[20]M.Neubert,Z.Ligeti,and Y.Nir,SLAC preprint SLAC–PUB–5915(1992).[21]M.Neubert,SLAC preprint SLAC–PUB–5992(1992).[22]M.A.Shifman,A.I.Vainshtein,and V.I.Zakharov,Nucl.Phys.B147,385(1979);B147,448(1979).FIGURESFIG.1.Diagrams contributing to the sum rule for the universal form factorχ3(v·v′):two-loop perturbative contribution(a),and nonperturbative contributions proportional to the quark con-densate(b),the gluon condensate(c),and the mixed condensate(d).Heavy quark propagators are drawn as double lines.The square represents the chromo-magnetic operator.FIG.2.Analysis of the stability region for the sum rule(23):The form factorχ3(y)is shown for y=1.5as a function of the Borel parameter.From top to bottom,the solid curves refer toω0=1.7,2.0,and2.3GeV.The dashes lines are obtained by neglecting the contribution proportional toδω2.FIG.3.(a)Prediction for the form factorχ3(v·v′)in the stability region1.7<ω0<2.3 GeV and0.7<T<1.2GeV.(b)Individual contributions toχ3(v·v′)for T=0.8GeV and ω0=2.0GeV:total(solid),mixed condensate(dashed-dotted),gluon condensate(wide dots), quark condensate(dashes).The perturbative contribution and theδω2-term are indistinguishable in thisﬁgure and are both represented by the narrow dots.11。

Kernels and regularization on graphs

Kernels and Regularization on GraphsAlexander J.Smola1and Risi Kondor21Machine Learning Group,RSISEAustralian National UniversityCanberra,ACT0200,AustraliaAlex.Smola@.au2Department of Computer ScienceColumbia University1214Amsterdam Avenue,M.C.0401New York,NY10027,USArisi@Abstract.We introduce a family of kernels on graphs based on thenotion of regularization operators.This generalizes in a natural way thenotion of regularization and Greens functions,as commonly used forreal valued functions,to graphs.It turns out that diﬀusion kernels canbe found as a special case of our reasoning.We show that the class ofpositive,monotonically decreasing functions on the unit interval leads tokernels and corresponding regularization operators.1IntroductionThere has recently been a surge of interest in learning algorithms that operate on input spaces X other than R n,speciﬁcally,discrete input spaces,such as strings, graphs,trees,automata etc..Since kernel-based algorithms,such as Support Vector Machines,Gaussian Processes,Kernel PCA,etc.capture the structure of X via the kernel K:X×X→R,as long as we can deﬁne an appropriate kernel on our discrete input space,these algorithms can be imported wholesale, together with their error analysis,theoretical guarantees and empirical success.One of the most general representations of discrete metric spaces are graphs. Even if all we know about our input space are local pairwise similarities between points x i,x j∈X,distances(e.g shortest path length)on the graph induced by these similarities can give a useful,more global,sense of similarity between objects.In their work on Diﬀusion Kernels,Kondor and Laﬀerty[2002]gave a speciﬁc construction for a kernel capturing this structure.Belkin and Niyogi [2002]proposed an essentially equivalent construction in the context of approx-imating data lying on surfaces in a high dimensional embedding space,and in the context of leveraging information from unlabeled data.In this paper we put these earlier results into the more principled framework of Regularization Theory.We propose a family of regularization operators(equiv-alently,kernels)on graphs that include Diﬀusion Kernels as a special case,and show that this family encompasses all possible regularization operators invariant under permutations of the vertices in a particular sense.2Alexander Smola and Risi KondorOutline of the Paper:Section2introduces the concept of the graph Laplacian and relates it to the Laplace operator on real valued functions.Next we deﬁne an extended class of regularization operators and show why they have to be es-sentially a function of the Laplacian.An analogy to real valued Greens functions is established in Section3.3,and eﬃcient methods for computing such functions are presented in Section4.We conclude with a discussion.2Laplace OperatorsAn undirected unweighted graph G consists of a set of vertices V numbered1to n,and a set of edges E(i.e.,pairs(i,j)where i,j∈V and(i,j)∈E⇔(j,i)∈E). We will sometimes write i∼j to denote that i and j are neighbors,i.e.(i,j)∈E. The adjacency matrix of G is an n×n real matrix W,with W ij=1if i∼j,and 0otherwise(by construction,W is symmetric and its diagonal entries are zero). These deﬁnitions and most of the following theory can trivially be extended toweighted graphs by allowing W ij∈[0,∞).Let D be an n×n diagonal matrix with D ii=jW ij.The Laplacian of Gis deﬁned as L:=D−W and the Normalized Laplacian is˜L:=D−12LD−12= I−D−12W D−12.The following two theorems are well known results from spectral graph theory[Chung-Graham,1997]:Theorem1(Spectrum of˜L).˜L is a symmetric,positive semideﬁnite matrix, and its eigenvaluesλ1,λ2,...,λn satisfy0≤λi≤2.Furthermore,the number of eigenvalues equal to zero equals to the number of disjoint components in G.The bound on the spectrum follows directly from Gerschgorin’s Theorem.Theorem2(L and˜L for Regular Graphs).Now let G be a regular graph of degree d,that is,a graph in which every vertex has exactly d neighbors.ThenL=d I−W and˜L=I−1d W=1dL.Finally,W,L,˜L share the same eigenvectors{v i},where v i=λ−1iW v i=(d−λi)−1L v i=(1−d−1λi)−1˜L v i for all i.L and˜L can be regarded as linear operators on functions f:V→R,or,equiv-alently,on vectors f=(f1,f2,...,f n) .We could equally well have deﬁned Lbyf,L f =f L f=−12i∼j(f i−f j)2for all f∈R n,(1)which readily generalizes to graphs with a countably inﬁnite number of vertices.The Laplacian derives its name from its analogy with the familiar Laplacianoperator∆=∂2∂x21+∂2∂x22+...+∂2∂x2mon continuous spaces.Regarding(1)asinducing a semi-norm f L= f,L f on R n,the analogous expression for∆deﬁned on a compact spaceΩisf ∆= f,∆f =Ωf(∆f)dω=Ω(∇f)·(∇f)dω.(2)Both(1)and(2)quantify how much f and f vary locally,or how“smooth”they are over their respective domains.Kernels and Regularization on Graphs3 More explicitly,whenΩ=R m,up to a constant,−L is exactly theﬁnite diﬀerence discretization of∆on a regular lattice:∆f(x)=mi=1∂2∂x2if≈mi=1∂∂x if(x+12e i)−∂∂x if(x−12e i)δ≈mi=1f(x+e i)+f(x−e i)−2f(x)δ2=1δ2mi=1(f x1,...,x i+1,...,x m+f x1,...,x i−1,...,x m−2f x1,...,x m)=−1δ2[L f]x1,...,x m,where e1,e2,...,e m is an orthogonal basis for R m normalized to e i =δ, the vertices of the lattice are at x=x1e1+...+x m e m with integer valuedcoordinates x i∈N,and f x1,x2,...,x m=f(x).Moreover,both the continuous and the dis-crete Laplacians are canonical operators on their respective domains,in the sense that they are invariant under certain natural transformations of the underlying space,and in this they are essentially unique.Regular grid in two dimensionsThe Laplace operator∆is the unique self-adjoint linear second order diﬀer-ential operator invariant under transformations of the coordinate system under the action of the special orthogonal group SO m,i.e.invariant under rotations. This well known result can be seen by using Schur’s lemma and the fact that SO m is irreducible on R m.We now show a similar result for L.Here the permutation group plays a similar role to SO m.We need some additional deﬁnitions:denote by S n the group of permutations on{1,2,...,n}withπ∈S n being a speciﬁc permutation taking i∈{1,2,...n}toπ(i).The so-called deﬁning representation of S n consists of n×n matricesΠπ,such that[Ππ]i,π(i)=1and all other entries ofΠπare zero. Theorem3(Permutation Invariant Linear Functions on Graphs).Let L be an n×n symmetric real matrix,linearly related to the n×n adjacency matrix W,i.e.L=T[W]for some linear operator L in a way invariant to permutations of vertices in the sense thatΠ πT[W]Ππ=TΠ πWΠπ(3)for anyπ∈S n.Then L is related to W by a linear combination of the follow-ing three operations:identity;row/column sums;overall sum;row/column sum restricted to the diagonal of L;overall sum restricted to the diagonal of W. Proof LetL i1i2=T[W]i1i2:=ni3=1ni4=1T i1i2i3i4W i3i4(4)with T∈R n4.Eq.(3)then implies Tπ(i1)π(i2)π(i3)π(i4)=T i1i2i3i4for anyπ∈S n.4Alexander Smola and Risi KondorThe indices of T can be partitioned by the equality relation on their values,e.g.(2,5,2,7)is of the partition type [13|2|4],since i 1=i 3,but i 2=i 1,i 4=i 1and i 2=i 4.The key observation is that under the action of the permutation group,elements of T with a given index partition structure are taken to elements with the same index partition structure,e.g.if i 1=i 3then π(i 1)=π(i 3)and if i 1=i 3,then π(i 1)=π(i 3).Furthermore,an element with a given index index partition structure can be mapped to any other element of T with the same index partition structure by a suitable choice of π.Hence,a necessary and suﬃcient condition for (4)is that all elements of T of a given index partition structure be equal.Therefore,T must be a linear combination of the following tensors (i.e.multilinear forms):A i 1i 2i 3i 4=1B [1,2]i 1i 2i 3i 4=δi 1i 2B [1,3]i 1i 2i 3i 4=δi 1i 3B [1,4]i 1i 2i 3i 4=δi 1i 4B [2,3]i 1i 2i 3i 4=δi 2i 3B [2,4]i 1i 2i 3i 4=δi 2i 4B [3,4]i 1i 2i 3i 4=δi 3i 4C [1,2,3]i 1i 2i 3i 4=δi 1i 2δi 2i 3C [2,3,4]i 1i 2i 3i 4=δi 2i 3δi 3i 4C [3,4,1]i 1i 2i 3i 4=δi 3i 4δi 4i 1C [4,1,2]i 1i 2i 3i 4=δi 4i 1δi 1i 2D [1,2][3,4]i 1i 2i 3i 4=δi 1i 2δi 3i 4D [1,3][2,4]i 1i 2i 3i 4=δi 1i 3δi 2i 4D [1,4][2,3]i 1i 2i 3i 4=δi 1i 4δi 2i 3E [1,2,3,4]i 1i 2i 3i 4=δi 1i 2δi 1i 3δi 1i 4.The tensor A puts the overall sum in each element of L ,while B [1,2]returns the the same restricted to the diagonal of L .Since W has vanishing diagonal,B [3,4],C [2,3,4],C [3,4,1],D [1,2][3,4]and E [1,2,3,4]produce zero.Without loss of generality we can therefore ignore them.By symmetry of W ,the pairs (B [1,3],B [1,4]),(B [2,3],B [2,4]),(C [1,2,3],C [4,1,2])have the same eﬀect on W ,hence we can set the coeﬃcient of the second member of each to zero.Furthermore,to enforce symmetry on L ,the coeﬃcient of B [1,3]and B [2,3]must be the same (without loss of generality 1)and this will give the row/column sum matrix ( k W ik )+( k W kl ).Similarly,C [1,2,3]and C [4,1,2]must have the same coeﬃcient and this will give the row/column sum restricted to the diagonal:δij [( k W ik )+( k W kl )].Finally,by symmetry of W ,D [1,3][2,4]and D [1,4][2,3]are both equivalent to the identity map.The various row/column sum and overall sum operations are uninteresting from a graph theory point of view,since they do not heed to the topology of the graph.Imposing the conditions that each row and column in L must sum to zero,we recover the graph Laplacian.Hence,up to a constant factor and trivial additive components,the graph Laplacian (or the normalized graph Laplacian if we wish to rescale by the number of edges per vertex)is the only “invariant”diﬀerential operator for given W (or its normalized counterpart ˜W ).Unless stated otherwise,all results below hold for both L and ˜L (albeit with a diﬀerent spectrum)and we will,in the following,focus on ˜Ldue to the fact that its spectrum is contained in [0,2].Kernels and Regularization on Graphs5 3RegularizationThe fact that L induces a semi-norm on f which penalizes the changes between adjacent vertices,as described in(1),indicates that it may serve as a tool to design regularization operators.3.1Regularization via the Laplace OperatorWe begin with a brief overview of translation invariant regularization operators on continuous spaces and show how they can be interpreted as powers of∆.This will allow us to repeat the development almost verbatim with˜L(or L)instead.Some of the most successful regularization functionals on R n,leading to kernels such as the Gaussian RBF,can be written as[Smola et al.,1998]f,P f :=|˜f(ω)|2r( ω 2)dω= f,r(∆)f .(5)Here f∈L2(R n),˜f(ω)denotes the Fourier transform of f,r( ω 2)is a function penalizing frequency components|˜f(ω)|of f,typically increasing in ω 2,and ﬁnally,r(∆)is the extension of r to operators simply by applying r to the spectrum of∆[Dunford and Schwartz,1958]f,r(∆)f =if,ψi r(λi) ψi,fwhere{(ψi,λi)}is the eigensystem of∆.The last equality in(5)holds because applications of∆become multiplications by ω 2in Fourier space.Kernels are obtained by solving the self-consistency condition[Smola et al.,1998]k(x,·),P k(x ,·) =k(x,x ).(6) One can show that k(x,x )=κ(x−x ),whereκis equal to the inverse Fourier transform of r−1( ω 2).Several r functions have been known to yield good results.The two most popular are given below:r( ω 2)k(x,x )r(∆)Gaussian RBF expσ22ω 2exp−12σ2x−x 2∞i=0σ2ii!∆iLaplacian RBF1+σ2 ω 2exp−1σx−x1+σ2∆In summary,regularization according to(5)is carried out by penalizing˜f(ω) by a function of the Laplace operator.For many results in regularization theory one requires r( ω 2)→∞for ω 2→∞.3.2Regularization via the Graph LaplacianIn complete analogy to(5),we deﬁne a class of regularization functionals on graphs asf,P f := f,r(˜L)f .(7)6Alexander Smola and Risi KondorFig.1.Regularization function r (λ).From left to right:regularized Laplacian (σ2=1),diﬀusion process (σ2=1),one-step random walk (a =2),4-step random walk (a =2),inverse cosine.Here r (˜L )is understood as applying the scalar valued function r (λ)to the eigen-values of ˜L ,that is,r (˜L ):=m i =1r (λi )v i v i ,(8)where {(λi ,v i )}constitute the eigensystem of ˜L .The normalized graph Lapla-cian ˜Lis preferable to L ,since ˜L ’s spectrum is contained in [0,2].The obvious goal is to gain insight into what functions are appropriate choices for r .–From (1)we infer that v i with large λi correspond to rather uneven functions on the graph G .Consequently,they should be penalized more strongly than v i with small λi .Hence r (λ)should be monotonically increasing in λ.–Requiring that r (˜L) 0imposes the constraint r (λ)≥0for all λ∈[0,2].–Finally,we can limit ourselves to r (λ)expressible as power series,since the latter are dense in the space of C 0functions on bounded domains.In Section 3.5we will present additional motivation for the choice of r (λ)in the context of spectral graph theory and segmentation.As we shall see,the following functions are of particular interest:r (λ)=1+σ2λ(Regularized Laplacian)(9)r (λ)=exp σ2/2λ(Diﬀusion Process)(10)r (λ)=(aI −λ)−1with a ≥2(One-Step Random Walk)(11)r (λ)=(aI −λ)−p with a ≥2(p -Step Random Walk)(12)r (λ)=(cos λπ/4)−1(Inverse Cosine)(13)Figure 1shows the regularization behavior for the functions (9)-(13).3.3KernelsThe introduction of a regularization matrix P =r (˜L)allows us to deﬁne a Hilbert space H on R m via f,f H := f ,P f .We now show that H is a reproducing kernel Hilbert space.Kernels and Regularization on Graphs 7Theorem 4.Denote by P ∈R m ×m a (positive semideﬁnite)regularization ma-trix and denote by H the image of R m under P .Then H with dot product f,f H := f ,P f is a Reproducing Kernel Hilbert Space and its kernel is k (i,j )= P −1ij ,where P −1denotes the pseudo-inverse if P is not invertible.Proof Since P is a positive semideﬁnite matrix,we clearly have a Hilbert space on P R m .To show the reproducing property we need to prove thatf (i )= f,k (i,·) H .(14)Note that k (i,j )can take on at most m 2diﬀerent values (since i,j ∈[1:m ]).In matrix notation (14)means that for all f ∈Hf (i )=f P K i,:for all i ⇐⇒f =f P K.(15)The latter holds if K =P −1and f ∈P R m ,which proves the claim.In other words,K is the Greens function of P ,just as in the continuous case.The notion of Greens functions on graphs was only recently introduced by Chung-Graham and Yau [2000]for L .The above theorem extended this idea to arbitrary regularization operators ˆr (˜L).Corollary 1.Denote by P =r (˜L )a regularization matrix,then the correspond-ing kernel is given by K =r −1(˜L ),where we take the pseudo-inverse wherever necessary.More speciﬁcally,if {(v i ,λi )}constitute the eigensystem of ˜L,we have K =mi =1r −1(λi )v i v i where we deﬁne 0−1≡0.(16)3.4Examples of KernelsBy virtue of Corollary 1we only need to take (9)-(13)and plug the deﬁnition of r (λ)into (16)to obtain formulae for computing K .This yields the following kernel matrices:K =(I +σ2˜L)−1(Regularized Laplacian)(17)K =exp(−σ2/2˜L)(Diﬀusion Process)(18)K =(aI −˜L)p with a ≥2(p -Step Random Walk)(19)K =cos ˜Lπ/4(Inverse Cosine)(20)Equation (18)corresponds to the diﬀusion kernel proposed by Kondor and Laf-ferty [2002],for which K (x,x )can be visualized as the quantity of some sub-stance that would accumulate at vertex x after a given amount of time if we injected the substance at vertex x and let it diﬀuse through the graph along the edges.Note that this involves matrix exponentiation deﬁned via the limit K =exp(B )=lim n →∞(I +B/n )n as opposed to component-wise exponentiation K i,j =exp(B i,j ).8Alexander Smola and Risi KondorFig.2.Theﬁrst8eigenvectors of the normalized graph Laplacian corresponding to the graph drawn above.Each line attached to a vertex is proportional to the value of the corresponding eigenvector at the vertex.Positive values(red)point up and negative values(blue)point down.Note that the assignment of values becomes less and less uniform with increasing eigenvalue(i.e.from left to right).For(17)it is typically more eﬃcient to deal with the inverse of K,as it avoids the costly inversion of the sparse matrix˜L.Such situations arise,e.g.,in Gaussian Process estimation,where K is the covariance matrix of a stochastic process[Williams,1999].Regarding(19),recall that(aI−˜L)p=((a−1)I+˜W)p is up to scaling terms equiv-alent to a p-step random walk on the graphwith random restarts(see Section A for de-tails).In this sense it is similar to the dif-fusion kernel.However,the fact that K in-volves only aﬁnite number of products ofmatrices makes it much more attractive forpractical purposes.In particular,entries inK ij can be computed cheaply using the factthat˜L is a sparse matrix.A nearest neighbor graph.Finally,the inverse cosine kernel treats lower complexity functions almost equally,with a signiﬁcant reduction in the upper end of the spectrum.Figure2 shows the leading eigenvectors of the graph drawn above and Figure3provide examples of some of the kernels discussed above.3.5Clustering and Spectral Graph TheoryWe could also have derived r(˜L)directly from spectral graph theory:the eigen-vectors of the graph Laplacian correspond to functions partitioning the graph into clusters,see e.g.,[Chung-Graham,1997,Shi and Malik,1997]and the ref-erences therein.In general,small eigenvalues have associated eigenvectors which vary little between adjacent vertices.Finding the smallest eigenvectors of˜L can be seen as a real-valued relaxation of the min-cut problem.3For instance,the smallest eigenvalue of˜L is0,its corresponding eigenvector is D121n with1n:=(1,...,1)∈R n.The second smallest eigenvalue/eigenvector pair,also often referred to as the Fiedler-vector,can be used to split the graph 3Only recently,algorithms based on the celebrated semideﬁnite relaxation of the min-cut problem by Goemans and Williamson[1995]have seen wider use[Torr,2003]in segmentation and clustering by use of spectral bundle methods.Kernels and Regularization on Graphs9Fig.3.Top:regularized graph Laplacian;Middle:diﬀusion kernel with σ=5,Bottom:4-step random walk kernel.Each ﬁgure displays K ij for ﬁxed i .The value K ij at vertex i is denoted by a bold line.Note that only adjacent vertices to i bear signiﬁcant value.into two distinct parts [Weiss,1999,Shi and Malik,1997],and further eigenvec-tors with larger eigenvalues have been used for more ﬁnely-grained partitions of the graph.See Figure 2for an example.Such a decomposition into functions of increasing complexity has very de-sirable properties:if we want to perform estimation on the graph,we will wish to bias the estimate towards functions which vary little over large homogeneous portions 4.Consequently,we have the following interpretation of f,f H .As-sume that f = i βi v i ,where {(v i ,λi )}is the eigensystem of ˜L.Then we can rewrite f,f H to yield f ,r (˜L )f = i βi v i , j r (λj )v j v j l βl v l = iβ2i r (λi ).(21)This means that the components of f which vary a lot over coherent clusters in the graph are penalized more strongly,whereas the portions of f ,which are essentially constant over clusters,are preferred.This is exactly what we want.3.6Approximate ComputationOften it is not necessary to know all values of the kernel (e.g.,if we only observe instances from a subset of all positions on the graph).There it would be wasteful to compute the full matrix r (L )−1explicitly,since such operations typically scale with O (n 3).Furthermore,for large n it is not desirable to compute K via (16),that is,by computing the eigensystem of ˜Land assembling K directly.4If we cannot assume a connection between the structure of the graph and the values of the function to be estimated on it,the entire concept of designing kernels on graphs obviously becomes meaningless.10Alexander Smola and Risi KondorInstead,we would like to take advantage of the fact that ˜L is sparse,and con-sequently any operation ˜Lαhas cost at most linear in the number of nonzero ele-ments of ˜L ,hence the cost is bounded by O (|E |+n ).Moreover,if d is the largest degree of the graph,then computing L p e i costs at most |E | p −1i =1(min(d +1,n ))ioperations:at each step the number of non-zeros in the rhs decreases by at most a factor of d +1.This means that as long as we can approximate K =r −1(˜L )by a low order polynomial,say ρ(˜L ):= N i =0βi ˜L i ,signiﬁcant savings are possible.Note that we need not necessarily require a uniformly good approximation and put the main emphasis on the approximation for small λ.However,we need to ensure that ρ(˜L)is positive semideﬁnite.Diﬀusion Kernel:The fact that the series r −1(x )=exp(−βx )= ∞m =0(−β)m x m m !has alternating signs shows that the approximation error at r −1(x )is boundedby (2β)N +1(N +1)!,if we use N terms in the expansion (from Theorem 1we know that ˜L≤2).For instance,for β=1,10terms are suﬃcient to obtain an error of the order of 10−4.Variational Approximation:In general,if we want to approximate r −1(λ)on[0,2],we need to solve the L ∞([0,2])approximation problemminimize β, subject to N i =0βi λi −r −1(λ) ≤ ∀λ∈[0,2](22)Clearly,(22)is equivalent to minimizing sup ˜L ρ(˜L )−r−1(˜L ) ,since the matrix norm is determined by the largest eigenvalues,and we can ﬁnd ˜Lsuch that the discrepancy between ρ(λ)and r −1(λ)is attained.Variational problems of this form have been studied in the literature,and their solution may provide much better approximations to r −1(λ)than a truncated power series expansion.4Products of GraphsAs we have already pointed out,it is very expensive to compute K for arbitrary ˆr and ˜L.For special types of graphs and regularization,however,signiﬁcant computational savings can be made.4.1Factor GraphsThe work of this section is a direct extension of results by Ellis [2002]and Chung-Graham and Yau [2000],who study factor graphs to compute inverses of the graph Laplacian.Deﬁnition 1(Factor Graphs).Denote by (V,E )and (V ,E )the vertices V and edges E of two graphs,then the factor graph (V f ,E f ):=(V,E )⊗(V ,E )is deﬁned as the graph where (i,i )∈V f if i ∈V and i ∈V ;and ((i,i ),(j,j ))∈E f if and only if either (i,j )∈E and i =j or (i ,j )∈E and i =j .Kernels and Regularization on Graphs 11For instance,the factor graph of two rings is a torus.The nice property of factor graphs is that we can compute the eigenvalues of the Laplacian on products very easily (see e.g.,Chung-Graham and Yau [2000]):Theorem 5(Eigenvalues of Factor Graphs).The eigenvalues and eigen-vectors of the normalized Laplacian for the factor graph between a regular graph of degree d with eigenvalues {λj }and a regular graph of degree d with eigenvalues {λ l }are of the form:λfact j,l =d d +d λj +d d +d λ l(23)and the eigenvectors satisfy e j,l(i,i )=e j i e l i ,where e j is an eigenvector of ˜L and e l is an eigenvector of ˜L.This allows us to apply Corollary 1to obtain an expansion of K asK =(r (L ))−1=j,l r −1(λjl )e j,l e j,l .(24)While providing an explicit recipe for the computation of K ij without the need to compute the full matrix K ,this still requires O (n 2)operations per entry,which may be more costly than what we want (here n is the number of vertices of the factor graph).Two methods for computing (24)become evident at this point:if r has a special structure,we may exploit this to decompose K into the products and sums of terms depending on one of the two graphs alone and pre-compute these expressions beforehand.Secondly,if one of the two terms in the expansion can be computed for a rather general class of values of r (x ),we can pre-compute this expansion and only carry out the remainder corresponding to (24)explicitly.4.2Product Decomposition of r (x )Central to our reasoning is the observation that for certain r (x ),the term 1r (a +b )can be expressed in terms of a product and sum of terms depending on a and b only.We assume that 1r (a +b )=M m =1ρn (a )˜ρn (b ).(25)In the following we will show that in such situations the kernels on factor graphs can be computed as an analogous combination of products and sums of kernel functions on the terms constituting the ingredients of the factor graph.Before we do so,we brieﬂy check that many r (x )indeed satisfy this property.exp(−β(a +b ))=exp(−βa )exp(−βb )(26)(A −(a +b ))= A 2−a + A 2−b (27)(A −(a +b ))p =p n =0p n A 2−a n A 2−b p −n (28)cos (a +b )π4=cos aπ4cos bπ4−sin aπ4sin bπ4(29)12Alexander Smola and Risi KondorIn a nutshell,we will exploit the fact that for products of graphs the eigenvalues of the joint graph Laplacian can be written as the sum of the eigenvalues of the Laplacians of the constituent graphs.This way we can perform computations on ρn and˜ρn separately without the need to take the other part of the the product of graphs into account.Deﬁnek m(i,j):=l ρldλld+de l i e l j and˜k m(i ,j ):=l˜ρldλld+d˜e l i ˜e l j .(30)Then we have the following composition theorem:Theorem6.Denote by(V,E)and(V ,E )connected regular graphs of degrees d with m vertices(and d ,m respectively)and normalized graph Laplacians ˜L,˜L .Furthermore denote by r(x)a rational function with matrix-valued exten-sionˆr(X).In this case the kernel K corresponding to the regularization operator ˆr(L)on the product graph of(V,E)and(V ,E )is given byk((i,i ),(j,j ))=Mm=1k m(i,j)˜k m(i ,j )(31)Proof Plug the expansion of1r(a+b)as given by(25)into(24)and collect terms.From(26)we immediately obtain the corollary(see Kondor and Laﬀerty[2002]) that for diﬀusion processes on factor graphs the kernel on the factor graph is given by the product of kernels on the constituents,that is k((i,i ),(j,j ))= k(i,j)k (i ,j ).The kernels k m and˜k m can be computed either by using an analytic solution of the underlying factors of the graph or alternatively they can be computed numerically.If the total number of kernels k n is small in comparison to the number of possible coordinates this is still computationally beneﬁcial.4.3Composition TheoremsIf no expansion as in(31)can be found,we may still be able to compute ker-nels by extending a reasoning from[Ellis,2002].More speciﬁcally,the following composition theorem allows us to accelerate the computation in many cases, whenever we can parameterize(ˆr(L+αI))−1in an eﬃcient way.For this pur-pose we introduce two auxiliary functionsKα(i,j):=ˆrdd+dL+αdd+dI−1=lrdλl+αdd+d−1e l(i)e l(j)G α(i,j):=(L +αI)−1=l1λl+αe l(i)e l(j).(32)In some cases Kα(i,j)may be computed in closed form,thus obviating the need to perform expensive matrix inversion,e.g.,in the case where the underlying graph is a chain[Ellis,2002]and Kα=Gα.Kernels and Regularization on Graphs 13Theorem 7.Under the assumptions of Theorem 6we haveK ((j,j ),(l,l ))=12πi C K α(j,l )G −α(j ,l )dα= v K λv (j,l )e v j e v l (33)where C ⊂C is a contour of the C containing the poles of (V ,E )including 0.For practical purposes,the third term of (33)is more amenable to computation.Proof From (24)we haveK ((j,j ),(l,l ))= u,v r dλu +d λv d +d −1e u j e u l e v j e v l (34)=12πi C u r dλu +d αd +d −1e u j e u l v 1λv −αe v j e v l dαHere the second equalityfollows from the fact that the contour integral over a pole p yields C f (α)p −αdα=2πif (p ),and the claim is veriﬁed by checking thedeﬁnitions of K αand G α.The last equality can be seen from (34)by splitting up the summation over u and v .5ConclusionsWe have shown that the canonical family of kernels on graphs are of the form of power series in the graph Laplacian.Equivalently,such kernels can be char-acterized by a real valued function of the eigenvalues of the Laplacian.Special cases include diﬀusion kernels,the regularized Laplacian kernel and p -step ran-dom walk kernels.We have developed the regularization theory of learning on graphs using such kernels and explored methods for eﬃciently computing and approximating the kernel matrix.Acknowledgments This work was supported by a grant of the ARC.The authors thank Eleazar Eskin,Patrick Haﬀner,Andrew Ng,Bob Williamson and S.V.N.Vishwanathan for helpful comments and suggestions.A Link AnalysisRather surprisingly,our approach to regularizing functions on graphs bears re-semblance to algorithms for scoring web pages such as PageRank [Page et al.,1998],HITS [Kleinberg,1999],and randomized HITS [Zheng et al.,2001].More speciﬁcally,the random walks on graphs used in all three algorithms and the stationary distributions arising from them are closely connected with the eigen-system of L and ˜Lrespectively.We begin with an analysis of PageRank.Given a set of web pages and links between them we construct a directed graph in such a way that pages correspond。

Generalized WDVV equations for B_r and C_r pure N=2 Super-Yang-Mills theory

a r X i v :h e p -t h /0102190v 1 27 F eb 2001Generalized WDVV equations for B r and C r pure N=2Super-Yang-Mills theoryL.K.Hoevenaars,R.MartiniAbstractA proof that the prepotential for pure N=2Super-Yang-Mills theory associated with Lie algebrasB r andC r satisﬁes the generalized WDVV (Witten-Dijkgraaf-Verlinde-Verlinde)system was given by Marshakov,Mironov and Morozov.Among other things,they use an associative algebra of holomorphic diﬀter Ito and Yang used a diﬀerent approach to try to accomplish the same result,but they encountered objects of which it is unclear whether they form structure constants of an associative algebra.We show by explicit calculation that these objects are none other than the structure constants of the algebra of holomorphic diﬀerentials.1IntroductionIn 1994,Seiberg and Witten [1]solved the low energy behaviour of pure N=2Super-Yang-Mills theory by giving the solution of the prepotential F .The essential ingredients in their construction are a family of Riemann surfaces Σ,a meromorphic diﬀerential λSW on it and the deﬁnition of the prepotential in terms of period integrals of λSWa i =A iλSW ∂F∂a i ∂a j ∂a k .Moreover,it was shown that the full prepotential for simple Lie algebras of type A,B,C,D [8]andtype E [9]and F [10]satisﬁes this generalized WDVV system 1.The approach used by Ito and Yang in [9]diﬀers from the other two,due to the type of associative algebra that is being used:they use the Landau-Ginzburg chiral ring while the others use an algebra of holomorphic diﬀerentials.For the A,D,E cases this diﬀerence in approach is negligible since the two diﬀerent types of algebras are isomorphic.For the Lie algebras of B,C type this is not the case and this leads to some problems.The present article deals with these problems and shows that the proper algebra to use is the onesuggested in[8].A survey of these matters,as well as the results of the present paper can be found in the internal publication[11].This paper is outlined as follows:in theﬁrst section we will review Ito and Yang’s method for the A,D,E Lie algebras.In the second section their approach to B,C Lie algebras is discussed. Finally in section three we show that Ito and Yang’s construction naturally leads to the algebra of holomorphic diﬀerentials used in[8].2A review of the simply laced caseIn this section,we will describe the proof in[9]that the prepotential of4-dimensional pure N=2 SYM theory with Lie algebra of simply laced(ADE)type satisﬁes the generalized WDVV system. The Seiberg-Witten data[1],[12],[13]consists of:•a family of Riemann surfacesΣof genus g given byz+µz(2.2)and has the property that∂λSW∂a i is symmetric.This implies that F j can be thought of as agradient,which leads to the followingDeﬁnition1The prepotential is a function F(a1,...,a r)such thatF j=∂FDeﬁnition2Let f:C r→C,then the generalized WDVV system[4],[5]for f isf i K−1f j=f j K−1f i∀i,j∈{1,...,r}(2.5) where the f i are matrices with entries∂3f(a1,...,a r)(f i)jk=The rest of the proof deals with a discussion of the conditions1-3.It is well-known[14]that the right hand side of(2.1)equals the Landau-Ginzburg superpotential associated with the cor-∂W responding Lie ing this connection,we can deﬁne the primaryﬁeldsφi(u):=−∂x (2.10)Instead of using the u i as coordinates on the part of the moduli space we’re interested in,we want to use the a i .For the chiral ring this implies that in the new coordinates(−∂W∂a j)=∂u x∂a jC z xy (u )∂a k∂a k )mod(∂W∂x)(2.11)which again is an associative algebra,but with diﬀerent structure constants C k ij (a )=C k ij(u ).This is the algebra we will use in the rest of the proof.For the relation(2.7)weturn to another aspect of Landau-Ginzburg theory:the Picard-Fuchs equations (see e.g [15]and references therein).These form a coupled set of ﬁrst order partial diﬀerential equations which express how the integrals of holomorphic diﬀerentials over homology cycles of a Riemann surface in a family depend on the moduli.Deﬁnition 6Flat coordinates of the Landau-Ginzburg theory are a set of coordinates {t i }on mod-uli space such that∂2W∂x(2.12)where Q ij is given byφi (t )φj (t )=C kij (t )φk (t )+Q ij∂W∂t iΓ∂λsw∂t kΓ∂λsw∂a iΓ∂λsw∂a lΓ∂λsw∂t r(2.15)Taking Γ=B k we getF ijk =C lij (a )K kl(2.16)which is the intended relation (2.7).The only thing that is left to do,is to prove that K kl =∂a mIn conclusion,the most important ingredients in the proof are the chiral ring and the Picard-Fuchs equations.In the following sections we will show that in the case of B r ,C r Lie algebras,the Picard-Fuchs equations can still play an important role,but the chiral ring should be replaced by the algebra of holomorphic diﬀerentials considered by the authors of [8].These algebras are isomorphic to the chiral rings in the ADE cases,but not for Lie algebras B r ,C r .3Ito&Yang’s approach to B r and C rIn this section,we discuss the attempt made in[9]to generalizethe contentsof the previoussection to the Lie algebras B r,C r.We will discuss only B r since the situation for C r is completely analogous.The Riemann surfaces are given byz+µx(3.1)where W BC is the Landau-Ginzburg superpotential associated with the theory of type BC.From the superpotential we again construct the chiral ring inﬂat coordinates whereφi(t):=−∂W BC∂x (3.2)However,the fact that the right-hand side of(3.1)does not equal the superpotential is reﬂected by the Picard-Fuchs equations,which no longer relate the third order derivatives of F with the structure constants C k ij(a).Instead,they readF ijk=˜C l ij(a)K kl(3.3) where K kl=∂a m2r−1˜C knl(t).(3.4)The D l ij are deﬁned byQ ij=xD l ijφl(3.5)and we switched from˜C k ij(a)to˜C k ij(t)in order to compare these with the structure constants C k ij(t). At this point,it is unknown2whether the˜C k ij(t)(and therefore the˜C k ij(a))are structure constants of an associative algebra.This issue will be resolved in the next section.4The identiﬁcation of the structure constantsThe method of proof that is being used in[8]for the B r,C r case also involves an associative algebra. However,theirs is an algebra of holomorphic diﬀerentials which is isomorphic toφi(t)φj(t)=γk ij(t)φk(t)mod(x∂W BC2Except for rank3and4,for which explicit calculations of˜C kij(t)were made in[9]we will rewrite it in such a way that it becomes of the formφi(t)φj(t)=rk=1 C k ij(t)φk(t)+P ij[x∂x W BC−W BC](4.3)As aﬁrst step,we use(3.4):φiφj= Ci·−→φ+D i·−→φx∂x W BC j= C i−D i·r n=12nt n2r−1 C n·−→φ+D i·−→φx∂x W BCj(4.4)The notation −→φstands for the vector with componentsφk and we used a matrix notation for thestructure constants.The proof becomes somewhat technical,so let usﬁrst give a general outline of it.The strategy will be to get rid of the second term of(4.4)by cancelling it with part of the third term,since we want an algebra in which theﬁrst term gives the structure constants.For this cancelling we’ll use equation(3.4)in combination with the following relation which expresses the fact that W BC is a graded functionx ∂W BC∂t n=2rW BC(4.5)Cancelling is possible at the expense of introducing yet another term which then has to be canceled etcetera.This recursive process does come to an end however,and by performing it we automatically calculate modulo x∂x W BC−W BC instead of x∂x W BC.We rewrite(4.4)by splitting up the third term and rewriting one part of it using(4.5):D i·−→φx∂x W BC j= −12r−1 D i·−→φx∂x W BC j= −D i2r−1·−→φx∂x W BC j(4.6) Now we use(4.2)to work out the productφkφn and the result is:φiφj= C i·−→φ−D i2r−1·r n=12nt n D n·−→φx∂x W BC j +2rD i2r−1·rn=12nt n −D n·r m=12mt m2r−1[x∂x W BC−W BC]j(4.8)Note that by cancelling the one term,we automatically calculate modulo x∂x W BC −W BC .The expression between brackets in the ﬁrst line seems to spoil our achievement but it doesn’t:until now we rewrote−D i ·r n =12nt n 2r −1C m ·−→φ+D n ·−→φx∂x W BCj(4.10)This is a recursive process.If it stops at some point,then we get a multiplication structureφi φj =r k =1C k ij φk +P ij (x∂x W BC −W BC )(4.11)for some polynomial P ij and the theorem is proven.To see that the process indeed stops,we referto the lemma below.xby φk ,we have shown that D i is nilpotent sinceit is strictly upper triangular.Sincedeg (φk )=2r −2k(4.13)we ﬁnd that indeed for j ≥k the degree of φk is bigger than the degree ofQ ij5Conclusions and outlookIn this letter we have shown that the unknown quantities ˜C k ijof[9]are none other than the structure constants of the algebra of holomorphic diﬀerentials introduced in [8].Therefore this is the algebra that should be used,and not the Landau-Ginzburg chiral ring.However,the connection with Landau-Ginzburg can still be very useful since the Picard-Fuchs equations may serve as an alternative to the residue formulas considered in [8].References[1]N.Seiberg and E.Witten,Nucl.Phys.B426,19(1994),hep-th/9407087.[2]E.Witten,Two-dimensional gravity and intersection theory on moduli space,in Surveysin diﬀerential geometry(Cambridge,MA,1990),pp.243–310,Lehigh Univ.,Bethlehem,PA, 1991.[3]R.Dijkgraaf,H.Verlinde,and E.Verlinde,Nucl.Phys.B352,59(1991).[4]G.Bonelli and M.Matone,Phys.Rev.Lett.77,4712(1996),hep-th/9605090.[5]A.Marshakov,A.Mironov,and A.Morozov,Phys.Lett.B389,43(1996),hep-th/9607109.[6]R.Martini and P.K.H.Gragert,J.Nonlinear Math.Phys.6,1(1999).[7]A.P.Veselov,Phys.Lett.A261,297(1999),hep-th/9902142.[8]A.Marshakov,A.Mironov,and A.Morozov,Int.J.Mod.Phys.A15,1157(2000),hep-th/9701123.[9]K.Ito and S.-K.Yang,Phys.Lett.B433,56(1998),hep-th/9803126.[10]L.K.Hoevenaars,P.H.M.Kersten,and R.Martini,(2000),hep-th/0012133.[11]L.K.Hoevenaars and R.Martini,(2000),int.publ.1529,www.math.utwente.nl/publications.[12]A.Gorsky,I.Krichever,A.Marshakov,A.Mironov,and A.Morozov,Phys.Lett.B355,466(1995),hep-th/9505035.[13]E.Martinec and N.Warner,Nucl.Phys.B459,97(1996),hep-th/9509161.[14]A.Klemm,W.Lerche,S.Yankielowicz,and S.Theisen,Phys.Lett.B344,169(1995),hep-th/9411048.[15]W.Lerche,D.J.Smit,and N.P.Warner,Nucl.Phys.B372,87(1992),hep-th/9108013.[16]K.Ito and S.-K.Yang,Phys.Lett.B415,45(1997),hep-th/9708017.。

legendre多项式

∂z2 = r5 =
r3
∂3V
15z3 − 9zr2
15 cos3 θ − 9 cos θ
∂z3 = −
r7
=−
r4
We notice that ∂nV /∂zn is an n-th degree polynomial of cos θ divided by rn+1. One formula which combines
Some equations relating (x, y, z) to (r, θ, φ) are,
x = r sin θ cos φ
(4)
y = r sin θ sin φ
(5)
z = r cos θ
(6)
r2 = x2 + y2 + z2
Байду номын сангаас
(7)
From (7) it follows that
∂r z
Legendre Polynomials
• Introduced in 1784 by the French mathematician A. M. Legendre(1752-1833).
• We only study Legendre polynomials which are special cases of Legendre functions. See sections 4.3, 4.7, 4.8, and 4.9 of Kreyszig.
1
Deﬁnition:
1 x2 + y2 + (z
− h)2
=
1 √
r2 − 2rh cos θ + h2
=
∞ n=0

PhysRevA.87.042115

Instituto de F´ ısica, Universidade Federal Fluminense, Av. Gal. Milton Tavares de Souza s/n, Gragoat´ a, 24210-346, Niter´ oi, RJ, Brazil (Received 20 March 2013; published 22 April 2013) Geometric quantum discord is a well-deﬁned measure of quantum correlation if Schatten one-norm (trace norm) is adopted as a distance measure. Here, we analytically investigate the dynamical behavior of the one-norm geometric quantum discord under the effect of decoherence. By starting from arbitrary Bell-diagonal mixed states under Markovian local noise, we provide the decays of the quantum correlation as a function of the decoherence parameters. In particular, we show that the one-norm geometric discord exhibits the possibility of double sudden changes and freezing behavior during its evolution. For nontrivial Bell-diagonal states under simple Markovian channels, these are new features that are in contrast with the Schatten two-norm (Hilbert-Schmidt) geometric discord. The necessary and sufﬁcient conditions for double sudden changes as well as their exact locations in terms of decoherence probabilities are provided. Moreover, we illustrate our results by investigating decoherence in quantum spin chains in the thermodynamic limit. DOI: 10.1103/PhysRevA.87.042115 PACS number(s): 03.65.Ud, 03.67.Mn, 75.10.Jm

Bode and Fano Impedance Matching

2. T. Hirono, Y. Shibata, W.W. Lui, S. Seki, and Y. Yoshikuni, The second-order condition for the dielectric interface orthogonal to the Yee-lattice axis in the FDTD scheme, IEEE Microwave Guided Wave Lett 10 (2000), 359 –361.
3. K.P. Hwang and A.C. Cangellaris, Effective permittivities for secondorder accurate FDTD equations at dielectric interface, IEEE Microwave Guided Wave Lett 11 (2001), 158 –160.
© 2008 Wiley Periodicals, Inc.
BODE AND FANO IMPEDANCE MATCHING
11. D.M. Sheen, S.M. Ali, M.D. Abouzahra, and J.A. Kong, Application of the three-dimensional ﬁnite-difference time-domain method to the analysis of planar microstrip circuits, IEEE Trans. Microwave Theory Tech 38 (1990), 849 – 857.
6. Q.X. Chu and H. Ding, Second-order accurate FDTD equations at dielectric interface for TE modes, IEEE Antenna Propag Symp Dig 1 (2005), 205–208.

Weak Interactions of Light Flavours

2Leabharlann Standard Model
LSM = LH (φ) + LG (W, Z, G) + Higgs Gauge ¯ / ψ+ ψiD
ψ=fermions ψ,ψ′ =fermions
The Standard Model Lagrangian has four parts: ¯ ′ gψψ′ ψφψ Yukawa
QCD and QED conserve C,P,T separately. Local Field theory by itself implies CPT. The fermion and Higgs2 part of the SM-lagrangian conserves CP and T as well. The only part that violates CP and as a consequence also T is the Yukawa part. The Higgs part is responsible for two parameters, the gauge part for three and the HiggsFermion part contains in principle 27 complex parameters, neglecting Yukawa couplings to neutrinos. Luckily most of the 54 real parameters in the Yukawa sector are unobservable. After diagonalizing the lepton sector there only the three charged lepton masses remain. The quark sector can be similarly diagonalized leading to 6 quark masses, but some parts remain in the diﬀerence between weak interaction eigenstates and mass-eigenstates. The latter is conventionally put in the couplings of the charged W -boson, which is given by Vud Vus Vub dα g α − uα cα t γ µ (1 − γ5 ) Vcd Vcs Vcb sα − √ Wµ 2 2 V V V b

4.布朗运动与伊藤公式

Application of Central Limit Them.
Consider limit S (t ) / t as ∆→ 0.
S (t ) 1 t − tk ∆ tk +1 − t ∆ = ∆ S k +1 + ∆ S k t t tk +1 − t k +1 1 t − tk k = ∑ Ri + ∆ ∑ Ri t ∆ i =1 i =1 t − tk = ∆ tk tk k tk − t tk +1 k +1 ∑ Ri + ∑ Ri → X t i =1 ∆ t (k + 1) i =1
input these values to the ori. equation. This completes the proof of the lemma.
Geometric Brownian Motion
St*+∆t − St* = σ ∆t R (ω ) δ =: * St By Taylor expansion
1 2 * 2 δ = ln(1 + δ ) + δ + O ∆S 2 neglecting the higher order terms of ∆t, we have * St +∆t 1 2 ln * = σ ∆t R(ω ) − σ ∆t St 2rownian Motion cont.
S k∆ (t ), t = tk , ∆ S (t ) = t − t ∆ t − t ∆ k S k +1 + k +1 S k , tk ≤ t ≤ tk +1. ∆ ∆

The fine gradings of sl(3,C) and their symmetries

Introduction
— Admissible gradings of a simple Lie algebra L over the complex or real number ﬁeld are basic structural properties of each L. Examples of exploitation of coarse gradings like Z2 are easy to ﬁnd in the physics literature. Here we are interested in the opposite extreme: the ﬁne gradings of L which only recently were described [8,9]. Our aim is to point out the interesting symmetries of the ﬁne gradings on the example of sl(3, C ). Such symmetries can be used in the study of graded contractions of L. Indeed, they are the symmetries of the system of quadratic equations for the contraction parameters. Graded contractions are a systematic way of forming from L a family of equidimensional Lie algebras which are not isomorphic to L. An insight into such parameter–dependent families of Lie algebras provides a group theoretical tool for investigating relations between diﬀerent physical theories through their symmetries. In this contribution we consider ﬁnite dimensional L and announce speciﬁc results for sl(3, C ) only. The decomposition Γ : L = i∈I Li is called a grading if, for any pair of indices i, j ∈ I , there exists an index k ∈ I such that 0 = [Li , Lj ] ⊆ Lk . There are inﬁnitely many gradings of a given Lie algebra. We do not need to distinguish those gradings which can be transformed into each other. More precisely, if Γ : L = i∈I Li is a ˜ : L = i∈I g (Li ) is also a grading. grading and g is an automorphism of L, then Γ Two such gradings are called equivalent. A grading Γ : L = i∈I Li is a reﬁnement of the grading Γ : L = i∈J Lj if for any i ∈ I there exists j ∈ J such that Li ⊆ Lj . A grading which cannot be properly reﬁned is called ﬁne. For construction of a grading of L, one can use any diagonalizable automorphism g from Aut L. It is easy to see that decomposition of L into eigenspaces of g is a

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。