Dobinski-type relations and the Log-normal distribution

合集下载

Complexity results for triangular sets

Complexity results for triangular sets

Journal of Symbolic Computation36(2003)555–594/locate/jsc Complexity results for triangular sets´Eric SchostLaboratoire GAGE,´Ecole Polytechnique,91128Palaiseau Cedex,FranceReceived14November2002;accepted19March2003AbstractWe study the representation of the solutions of a polynomial system by triangular sets,and concentrate on the positive-dimensional case.We reduce to dimension zero by placing the free variables in the basefield,so the solutions can be represented by triangular sets with coefficients in a rational functionfield.We give intrinsic-type bounds on the degree of the coefficients in such a triangular set,and on the degree of an associated degeneracy hypersurface.Then we show how to apply lifting techniques in this context,and point out the role played by the evaluation properties of the input system.Our algorithms are implemented in Magma;we present three applications,relevant to geometry and number theory.©2003Elsevier Ltd.All rights reserved.Keywords:Triangular sets;Complexity;Symbolic Newton operator1.IntroductionThis paper studies the triangular representation of the solutions of a polynomial system. Ourfirst focus is on complexity results and algorithms;we also present a series of applications that were treated with these techniques.To make things clear,let usfirst display a concrete example of a triangular set.An example in Q[X1,X2].Consider the polynomial system in Q[X1,X2]: F1=−X31X2+2X21−4X1X22+2X1X2−2,F2=X21X2−X1+4X22−2X2.It admits the following Gr¨o bner basis for the lexicographic order X1<X2:T1=X21−2,T2=X22−14X1.E-mail address:Eric.Schost@polytechnique.fr(´E.Schost).0014-5793/03/$-see front matter©2003Elsevier Ltd.All rights reserved.doi:10.1016/S0747-7171(03)00095-6556´E.Schost /Journal of Symbolic Computation 36(2003)555–594Since T 1is in Q [X 1]and T 2in Q [X 1,X 2],we say that (T 1,T 2)form a triangular set .In particular,T 1describes the projection of the zero-set of (F 1,F 2)on the X 1-axis.From the field-theoretic point of view,the system (F 1,F 2)generates a prime zero-dimensional ideal,so Q →B :=Q [X 1,X 2]/(F 1,F 2)defines a field extension.We let x 1,x 2be the images of X 1,X 2in B ;then T 1is the minimal polynomial of x 1in Q →B and T 2,seen in Q (x 1)[X 2],is the minimal polynomial of x 2in Q (x 1)→B .Generalization and first complexity considerations.Consider now an arbitrary field K ,K its algebraic closure,and a zero-dimensional variety W ⊂A n (K )defined over K .For simplicity,we take W irreducible over K ;then just as above,the ideal defining W admits the following Gr¨o bner basis for the lexicographic order X 1<···<X n :T 1(X 1),T 2(X 1,X 2),...T n (X 1,...,X n ),with T k in K [X 1,...,X k ],and monic in X k ,for k ≤n .We will use this as an intuitive definition of a triangular set for the rest of this informal introduction.Note that if W is not irreducible,its defining ideal might not have such a triangular family of generators:several triangular sets may be necessary.For k ≤n ,the family T 1,...,T k describes the projection of W on the affine subspace of coordinates X 1,...,X k .In particular,as above,T 1is the minimal polynomial of X 1modulo the ideal defining W .This close link between projections and triangular representations is central in what follows.Let us turn to complexity considerations.The product of the degrees of the polynomials T k in their “main variable” k ≤n deg X k T k equals the number of points in W ,and bounds the total degree of each polynomial T k .Thus,in terms of degrees in the variables X 1,...,X n ,there is not much more to say.New questions arise when the base field K is endowed with a “size”function:if K is a rational function field,we may consider the degree of its elements;if K is a number field,we can talk about the height of its elements.In this context,it becomes natural to ask how the size of the coefficients in T 1,...,T n relates to some invariants measuring the “complexity”of the variety W .In view of the above remarks,a more accurate question is actually,for k ≤n ,the relation between the size of the coefficients in T 1,...,T k and the complexity of the projection of W on the subspace of coordinates X 1,...,X k .In this paper,we focus on this question in the function field case.Here is the concrete situation from where the question originates.Polynomial systems with parameters.A variety of problems can be described by polynomial systems involving free variables,or parameters.In such situations,we also often know that there are only finitely many solutions for a generic choice of the parameters.In other words,we are considering systems that are zero-dimensional over the field of rational functions on some parameter space;triangular sets with rational function´E.Schost/Journal of Symbolic Computation36(2003)555–594557 coefficients can then be used to represent their solutions.The following applications motivated this approach;they are detailed in Section8.•Modular equations.In Gaudry and Schost(2002),we propose a definition of modular equations for hyperelliptic curves,with a view towards point-counting applications.For a given curve,these equations come from the resolution of zero-dimensional polynomial systems,as the minimal polynomial of one of the unknowns.Thus, they can be obtained from a triangular set computation,as in the introductory example.An interesting question is that of modular equations for a curve with generic coefficients,which can be precomputed and stored in a database.This was already done in the elliptic case,and is now done for afirst hyperelliptic modular equation in the Magma package CrvHyp.This naturally raises the question of triangular sets with coefficients in a rational functionfield.•Curves with split Jacobian.Curves of genus2with(2,2)-split Jacobian are of interest in number theory:over Q,torsion,rank and cardinality records are obtained for such curves,see Kulesz(1995,1999)and Howe et al.(2000).Roughly speaking,these curves are characterized by the presence of elliptic quotients of degree2of their Jacobian.We studied such curves in Gaudry and Schost(2001),and showed that the elliptic quotients can be read off triangular sets coming from the resolution of a suitable polynomial system.Classification questions require treating this question for curves with generic coefficients,which leads again to the problem of computing triangular sets over a rational functionfield.•Implicitization.Finally,we will show that the implicit equation of a parametrized surface in R3can be obtained using the triangular representation.Contrary to the above,this question is not a priori formalized in terms of a parametric system.Nevertheless,this question actually reduces to the computation of a minimal polynomial over the rational functionfield Q(x1,x2),which can be done using triangular sets.These examples share the following property:only a partial information,such as a specific eliminating polynomial,is really wanted.We now see how triangular sets can answer this question with good complexity.Overview of our results.The above discussion is formalized as follows:we consider a polynomial system F defined over afield K,depending on m parameters P1,...,P m and n unknowns X1,...,X n.Geometrically speaking,F defines a variety W of dimension m in A m+n(K)and generates a zero-dimensional ideal,when extended over thefield of rational functions on A m(K).Then its“generic solutions”can be represented by a family of triangular sets with coefficients in this rational functionfield.For this short overview,we assume that the generic solutions are represented by a single triangular set T1,...,T ing additional regularity hypotheses,we will answer the following questions:How do the degrees in this triangular set relate to geometric degrees? How accurately does this triangular set describe the solutions of the parametric system F? How fast can it be computed?558´E.Schost/Journal of Symbolic Computation36(2003)555–594•Degree bounds.The coefficients of T1,...,T n are rational functions in the free variables P1,...,P m.Wefirst show that their degrees are bounded by intrinsic geometric degrees,that is,independently of the B´e zout number of the system F.Precisely,for k≤n,the coefficients of T1,...,T k have degree bounded in terms only of the degree of the projection W k of W on the space of coordinates P1,...,P m,X1,...,X k.The precise bound is of order(deg W k)k.•Geometric degree of the degeneracy locus.A triangular set with coefficients in a rational functionfield describes generic solutions.Thus,there is an open subset in the parameter space where none of the denominators of these rational functions vanishes, and where their specialization gives a description the solutions of the parametric system F.We show that the locus where the specialization fails is contained in a hypersur-face whose degree is quadratic in the geometric degree of W.Note the difference with the above degree bounds,which are not polynomial in this degree.The analysis of the probabilistic aspects of our algorithms are based on this result.•Algorithms.Triangular sets are useful for structured problems.For instance,all the above examples can be reduced to the computation of thefirst k polynomials T1,...,T k,for some k≤n.We give probabilistic algorithms for computing these polynomials,whose complexity is polynomial in the size of the ing the above upper bound,the complexity actually depends on the degree of the projection W k of W on the space of coordinates P1,...,P m,X1,...,X k,but not on the degree of W itself.Note nevertheless that our complexity results comprise an additional factor which is exponential in n,inherent to computations with triangular sets.Following the series of papers Giusti et al.(1995,1997,1998),Heintz et al.(2000),Giusti et al.(2001)and Heintz et al.(2001),our algorithms rely on symbolic Newton lifting techniques and the straight-line program representation of polyno-mials.Their practical behaviour matches their good complexity,as they enabled to solve problems that were otherwise out-of-reach.Comparison with primitive elements techniques.This work is in the continuation of Schost(2003),which focuses on a representation by primitive element techniques,the geometric resolution,in a similar context.Caution must be taken when comparing the two approaches.They answer different questions;as such,their complexities cannot be compared directly,since they are stated in terms of different quantities.We use again the above notation:the geometric object of interest is a variety W defined by polynomials in K[P1,...,P m,X1,...,X n],and for k≤n,W k is its projection on the space of coordinates P1,...,P m,X1,...,X k.The degree bound of the coefficients in a geometric resolution is linear in the degree of W.This is to be compared with the results for the triangular representation,which are not polynomial in this degree.On the other hand,triangular sets take into account the degrees of the successive projections W k,which cannot be reached using a primitive element.These degrees can be arbitrarily smaller than the degree of W,making the interest of the triangular representation.´E.Schost/Journal of Symbolic Computation36(2003)555–594559 Consider now the algorithmic aspect.The algorithm in Schost(2003)computes a parametric geometric resolution with a complexity that depends on the degree of W. The algorithms proposed here compute k polynomials T1,...,T k,for any given k≤n;their complexity depends on the degree of the corresponding projection W k of W on the space of coordinates(P1,...,P m,X1,...,X k),but not on the degree of W.Again,this suggests that triangular sets are of interest for problems with a structure,where projections might induce degree drops.We refer to Section8for a practical confirmation for several applications.Related work.In dimension zero,a landmark paper for the triangular representation is Lazard(1992).Our definition of triangular sets is inspired by the one given there,as is the treatment of more technical questions such as splitting and combining triangular sets.In arbitrary dimension,several notions of triangular sets and algorithms exist,see Lazard(1991),Kalkbrener(1991),Maza(1997),Aubry(1999),Delli`e re(1999)and Szanto (1999).For a comparison of some of these approaches,see Aubry et al.(1999);we also refer to the detailed survey of Hubert(in press).Our choice to reduce the question to dimension zero over afield of rational functions yields algorithms with good complexity, and easy to implement.Yet,our output is not as strong as for instance that of Lazard(1991), Maza(1997)and Delli`e re(1999):ours is only generically valid.Upper bounds on the degrees of the polynomials in a triangular set were given in Gallo and Mishra(1990)and Szanto(1999);we recall these results in the next section. In particular,the approach of Gallo and Mishra(1990)inspired Theorem1below.We also use results from Schost(2003),which follow notably Sabia and Solern´o(1996).Lifting techniques for polynomial systems were introduced in Trinks(1985)and Winkler(1988).They were used again in the series of papers by Giusti,Heintz,Pardo and collaborators(Giusti et al.,1995,1997,1998,2001;Heintz et al.,2000,2001).The conjoint use of the straight-line program representation led there to algorithms with the best known complexity for primitive element representations.The present work is in the continuation of the above;see also the survey of Pardo(1995)for a historical presentation of the use of straight-line programs in elimination theory.Finally,let us mention the results of Lecerf(2002),which extend lifting techniques to situations with multiplicities.We note that the paper Heintz et al.(2000)precedes Schost(2003)and the present work, and considers similar questions of parametric systems.Nevertheless,we noted in Schost (2003)that the geometric hypotheses made in that paper are not satisfied in many“real life”applications,and this is again the case for the applications treated here.It should be noted that our complexity statements are of an arithmetic nature,that is,we only estimate the number of basefield operations.When the basefield is the rationalfield,the notion of binary complexity will give a better description of the expected computation time.We have not developed this aspect,which requires arithmetic–geometric considerations.We refer to Krick and Pardo(1996),Giusti et al.(1997)and Krick et al. (2001)where such ideas are presented.This work is based on a shorter version published in Schost(2002).The degree bounds given here are sharper.The whole analysis of the degeneracy locus and the subsequent error probability analyses for the algorithms are new.The complexity results are now precisely stated in terms of basic polynomial and power series arithmetic.560´E.Schost /Journal of Symbolic Computation 36(2003)555–5942.Notation,main resultsTriangular sets in dimension zero.We first define triangular sets over a ring R .Our definition is directly inspired by that of reduced triangular sets given in Lazard (1992):a triangular set is a family of polynomials T =(T 1,...,T n )in R [X 1,...,X n ]such that,for k ≤n :•T k depends only on X 1,...,X k ,•T k is monic in X k ,•T k has degree in X j less than the degree in X j of T j ,for all j <k .Let now K be a field,K its algebraic closure and W ⊂A n (K )a zero-dimensional variety.Recall that W is defined over K if its defining ideal in K [X 1,...,X n ]is generated by polynomials in K [X 1,...,X n ].In this case,a family {T 1,...,T J }of triangular sets with coefficients in K represents the points of W if the radical ideal defining W in K [X 1,...,X n ]is the intersection of the ideals generated by T 1,...,T J ,and if for j =j ,T j and T j have no common zero.In this situation,all ideals (T j )are radical by the Chinese remainder theorem.We then relate the degrees of the polynomials in the family {T 1,...,T J }and the cardinality of W :•If W is irreducible,the family {T 1,...,T J }is actually reduced to a single triangular set T =(T 1,...,T n )and the product k ≤n deg X k T k is the cardinality of W .Here,deg X k T k denotes the degree of T k in the variable X k .•If W is not irreducible,a family {T 1,...,T J }satisfying our conditions exists but is not unique (Lazard ,1992,Proposition 2and Remark 1);now the sum j ≤J k ≤n deg X k T j k is the cardinality of W .Hereafter,note that the superscript in the notation T jk does not denote a j th power.Note that it is necessary to work over the algebraically closed field K ,or more generally to impose separability conditions,to obtain equalities as above,relating the degrees in the triangular sets T or {T 1,...,T J }and the number of points in the variety W .The basic geometric setting.We now turn to more geometric considerations.Throughout this paper,we fix a field K ,K its algebraic closure,and work in the affine space A m +n (K ).We denote by P =P 1,...,P m the first m coordinates in A m +n (K )and by X =X 1,...,X n the last n coordinates.We use the notion of geometric degree of an arbitrary affine variety (not necessarily irreducible,nor even equidimensional),introduced in Heintz (1983).In what follows,the affine space A m +n (K )is endowed with two families of projections.For k ≤n ,we define µk and πk as follows;hereafter,p denotes a point in A m (K ).µk :A m +n (K )→A m +k (K )πk :A m +k (K )→A m (K )(p ,x 1,...,x n )→(p ,x 1,...,x k )(p ,x 1,...,x k )→p .Note in particular that πn maps the whole space A m +n (K )to A m (K ).The main geometric object is an m -dimensional variety W ⊂A m +n (K ).Our first results are of an intrinsic nature,so we do not need an explicit reference to a defining polynomial system.The assumptions on W follow the description made in the introduction:´E.Schost/Journal of Symbolic Computation36(2003)555–594561 Assumption1.Let{W j}j≤J denote the irreducible components of W.We assume that for j≤J:(1)The imageπn(W j)is dense in A m(K).(2)The extension K(P1,...,P m)→K(W j)is separable.Assumption1(1)implies that thefibres of the restriction ofπn to each component of W are genericallyfinite;this justifies treating thefirst m coordinates as distinguished variables and calling them parameters.Assumption1(2)is of a more technical nature,and will help to avoid many difficulties;it is always satisfied in characteristic zero.Under Assumption1,we can define the generic solutions of the variety W.Let J⊂K[P,X]be the radical ideal defining W and J P its extension in K(P)[X].We call generic solutions of W the roots of J P,which are infinite number.We now refer to the previous paragraph,taking K=K(P),and for W thefinite set of generic ing Assumption1(2),the ideal J P remains radical in K[X],so the generic solutions are indeed defined over K=K(P).Thus,they can be represented by a family of triangular sets in K(P)[X];our purpose in this paper is to study their complexity properties,and provide algorithms to compute with them.Let us immediately note some particular cases:•If W is irreducible,a single triangular set is enough to represent its generic solutions.•If W is defined over K,it can be written W=∪j≤J W j,where for all j,W j is defined over K,and the defining ideal of W j is prime in K[P,X].Then the generic solutions of each W j are represented by a triangular set in K(P)[X];the generic solutions of W are represented by their reunion.Projections of W.Before presenting the main results,we introduce some notation related to W and its successive projections.Let k be in1,...,n.First of all,we denote by X≤k thefirst k variables X1,...,X k;if T is a triangular set,T≤k is the sub-family T1,...,T k.We denote by W k⊂A m+k(K)the closure ofµk(W),so in particular W n coincides with W.It is a routine check that for all k,W k satisfies Assumption1as well.Let J k⊂K[P,X≤k]be the ideal defining W k,and J P,k its extension in K(P)[X≤k]. Under Assumption1(1),J P,k coincides with J P∩K(P)[X≤k].Thus if the generic solutions of W are defined by a triangular set T,J P,k is generated by T≤k.For p in A m(K),we denote by W k(p)thefibreπ−1k(p)∩W k and by D k the generic cardinality of thefibres W k(p).Finally,let B k be the quotient K(P)[X≤k]/J P,k;by Assumption1(2),the extension K(P)→B k is a product of separablefield ing the separability,B k has dimension D k,by Proposition1in Heintz(1983).Degree boundsWith this notation,we now present our main results.We assume that the generic solutions of W are represented by a triangular set T=(T1,...,T n)in K(P)[X].In view of the above remarks,this is not a strong limitation:if this assumption is not satisfied,as soon as W is defined over K,the following upper bounds apply to all the K-defined irreducible components of W.562´E.Schost /Journal of Symbolic Computation 36(2003)555–594As mentioned in the preamble,the degree bounds of T in the X variables are easily dealt with:for all k ≤n ,the product i ≤k deg X i T i is the dimension of B k over K (P ),that is,the generic cardinality D k of the fibres W k (p ).We will thus concentrate on the dependence with respect to the P variables.For k ≤n ,the polynomial T k depends only on the variables X 1,...,X k ,and has coefficients in K (P )=K (P 1,...,P m ).It is then natural to relate the degrees of these coefficients to the degree of the projection of W on the space of coordinates P 1,...,P m ,X 1,...,X k ,that is,W k .This is the object of our first theorem.In all that follows,we call degree of a rational function the maximum of the degrees of its numerator and denominator.Theorem 1.Let W be a variety satisfying Assumption 1,and suppose that the generic solutions of W are represented by a triangular set T in K (P )[X ].For k ≤n,all coefficients in T k have degree bounded by (2k 2+2)k (deg W k )2k +1.This result improves those of Gallo and Mishra (1990)and Szanto (1999)for respectively Ritt–Wu’s and Kalkbrener’s unmixed representations.If W is given as the zero-set of a system of n equations of degree d ,then Gallo–Mishra’s bound is 2n (8n )2n d (d +1)4n 2and Szanto’s is d O (n 2).With this notation,the B´e zout inequality (Theorem 1in Heintz (1983))implies that the degree of W k is at most d n for all k .Thus according to Theorem 1,for k ≤n ,in a worst-case scenario the coefficients in the polynomial T k have degree bounded by (2k 2+2)k d 2kn +n .Hence the estimate is better for low indices k than for higher indices;this contrasts with the previous results,which gave the same bounds for all T k .For the worst case k =n ,our estimates are within the class d 2n 2+o (n 2),to be compared with Gallo and Mishra’s bound of d 4n 2+o (n 2).Any of these bounds are polynomial in d n 2;we do not know if this is sharp.More importantly,Theorem 1reveals that the degrees of the coefficients of T are controlled by the intrinsic geometric quantities deg W k ,rather than by the degrees of a defining polynomial system.For instance,this indicates a good behaviour with respect to decomposition,e.g.into irreducible.Also,these degrees may be bounded a priori:in the example presented in Section 8.3,the B´e zout bound is 1024,but an estimate based on the semantics of the problem gives deg W k ≤80.Degree of the degeneracy locus.We still assume that the generic solutions of W are represented by a triangular set T =(T 1,...,T n )in K (P )[X ].Since the coefficients of T are rational functions,there exists an open subset of the parameter space where they can be specialized,and give a description of the fibres of πn .Theorem 2below gives an upper bound on the degree of a hypersurface where this specialization fails.Theorem 2.Let W be a variety satisfying Assumption 1,and suppose that the generic solutions of W are represented by a triangular set T in K (P )[X ].There exists a polynomial ∆W ∈K [P ]of degree at most (3n deg W +n 2)deg W such that,if p ∈A m (K )does not cancel ∆W :(1)p cancels no denominator in the coefficients of (T 1,...,T n ).We denote by (t 1,...,t n )⊂K [X ]these polynomials with coefficients specialized at p .´E.Schost/Journal of Symbolic Computation36(2003)555–594563 (2)(t1,...,t n)is a radical ideal.Let Z n⊂A n(K)be the zero-set of the polynomials(t1,...,t n);then thefibre W n(p)is{p}×Z n⊂A m+n(K).Just as in Theorem1,this result is of an intrinsic nature,since it depends only on geometric quantities.Nevertheless,in strong contrast with the previous result,these bounds are polynomial in the geometric degree of W.In particular,Theorem2shows that the reunion of the zero-sets of all denominators of the coefficients of T is contained in a hypersurface of degree bounded polynomially in terms of the degree of W.Thus,the zero-set of any such denominator has degree bounded by the same quantity.Theorem1does not give such a polynomial bound for the degrees of the denominators.Were the upper bounds of Theorem1to be sharp,this would indicate that these denominators are(high)powers of polynomials of moderate degree. Algorithms.The above results are purely geometric,and independent of any system of generators.For algorithmic considerations,we now assume that W is given as the zero-set of a polynomial system F=F1,...,F n in K[P,X].We make the additional assumption that the Jacobian determinant with respect to X is invertible on a dense subset of W.Then Assumption1is satisfied,and we consider the problem of computing triangular sets that represent the generic solutions of W.The underlying paradigm is that solving a zero-dimensional system over K by means of triangular sets is a well-solved task.Thus,the basic idea isfirst to specialize the indeterminates P in the system F,and solve the corresponding system in the remaining variables X,by means of triangular sets in K[X].A lifting process then produces triangular sets with coefficients in a formal power series ring,from which we can recover the required information.Ourfirst contribution treats the case when W is irreducible:its generic solutions are then represented by a single triangular set T=(T1,...,T n),and we propose a probabilistic algorithm that computes T1,...,T k for any k.If W is not irreducible,we compute the minimal polynomial of X1modulo the extended ideal(F1,...,F n)in K(P)[X],using similar techniques.We do not treat the general question of computing a whole family of triangular sets when W is not irreducible.From the practical point of view,this might not be a strong restriction:our results cover all the applications that we had to treat.We use the following complexity notations:•We suppose that F is given by a straight-line program of size L,and that F1,...,F n have degree bounded by d.•We say that f is in O log(g)if there exists a constant a such that f is in O(g log(g)a)—this is sometimes also expressed by the notation f∈O∼(g).•M(D)denotes the cost of the multiplication of univariate polynomials of degree D, in terms of operations in the base ring.M(D)can be taken in O(D log D log log D), using the algorithm of Sch¨o nhage and Strassen(1971).We denote by C0a universal constant such that for any ring R,any integer D and any monic polynomial T in R[X]of degree D,all operations(+,×)in R[X]/(T) can be done in at most C0M(D)operations,see Chapter9in von zur Gathen and Gerhard(1999).564´E.Schost/Journal of Symbolic Computation36(2003)555–594We assume that there exist constants C1andαsuch that M(D)M(D )≤C1M(DD )log(DD )αholds for all D,D .This assumption is satisfied for all commonly used multiplication algorithms.•M s(D,M)denotes the cost of M-variate series multiplication at precision D.This can be taken less than M((2D+1)M)using Kronecker’s substitution.If the base field has characteristic zero,this complexity becomes linear in the size of the series, up to logarithmic factors;see Lecerf and Schost(in press,Theorem1).We assume that there exists a constant C2<1such that M s(D,M)≤C2M s(2D,M)holds for all D and M.This is the case for all commonly used estimates,for instance for the ones mentioned above.Apart from the above constants,the complexities below are stated in terms of the degrees D k of the rational functions that appear in the output,and the number D n.This number was defined earlier as the generic cardinality of thefibres W n(p);it is thus the generic number of solutions of the parametric system F.Theorem3.Assume that W is irreducible.Let p,p be in K m;assume that a description of the zeros of the systems F(p,X),F(p ,X)by triangular sets is known.For k≤n,let D k be the maximum of the degrees of the coefficients of T1,...,T k.Then T1,...,T k can be computed withinO log((nL+n3)(C0C1)n M(D n)M s(4D k,m)+km2D n M(D k)M s(4D k,m−1)) operations in K.The algorithm chooses3m−1values in K,including the coordinates of p and p .IfΓis a subset of K,and these values are chosen inΓ3m−1,then the algorithm fails for at most50n(k2+2)3k d6kn+4n|Γ|3m−2choices.Theorem4.Let p,p be in K m;assume that a description of the zeros of the systems F(p,X),F(p ,X)by triangular sets which define prime ideals in K[X]is known.Let M1∈K(P)[U]be the minimal polynomial of X1modulo the extended ideal (F1,...,F n)in K(P)[X],and D1the maximum of the degrees of its coefficients.Then M1can be computed withinO log((nL+n3)(C0C1)n M(D n)M s(4D1,m)+m2D n M(D1)M s(4D1,m−1)) operations in K.The algorithm chooses3m−1values in K,including the coordinates of p and p .IfΓis a subset of K,and these values are chosen inΓ3m−1,then the algorithm fails for at most50nd4n|Γ|3m−2choices.These complexities are polynomial with respect to the possible number of monomials in the output.The exponential terms(C0C1)n reflect the cost of computing modulo a triangular set with n elements.Using Theorem1,the above complexities are bounded in terms only of the degrees of the varieties W k(for Theorem3)and W1(for Theorem4).Triangular sets are thus useful when a partial information is required:they avoid taking the whole degree of the variety W into account,as would be the case using primitive element techniques.Finally,note that we could give an alternative formulation for the estimates of probabilities.Referring for instance to Theorem4,a probability of success greater than。

SSD2电子教材unit1

SSD2电子教材unit1
system can be decomposed into the hardware system, the software system, and the network system. Each of these subsystems will be discussed in more detail in subsequent units of this course. The figure below illustrates the major subsystems in a computer system with some examples.
Exercise 1
? Copyright 1999-2004 iCarnegie, Inc. All rights reserved
1.1 Overview of Computer Systems
This section provides a top-level view of the different components in a computer system. You will also obtain a basic understanding of how a computer works using its sub-components.
The modern computer operates in a similar fashion. Input to a computer can be sent through the keyboard or mouse. The computer then processes the input, stores the result, and displays the result via the monitor, speaker, printer, or other output devices. For example, when you request for a web page by typing in its URL (Uniform Resource Locator), "", the computer processes your input by fetching the requested page over the Internet. It then displays the fetched page on your monitor as output.

Branching functions of $A_{n-1}^{(1)}$ and Jantzen-Seitz problem for Ariki-Koike algebras

Branching functions of $A_{n-1}^{(1)}$ and Jantzen-Seitz problem for Ariki-Koike algebras


1
1
Introdund Koike [4] introduced an analogue of the Iwahori-Hecke algebra for the complex reflection group G(l, 1, m) = (Z/lZ)≀Sm . This algebra Hm (v ) = Hm (v ; u0 , . . . , ul−1) depends on l + 1 parameters v, u0 , . . . , ul−1 and reduces to the group algebra of G(l, 1, m) when v = 1 and uk = exp(2ikπ/l). Also, it generalises the Iwahori-Hecke algebras of the Coxeter groups of types Am−1 and Bm , which are obtained for l = 1 and l = 2 respectively. The algebra Hm (v ) appeared independently in [5], where Hecke algebras of other types of complex reflection groups were also defined. When the parameters are generic, Hm (v ) is semisimple and its simple modules S (λ) are labelled by l-tuples of partitions λ = (λ(0) , . . . , λ(l−1) ) such that |λ| := j |λ(j ) | = m. In this paper, we are concerned with the following choice of parameters. Fix an integer n ≥ 2, l integers 0 ≤ v0 ≤ . . . ≤ vl−1 < n, and set v = ζ = exp(2iπ/n), uk = ζ vk , (0 ≤ k < l).

First-Order Phase Transitions in Frustrated Spin Systems

First-Order Phase Transitions in Frustrated Spin Systems

a r X i v :c o n d -m a t /0204505v 1 [c o n d -m a t .s t r -e l ] 23 A p r 20021First-Order Phase Transitions in Frustrated Spin Systems Akihisa Koga ,Akira Kawaguchi ,Kouichi Okunishi ∗and Norio Kawakami Department of Applied Physics,Osaka University,Osaka 565-0871∗Department of Physics,Niigata University,Igarashi 2,Niigata 950-2181(Received )We give a short review of our recent works on the first-order quantum phase transitions in frustrated spin chains with orthogonal-dimer structure.When the ratio of the competing antiferromagnetic exchange couplings is varied,a first-order transition occurs between the dimer phase and the plaquette phase,which is accompanied by the discontinuity in the spin excitation gap.We further show that strong frustration triggers the phase transitions in a magnetic field,which exhibit plateaus and jumps in the magnetization curve.The hole-doping effect is also addressed for the orthogonal-dimer chain with linked-tetrahedra structure.It is found that the competing antiferromagnetic interactions result in a first-order metal-insulator transition upon hole doping.§1.Introduction Recently antiferromagnetic quantum spin systems with strong frustration have been studied intensively.A typical example is SrCu 2(BO 3)21)where the spin-1/2magnetic Cu ions are located on the orthogonal-dimer lattice (the so-called Shastry-Sutherland model).2)Various interesting phenomena were observed for this com-pound such as dispersionless triplet excitations,correlated hopping of two magnons,plateaus in the magnetization curve.It is suggested that novel properties in this compound are closely related to frustration due to the competing antiferromagnetic couplings.3),4)Besides such spin systems,metallic systems with frustration have also attracted much attention recently.For instance,it is suggested that heavy-fermion behavior observed in LiV 2O 45)may be caused by geometrical frustration inherent in the pyrochlore-lattice structure.6)One of the remarkable properties common in such frustrated systems is that strong frustration enhances the competition among several eigenstates with distinct character as a candidate for the ground state,giving rise to first-order quantum phase transitions when we vary the exchange couplings,the magnetic field,the chemical potential,etc.In order to address the role of geometrical frustration in quantum phase transitions,in this paper,we investigate a simple one-dimensional (1D)version of the frustrated spin model with orthogonal-dimer structure.By means of theseries expansion technique,7)the exact diagonalization (ED)and the density matrix renormalization group (DMRG),8),9)we demonstrate that strong frustration indeed causes a wide variety of first-order quantum phase transitions.typeset using P T P T E X.sty <ver.1.0>2 A.Koga,A.Kawaguchi,K.Okunishi and N.Kawakami§2.Quantum spin system with orthogonal-dimer structure In this section,we studyfirst-order transitions in the1D orthogonal-dimer spinchain.10),11)The Hamiltonian we shall deal with is the standard S=1/2antiferro-Fig.1.line represents themagnetic H S zi,onthe parameters j=J′/J andWefirst spin chain.10),11)system is that a direct in Fig.1 is always an state should be the exact ground state up to a certain critical value of j.10)On the otherFig.2.forfinitehand,inAs seendimerLettheover thethereFirst-Order Phase Transitions in Frustrated Spin Systems3 corresponding excitation energy is shown as the solid line in Fig.2(b).The other is an unusual four-fold degenerate excitation shown as the dashed line,which is formed by breaking the plaquette singlet statefirst,and then making a local dimer state accompanied by two free spins.11)Since the latter excitation becomes the lowest-excitation in the region j c<j<0.87,as shown in Fig.2,it plays an essential role for the phase transitions in the vicinity of the critical point,as will be shown below.In this way,the unusual four-fold degenerate excitation found here possesses intermediate properties between those typical for the dimer phase and the plaquette phase.We have seen so far that several different kinds of low-energy excitations coexist in our frustrated model.It is thus interesting to see how these excitations affect the magnetization process.To this end,we focus here on the magnetization curve for the ratio of the exchange couplings j=0.94,where the system belongs to the plaquette phase.As shown in Fig.3,the lowest excitation is a plaquette-triplet excitation(solid line),and another four-fold dispersionless excitation(broken line) lies slightly above it.The coexistence of two-kinds of distinct excitations influencesFig.3.curvethe=∆),the by theπ). With/4-plateau.inis not the four-foldAs1D spinto4 A.Koga,A.Kawaguchi,K.Okunishi and N.Kawakami§3.Correlated electron system with strong frustration In this section,we extend our discussions to a hole-doped system.14)We here employ another orthogonal-dimer model15)shown in Fig. 4.A nice feature in this model is that it can describe not only the orthogonal-dimer spin system but also the linked-tetrahedra system,which plays an important role for understanding low-energy properties of the pyrochlore system.In the undoped case,afirst-order quantum phase transition between the dimer phase and the plaquette phase occurs at the critical point j c=0.71.15),16)Fig.tronForFig.lated this with atheFirst-Order Phase Transitions in Frustrated Spin Systems5are defined as(see Fig.4),1¯S1=(<S z1S z3>+<S z1S z4>+<S z2S z3>+<S z2S z4>).4In Fig.5(b),the spin correlation functions calculated by DMRG are shown as a function of the chemical potentialµ.Note that we have¯S1=−1/4and¯S2=0at halffilling,which accord with those expected for the isolated dimers.It is seen that the spin correlation along the rung,¯S1,rapidly decreases down to−1/4as the system approaches halffilling from the metallic side.On the other hand,¯S2becomes almost zero along the chain direction,reflecting the fact that the system is an assembly of decoupled dimers at halffilling.Therefore,near halffilling,the metallic state is considered as a resonating state composed of such dimer pairs.In contrast to the dimer phase,strong frustration dramatically changes the na-ture of the metal-insulator transition in the plaquette phase.In Fig.6(a),we showFig.6.1.0the the halfphasesity.phaseisOnwhileThis thethe tion.6 A.Koga,A.Kawaguchi,K.Okunishi and N.Kawakami§4.SummaryWe have investigated thefirst-order quantum phase transitions in antiferromag-netic spin chains with orthogonal-dimer structure.It has been clarified that there ap-pears rich structure in the excitation spectrum,which is caused by strong frustration. Such nontrivial excitations determine the nature of the quantum phase transitions, when we change the interactions,the magneticfield,etc.We have also studied the hole-doping effect on another orthogonal-dimer chain with linked-tetrahedra struc-ture.It has been shown that thefirst-order metal-insulator transition is triggered by strong geometrical frustration.We have checked in a preliminary calculation that such afirst-order metal-insulator transition also appears in the model of§2,implying that it may be common in this class of the orthogonal-dimer models.AcknowledgementsThis work was partly supported by a Grant-in-Aid from the Ministry of Educa-tion,Science,Sports and Culture of Japan.A part of computations was done at the Supercomputer Center at the Institute for Solid State Physics,University of Tokyo and Yukawa Institute Computer Facility. A.Kawaguchi is supported by the Japan Society for the Promotion of Science.References1)H.Kageyama,K.Yoshimura,R.Stern,N.V.Mushnikov,K.Onizuka,M.Kato,K.Kosuge,C.P.Slichter,T.Goto and Y.Ueda:Phys.Rev.Lett.823168(1999).2) B.S.Shastry and B.Sutherland:Physica108B1069(1981).3)S.Miyahara and K.Ueda:Phys.Rev.Lett.82,3701(1999).4) A.Koga and N.Kawakami:Phys.Rev.Lett.84,4461(2000);C.Knetter,A.B¨u hlet,E.M¨u ller-Hartmann and G.S.Uhrig:Phys.Rev.Lett.85,3958(2000);T.Momoi and K.Totsuka,Phys.Rev.B613231(2000);Y.Fukumoto:J.Phys.Soc.Jpn.69,2755(2000);G.Misguish,Th.Jolicoeur and S.M.Girvin,Phys.Rev.Lett.87097203(2001);C.H.Chung,J.B.Marston and S.Sachdev,Phys.Rev.B64134407(2001).5)S.Kondo,D.C.Johnston,C.A.Swenson,F.Borsa,A.V.Mahajan,ler,T.Gu,A.I.Goldman,M.B.Maple,D.A.Gajewski,E.J.Freeman,N.R.Dilley,R.P.Dickey,J.Merrin,K.Kojima,G.M.Luke,Y.J.Uemura,O.Chmaissem and J.D.Jorgensen,Phys.Rev.Lett.78,3729(1997).6)H.Kaps,N.B¨u ttgen,W.Trinkl,A.Loidl,M.Klemm and S.Horn,J.Phys.Condens.Matter13,8497(2001).7)M.P.Gelfand and R.R.P.Singh,Adv.Phys.49,93(2000).8)S.R.White,Phys.Rev.Lett.69,2863(1992);Phys.Rev.B48,10345(1993).9)T.Nishino and K.Okunishi,J.Phys.Soc.Jpn.63,4084(1995);Y.Hieida,K.Okunishiand Y.Akutsu,Phys.Lett.A233,464(1997).10)N.B.Ivanov and J.Richter,Phys.Lett.232A,308(1997);J.Richter,N.B.Ivanov andJ.Schulenburg,J.Phys.Condence Matt.10,3635(1998);J.Schulenburg and J.Richter, Phys.Rev.B65,054420(2002).11) A.Koga,K.Okunishi and N.Kawakami,Phys.Rev.B62,5558(2000).12)N.Kato and M.Imada,J.Phys.Soc.Jpn.64,(1995)4105.13) A.Koga,S.Kumada,N.Kawakami and T.Fukui,J.Phys.Soc.Jpn.67,622(1998).14)M.Ogata,M.U.Luchini,T.M.Rice,Phys.Rev.B44(1991)12083.15)M.P.Gelfand,Phys.Rev.B43,8644(1991).16) A.Honecker,a and M.Troyer,Eur.Phys.J.B,15227,(2000).。

Abstract

Abstract

July 17, 2006
Technical Report 2Science 140 Governors Drive University of Massachusetts Amherst, Massachusetts 01003-9624
Abstract We present a novel hierarchical framework for solving Markov decision processes (MDPs) using a multiscale method called diffusion wavelets. Diffusion wavelet bases significantly differ from the Laplacian eigenfunctions studied in the companion paper (Mahadevan and Maggioni, 2006): the basis functions have compact support, and are inherently multi-scale both spectrally and spatially, and capture localized geometric features of the state space, and of functions on it, at different granularities in spacefrequency. Classes of (value) functions that can be compactly represented in diffusion wavelets include piecewise smooth functions. Diffusion wavelets also provide a novel approach to approximate powers of transition matrices. Policy evaluation is usually the expensive step in policy iteration, requiring O(|S |3 ) time to directly solve the Bellman equation (where |S | is the number of states for discrete state spaces or sample size in continuous spaces). Diffusion wavelets compactly represent powers of transition matrices, yielding a direct policy evaluation method requiring only O(|S |) complexity in many cases, which is remarkable because the Green’s function (I − γP π )−1 is usually a full matrix requiring quadratic space just to store each entry. A range of illustrative examples and experiments, from simple discrete MDPs to classic continuous benchmark tasks like inverted pendulum and mountain car, are used to evaluate the proposed framework.

The Decision Reliability of MAP, Log-MAP,

The Decision Reliability of MAP, Log-MAP,

The Decision Reliability of MAP, Log-MAP, Max-Log-MAP and SOV A Algorithmsfor Turbo CodesAbstract —In this paper, we study the reliability of decisions ofe Codes, Channel Reliability,e N comm llular, satellite and we also consider two improved versions, named Log-MAP two different or identicalRecursi s, connectedin pFig. 1. The turbo encoder with rate 1/3.The first encoder operat ed b e u , i ond encoderp Lucian Andrei Peri şoar ă, and Rodica Stoianth MAP, Log-MAP, Max-Log-MAP and SOVA decoding algorithms for turbo codes, in terms of the a priori information, a posteriori information, extrinsic information and channel reliability. We also analyze how important an accurate estimate of channel reliability factor is to the good performances of the iterative turbo decoder. The simulations are made for parallel concatenation of two recursive systematic convolutional codes with a block interleaver at the transmitter, AWGN channel and iterative decoding with mentioned algorithms at the receiver.Keywords —Convolutional Turbo D cision Reliability, Extrinsic Information, Iterative Decoding.I. I NTRODUCTIONunication systems, like ce computer fields, the information is represented as a sequence of binary digits. The binary message is modulated to an analog signal and transmitted over a communication channel affected by noise that corrupt the transmitted signal.The channel coding is used to protect the information fromnoise and to reduce the number of error bits.One of the most used channel codes are convolutional codes, with the decoding strategy based on the Viterbialgorithm. The advantages of convolutional codes are used inTurbo Codes (TC), which can achieve performances within a2 dB of channel capacity [1]. These codes are parallelconcatenation of two Recursive Systematic Convolutional (RSC) codes separated by an interleaver. The performances of the turbo codes are due to parallel concatenation ofcomponent codes, the interleaver schemes and the iterative decoding using the Soft Input Soft Output (SISO) algorithms [2], [3].In this paper we study the decision reliability problem for turbo coding schemes in the case of two different decodingstrategies: Maximum A Posteriori (MAP) algorithm and Soft Output Viterbi Algorithm (SOVA). For the MAP algorithmand Max-Log-MAP algorithms. The first one is a simplified algorithm which offers the same optimal performance with a reasonable complexity. The second one and the SOVA are less complex again, but give a slightly degraded performance. The paper is organized as follows. In Section II, the turbo encoder is presented. In Section III, the turbo decoder is ex Manuscript received December 10, 2008. This work was supported in part by the Romanian National University Research Council (CNCSIS) under theGrant type TD (young doctoral students), no. 24.L. A. Peri şoar ă is with the Applied Electronics and InformationEngineering Department, Politehnica University of Bucharest, Romania (e-mail: lucian@orfeu.pub.ro, lperisoara@, www.orfeu.pub.ro).R. Stoian is with the Applied Electronics and Information Engineering Department, Politehnica University of Bucharest, Romania (e-mail: rodica@orfeu.pub.ro, rodicastoian2004@, www.orfeu.pub.ro).plained in detail, presenting firstly the iterative decoding principle (turbo principle), specifying the concepts of a priori information, a posteriori information, extrinsic information, channel reliability and source reliability. Then, we review the MAP, Log-MAP, Max-Log-MAP and SOVA decoding algorithms for which we discuss the decision reliability. In Section IV is analyzed the influence of channel reliability factor on decoding performances for the mentioned decoding algorithms. Section V presents some simulation results, which we obtained.II. T HE T URBO C ODING S CHEME The turbo encoder can use ve Systematic Convolutional (RSC) code arallel, see Fig. 1.es on the input bits represent n their original order, while the sec y the fram o erates on the input bits which are permuted by the interleaver, frame u ’, [4]. The output of the turbo encoder is represented by the frame: I2)()()1211,12,121,22,21,2,,,,,,,,,...,,,k k k u c c u c c u c c ==v u c c /R k n = to b , (1)is less likely where frame c1 is the output of the first RSC and frame c2 is the output of the second RSC. If the input frame u is of length k and the output frame x is of length n , then the encoder rate is .For block encoding data is segmented into non-overlapping blocks of length k with each block encoded (and decoded)independently. This scheme imposes the use of a blockinterleaver with the constraint that the RSC’s must begin in the same state for each new block. This requires either trellis termination or trellis truncation. Trellis termination need appending extra symbols (usually named tail bits) to the inputframe to ensure that the shift registers of the constituent RSC encoders starts and ends at the same zero state. If the encoder has code rate 1/3, then it maps k data bits into 3k coded bits plus 3m tail bits. Trellis truncation simply involves resettingthe state of the RSC’s for each new block.The interleaver used for parallel concatenation is a device that permutes coordinates either on a block basis (a generalized “block” interleaver) or on a sliding window basis(a generalized “convolutional” interleaver). The interleaver ensures that the set of code sequences generated by the turbo code has nice weight properties, which reduces the probabilitythat the decoder will mistake one codeword for another.The output codeword is then modulated, for example with Binary Phase Shift Keying (BPSK), resulting the sequence , which is transmitted over an Additive White Gaussian Noise (AWGN) channel.(12,,=v u c c 12,)p x x )(,s p =x x e a low weight codeword due to the interleaver in front of it. The interleaver shuffles the inputsequence It is known that turbo codes are the best practical codes due to their performance at low SNR. One reason for their better performance is that turbo codes produce high weight code words [4]. For example, if the input sequence u is originally low weight, the systematic u and parity c 1 outputs mayproduce a low weight codeword. However, the parity output c 2 is less likely to be a low weight codeword due to the u , in such a way that when introduced to the second encoder, it is more likely probable to produce a high weight codeword. This is ideal for the code because high weight code words result in better decoder performance. III. T HE T URBO D ECODING S CHEME Let be the received sequence of length n , 12(,,)s p p =y y y y where the vector y s is formed only by the received informationsymbols s y 222222(,,...,)p p p p n y y y =y p 1 and y p 2and . These three streams are applied to the input of the turbo decoder presented in Fig. 2. 11112(,,...,)p p p p n y y y 1=y y At time j , decoder 1 using partial received information 1,s p j j y y (), makes its decision and outputs the a posterioriinformation s j L x +()()()e s s s s . Then, the extrinsic information is computed j j j c jL x L x L x L y +−=−−. Decoder 2 makes itsdecision based on the extrinsic information ()e sj L x 2 from decoder 1 and the received information ',s p j jy y . The term(')s j L x + is the a posteriori information derived from decoder 2 and used by decoder 1 as a priori information about thereceived sequence, noted with (')sj L x −(). Now, the second iteration can begin, and the first decoder decodes the same channel symbols, but now with additional information about the value of the input symbols provided by the second decoder in the first iteration. After some iterations, the algorithm converges and the extrinsic information values remains the same. Now the decision about the message bits u j is made based on the a posteriori values s j L x +.e s y p 2y p 1y sFig. 2. The turbo decoder.Each constituent decoder operates based on the Logarithm Likelihood Ratio (LLR).A. The Decision Reliability of MAP DecoderBahl, Cocke, Jelinek and Raviv proposed the Maximum APosteriori (MAP) decoding algorithm for convolutional codesin 1974 [1]. The iterative decoder developed by Berrou et al.[5] in 1993 has a greatly increased attention. In their paper,they considered the iterative decoding of two RSC codesconcatenated in parallel through a non-uniform interleaver and the MAP algorithm was modified to minimize the sequence error probability instead of bit error probability.Because of its increased complexity, the MAP algorithm was simplified in [6] and the optimal MAP algorithm calledthe Log-MAP algorithm was developed. The LLR of a transmitted bit is defined as [2]:(1)()log ()(1)s Wenoted def j s sj j s j P x L x L x P x −⎛⎞=+==⎜⎟⎜⎟=−⎝⎠where the sign of the LLR ()s j L x indicate whether the bit s j xis more likely to be +1 or -1 and the magnitude of the LLRgives an indication of the correct value of s j x . The term()sj L x − is defined as the a priori information about s j x .In channel coding theory we are interested in theprobability that , based or conditioned on some received sequence 1s j x =±s j y . Hence, we use the conditional LLR: ()()()1||log (1|s s We noted def j j s s s j j j s s j j P x y L x y L x P x y +⎛⎞=+⎜⎟=⎜⎟=−⎝⎠=) The conditional probabilities (1|s sj j P x y =± are the a posteriori probabilities of the decoded bit s j x and ()s j L x + is thea posteriori information about sj x , which is the information that the decoder gives us, including the received frame, the a priori information for the systematic symbols y s j and the apriori information for symbol x s j . It is the output of the MAPalgorithm. In addition, we will use the conditional LLR ()|s s j j L y x based on the probability that the receiver’s output would be s j y when the transmitted bit s j x was either +1 or -1:()()()|1|log |1s s defj j s s jjs s j j P y x L y x P y x ⎛⎞=+⎜=⎜=−⎝⎠⎟⎟. (3)For AWGN channel using BPSK modulation, we can write the conditional probability density functions, [7]:()()20|12s s b j j j EP y x y a N ⎡⎤=±=−⎢⎣⎦m ⎥, (4)where is the transmitted energy per bit, a is the fadingamplitude and is the noise variance.b E 0/2N We can rewrite the (3) as follows: ()()()2200|4,s s s s b j j j j Noteds s b j c j E L y x y a y a N E a y L y N ⎡⎤=−−−+⎢⎥⎣⎦== (5) the fading amplitude and is the noise power. For nonfading AWGN channels a = 1 and 0N /204c b L E N =. Theratio is defined as the Signal to Noise Ration (SNR) of thechannel.0/b E N The extrinsic information can be computed as [1], [2], [9]: ()()()()()()1|()log 1|1|log log 1|()().s s j j e sj s s jj s s j j s sj j s s s j j c j P x y L x P x y P x P y x P x P y x L x L x L y +−⎛⎞=+⎜⎟=⎜⎟=−⎝⎠⎛⎞⎛=+=+⎜⎟⎜−−⎜⎟⎜=−=−⎝⎠⎝=−−11s j s j ⎞⎟⎟⎠ (6)The a posteriori information defined in (2), can be written asthe following [1], [10]:11(')()(',)()log (')()(',)e j j j s j e j j j s s s s L x s s s s −++−−α⋅β⋅γ=α⋅β⋅γ∑∑, (7)where +∑is the summation over all possible transition branch pairs (s ’,s ) in the trellis, at time j , given the transmittedsymbol x s j = +1. Analog, −∑is for transmitted symbol x s j =-1.The forward and backward terms, represented in Fig. 3 as transitions between two consecutive states from the trellis, can be computed recursively as following [7], [10], [11]:1'()(')(',)j j j s s s s s −α=αγ∑, (8)1(')()(',)j j j ss s s s −β=βγ∑. (9)For systematic codes, which is our case, the branch transition probabilities (',)js s γ are given by the relation:11(',)exp ()(',)22s s s s e j j j c jj j s s L x x L x y s −⎡γ=+⋅γ⎢⎣⎦s ⎤⎥, (10) where:112211(',)exp 22e p p j c j j c p p j j s s L x y L x ⎡⎤γ=+⎢⎥⎣⎦y .(11)At each iteration and for each frame y, ()s j L x + is computedat the output of the second decoder and the decision is done,symbol by symbol j = 1…k , based on the sign of ()sj L x +, original information bit u j being estimated as [2], [3]: {ˆ()sj usign L x +=}j . (12) In the iterative decoding procedure, the extrinsicinformation ()e s j L x is permuted by the interleaver andbecomes the a priori information ()sj L x − for the next decoder. influence on ()s j L x + is insignificant.B. The Decision Reliability of Max-Log-MAP DecoderThe MAP algorithm as described in previous section is much more complex than the Viterbi algorithm and with hard decision outputs performs almost identically to it. Therefore for almost 20 years it was largely ignored. However, its application in turbo codes renewed interest in this algorithm. Its complexity can be dramatically reduced without affecting its performance by using the sub-optimal Max-Log-MAP algorithm, proposed in [12]. This technique simplifies the MAP algorithm by transferring the recursions into the log domain and invoking the approximation: ln max()i x i ii e x ⎛⎞≈⎜⎟⎝⎠∑. (13)where max()i i x means the maximum value of x i . If we note:()()ln ()j j A s =αs , (14)()()ln ()j j B s s =β, (15)and:()(',)ln (',)j j G s s s s =γ, (16)then the equations (8), (9) and (10) can be written as: ()(()1'1'1'()ln ()ln (')(',)ln exp (')(',)max (')(',),j j j j s j j s j j s )A s s s s A s G s s A s G s s −−−⎛⎞=α=αγ⎜⎟⎝⎠⎛=+⎜⎝⎠≈+∑∑s ⎞⎟⎞⎟(17) ()()()11(')ln (')ln ()(',)ln exp ()(',)max ()(',),j j j j s j j s j j s B s s s s s B s G s s B s G s s −−⎛⎞=β=βγ⎜⎟⎝⎠⎛=+⎜⎝⎠≈+∑∑ (18) 11(',)()22s s s s jj j c j G s s C x L x L x y −=++j , (19) term ()s s j j x L x −.Finally, the a posteriori LLR ()s j L x + which the Max-Log-MAP algorithm calculates is:Fig. 3. Trellis states transitions.for ()j s αfor 1(')j s −β((1(',)11(',)1()max(')(',)()max (')(',)().j j s j j j j s s for u j j j s s for u L x As G s s B s ))A s G s s B s +−=+−=−≈++−++ (20)In [12] and [13] the authors shows that the complexity of Max-Log-MAP algorithm is bigger than two times that of a classical Viterbi algorithm Unfortunately, the storage requirements are much greater for Max-Log-MAP algorithm, due to the need to store both the forward and backward recursively calculated metrics and before the ()j A s ()j B s ()s j L x + values can be calculated.C. The Decision Reliability of Log-MAP DecoderThe Max-Log-MAP algorithm gives a slight degradation in performance compared to the MAP algorithm due to the approximation of (13). When used for the iterative decodingof turbo codes, Robertson found this degradation to result in a drop in performance of about 0.35 dB, [12]. However, the approximation of (13) can be made exact by using the Jacobian logarithm:()(()121212121212ln()max(,)ln 1exp ||max(,)||(,),x x e e x x x x )x x f x x g x x +=++−−=+−= (21)where ()f δ can be thought of as a correction term. However,the maximization in (17) and (18) is completed by the correction term ()f δ in (21). This means that the exact ratherthan approximate values of and are calculated. For binary trellises, the maximization will be done only for two terms. Therefore we can correct the approximations in (17) and (18) by adding the term ()j A s ()j B s ()f δ, where δ is the magnitude of the difference between the metrics of the twomerging paths. This is the basis of the Log-MAP algorithmproposed by Robertson, Villebrun and Hoeher in [12]. Thus we must generalize the previous equation for more than two 1x terms, by nesting the 12(,)g x x operations as follows: (((13211ln ,,,(,)i n x n n i e g x g x g x g x x −=⎛⎞=⎜⎟⎝⎠∑K ))), (22)The correction term ()f δδ need not to be computed for every value of , but instead can be stored in a look-up table. In [12], Robertson found that such a look-up table need containonly eight values for , ranging between 0 and 5. This meansthat the Log-MAP algorithm is only slightly more complexthan the Max-Log-MAP algorithm, but it gives exactly the same performance as the MAP algorithm. Therefore, it is a very attractive algorithm to use in the component decoders of an iterative turbo decoder. δD. The Decision Reliability of SOVA DecoderThe MAP algorithm has a high computational complexityfor providing the Soft Input Soft Output (SISO) decoding. However, we obtain easily the optimal a posteriori probabilities for each decoded symbol. The Viterbi algorithm provides the Maximum Likelihood (ML) decoding for convolutional codes, with optimalsequence estimation. The conventional Viterbi decoder has two main drawbacks for a serial decoding scheme: the inner Viterbi decoder produces bursts of error bits and hard decision output, which degrades the performance of the outer Viterbi decoder [3]. Hagenauer and Hoeher modified the classical Viterbi algorithms and they provided a substantially less complex and suboptimal alternative in their Soft OutputViterbi Algorithm (SOVA). The performance improvement is obtained if the Viterbi decoders are able to produce reliability values or soft outputs by using a modified metric [14]. These reliability values are passed on to the subsequent Viterbi decoders as a priori information .In soft decision decoding, the receiver doesn’t assign a zero or a one to each received symbol from the AWGN channel, but uses multi-bit quantized values for the received sequence y , because the channel alphabet is greater than the sourcealphabet [3]. In this case, the metric derived from Maximum Likelihood principle, is used instead of Hamming distance. For an AWGN channel, the soft decision decoding produces again of 2÷3 dB over hard decision decoding, and an eight-level quantization offers enough performance in comparison with an infinite bit quantization [7].The original Viterbi algorithm searches for an informationsequence u that maximizes the a posteriori probability, s being the states sequence generated by the message u . Using the Bayes theorem and taking into account that thereceived sequence y is fixed for the metric computation and it can be discarded, the maximization of is: (|)P s y (|)P s y {}{max (|)max (|)()P P =u us }P y y s s . (23)For a systematic code, this relation can be expanded to:(1211max (,,)|,()k s p p j j j j j j j P y y y s s P s −=)⎧⎫⎪⎪⎨⎬⎪⎪⎩⎭∏u. (24) Taking into account that:()()()(1211122(,,)|,|||s p p j j j j j s s p p p p j j j j j j P y y y s s P y x P y x P y x −==⋅⋅), (25)where 1(,)j j s s − denotes the transitions between the states attime j -1 and the states at time j , the SOVA metric is obtained from (24) as [15]:()()***1***|1(1)log log ,(0)|1j j j j j j j j j jP y x P u M M x u P u P y x −⎛⎞=+⎛⎞=⎜⎟=++⎜⎟⎜⎟⎜⎟==−⎝⎠⎝⎠∑ (26)where *1,2,(,,)j j j j x u c c = is the RSC output code word at timej , at channel input and *1(,,)s p p j j j j 2y y y y = is the channeloutput. The summation is made for each pair of information symbols (,s j j u y ) and for each pair of parity symbols (11,,p j j c y )and (2,2,p j j y 1*c ).According [14] and [7], the relation (26) can be reduced as: **c j j ()j j j j M M L −=+∑x y u L u +(), (27)where the source reliability j L u , defined in (26), is the log-likelihood ratio of the binary symbol u j . The sign of ()j L u ) is the hard decision of u j and the magnitude of (j L u is the decision reliability .According [10], the SOVA metric includes values from the past metric M j -1, the channel reliability L c and the source reliability ()j L u (, as an a priori value. If the channel is very good, the second term in (27) is greater than the third term andthe decoding relies on the received channel values. If thechannel is very bad, the decoding relies on the a priori information )j L u . If M 1j , M 2j are two metrics of the survivor path and concurrent path in the trellis, at time j , then the metric difference is defined as [7]:01212j j j M M −)(s m Δ=. (28)The probability of path m , at time j , is related as:()/2mjM (path )exp m j P m P ==. (29) where j s is a states vector and mj M is the metric. The probability of choosing the survivor path is: 001)(path (correc ath 1)(path 2)1jjP e P P P e ΔΔ==++t)(p . (30)The reliability of this path decision is calculated as:(correct)orrect)log 1-(c j P P =Δ. (31) The reliability values along the survivor paths, for aparticular node and time j , are denoted as d j Δ, where d is the distance from the current node at time j . If the survivor path bit for is the same with the associated bit on the competing path, then there would be no error if the competing path is chosen. The reliability value remains unchanged.d j =To improve the reliability values an updating process must be used, so the “soft” values of a decision symbol are:(')'di j d j di L u u −−=j=Δ∑, (32)which can be approximated as:{}0...(')'min i j d j d i d L u u −−=j =⋅Δ. (33)The SOVA algorithm described in this section is the least complex of all the SISO decoders discussed in this section. In [12], Robertson shows that the SOVA algorithm is about halfas complex as the Max-Log-MAP algorithm. However, theSOVA algorithm is also the least accurate of the algorithmsdescribed in this section and, when used in an iterative turbo decoder, performs about 0.6 dB worse than a decoder using the MAP algorithm. If we represent the outputs of the SOVA algorithm they will be significantly more noisy than thosefrom the MAP algorithm, so an increased number of decodingiterations must be used for SOVA to obtain the sameperformances as for MAP algorithm.The same results are reported also for the iterative decoding (turbo decoding) of the turbo product codes, which are basedon two concatenated Hamming block codes not on convolutional codes [19]. IV. T HE INFLUENCE OF L C ON DECODING PERFORMANCE In this section we analyze the importance of an accurate estimate of the channel reliability factor L c is to the good performance of an iterative turbo decoder which uses the MAP, SOVA, Max-Log-MAP and Log-MAP algorithms. In the MAP algorithm the channel inputs and the a priori information are used to calculate the transition probabilities from one state to another, that are then used to calculate theforward and backward recursion terms [2], [8]. Finally, the aposteriori information ()s j L x + is computed and the decision about the original message is made based on it. In the iterative decoding with MAP algorithm, the channelreliability is calculated from the received channel values. At first iteration, the decoder 1 has no a priori information available (the ()s j L x − is zero) and the output from thealgorithm is calculated based on channel values. If an incorrect value of L c is used the decoder will make more decision errors and the extrinsic information from the output of the first decoder will have incorrect values, for the softchannel inputs [16].In the SOVA algorithm the channel values are used torecursively calculate the metric *c j L y j M for the current state s along a path from the metric 1j M − for the previous state along that path added to an a priori information term and to a cross-correlation term between the transmitted and the receivedchannel values, *j x and *j y , using (27). The channel reliability factor is used to scale this cross-correlation. When we usec Lan incorrect value of , e.g. , we are scaling the channel values applied to the inputs of component decoders by a factor of one instead of the correct value of . This has the effect of scaling all the metrics by the same factor, see (8), and the metric differences are also scaled by the same factor, see (9). This scaling of the metrics do not affect the path chosen by the algorithm as a survivor path or as a Maximum Likelihood (ML) path, so the hard decisions given by the algorithm are not affected by using an incorrect value of L c [16]-[18].c L ()j B s 1c L =c L c In the iterative decoding with SOVA algorithm, in the first iteration we assume that no a-priori information about the transmitted bits is available to the decoder (the a-priori information is zero), the first component decoder takes only the channel values. If channel reliability factor is incorrect, the channel values are scaled, the extrinsic information will be also scaled by the same factor and the a-priori information for the second decoder will also be scaled. Because of the linearity of the SOVA, the effect of using an incorrect value of the channel reliability factor is that the output LLR from the decoder is scaled by a constant factor. The relative importance of the two inputs to the decoder, the a priori information and the channel information, will not change, since the LLRs for both these sources of information will be scaled by the same factor. In the final iteration, the soft outputs from the final component decoder will have the same sign as those that would have been calculated using the correct value of . So, the hard outputs from the turbo decoder using the SOVA algorithm are not affected by the channel reliability factor [16].L c L The Max-Log-MAP algorithm has the same linearity that is found in the SOVA algorithm. Instead of one metric, now two metrics and are calculated, for forward andbackward recursions, see (17), (18) and (19), were are used only simple additions of the cross-correlation of the transmitted and received symbols. But, if an incorrect value of the channel reliability value is used, all the metrics are simply scaled by a factor as in the SOVA algorithm. The soft outputs given by the differences in metrics between different paths will also be scaled by the same factor, with the sign unchanged and the final hard decisions given by the turbo decoder will not be affected.()j A s The Log-MAP algorithm is identical to the Max-Log-MAP algorithm, except for a correction term ()()ln exp()f δ=−δ1+, used in the calculation of the forward and backward metrics and ()j A s ()j B s , and the soft output LLRs. The function()f δ is not a linear function, it decreases asymptoticallytowards zero as δ increases. Hence the linearity that is present in the Max-Log-MAP and SOVA algorithms is not present in the Log-MAP algorithm. This effect of non-linearity determines more hard decision errors of thecomponent decoders if the channel reliability factor is incorrect, and the extrinsic information derived from the first component decoder have incorrect amplitudes, which become the a-priori information for the second decoder in the first iteration. Both decoders in subsequent iterations will have incorrect amplitudes relative to the soft channel inputs.c L In the iterative decoding with Log-MAP algorithm, the extrinsic information exchange from one component decoder to another, determines a rapid decrease in the BER as the number of iterations increases. When the incorrect value of is used, no such rapid fall in the BER occurs due to the incorrect scaling of the a priori information relative to the channel inputs. In fact, the performance of the decoder is largely unaffected by the number of iterations used.c L For wireless communications, some of them modeled as Multiple Input Multiple Output (MIMO) systems [23], the channel is considered to be Rayleigh or Rician fading channel. If the Channel State Information (CSI) is not known at the receiver, a natural approach is to estimate the channel impulse response and to use the estimated values to compute the channel reliability factor required by the MAP algorithm to calculate the correct decoding metric.c L In [20], the degradation in the performance of a turbo decoder using the MAP algorithm is studied when the channel SNR is not correctly estimated. The authors propose a method for blind estimation of the channel SNR, using the ratio of the average squared received channel value to the square of the average of the magnitudes of the received channel values. In addition, they showed that using these estimates for SNR gives almost identical performances for the turbo decoder to that given using the true SNR.In [8], the authors proposes a simple estimation scheme for from the statistical computation on the block observation of matched filter outputs. The channel estimator includes the error variance of the channel estimates. In [24], is used the Minimum Mean Squared Error (MMSE) estimation criterion and is studied an iterative joint channel MMSE estimation and MAP decoding.c L None of above works requires a training sequence with pilot symbols to estimate the channel reliability factor. Other studies used pilot symbols to estimate the channel parameters, like [22] and [25].In [22] it is shown that it is not necessary to estimate the channel SNR for a turbo decoder with Max-Log-MAP or SOVA algorithms. If the MAP or the Log-MAP algorithm is used then the value of does not have to be very close to the true value for a good BER performance to be obtained. c LV. S IMULATION RESULTSThis section presents some simulation results for the turbo codes ensembles, with MAP, Max-Log-MAP, Log-MAP and SOVA decoding algorithms. The turbo encoder is the same for the four decoding algorithms and is described by two identical RSC codes with constraint length 3 and the generator polynomials and . No tail bitsand no puncturing are performed. The two constituent encoders are parallel concatenated by a classical block interleaver, with dimensions variable according to the frame21f G =+D D 21b G D =++。

On the Measure of the Information in a Statistical Experiment

On the Measure of the Information in a Statistical Experiment

Bayesian Analysis(2007)2,Number1,pp.167–212On the Measure of the Information in aStatistical ExperimentJosep Ginebra∗Abstract.Setting aside experimental costs,the choice of an experiment is usu-ally formulated in terms of the maximization of a measure of information,oftenpresented as an optimality design criterion.However,there does not seem to bea universal agreement on what objects can qualify as a valid measure of the in-formation in an experiment.In this article we explicitly state a minimal set ofrequirements that must be satisfied by all such measures.Under that framework,the measure of the information in an experiment is equivalent to the measure ofthe variability of its likelihood ratio statistics or which is the same,it is equivalentto the measure of the variability of its posterior to prior ratio statistics and to themeasure of the variability of the distribution of the posterior distributions yieldedby it.The larger that variability,the more peaked the likelihood functions andposterior distributions that tend to be yielded by the experiment,and the moreinformative the experiment is.By going through various measures of variability,this paper uncovers the unifying link underlying well known information measuresas well as information measures that are not yet recognized as such.The measure of the information in an experiment is then related to the measure of the information in a given observation from it.In this framework,the choice ofexperiment based on statistical merit only,is posed as a decision problem where thereward is a likelihood ratio or posterior distribution,the utility function is convex,the utility of the reward is the information observed,and the expected utility is theinformation in an experiment.Finally,the information in an experiment is linkedto the information and to the uncertainty in a probability distribution,and wefindthat the measure of the information in an experiment is not always interpretableas the uncertainty in the prior minus the expected uncertainty in the posterior.Keywords:Convex ordering,design of experiments,divergence measure,Hellingertransform,likelihood ratio,measure of association,measure of diversity,measureof surprise,mutual information,optimal design,posterior to prior ratio,referenceprior,stochastic ordering,sufficiency,uncertainty,utility,value of information.1IntroductionIn the statistical science community there is a pervading feeling that the concept of “information carried by an experiment”is something intangible that can not be charac-terized.Review articles often list various measures,each addressing a particular aspect of what information means,but they do not identify any commonality among these mea-sures.Papaioannou(2001)describes the current understanding by stating that“While information is a basic and fundamental concept in statistics,there is no universal agree-168On the Measure of Information ment on how to define and measure it in a unique way”.Clearly,there exists a need for an agreement on what qualifies as an information measure and of the features that make an experiment more informative than another.Let E=(X;Pθ)denote a statistical experiment observing a random variable X with an unknown distribution Pθ,where the parameterθ∈Ωis an index for the list of possible distributions of X.When experimenting,the goal is to learn about the unknownθthat explains X.Since many aspects of the association between X andθhelp in identifying the Pθresponsible for producing an observation X=x,the information aboutθin E is typically a highly multidimensional concept that can not be possibly captured completely by any single real valued quantity.Nevertheless,to rank experiments in terms of the information“they carry”,one has to do it through real valued measures that capture the one aspect of the information in E that one cares the most.It follows the need for a framework that encompasses all valid measures of the information in E that can be used as scales to induce a total information ordering on the space of available experiments,and maybe choose one of them.This paper makes that framework explicit by building on the sufficiency ordering of experiments considered in Blackwell(1951,1953)and Le Cam(1964,1986).Section2introduces the background and notation on statistical experiments.Section 3reviews the sufficiency ordering of experiments,which is also called the‘always at least as informative’ordering.That section also presents the Blackwell-Sherman-Stein and Le Cam theorem,establishing that the sufficiency ordering of experiments is equivalent to the convex ordering of their likelihood ratio statistics and to the convex ordering of the distribution of the posterior distributions attained under a given prior.Definition4.1in Section4identifies a minimum set of requirements that must be satisfied by every measure of the information in E,making the sufficiency ordering into the only essential ingredient in the characterization of these measures.That character-ization does neither assume thatθis a random variable nor that the experiment will be used in a statistical decision problem,even though it can be given decision theoretic and/or Bayesian interpretations.Section5then explains how,as a consequence of this characterization,measuring the information aboutθin E is essentially the same as measuring the variability of its likelihood ratio statistics,as in Definition5.1-5.2.It follows that measuring the information in E is also the same as measuring the variability of its posterior to prior ratio statistics,and it is the same as measuring the variability of the distribution of the posterior distributions yielded by it.The larger that variability,the more peaked the likelihood functions and the posterior distributions that tend to be yielded by E,and the more informative E is.By considering various measures of the variability of these statistics,we present a broad spectrum of features associated to the informativity of E,and we uncover the framework underlying all the measures of the information being used by the design of experiments literature(DoE from now on),as well as measures of the information not yet recognized as such by them.In this manuscript the comparison of experiments is always made based on statisticalJosep Ginebra169 merit only,irrespective of experimental costs.In practice,choosing among experiments requires compromising between the information they carry and their cost,but when comparing experiments just in terms of the information in them,no apologies are to be made for doing as if their cost was the same.A source of confusion is that the term information aboutθis used to denote differing concepts.A secondary contribution of the paper is to help distinguish and relate •the measure of the information aboutθin experiment E=(X;Pθ),also recognized as the statistical information or the expected information in E,which is relevant for comparing experiments in terms of statistical merit,it is our main object of interest,and which is dealt with in Sections3to5and7,•the measure of the information aboutθin an observation X=x,also recognized as the observed information in X=x,which is relevant after the experiment is selected and carried out as a Bayesian model checking test statistic,and which is dealt with in Section6.1,and•the measure of the information aboutθin a given distribution h onΩ,also rec-ognized by Shannon as the self-information aboutθin its own distribution,which is relevant when assessing the strength of knowledge aboutθand as a measure of the homogeneity in a population h,and which is dealt with in Section6.2.Most of the statistical literature concentrates on inference for a given experiment and as a consequence,its main focus is the measure of the information in X=x.In the non-Bayesian DoE literature the information in E is then typically measured through real valued transformations of Fisher information matrices introduced in Kiefer(1959),and through divergence measures introduced in Csisz´a r(1963,1967).On the other hand,the information theory literature stemming from Shannon(1948)starts by characterizing the information about a random variable in its own distribution through the negative of its entropy.In the Bayesian DoE literature the information in E is then measured through the cross entropy between X andθas in Lindley(1956),in an approach general-ized in Raiffa and Schlaiffer(1961)and in DeGroot(1962,1984),where the information in E is measured through the negative of the Bayes risk and it is interpreted as the uncertainty in the prior minus the expected uncertainty in the posterior.Different from that,this manuscript starts from and focuses on a characterization of the measure of the information in experiment E that encompasses as special cases the measures of information in Kiefer(1959),Csisz´a r(1963,1967),Lindley(1956),Raiffa and Schlaiffer(1961)and DeGroot(1962).In Section6.1,the measure of the information aboutθin an observation X=x from E is defined to be a non-negative convex function of the corresponding likelihood ratio or posterior distribution,in Definition6.1-6.2.All added,it turns that the choice of the most informative experiment can be posed as a decision problem where the reward from choosing experiment E is its likelihood ratio or posterior distribution statistic,the utility function is convex,the utility of the reward is the information in the observed outcome,and the expected utility from choosing E is the information in E.170On the Measure of Information In Section6.2,the information aboutθin a distribution h onΩis defined to be the information in an observed outcome that updates a baseline prior into a posterior h.The uncertainty aboutθin h is then measured as the information in a one-point distribution minus the information in h.Section7explores the relationship between the information in E and the expected impact of E on the uncertainty aboutθin its own distribution.By showing that the information in E is not always interpretable as the uncertainty in the prior minus the expected uncertainty in the posterior,this section clarifies the sense in which our definition of information generalizes the definition of information proposed in De Groot (1962)and adopted most often in Bayesian DoE.Section8illustrates through an example how the framework described in this manuscript allows for a unified approach to the selection of an experiment,to the construction of a reference prior for a given experiment,to the assessment of the validity of the model and to the quantification of the impact of X=x on the knowledge about θ.It is important to emphasize that even though this manuscript might look like a review paper to some,Definitions4.1,5.1-5.2and6.1-6.2,their motivation,some ex-amples and the interpretation of Proposition3.1are new.Readers mainly interested in statistical inference might want to read Sections3.2.1,6.1and5first,because they provide a more intuitive starting point to the manuscript that does not rely on the ax-iomatic framework built in Sections3and4.In that alternative presentation onefirst defines the measure of the information in a given observation X=x,and then presents the measure of the information in experiment E as the average of the information in all the observations that could have been yielded by it.2Background and notation2.1Statistical experimentsDefinition2.1.A statistical experiment E={(X,S X);(Pθ,Ω)}yields an observation on a random variable X defined on S X,with an unknown probability distribution that is known to be in the family(Pθ,θ∈Ω).Following Wald(1950)and Blackwell(1951),here an experiment E is considered to be a family of probability measures,(Pθ,θ∈Ω),on a common sample space,S X, one of which is assumed to be the distribution of X.One might think of eachθin Ωas representing a possible explaining“theory”and the parameter spaceΩas rep-resenting the set of all conceivable“theories”.When comparing experiments E and F={(Y,S Y);(Qθ,Ω)},the only necessary common thread between them is that the parameter space be the same,and that the same unknownθgoverns the distribution of X and Y.Note that here,a statistical experiment coincides with what the inference literature calls a parametric model.Thus,“a measure of the information aboutθin experimentJosep Ginebra171 E”is synonymous of“a measure of the information aboutθin the statistical model (Pθ,θ∈Ω)”.To avoid measure theoretical details,throughout this paper we assume that the probability measures(Pθ,θ∈Ω)are dominated by aσ-finite measureµ(i.e., that there is a measureµsuch that events ofµ-measure zero also have Pθ-measure zero), and thus the corresponding density functions,pθ,will always exist.We also assume that sample spaces are complete separable metric spaces.That covers all situations faced in the usual statistical inference and design of experiments practice.As an example,linear normal experiments are the ones that yield X∈R n distributed as a N n(Aβ,σ2I),where A is a known n×p design matrix andβis a vector of regression parameters.Hereθdenotes eitherβwithΩ=R p or(β,σ)withΩ=R p×[0,∞), depending on whetherσis assumed known or unknown,and selecting an experimentconsists of choosing a design matrix A.An experiment is said to be totally non-informative,denoted by E tni,if the distri-butions of X are the same for allθ.One can not learn aboutθby observing from E tni, and it is the baseline relative to which the information in every experiment is measured. At the other end,an experiment is said to be totally informative,denoted by E ti,if forevery pair(θi,θj)∈Ω×Ωthe intersection of the support sets for Pθi and Pθjis anempty set,and thus if it is a family of mutually singular distributions.After performing E ti,the Pθthat generated X=x can be identified with certainty.In Bayesian setups,the uncertainty aboutθ∈Ωis modelled through a prior distribu-tionπonΩ,which allows one to represent the experiment or statistical model(Pθ,θ∈Ω) through the marginal distribution of X,Pπ,with density function pπ(x)=Eπ[pθ(x)], and to construct the joint distribution for(X,θ)with density functionfπ(x,θ)=pθ(x)π(θ)=pπ(x)πE(θ|x),(1) whereπE(θ|x)denotes the density of the posterior distribution ofθ.Let the sampling, predictive and posterior densities of F=(Y;Qθ)be denoted by qθ(y),qπ(y)andπF(θ|y).Note that when designing an experiment,data as well as parameters are unknown and the reasons against treating them symmetrically by considering both X andθas random and by averaging over both parameter and sample spaces are a lot less compelling than for inference problems.In fact,it is our perception that the only difference between the Bayesian and the non-Bayesian way of planning for an experiment is in the way one interprets the optimality design criteria available.2.2Likelihood functions attainable and information in EGiven an observation X=x yielded by experiment E,likelihood functions,l x(θ),are functions ofθproportional to pθ(x)with the constant of proportionality being arbitrary. Before performing the experiment,l X(θ)can be regarded as a random function onΩ. For totally non-informative experiments,E tni,likelihood functions are always constant, while for totally informative experiments,E ti,likelihood functions are always zero ev-erywhere except for one valueθinΩ.For any experiment,the relatively“flatter”the attainable likelihoods,the harder it is to identify theθthat explains the observations,172On the Measure of Information and the worse that experiment is for inferential purposes.Adherents to likelihood based inference will recognize informative experiments to be the ones that provide highly concentrated likelihood functions.Given a choice,they will prefer experiments that tend to produce likelihoods with more pronounced peak(s) and the measures of the information in E should quantify this ments in this direction can be found in Barnard(1959),in Barnard,Jenkins,and Winsten(1962, p.323),and in Birnbaum(1962,pp.293,304).Whenθis real valued andΩis open,the relative peakedness of the likelihood atθcan be measured through the squared relative rate of change of the likelihood function,r x(θ)=(˙l x(θ)/l x(θ))2=(˙pθ(x)/pθ(x))2,(2) where the dot indicates the derivative with respect toθ.Before the experiment is performed,x is unobserved and r X(θ)is a random function onΩwith possibly a different distribution under each pθ.Under regularity conditions(see,e.g.,Lehmann1983),one can assess the information aboutθin E through the average of r X(θ)under pθ,IθF i(E)=E pθ[(˙pθ(x)pθ(x)],(3)which was introduced in Fisher(1922)and is called the Fisher information in E.Thelarger IθF i (E),the smaller the asymptotic variance for the maximum likelihood estimatorofθ,and the more informative E is.WhenΩis an open subset of R p,the Fisher information is defined to be the covariance matrix of the vector of ratios between the partial derivatives of l x(θ)and l x(θ).The DoE literature largely focuses on using real valued transformations of the Fisher information matrix,introduced in Kiefer(1959).However,Fisher information might not exist and when it does exist,it typically depends on the unknown value ofθ,in which case different experiments could be optimal for different values ofθ.Furthermore,the performance of E for tasks other than estimation under squared error loss should be assessed on the basis of how well those tasks can be performed,which may involve aspects of the association between X andθnot captured by Fisher information.3Sufficiency ordering of experiments and information In statistical decision theoretic terms,the information aboutθin an experiment E de-pends on the performance of E in relation to the terminal consequences of the statistical decisions made based on the data obtained from it.Sometimes one needs to select an experiment E based on its expected performance on a given decision problem with loss function L(θ,d)defined onΩ×D,where d is a decision and D is the space of decisions.To make a decision based on data from E=(X;Pθ)one selects a decision rule,δ(x), that assigns to each x∈S X a possible decision d,and the performance ofδ(X)under eachθ∈Ωis appraised through its risk function,R E(θ,δ)=E pθ[L(θ,δ(x))].Josep Ginebra173 In principle,one could assess the performance of E on that given terminal decision problem through the class of risk functions of its admissible rules(i.e.,the rules that can not be improved upon for everyθ),but that does not lead to any clear cut choice between two experiments.To narrow that choice down,one could compare experiments E and F on the basis of the risk function of a given pair of admissible rules,like their respective Bayes rules(minimizing the weighted average of R E(θ,δ)under a given distribution on Ω),or their minimax rules(minimizing the maximum of R E(θ,δ)overΩ).Most often though,one would stillfind E to be either better or worse than F depending on the value ofθ.To attain a total ordering of the experiments available,one must compare them on the basis of a real number like the average risk for their Bayes rules(i.e.,their Bayes risk),or the maximum risk for their minimax rule(i.e.,their minimax risk).By considering the choice between any two experiments,E and F,based on the Bayes or the minimax risk for a specific terminal decision problem,one might chose experiment E or F depending on the problem at hand.However,sometimes by observing X from E one can do at least as well as by observing Y from F for every terminal decision problem,and thus in particular for every statistical inference problem.These situations lead to the ordering of experiments considered next.3.1When is experiment E“sufficient for”or“always at least asinformative as”experiment F?Definition3.1(Blackwell1951,1953).Experiment E=(X;Pθ)is said to be“suf-ficient for”F=(Y;Qθ)if there exists a stochastic transformation of X to a random variable W(X)such that W(X)and Y have identical distribution under eachθ∈Ω.Lehmann(1988,p.521)re-phrases this by stating that“experiment E is sufficient for F if there exists a random variable Z with a known distribution and a function g(·,·) such that for allθ∈Ω,X being distributed as Pθimplies that g(X,Z)is distributed as Qθ.”In fact,there is no loss of generality in assuming that the distribution of Z is uniform on(0,1)and independent of X.Thus,E is“sufficient for”F whenever by using a realization X=x of experiment E and an auxiliary randomization,Z,one can simulate data distributed as Y without knowingθ.The sufficiency ordering of experiments is the central subject of study of the com-parison of experiments literature stemming out of Blackwell’s seminal papers.Brief expositions can be found in Blackwell and Girshik(1954),Savage(1954),Lehmann (1959,1988),DeGroot(1970),LeCam(1975,1996),Torgersen(1976),Vajda(1989), Shiryaev and Spokoiny(2000)and Gollier(2001).For a thorough presentation,see Heyer(1982),Strasser(1985),LeCam(1986)or Torgersen(1991a).For a review with an exhaustive list of references and examples,see Goel and Ginebra(2003).In statistical inference,experiment E=(X;Pθ)is typicallyfixed and given and discussions are limited to the comparison between E and the sub-experiment E T that yields a statistic based on X,T(X).Clearly E is always“sufficient for”E T in the sense of Definition3.1.Furthermore,when T(X)is a sufficient statistic for X,once174On the Measure of Information given T(x)=t one can also generate data distributed as X without knowingθ,and hence in that case experiment E T is also“sufficient for”E.Therefore,stating that a statistic T(X)is sufficient for X is equivalent to stating that experiments E and E T are“sufficient for”each other.But Definition3.1applies more generally,since it allows for the comparison of experiments on unrelated sample spaces.(In fact,as remarked in Le Cam(1975),‘to state that E is“sufficient for”F is the same as to state that there exists an experiment EF yielding(X,Y)for which X is a sufficient statistic for EF’).As an example of a sufficiency ordering of experiments,let E and F be a pair oflinear normal experiments that observe X and Y from N nE (Aβ,σ2I)and N nF(Bβ,σ2I)respectively,where A and B are known n E×p and n F×p matrices.Hansen and Torgersen(1974)prove that whenσ2is known E is“sufficient for”F if and only if IθF i(E)−IθF i(F)=A A−B B is non-negative definite,and that for unknownσ2an additional condition is that n E≥n F+rank(A A−B B).The Definition3.1making E sufficient for F if one can derive from X a random variable with the same distribution as Y using a known random mechanism that only depends on X,is grounded on a randomization argument seemingly devoid of statistical meaning.Nevertheless,that meaning follows from the well established fact that E is “sufficient for”F if,and only if E is“always at least as good as”F in the sense that for every decision problem and for every decision ruleδ(Y)based on F,there exists a decision ruleδ∗(X)based on E such that R E(θ,δ∗(x))≤R F(θ,δ(y))for allθ. Consequently,when E is“sufficient for”F the Bayes and minimax risks under E are at most as large as the ones under F for every prior and loss.Therefore,E is“sufficient for”F if and only if E is preferable to F for every sta-tistical decision problem with a loss defined onΩ×D,which includes non-sequential estimation,testing,classification,the prediction of future observations and any other purely inferential problem,where learning aboutθis the single goal of experimenting. Hence,the phrase“E is always at least as informative as F”is used as a synonym for “E is sufficient for F”(see,e.g.,Blackwell and Girshik1954;Lehmann1988).This also explains why any ordering of experiments that respects the sufficiency ordering is called an information ordering in Torgersen(1991a),and why the sufficiency ordering will be essential in the characterization of the measure of the information in an experiment.Comparisons in the sense of this sufficiency ordering are always made on the basis of statistical merit only,ignoring experimental costs.By stating that E is“sufficient for”or“always at least as informative as”F if and only if E is preferable to F for every decision problem with a loss onΩ×D,one excludes from consideration the comparison of experiments under one-period bandit and stochastic control type problems,which have loss functions defined onΩ×S X(see,e.g.,Gonzalez and Ginebra2001),and the comparison under mixed problems with the goal of maximizing both information about θand outcome,which have loss functions defined onΩ×D×S X(see,e.g.,Verdinelli and Kadane1992).One also excludes the comparison of experiments in terms of their performance under problems with loss functions that depend on the experiment itself, like the ones that include in the loss experimental costs that depend on sample size.Josep Ginebra 1753.2Variability of likelihood ratio statistics and information in EThe Blackwell-Sherman-Stein and Le Cam theorems presented below establish the equivalence between the sufficiency ordering of experiments,the convex ordering of their likelihood ratio statistics,and the convex ordering of the distribution of the posterior distributions attained under a given prior.That will enable us to propose measuring the information in E through measures of the variability of its likelihood ratio and posterior distribution statistics,as described in Section 5.Here we focus on the case where Ωhas a finite number of elements,Ω={θ1,...,θk },because it is an important case in its own (under it,Fisher information is not even defined),and because it provides the basic tool for the cases where Ωis infinite.3.2.1The vector of likelihood ratios as a likelihood functionLet E =(X ;P θ)be an experiment on Ω,and let P π= k i =1πi P θi be a convex combi-nation of the elements in (P θi ,θi ∈Ω)that dominates all the elements in that family (i.e.,P πis such that any measurable set of P π-measure zero has P θi -measure zero forall i ).In that case,the ‘likelihood to averaged likelihood ratio statistic’T π(X )=1E π[l X (θ)](l X (θ1),...,l X (θk )),(4)is minimal sufficient for E (see,e.g.,Basu 1975;Lehmann 1983),and its distribution characterizes the statistical properties of E .In particular,P πcan be any convex com-bination with strictly positive weights (i.e.with πi >0for i =1,...,k ),and we focus on this case,but when all measures in E are dominated by one of them,say P θ1,thenT θ1(X )=1176On the Measure of Information Remark:A choice of a particular set of weights,(π1,...,πk),is just a matter of choice of a dominating measure Pπ,and of a version of standardized likelihood function Tπ(x),and all that can be devoid of any Bayesian connotation.In fact,in the sufficiency ordering literature one typically restricts attention to uniform weights,πi=1/k.On the other hand,in Bayesian terms one is entitled to interpretπas a prior distribution and Tπ(X)as the‘posterior to prior ratio statistic’,Tπ(X)=(πE(θ1|X)πk).(7)3.2.2The Blackwell-Sherman-Stein and Le Cam theoremHere,we compare the experiments onΩE=(X;Pθ)and F=(Y;Qθ),through the distribution of their corresponding likelihood ratio statistics,Tπ(X)and Sπ(Y). For totally non-informative experiments,E tni,the likelihood function is constant,andTπ(X)=(1,...,1)with probability one.For totally informative experiments,E ti,the likelihood is zero everywhere except at oneθj∈Ω,and Tπ(X)=(0,...,1/πj,...,0), which is an extreme point of Kπ.In general,the further Tπ(X)tends to fall away from(1,...,1)towards an extreme point of Kπ,the easier it is to guessθand the more informative E is.Given that for every experimentE pπ[Tπ(x)]=(1,...,1),(8) the more informative E is,the more spread out is the distribution of Tπ(X)(under X∼Pπ)away from(1,...,1)towards extreme points of Kπ.Therefore it should not come as a surprise that the Blackwell-Sherman-Stein theorem,enunciated next,relates “E being always at least as informative as F”to the distribution of Tπ(X)when X is Pπ-distributed being more variable than the distribution of Sπ(Y)when Y is Qπ-distributed,where Qπ= k i=1πi Qθi.Proposition3.1.Experiment E=(X;Pθ)is“sufficient for”experiment F=(Y;Qθ)if and only if for some strictly positive set of weightsπ,E pπ[φ(Tπ(x))]≥E qπ[φ(Sπ(y))](9)for every convex functionφ(·)on Kπ.Equivalently,this proposition can be re-stated in terms of the convex ordering of likelihood ratio statistics as“experiment E is sufficient for F if and only if the distri-bution of Tπ(X)when X∼Pπis larger in the convex order than the distribution of Sπ(Y)when Y∼Qπ,i.e.,if and only ifTπ(X)|pπ≥cx Sπ(Y)|qπ.”(10) Since convex functions take on their larger values over“extreme regions”,any measure of the form E[φ(U)]with a convexφ(·)can be interpreted as a measure of the dispersion of the random variable U.Consequently,“E is sufficient for F”is equivalent to“the。

Poincar'e series and monodromy of the simple and unimodal boundary singularities

Poincar'e series and monodromy of the simple and unimodal boundary singularities
se singularities admit a symmetry. This leads to the boundary singularities considered by Arnold in [A1]. Arnold and V. I. Matov classified the simple and unimodal boundary singularities. Here we investigate the relation between the characteristic polynomial of the monodromy and the Poincar´ e series of the ambient hypersurface singularity for such a singularity. The simple boundary singularities arise from simple hypersurface singularities and there is a generalization of the McKay correspondence for these cases. R. Stekolshchik has extended our theorem for the Kleinian singularities to this generalized McKay correspondence. Here we give an interpretation of his result in terms of singularity theory. For 7 of the 12 exceptional unimodal boundary singularities we show that there is a direct relation between the Poincar´ e series of the ambient singularity and the characteristic polynomial of the monodromy. The author is grateful to Sabir Gusein-Zade for suggesting him to study relations between Poincar´ e series and monodromy for boundary singularities.
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

and so there will be no Dirac peak in the formula. Then the function Wr,r (α, β, γ ; x) > 0 defined by (α, β, γ integers, α, γ > 0):

Br,r (αn + βn + γ ) =

Br,r (n) =
0
x
n
1 e

k =0
δ (x − k (k + 1) . . . (k + r − 1)) (k + r − 1)!
dx, n ≥ 0. (6)
For r = 1 the discrete distribution of Eq.(6) is the weight function for the orthogonality relation for Charlier polynomials [9]. In contrast we emphasize that for r = s the Br,s (n) are moments of continuous distributions [2]. In this note we wish to point out an intimate relation between the formulas of Eqs. (3), (4), (5) and the log-normal distribution [10], [11]: (ln(x)−µ)2 1 e− 2σ2 , x ≥ 0, σ, µ > 0. (7) Pσ,µ (x) = √ 2πσx
Dobi´ nski-type relations and the Log-normal distribution
arXiv:quant-ph/0303030v1 6 Mar 2003
P Blasiak†‡ , K A Penson† and A I Solomon†
Universit´ e Pierre et Marie Curie, Laboratoire de Physique Th´ eorique des Liquides, CNRS UMR 7600 `me Tour 16, 5ie ´ etage, 4, place Jussieu, F 75252 Paris Cedex 05, France
Dobi´ nski-type relations and the Log-normal distribution
4
0.25
0.2 Wr,r (1,1,1;x) r=2 0.15
0.1 r=4 0.05
r=3
0
0
0.5
1
1.5 x
2
2.5
3
Figure 2. Weight functions Wr,r (1, 1, 1; x) for r = 2, 3, 4.
Dobi´ nski-type relations and the Log-normal distribution
3
0.25
e δ(x-1)
-1
0.2 W1,1(1,1,1;x)
0.15
0.1
0.05
0
0.5
1
1.5 x
2
2.5
3
Figure 1. Weight function W1,1 (1, 1, 1; x), see Eq.(13).
† r s n ns
= (a )
† n(r −s) k =s
Sr,s (n, k )(a† )k ak
(1)
and the corresponding Bell numbers Br,s (n) as
ns
Br,s (n) =
k =s
Sr,s (n, k ).
(2)
In [1] explicit and exact expressions for Sr,s (n, k ) and Br,s (n) were found. In a parallel study [2] it was demonstrated that Br,s (n) can be considered as the n-th moment of a probability distribution on the positive half-axis. In addition, for every pair (r, s) the corresponding distribution can be explicitly written down. These distributions constitute the solutions of a family of Stieltjes moment problems, with Br,s (n) as moments. Of particular interest to us are the sequences with r = s, for which the following representation as an infinite series has been obtained: 1 Br,r (n) = e = 1 e
t
Dobi´ nski-type relations and the Log-normal distribution
2
In a recent investigation [1] we analysed sequences of integers which appear in the process of normal ordering of powers of monomials of boson creation a† and annihilation a operators, satisfying the commutation rule [a, a† ] = 1. For r, s integers such that r ≥ s, we define the generalized Stirling numbers of the second kind Sr,s (n, k ) as (a ) a

k =0
n ≥ 0,
(5)
which expresses the conventional Bell numbers B1,1 (n) as a rapidly convergent series. Its simplicity has inspired combinatorialists such as G.-C. Rota [4] and H.S. Wilf [5]. Eq.(5) has far-reaching implications in the theory of stochastic processes [6], [7], [8]. The probability distribution whose n-th moment is Br,r (n) is an infinite ensemble of weighted Dirac delta functions located at a specific set of integers (a so-called Dirac comb):
First we quote the standard expression for its n-th moment:

Mn =
0
xn Pσ,µ (x)dx = e
n µ+n σ 2
2
, n ≥ 0,
(8)
which can be reparametrized for k > 1 as Mn = k αn with µ = β ln(k ), σ= 2α ln(k ) > 0. (10) (11)
2 +βn
,
(9)
Given three integers α, β, γ (where α > 0), we wish to find a weight function W1,1 (α, β, γ ; x) > 0 such that

B1,1 (αn2 + βn + γ ) =
0
xn W1,1 (α, β, γ ; x)dx.
Abstract. We consider sequences of generalized Bell numbers B (n), n = 0, 1, . . . which can ∞ [P (k)]n 1 be represented by Dobi´ nski-type summation formulas, i.e. B (n) = C k=0 D(k) , with P (k ) a polynomial, D(k ) a function of k and C = const. They include the standard Bell numbers (P (k ) = k , D(k ) = k !, C = e), their generalizations Br,r (n), r = 2, 3, . . . appearing in the normal ordering of powers of boson monomi(p) r )! als (P (k ) = (k+ k! , D (k ) = k !, C = e), variants of “ordered” Bell numbers Bo (n) k (P (k ) = k , D(k ) = ( p+1 p ) , C = 1 + p, p=1,2. . . ), etc. We demonstrate that for α, β, γ, t positive integers (α, t = 0), B (αn2 + βn + γ ) is the n-th moment of a positive function on (0, ∞) which is a weighted infinite sum of log-normal distributions.
相关文档
最新文档