The Maximum Likelihood Degree

合集下载

最大似然检测

最大似然检测最大似然检测（Maximum Likelihood，ML）检测，也被称作最大似然序列估计（MLSE），从严格意义上讲它不是均衡方案而是接收机方式，其中接收端的检测处理显式地考虑了无线信道时间弥散的影响。

从根本上讲，ML检测器考虑了时间弥散对接收信号的影响，用整个接收信号来确定最有可能被发送的序列。

为了实现最大似然检测，通常使用Viterbi算法。

然而，尽管基于Viterbi算法的最大似然检测被广泛应用于诸如GSM的2G通信，该算法还是因为太过复杂而无法应用在LTE上，这是因为更宽的传输带宽将导致更广泛的信道频率选择性和更高的采样速率。

总的来说，信号信息经过信道估计和均衡后，通过资源逆映射映射到不同的物理信道上进行处理。

1.1最大似然估计原理给定一个概率分布D，假定其概率密度函数（连续分布）或概率聚集函数（离散分布）为fD，以及一个分布参数θ，我们可以从这个分布中抽出一个具有n个值的采样通过利用fD，我们就能计算出其概率：但是，我们可能不知道θ的值，尽管我们知道这些采样数据来自于分布D。

那么我们如何才能估计出θ呢？一个自然的想法是从这个分布中抽出一个具有n个值的采样，然后用这些采样数据来估计θ.一旦我们获得，，我们就能从中找到一个关于θ的估计。

最大似然估计会寻找关于θ的最可能的值（即，在所有可能的θ取值中，寻找一个值使这个采样的“可能性”最大化）。

这种方法正好同一些其他的估计方法不同，如θ的非偏估计，非偏估计未必会输出一个最可能的值，而是会输出一个既不高估也不低估的θ值。

要在数学上实现最大似然估计法，我们首先要定义似然函数:并且在θ的所有取值上，使这个函数最大化。

这个使可能性最大的值即被称为θ的最大似然估计。

1.2最大似然译码算法在LTE上的应用假定调制星座图中的所有信号都是等概的，最大似然译码器对所有可能的见，和妥2值，从信号调制星座图中选择一对信号(二.，见2)使下面的距离量度最小(1)化简得最大似然译码判决准则为：(2)上式中:C为调制符号对所有可能的集合;和是通过合并接收信号和信道状态信息构造产生的两个判决统计。

马尔可夫网络的参数估计方法(十)

马尔可夫网络的参数估计方法马尔可夫网络是一种描述随机过程的数学工具，它可以用来建模时间序列数据、自然语言处理等领域。

在实际应用中，我们通常需要对马尔可夫网络的参数进行估计，以便更准确地模拟和预测系统的行为。

在本文中，我们将讨论一些常见的马尔可夫网络参数估计方法，并对它们的优缺点进行比较。

1. 最大似然估计（Maximum Likelihood Estimation, MLE）最大似然估计是一种常见的参数估计方法，它通过最大化观测数据的似然函数来估计参数值。

对于马尔可夫链模型来说，我们可以通过观测数据的转移概率来估计状态转移矩阵。

具体来说，对于一个马尔可夫链模型，我们可以定义观测数据的似然函数为所有状态转移的联合概率，然后通过最大化这个似然函数来估计状态转移矩阵的参数值。

虽然最大似然估计是一种直观简单的估计方法，但是它也存在一些缺点。

首先，当观测数据较少时，似然函数可能存在多个局部最优解，使得估计结果不够稳定。

其次，当模型的参数维度较高时，最大似然估计可能会导致过拟合，从而影响模型的泛化能力。

2. 贝叶斯估计（Bayesian Estimation）贝叶斯估计是一种基于贝叶斯统计理论的参数估计方法，它通过引入先验概率分布来对参数进行估计。

对于马尔可夫链模型来说，我们可以通过引入状态转移概率的先验分布来对状态转移矩阵进行估计。

具体来说，我们可以选择一个合适的先验分布，然后通过观测数据来更新参数的后验分布，最终得到参数的估计值。

贝叶斯估计的优点在于它可以有效地利用先验信息，从而提高参数估计的稳定性和泛化能力。

另外，贝叶斯估计还可以提供参数估计的不确定性信息，这对于模型的评估和选择非常有帮助。

然而，贝叶斯估计也存在一些问题，比如选择合适的先验分布可能会影响参数估计的结果，而且计算复杂度较高。

3. 最大后验概率估计（Maximum a posteriori Estimation, MAP）最大后验概率估计是贝叶斯估计的一种特殊情况，它通过最大化后验概率来估计参数值。

机器学习题库

机器学习题库一、极大似然1、 ML estimation of exponential model （10)A Gaussian distribution is often used to model data on the real line ， but is sometimesinappropriate when the data are often close to zero but constrained to be nonnegative 。

In such cases one can fit an exponential distribution, whose probability density function is given by()1xb p x e b-=Given N observations x i drawn from such a distribution ：(a) Write down the likelihood as a function of the scale parameter b.（b ） Write down the derivative of the log likelihood 。

(c ） Give a simple expression for the ML estimate for b 。

2、换成Poisson 分布：()|,0,1,2,...!x e p x y x θθθ-==()()()()()1111log |log log !log log !N Ni i i i N N i i i i l p x x x x N x θθθθθθ======--⎡⎤=--⎢⎥⎣⎦∑∑∑∑二、贝叶斯1、贝叶斯公式应用假设在考试的多项选择中,考生知道正确答案的概率为p ，猜测答案的概率为1-p ，并且假设考生知道正确答案答对题的概率为1，猜中正确答案的概率为1m ，其中m 为多选项的数目。

stata极大似然估计的实例

stata极大似然估计的实例Stata极大似然估计的实例：一步一步回答简介：极大似然估计（Maximum Likelihood Estimation, MLE）是一种常用的参数估计方法，其基本思想是找到一组参数值使得给定数据的似然函数达到最大。

Stata作为一种流行的统计分析软件，提供了丰富的功能和命令来实现极大似然估计。

本文将以实例的形式介绍如何使用Stata进行极大似然估计，并逐步解释相关的步骤和概念。

实例背景：假设我们有一组来自二项分布的数据，我们希望通过极大似然估计来估计出分布的参数。

步骤1：准备数据首先，我们需要准备数据。

假设我们有一个样本容量为100的二项分布数据，其中成功的次数为40次，失败的次数为60次。

步骤2：构建似然函数在进行极大似然估计之前，我们需要构建似然函数。

对于二项分布，似然函数的形式是：L(p) = (n choose k) * p^k * (1-p)^(n-k)，其中n是样本容量，k是成功次数，p是成功的概率。

在Stata中，我们可以使用"ml model"命令来指定模型和似然函数的形式。

在本例中，我们使用二项分布的似然函数，其中p是我们要估计的参数。

步骤3：指定模型和似然函数在Stata中，我们可以使用以下命令来指定模型和似然函数：stataclearset seed 12345input success failure40 60endml model d2 (success = failure, noweight)ml maximize上述命令的含义是：清除现有数据，设置随机数种子，输入我们的样本数据，然后使用“ml model”命令指定模型和似然函数。

在这里，d2代表二项分布，success和failure是数据变量，noweight表示没有加权。

最后，我们使用“ml maximize”命令来最大化似然函数。

步骤4：查看估计结果在进行极大似然估计后，Stata会返回估计的参数值和其他统计信息。

最大约登指数对应的截断值

最大约登指数对应的截断值
最大约登指数（Maximum Likelihood Estimation，简称MLE）是一种估计参数值的统计方法，常用于概率分布的参数估计。

在最大约登估计中，参数的估计值是使似然函数取得最大值时的参数值。

截断值（cutoff value）指的是对于某个连续随机变量，其取值必须大于或小于某个特定值的情况。

在使用最大约登估计进行参数估计时，可能会考虑截断值来限制参数的范围。

具体来说，截断值可以通过设定一个上限或下限来限制参数的取值范围。

根据具体的分布和问题，截断值的选择可能有所不同。

例如，对于正态分布（Normal distribution），如果我们知道变量的取值必须大于某个数值，则可以设置一个下限，即使得所有估计的参数都保持在给定的下限以上。

需要注意的是，截断值的选择应该基于具体的问题和样本数据的特点，需要进行合理的判断和探究。

没有一个通用的公式或规则来确定截断值，它通常是根据实际需求和领域知识来确定的。

因此，在使用最大约登估计进行参数估计时，如果需要考虑截断值，应该根据具体的问题和数据特点来确定合适的截断值。

最大似然法

最大似然法最大似然法（the method of maximum likelihood）也称极大似然法，它最早是由高斯所提出的,后来由英国统计学家费歇于1912年在其一篇文章中重新提出,并且证明了这个方法的一些性质.最大似然估计这一名称也是费歇给的.它是建立在最大似然原理的基础上的一个统计方法.为了对最大似然原理有一个直观的认识,我们先来看一个例子.例设有外形完全相同的两个箱子,甲箱有99个白球1个黑球,乙箱有1个白球99个黑球.今随机地抽取一箱,然后再从这箱中任取一球,结果发现是白球.问这个箱子是甲箱还是乙箱?分析注意我们这里做的是统计推断而不是逻辑推断。

所谓统计推断，就是根据已知的部分数据对总体的进行估计的一种推断方法。

从部分推断总体，必然伴随着一定的犯错误的概率。

因此从逻辑上认起死理来，统计推断似乎因为不太严谨而被排斥在“科学推断”之外了。

但是在实际生活中，如果都要按照逻辑推断来思考，那么将会给你的生活带来很大的麻烦。

比如出门，则难免会有一定的概率出一定的意外，因此所谓“安全回家”在逻辑上便不再是绝对可靠的，故而你只能选择闭门不出。

回到刚才的例题。

现在的问题是，仅仅从取出的球是白球这一点是无法从逻辑上严格加以判定该箱究竟是甲箱还是乙箱的。

但是如果现在一定要我们做出选择，那么我们只能这样来考虑：从箱中取出的球是白球这一点来看，甲箱和乙箱哪个看上去更像是真正从中取球的箱子？我们这样来分析：如果该箱是甲箱,则取得白球的概率为0.99；如果该箱是乙箱,则取得白球的概率0.01．因此，用“该箱是甲箱”来解释所取的球是白球这一事件更有说服力一些，从而我们判定甲箱比乙箱更像一些。

最后我们做出推断,这球是从甲箱取出的.其实，如果我们从“最大似然”的原文maximum likelihood来看，就会发现这个名称的原始含义就是“看起来最像”的意思。

“看起来最像”，在很多情况下其实就是我们决策时的依据。

一个总体往往都有若干个重要的参数。

计量经济学期末复习总结

第一章导论1．计量经济学是一门什么样的学科？答：“经济计量学”不仅要研究经济问题的计量方法，还要研究经济问题发展变化的数量规律。

可以认为，计量经济学是以经济理论为指导，以经济数据为依据，以数学、统计方法为手段，通过建立、估计、检验经济模型，揭示客观经济活动中存在的随机因果关系的一门应用经济学的分支学科。

2．计量经济学与经济理论、数学、统计学的联系和区别是什么？答：计量经济学是经济理论、数学、统计学的结合，是经济学、数学、统计学的交叉学科（或边缘学科）。

6．计量经济学模型的检验包括哪几个方面？为什么要进行模型的检验？答：对模型的检验通常包括经济意义经验、统计推断检验、计量经济检验、模型预测检验四个方面。

8．计量经济学模型中的被解释变量和解释变量、内生变量和外生变量是如何划分的？答：在联立方程计量经济学模型中，按是否由模型系统决定，将变量分为内生变量（endogenous variables）和外生变量（exogenous variables）两大类。

内生变量是由模型系统决定同时可能也对模型系统产生影响的变量，是具有某种概率分布的随机变量，外生变量是不由模型系统决定但对模型系统产生影响的变量，是确定性的变量。

9．计量经济学模型中包含的变量之间的关系主要有哪些？答：计量经济学模型中变量之间的关系主要是解释变量与被解释变量之间的因果关系，包括单向因果关系、相互影响关系、恒等关系。

12．计量经济学中常用的数据类型有哪些？答：根据生成过程和结构方面的差异，计量经济学中应用的数据可分为时间序列数据（time series data）、截面数据（cross sectional data）、面板数据（panal data）和虚拟变量数据（dummy variables data）。

13．什么是数据的完整性、准确性、可比性、一致性？答：1）完整性，指模型中所有变量在每个样本点上都必须有观察数据，所有变量的样本观察数据都一样多。

不完整数据估计参数的方法

不完整数据估计参数的方法在统计学中，不完整数据是指样本中存在一些缺失或缺损的观测值。

这种情况下，我们需要使用特定的方法来估计参数。

以下是几种常见的不完整数据估计参数的方法：1. 最大似然估计（Maximum Likelihood Estimation, MLE）：最大似然估计是一种常用的参数估计方法，它假设数据的缺失是随机且与完整数据的观测无关的。

该方法通过与已观测到的数据相比较，寻找最大化未观测数据可能性的参数值。

通过最大化似然函数来找到符合已观测数据的最优参数。

2. 指数经验似然估计（Exponential Empirical Likelihood Estimation, EEL）：指数经验似然估计是一种鲁棒的参数估计方法，用于处理不完整数据。

该方法利用指数分布来估计未观测数据的概率密度函数，并最大化观测数据的经验似然函数。

3. 多重插补（Multiple imputation）：多重插补是一种常见的不完整数据处理方法，它通过生成多个完整的数据集来估计参数。

首先，缺失值被随机插补，然后在每个插补数据集上进行参数估计，最后将多个估计结果合并为一个最终的估计值。

这种方法能够提供更可靠的估计结果和更准确的标准误差。

4. 期望最大化算法（Expectation-Maximization algorithm, EM）：期望最大化算法是一种迭代方法，用于估计含有不完整数据的模型参数。

该算法通过迭代地进行两个步骤：期望步骤（E-step）和最大化步骤（M-step）。

在E-step中，通过给出当前的参数估计，计算缺失数据的期望值；在M-step中，通过最大化完整数据的对数似然函数，更新参数估计。

该方法在缺失数据模型中的参数估计中具有良好的性能。

以上所提到的方法是处理不完整数据估计参数常用的方法之一。

根据实际情况和数据特点选择合适的方法，能够有效地提高参数估计的准确性和可靠性。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

a r X i v:mat h /46533v1[mat h.AG ]25J un24The Maximum Likelihood Degree Fabrizio Catanese,Serkan Ho¸s ten,Amit Khetan,and Bernd Sturmfels Abstract Maximum likelihood estimation in statistics leads to the problem of maximizing a product of powers of polynomials.We study the algebraic degree of the critical equations of this optimization prob-lem.This degree is related to the number of bounded regions in the corresponding arrangement of hypersurfaces,and to the Euler charac-teristic of the complexiﬁed complement.Under suitable hypotheses,the maximum likelihood degree equals the top Chern class of a sheaf of logarithmic diﬀerential forms.Exact formulae in terms of degrees and Newton polytopes are given for polynomials with generic coeﬃcients.1Introduction In algebraic statistics [13,21,22],a model for discrete data is a map f :R d →R n whose coordinates f 1,...,f n are polynomial functions in the parameters (θ1,...,θd )=:θ.The parameter vector θranges over an open subset U of R d such that f (θ)lies in the positive orthant R n >0.The image f (U )represents a family of probability distributions on an n -element state space,provided we make the extra assumption that f 1+···+f n −1is the zero polynomial.A given data set is a vector u =(u 1,...,u n )of positive integers.The problem of maximum likelihood estimation is to ﬁnd parameters θwhich best explain the data u .This leads to the following optimization problem:Maximize f 1(θ)u 1f 2(θ)u 2···f n (θ)u n subject to θ∈U .(1)Under suitable assumptions we have an optimal solution ˆθto the problem(1),which is an algebraic function of the data u .Our goal is to compute the degree of that algebraic function.We call this number the maximum1likelihood degree of the model f.Equivalently,the ML degree is the number of complex solutions to the critical equations of(1),for a general data vector u.In this paper we prove results of the following form:Theorem 1.Let f1,...,f n be polynomials of degrees b1,...,b n in d un-knowns.If the maximum likelihood degree of the model f=(f1,...,f n) isﬁnite then it is less than or equal to the coeﬃcient of z d in the generating function(1−z)df1∂f1f2∂f2f3∂f3f4∂f4f1∂f1f2∂f2f3∂f3f4∂f4 (1−2z)4=1+6z+25u1+u2+u3+u4andˆθ2=u1+u2of the top Chern class ofΩ1(log D).If X is projective d-space then this leads to Theorem1.In Section3we study the case when X is a smooth toric variety,and we derive a formula for the ML degree when the f i’s are Laurent polynomials which are generic relative to their Newton polytopes. For instance,Example8shows that the ML degree is13if we replace(3)byf i=αi+βiθ1+γiθ2+δiθ1θ2(i=1,2,3,4).Section4is concerned with the relationship of the ML degree to the bounded regions of the complement of{f i=0}in R d.The number of these regions is a lower bound to the number of real solutions of the critical equa-tions,and therefore a lower bound to the ML degree.We show that for plane quadrics all three numbers can be equal.However,for other combinations of plane curves the ML degree and the number of bounded regions diverge,and we prove a tight upper bound on the latter in Theorem12.Also,following work of Terao[24]and Varchenko[25],we show in Theorem13that the ML degree coincides with the number of bounded regions of the arrangement of hyperplanes{f i=0}when the f i’s are(not necessarily generic)linear forms.Section5revisits the ML degree for toric varieties,replacing the smooth-ness assumption by a much milder condition.Theorem15gives a purely combinatorial formula for the ML degree in terms of the Newton polytopes of the polynomials f i.This section also discusses how resolution of singular-ities can be used to compute the ML degree for nongeneric polynomials.Section6deals with topological methods for determining the ML degree. Theorem19shows that,under certain restrictive hypotheses,it coincides with the Euler characteristic of the complex manifold X\D,and Theorem 22oﬀers a general version of the semi-continuity principle which underlies the inequality in Theorem1.In Section7we relate the ML degree to the sheaf of logarithmic vectorﬁelds along D,which is the sheaf dual toΩ1(logD).This paper was motivated by recent appearances of the concept of ML degree in statistics and computational biology.Chor,Khetan and Snir[7] showed that the ML degree of a phylogenetic model equals9,and Geiger, Meek and Sturmfels[14]proved that an undirected graphical model has ML degree one if and only if it is decomposable.The notion of ML degree also makes sense for certain parametrized models for continuous data:Drton and Richardson[10]showed that the ML degree of a Gaussian graphical model equals5,and Bout and Richards[5]studied the ML degree of certain mixture models.The ML degree always provides an upper bound on the number of3local maxima of the likelihood function.Our ultimate hope is that a better understanding of the ML degree will lead to the development of custom-tailored algorithms for solving the critical equations dlog(f)=0.There is a need for such new algorithms,given that methods currently used in statistics (notably the EM-algorithm)often produce only local maxima in(1).2Critical Points of Rational FunctionsIn this section we work in the following general set-up of algebraic geometry. Let X be a complete factorial algebraic variety over the complex numbers C. We also assume that X is irreducible of dimension d≥1.In applications to statistics,the variety X will often be a smooth projective toric variety.Suppose that f∈C(X)is a rational function on X.Since X is factorial, the local rings O X,x are unique factorization domains.This means that the function f has a global factorization which is unique up to constants:f=F u11F u22···F u r r.(4) Here F i is a prime section of an invertible sheaf O X(D i)where D i is the divisor on X deﬁned by F i.In our applications we usually assume that r≥n where n is the number considered in the Introduction.For instance, if f1,...,f n are polynomials and X=P d then r=n+1;namely,F1,...,F n are the homogenizations of f1,...,f n usingθ0,and F n+1=θ0(see the proof of Theorem1for details).By(4),we can write the divisor of the rational function f uniquely asdiv(f)=r i=1u i D i,where the u i’s are(possibly negative)integers.Let D be the reduced union of the codimension one subvarieties D i⊂X,or,as a divisor,D:=Σr i=1D i.We are interested in computing the critical points of the rational function f on the open set V:=X\D complementary to the divisor D.Especially, we wish to know the number of critical points,counted with multiplicities.A critical point is by deﬁnition a point x∈X where the diﬀerential 1-form d f vanishes.If x is a smooth point on X,and x1,...,x d are local coordinates,then d f=Σd j=1(∂f/∂x j)dx j.Hence x is a critical point of f if4and only if∂f∂x2=···=∂ff =ri=1u i dlog(F i)=r i=1u i dF iinvertible in this local ring for j=i,andωis regular if and only if F i di-videsψi.This implies that the homomorphism which sendsωto the vector ψi(mod F i )i=1,...,r is well deﬁned,and it induces an isomorphism from the quotientΩ1X(logD)/Ω1X onto⊕r i=1O Di.Assume now that X is smooth.Then both sheavesΩ1X(D)andΩ1X are locally free of rank d=dim(X).Hence the intermediate sheafΩ1X(logD)is torsion free of the same rank.Our next result shows thatΩ1X(logD)is locally free if and only if the divisors D i are smooth and intersect transversally. Proposition3.Let x∈X be a smooth point,x1,...,x d local coordinates at x and D1,...,D h the divisors which contain x.Then the sheafΩ1X(logD)is locally free at x if and only if the h×d-matrix(∂F i/∂x j)has rank h at x. Proof.Any local section ofΩ1X(logD)can be written in the formω=ri=1ψi·dlog(F i)+η=h i=1ψi·dlog(F i)+d j=1ηj·dx j.(8)This observation gives rise to a local exact sequence0→O h X,x→O h X,x⊕O d X,x→Ω1X,x(logD)→0.(9) The surjective map on the right takes((ψi),(ηj))to the sum on the right hand side of(8).The injective map on the left takes the h-tuple(A1,...,A h)to((ψi),(ηj))withψi=F i A i andηj=−h l=1A l∂F lIn the above situation where X is smooth andΩ1X(logD)is locally free we shall say that the divisor D has global normal crossings(or GNC). Theorem4.Let X be smooth and assume that D is a GNC divisor.Then1.the section dlog(f)ofΩ1X(logD)does not vanish at any point of D,2.if the divisor D intersects every curve in X(in particular,if D isample)then dlog(f)vanishes only on aﬁnite subset of V=X\D,3.if the above conclusions hold,then the number of critical points of f onV,counted with multiplicities,equals the degree of the top Chern classc d(Ω1X(logD)).Proof.We abbreviateσ:=dlog(f)=Σr i=1u i dlog(F i).By the proof of Proposition3it follows that if(∂F i/∂x j)i=1,...h,j=1,...d has rank h at x,then Ω1X(logD)is locally free of rank d with generators dlog(F i)and some choice of d−h of the dx j.If we writeσin this basis,the coeﬃcients of dlog(F i)are the constants u i while the coeﬃcients of the dx j are some regular functions. Theﬁrst assertion follows immediately since the exponents u i are all nonzero.The second assertion follows from theﬁrst:let Zσbe the zero set of the sectionσ.Since Zσdoes not intersect D,it follows that dim(Zσ)=0.Thirdly,if F is a locally free sheaf of rank d on a smooth variety X of dimension d,andσis a section of H0(F)with a zero scheme Zσof dimension 0,then the length of Zσequals the degree of the top Chern class c d(F).The total Chern class of a sheaf F is the sum c tot(F)=Σd i=0c i(F)z i.This is a polynomial in z whose coeﬃcients are elements in the Chow ring A∗(X). Recall that every element in A∗(X)has a well-deﬁned degree which is the image of its degree d part under the degree homomorphism A d(X)→Z. Corollary5.Suppose that X is smooth and D is a GNC divisor on X which intersects every curve.Then the number of critical points of f,counted with multiplicities,is the degree of the coeﬃcient of z d in the following polynomial:c tot(Ω1X)·Πr i=1(1−zD i)−1∈A∗(X)[z].(10) Proof.The total Chern class c tot(F)is multiplicative with respect to exact sequences,i.e.,if0→A→B→C→0is an exact sequence of sheaves, then c tot(B)=c tot(A)·c tot(C).Hence the sequence(7)implies the result.7In the next section,we apply the formula(10)inthecase whenXis asmooth projective toric variety.The Chow group A d(X)has rank one and is generated by the class of any point.This canonically identiﬁes A d(X)with Z and so any top Chern class can be considered to be a number.Corollary6.Suppose X is a smooth toric variety with boundary divisors ∆1,...,∆s and D is GNC and meets every curve.The number of critical points of f,counted with multiplicity,equals the coeﬃcient of z d inΠs j=1(1−z∆j)θ0,θ2θ0 .The global factorization(4)of this F has r=n+1prime factors,namely,F i=θb i0·f i(θ1θ0)for i=1,...,n,8and F n+1=θ0with u n+1=−b1u1−b2u2−···−b n u n.The Chow ring of X=P d is Z[H]/ H d+1 ,where H represents the hyperplane class.By our genericity hypothesis,the r=n+1prime factors of F are smooth and global normal crossing.They correspond to the following divisor classes:D1=b1H,D2=b2H,...,D n=b n H and D n+1=H. Projective space P d is a smooth toric variety with d+1torus-invariant divisors ∆j,each having the same class H.Hence the formula in(11)specializes to (1−zH)d+1.(1−zb1H)···(1−zb n H) Since we work in the Chow ring of projective space P d,the coeﬃcient of(zH)d is the same as the coeﬃcient of z d in the generating function in(2).We now generalize our results from polynomials ofﬁxed degrees to Lau-rent polynomials withﬁxed Newton polytopes.Recall that the Newton poly-tope of a Laurent polynomial f(θ1,...,θd)is the convex hull of the set of exponent vectors of the monomials appearing in f with nonzero coeﬃcient. Given a convex polytope P⊂R d with vertices in Z d,by a generic Lau-rent polynomial with Newton polytope P we will mean a suﬃciently general C-linear combination of monomials with exponent vectors in P∩Z d.In the next theorem we consider n Laurent polynomials f1,f2,...,f n hav-ing respective Newton polytopes P1,P2,...,P n.Because the f i’s are Laurent polynomials,i.e.,their monomials may have negative exponents,we only consider those critical points of f=f u11f u22···f u n n which lie in the algebraic torus(C∗)d.The number of such critical points(counted with multiplicity) will be called the toric ML degree of the rational function f.Let P=P1+P2+···+P n denote the Minkowski sum of the given Newton polytopes,and let X be the projective toric variety deﬁned by P. Letη1,...,ηs∈Z d be the primitive inner normal vectors of the facets of P. They span the rays of the fan of X.Let∆1,...,∆s denote the corresponding torus-invariant divisors on X.Each of the Newton polytopes P i is the solution set of a system of linear inequalities of the speciﬁc formP i={x∈R d| x,ηj ≥−a ij for j=1,...,s}.The divisor on X deﬁned by the Laurent polynomial f i is linearly equivalent to D i= s j=1a ij∆j.The a ij are integers which can be positive or negative.9The divisor on X deﬁned by f=f u11f u22···f u n n is linearly equivalent toni=1u i D i=s j=1(n i=1u i a ij)·∆j.(12) We abbreviate the support of this divisor byI= j∈{1,...,s}|n i=1u i a ij=0 .(13) A toric variety X is smooth if all the cones in its normal fan are unimodular. Theorem7.If the toric variety X is smooth and the toric ML degree of the rational function f isﬁnite then it is bounded above by the coeﬃcient of z d in the following generating function with coeﬃcients in the Chow ring of X:j/∈I(1−z∆j)P it is ample on X by construction.So D i meets every curve on X and therefore so does D and we can apply Corollary6.A variable x j appears as a factor in F if and only if j∈I,in which case1−z∆j appears in both the numerator and denominator of(11),and we get the expression(14).Consider now arbitrary Laurent polynomials f1,...,f n inθ1,...,θd such that f= f u i i has onlyﬁnitely many critical points in(C∗)d.Letνbe the coeﬃcient of z d in(14).Let C m be the space of all n-tuples of Laurent poly-nomials with the given Newton polytopes.Consider the critical equations of f= f u i i and clear denominators.The resulting collection of d Laurent polynomials deﬁnes an algebraic subset˜W in the product space C m×(C∗)d. Saturate˜W to remove any components along the hypersurfaces{f i=0}and get a new algebraic subset W.The map from W onto C m is dominant and genericallyﬁnite,and the genericﬁber of this map consists ofνpoints.Our given Laurent polynomials f1,...,f n represent a pointφin C m.Let θ(1),...,θ(κ)be the isolated critical points of f.For each i,consider any irreducible component W(i)of W containing the point(φ,θ(i))in W⊂C m×(C∗)d.By Krull’s Principal Ideal Theorem,the component W(i)of W has codimension≤d and hence it has dimension≥m.As the genericﬁber isﬁnite,the dimension of W i is exactly m and the projection to C m is dominant.Sinceθ(i)is an isolated solution of the critical equations,the projection map to C m is open[19,(3.10)],so the intersection of W(i)with an open neighborhood of(φ,θ(i))maps onto an open neighborhood ofφ. Hence every generic point˜φnearφhas a preimage(˜φ,˜θ(i))near(φ,θ(i)), and these preimages are distinct for i=1,...,κ.We conclude thatκ≤ν. This semicontinuity argument is called the“specialization principle”stated in Mumford’s book[19,(3.26)]and also works when theθ(i)have multiplicities, as shown in Theorem22below.We illustrate Theorem7with two examples which we revisit in Section5. Example8.Consider n generic polynomials f1(θ1,θ2),...,f n(θ1,θ2)where the support of f i consists of monomialsθp1θq2with0≤p≤s i and0≤q≤t i, and suppose the u i’s are generic.The Newton polytope of f i is the rectangle P i=conv{(0,0),(s i,0),(0,t i),(s i,t i)}.The Minkowski sum of these rectangles is another rectangle,and X=P1×P1.In the numerator of(14),the contribution of the two torus-invariant divisors D and E corresponding to the left and the bottom edge of this11rectangle survives.The denominator comes from the product of the divisors of f1,...,f n:(1−zD)(1−zE)Figure2:The fan of a smooth projective toric surfaceThe three divisors corresponding to the polygons P1,P2,P3in Figure1areD1=2x3+2x4+2x5+x6D2=2x3+2x4+2x5+x6+x7+x8D3=x4+3x5+2x6+x7If all u i are positive,then the support of the divisor u1D1+u2D2+u3D3is I={3,...,8}.It follows that the toric ML degree is the coeﬃcient of z2in (1−zx1)(1−zx2)(1−zD1)−1(1−zD2)−1(1−zD3)−1.This coeﬃcient is14x1x2,which means that the toric ML degree is14.The toric ML degree of the model f is the toric ML degree deﬁned above for generic u.In this case,there is no cancellation among the coeﬃcients in (13),and I is the set of all indices j such that for some P i the supporting hy-perplane normal toηj does not pass through the origin.The toric ML degree of f is a numerical invariant of the polytopes P1,...,P n.A combinatorial formula for this invariant will be presented in Theorem15of Section5.4Bounded Regions in ArrangementsAs in the Introduction,we consider n polynomials f1,...,f n in d unknowns θ1,...,θd.We now assume that all coeﬃcients of the f i’s are real numbers, and we also assume that u1,...,u n are positive integers.However,we do not assume that the union of the divisors of the f i’s has global normal cross-ings.This is the case of interest in statistics.Consider the arrangement of13hypersurfaces deﬁned by the f i’s and let V R=R d\ n i=1{f i=0}be the complement of this arrangement.A connected component of V R is a bounded region if it is bounded as a subset of R d.Then the following observation holds.Proposition10.For any polynomial map f:R d→R n and any u∈N n>0,#{bounded regions of V R}≤#{critical points of f u11···f u n n in R d}≤ML degree of f.Proof.The function f=f u11···f u n n is continuous,and on the boundary of the closure of each bounded region its value is zero.Hence it has to have at least one(real)critical point in the interior of each region.The second inequality holds trivially,since the ML degree was deﬁned as the number of critical points of f u11···f u n n in C d,counted with multiplicities.This observation raises the question whether the inequalities above could be realized as equalities.We next show that this is the case when f1,...,f n are quadrics in the plane.Here the ML degree is2n2−2n+1by Theorem1. Proposition11.For each n,there are n quadrics f1,...,f n in R2such that #{bounded regions of V R}=ML degree of f=2n2−2n+1. Hence all critical points are real.Proof.We will take n quadrics that deﬁne“nested”ellipses with center at the origin,as suggested by Figure3.The proof follows by induction:assume we have2(n−1)2−2(n−1)+1bounded regions with n−1ellipses.Observe that the(n−1)st ellipse contains2n−3bounded regions.Then we add a new long and skinny ellipse which replaces the2n−3regions with3(2n−3)+2 regions.The total count comes out to be2n2−2n+1.We will see such an equality holding for n linear hyperplanes in R d below. However,even in the plane R2,the number of critical points and the number of bounded regions of V R diverge for curves of degree≥3.Theorem1implies that for n generic plane curves of degrees b1,...,b n the ML degree isni=1b i(b i−2)+ i<j b i b j+1.14Figure3:The“nested”ellipse constructionThe optimal upper bound for the number of bounded regions of V R is smaller than the ML degree,by the following unpublished result due to Oleg Viro. Theorem12.(Viro)Let f1,...,f n be real plane curves of degrees b1,...,b n, and let K be the number of odd degree curves among them.Then#{bounded regions of V R}≤ni=1(b i−1)(b i−2)In order to get any meaningful lower bound on the number of bounded regions of V R one needs to make some assumptions.Without any assumptions the lower bound is zero:for f i of even degree we take an empty(real)curve, and for f i of odd degree we take the union of an empty curve with a line.If we let all the lines intersect in a single point there will not be any bounded region.If we insist on at least having a GNC conﬁguration,then by the same construction the lower bound we get is the number of bounded regions in a generic arrangement of K lines where K is the number of odd degrees b i. This idea leads us to studying the ML degree of a hyperplane arrangement. Theorem13.Let f be given by n linear polynomials f1,...,f n with real coeﬃcients.Then the ML degree of f is equal to the number of the bounded regions of V R,and all critical points of the optimization problem(1)are real.This theorem does not assume any hypothesis such as global normal cross-ing.Under the GNC hypothesis,the hyperplanes would be in general position and the number of bounded regions equals n−1d ,as predicted by Theorem1. Theorem13is essentially due to Varchenko[25].We shall give a new proof.Proof.In light of Proposition10,we need to show that the number of bounded regions of V R equals the number of complex solutions of the ML equations.Let f i= d j=1a ijθj+c i for i=1,...,n.The ML equations areni=1u i a i1f i=0.(16)Consider the mapψ:C d+1→C n given byψ(θ0,...,θd)=(1/F1,...,1/F n). Here F i=c iθ0+ d j=1a ijθj is the homogenization of f i.We let¯H be the central hyperplane arrangement in R d+1given by the F i.We assume that the intersection of all the hyperplanes in¯H is just the origin;otherwise,the linear forms F i depend on fewer than d coordinates,and then we get inﬁnitely many critical points.The Zariski closure of im(ψ)in P n−1is a d-dimensional complex variety V.The solution set on V of the d linear equationsni=1(u i a i1)y i=···=n i=1(u i a id)y i=0consists ofﬁnitely many points provided u1,...,u n are generic.Obviously, the solutions to(16)lift to such complex solutions.In other words,the degree of the projective variety V is an upper bound on the ML degree of f.16Now we will compute the degree of V.This variety is the projective spec-trum of the N-graded algebra R=C[1/F i:i=1,...,n]where deg(1/F i)= 1.Terao[24,Theorem1.4]showed that the Hilbert series of R is equal toX∈L(−1)codim(X)µ(X) td!.We conclude that the degree of the projective variety V is(−1)d+1µ(0).By Zaslavsky[26],this number equals the number of bounded regions of V R. Example14.A family of important statistical models where Theorem13 applies is the linear polynomial model of[22].Such a model is given by a polynomial in r unknowns x=(x1,...,x r)with indeterminate coeﬃcients,p(x)=dj=1θj x a j(with a j∈N r),together with n data points v1,...,v n∈R r.The model is parametrized byf1(θ)=dj=1θj v a j1,...,f n(θ)=d j=1θj v a j n.The ML degree is the number of bounded regions of this arrangement.175Polytopes and Resolution of Singularities We now return to the setting of Section3,with the aim of relaxing the restrictive smoothness hypothesis in Theorem7.Our aim is to derive a combinatorial formula for the toric ML degree of any model f deﬁned by generic Laurent polynomials satisfying a mild hypothesis.The derivation of Theorem15involves resolution of singularities in the toric category.In the end of the section we shall comment on using resolution of singularities for bounding the ML degree in general.Given a polytope P in R d and a linear functional v on R d,we writeP v= p∈P|∀p′∈P: v,p ≤ v,p′for the face of P at which v attains its minimum.Two linear functionals v and v′are equivalent if P v=P v′.The equivalence classes are the relative interiors of cones of the inner normal fanΣP.Ifσis a cone inΣP,orσis a cone in any fan which reﬁnesΣP,then we write Pσ=P v for v in the relative interior ofσ.If f is a polynomial with Newton polytope P then fσdenotes the leading form consisting of all terms of f which are supported on Pσ.As in Section3,let f1,...,f n be Laurent polynomials with Newton poly-topes P1,...,P n⊂R d.Consider any fanΣwhich is a common reﬁnementof the inner normal fansΣP1,...,ΣPn.Supposeτis a cone inΣand let kbe the dimension of(P1+···+P n)τ.There exists a k-dimensional linear subspace L of R d and vectors q1,...,q n∈R d such that q i+Pτi lies in L for all i=1,...,n.The subspace L is unique and satisﬁes L∩Z d≃Z k.Let V(·,...,·)denote the normalized mixed volume on the subspace L.Here “normalized”refers to the lattice L∩Z d,as is customary in toric geometry [12].For any k-element subset{i1,...,i k}of{1,2,...,n}we abbreviateV(P i1,...,P ik;τ)=V(q i1+Pτi1,...,q ik+Pτik)if codim(τ)=k,(19)and V(P i1,...,P ik;τ)=0if codim(τ)>k.If k=d andτ={0}then wesimply write V(P i1,...,P id)for the mixed volume in(19).If k=0andτisfull-dimensional then(19)equals1;this happens in the last sum of(20).We are now ready to state our more general toric ML degree formula.As in Section3,let X be the toric variety corresponding to the Minkowski sum P=P1+···+P n andΣX the normal fan with raysη1,...,ηs.We consider the function f=f u11···f u n n.Each polytope P i corresponds to a divisor D i so the divisor of f is D= u i D i.Let I be the support of D as in(13). Label the rays ofΣX so that{1,...,r}are the indices not in I.18For each subset J of{1,...,r}letτJ denote the smallest cone ofΣwhich contains the vectorsηj for j∈J.If no such cone exists thenτJ is just a formal symbol and the expression(19)is declared to be zero forτ=τJ.The mild smoothness hypothesis we need is that every singular cone ofΣcontains at least one ray from I.Equivalently all conesτJ are smooth.Theorem15.Suppose every singular cone ofΣX contains some ray in the support of the divisor D.Then,the toric ML degree of the rational function f is bounded above by the following alternating sum of mixed volumes:1≤i1≤···≤i d≤n V(P i1,...,P id)− j∈{1,...,r}1≤i1≤···≤i d−1≤nV(P i1,...,P id−1;τ{j})+{j1,j2}⊂{1,...,r}1≤i1≤···≤i d−2≤n V(P i1,...,P id−2;τ{j1,j2})+···+(−1)d{j1,...,j d}⊂{1,...,r}V(∅;τ{j1,...,j d}).(20)Equality holds if each f i is generic relative to its Newton polytope P i. Proof.In order to apply Corollary6we must resolve the singularities of X. For toric varieties this is done in two steps.First we get a simplicial toric variety without adding any new rays to the fan.Second we resolve the re-maining singular(but simplicial)cones by adding new rays.This procedure is described in detail in[12].Typically theﬁrst step involves taking the pulling subdivision at each ray in the fan.However,under the given hypothesis it is enough to perform pulling subdivisions only at the rays in the support of Dto obtain a simplicial fanΣ˜X .Thisﬁne detail will be important below.Ourhypothesis holds for this intermediate fan as well,and subsequently we take a smooth reﬁnementΣX′ofΣ˜X by adding new rays in the relative interiors of each of the singular cones.Letπ:X′→X be the induced map.We will show that we get no new critical points under the resolution. Hence the number of critical points can be computed on X′.Weﬁnally claim that the Chern class formula expands into the given combinatorial formula.We investigate critical points of the pullback of our rational function: F′=π∗(F)=(x− u iπ∗(D i)) π∗(F i(x)).19。