Maximum entropy modeling of species geographic distributions

Ecological Modelling 190(2006)

231–259

Steven J.Phillips a ,?,Robert P.Anderson b ,c ,Robert E.Schapire d

AT&T Labs-Research,180Park Avenue,Florham Park,NJ 07932,USA

Department of Biology,City College of the City University of New York,J-526Marshak Science Building,

Convent Avenue at 138th Street,New York,NY 10031,USA

Division of Vertebrate Zoology (Mammalogy),American Museum of Natural History,Central Park West at 79th Street,

New York,NY 10024,USA

d Computer Scienc

e Department,Princeton University,35Olden Street,Princeton,NJ 08544,USA

Received 23February 2004;received in revised form 11March 2005;accepted 28March 2005

Available online 14July 2005

Abstract

The availability of detailed environmental data,together with inexpensive and powerful computers,has fueled a rapid increase in predictive modeling of species environmental requirements and geographic distributions.For some species,detailed pres-ence/absence occurrence data are available,allowing the use of a variety of standard statistical techniques.However,absence data are not available for most species.In this paper,we introduce the use of the maximum entropy method (Maxent)for modeling species geographic distributions with presence-only data.Maxent is a general-purpose machine learning method with a simple and precise mathematical formulation,and it has a number of aspects that make it well-suited for species distribution modeling.In order to investigate the ef?cacy of the method,here we perform a continental-scale case study using two Neotropical mammals:a lowland species of sloth,Bradypus variegatus ,and a small montane murid rodent,Microryzomys minutus .We compared Maxent predictions with those of a commonly used presence-only modeling method,the Genetic Algorithm for Rule-Set Prediction (GARP).We made predictions on 10random subsets of the occurrence records for both species,and then used the remaining localities for testing.Both algorithms provided reasonable estimates of the species’range,far superior to the shaded outline maps available in ?eld guides.All models were signi?cantly better than random in both binomial tests of omission and receiver operating characteristic (ROC)analyses.The area under the ROC curve (AUC)was almost always higher for Maxent,indicating better discrimination of suitable versus unsuitable areas for the species.The Maxent modeling approach can be used in its present form for many applications with presence-only datasets,and merits further research and development.?2005Elsevier B.V .All rights reserved.

Keywords:Maximum entropy;Distribution;Modeling;Niche;Range

Corresponding author.Tel.:+19733608704;fax:+19733608871.

E-mail addresses:phillips@https://www.360docs.net/doc/4c2604945.html,

(S.J.Phillips),anderson@https://www.360docs.net/doc/4c2604945.html, (R.P.Anderson),schapire@https://www.360docs.net/doc/4c2604945.html, (R.E.Schapire).

1.Introduction

Predictive modeling of species geographic distribu-tions based on the environmental conditions of sites of known occurrence constitutes an important tech-

232S.J.Phillips et al./Ecological Modelling190(2006)231–259

nique in analytical biology,with applications in con-servation and reserve planning,ecology,evolution, epidemiology,invasive-species management and other ?elds(Corsi et al.,1999;Peterson and Shaw,2003; Peterson et al.,1999;Scott et al.,2002;Welk et al.,2002;Yom-Tov and Kadmon,1998).Sometimes both presence and absence occurrence data are avail-able for the development of models,in which case general-purpose statistical methods can be used(for an overview of the variety of techniques currently in use, see Corsi et al.,2000;Elith,2002;Guisan and Zim-merman,2000;Scott et al.,2002).However,while vast stores of presence-only data exist(particularly in nat-ural history museums and herbaria),absence data are rarely available,especially for poorly sampled tropical regions where modeling potentially has the most value for conservation(Anderson et al.,2002;Ponder et al., 2001;Sober′o n,1999).In addition,even when absence data are available,they may be of questionable value in many situations(Anderson et al.,2003).Modeling techniques that require only presence data are therefore extremely valuable(Graham et al.,2004).

1.1.Niche-based models from presence-only data

We are interested in devising a model of a species’environmental requirements from a set of occurrence localities,together with a set of environmental vari-ables that describe some of the factors that likely in?uence the suitability of the environment for the species(Brown and Lomolino,1998;Root,1988). Each occurrence locality is simply a latitude–longitude pair denoting a site where the species has been ob-served;such georeferenced occurrence records often derive from specimens in natural history museums and herbaria(Ponder et al.,2001;Stockwell and Peterson, 2002a).The environmental variables in GIS format all pertain to the same geographic area,the study area, which has been partitioned into a grid of pixels.The task of a modeling method is to predict environmen-tal suitability for the species as a function of the given environmental variables.

A niche-based model represents an approximation of a species’ecological niche in the examined envi-ronmental dimensions.A species’fundamental niche consists of the set of all conditions that allow for its long-term survival,whereas its realized niche is that subset of the fundamental niche that it actually occupies (Hutchinson,1957).The species’realized niche may be smaller than its fundamental niche,due to human in?uence,biotic interactions(e.g.,inter-speci?c com-petition,predation),or geographic barriers that have hindered dispersal and colonization;such factors may prevent the species from inhabiting(or even encoun-tering)conditions encompassing its full ecological po-tential(Pulliam,2000;Anderson and Mart′?nez-Meyer, 2004).We assume here that occurrence localities are drawn from source habitat,rather than sink habitat, which may contain a given species without having the conditions necessary to maintain the population with-out immigration;this assumption is less realistic with highly vagile taxa(Pulliam,2000).By de?nition,then, environmental conditions at the occurrence localities constitute samples from the realized niche.A niche-based model thus represents an approximation of the species’realized niche,in the study area and environ-mental dimensions being considered.

If the realized niche and fundamental niche do not fully coincide,we cannot hope for any modeling al-gorithm to characterize the species’full fundamental niche:the necessary information is simply not present in the occurrence localities.This problem is likely ex-acerbated when occurrence records are drawn from too small a geographic area.In a larger study region,how-ever,spatial variation exists in community composi-tion(and,hence,in the resulting biotic interactions) as well as in the environmental conditions available to the species.Therefore,given suf?cient sampling effort, modeling in a study region with a larger geographic extent is likely to increase the fraction of the funda-mental niche represented by the sample of occurrence localities(Peterson and Holt,2003),and is preferable. In practice,however,the departure between the fun-damental niche(a theoretical construct)and realized niche(which can be observed)of a species will remain unknown.

Although a niche-based model describes suitabil-ity in ecological space,it is typically projected into geographic space,yielding a geographic area of pre-dicted presence for the species.Areas that satisfy the conditions of a species’fundamental niche represent its potential distribution,whereas the geographic ar-eas it actually inhabits constitute its realized distribu-tion.As mentioned above,the realized niche may be smaller than the fundamental niche(with respect to the environmental variables being modeled),in which

S.J.Phillips et al./Ecological Modelling190(2006)231–259233

case the predicted distribution will be smaller than the full potential distribution.However,to the extent that the model accurately portrays the species’fundamen-tal niche,the projection of the model into geographic space will represent the species’potential distribution.

Whether or not a model captures a species’full niche requirements,areas of predicted presence will typically be larger than the species’realized distribution.Due to many possible factors(such as geographic barriers to dispersal,biotic interactions,and human modi?ca-tion of the environment),few species occupy all areas that satisfy their niche requirements.If required by the application at hand,the species’realized distribution can often be estimated from the modeled distribution through a series of steps that remove areas that the species is known or inferred not to inhabit.For ex-ample,suitable areas that have not been colonized due to contingent historical factors(e.g.,geographic barri-ers)can be excluded(Peterson et al.,1999;Anderson, 2003).Similarly,suitable areas not inhabited due to bi-otic interactions(e.g.,competition with closely related morphologically similar species)can be identi?ed and removed from the prediction(Anderson et al.,2002). Finally,when a species’present-day distribution is de-sired,such as for conservation purposes,a current land-cover classi?cation derived from remotely sensed data can be used to exclude highly altered habitats(e.g.,re-moving deforested areas from the predicted distribution of an obligate-forest species;Anderson and Mart′?nez-Meyer,2004).

There are implicit ecological assumptions in the set of environmental variables used for modeling, so selection of that set requires great care.Temporal correspondence should exist between occurrence localities and environmental variables;for example, a current land-cover classi?cation should not be used with occurrence localities that derive from museum records collected over many decades(Anderson and Mart′?nez-Meyer,2004).Secondly,the variables should affect the species’distribution at the relevant scale, determined by the geographic extent and grain of the modeling task(Pearson et al.,2004).For example, using the terminology of Mackey and Lindenmayer (2001),climatic variables such as temperature and pre-cipitation are appropriate at global and meso-scales; topographic variables(e.g.,elevation and aspect)likely affect species distributions at meso-and topo-scales; and land-cover variables like percent canopy cover in?uence species distributions at the micro-scale.The choice of variables to use for modeling also affects the degree to which the model generalizes to regions outside the study area or to different environmental conditions(e.g.,other time periods).This is important for applications such as invasive-species management (e.g.,Peterson and Robins,2003)and predicting the impact of climate change(e.g.,Thomas et al.,2004). Bioclimatic and soil-type variables measure availabil-ity of the fundamental primary resources of light,heat, water and mineral nutrients(Mackey and Linden-mayer,2001).Their impact,as measured in one study area or time frame,should generalize to other situa-tions.On the other hand,variables representing latitude or elevation will not generalize well;although they are correlated with variables that have biophysical impact on the species,those correlations vary over space and time.

A number of other serious potential pitfalls may af-fect the accuracy of presence-only modeling;some of these also apply to presence–absence modeling.First, occurrence localities may be biased.For example,they are often highly correlated with the nearby presence of roads,rivers or other access conduits(Reddy and D′a valos,2003).The location of occurrence localities may also exhibit spatial auto-correlation(e.g.,if a re-searcher collects specimens from several nearby local-ities in a restricted area).Similarly,sampling intensity and sampling methods often vary widely across the study area(Anderson,2003).In addition,errors may exist in the occurrence localities,be it due to transcrip-tion errors,lack of suf?cient geographic detail(espe-cially in older records),or species misidenti?cation. Frequently,the number of occurrence localities may be too low to estimate the parameters of the model re-liably(Stockwell and Peterson,2002b).Similarly,the set of available environmental variables may not be suf?cient to describe all the parameters of the species’fundamental niche that are relevant to its distribution at the grain of the modeling task.Finally,errors may be present in the variables,perhaps due to errors in data manipulation,or due to inaccuracies in the climatic models used to generate climatic variables,or inter-polation of lower-resolution data.In sum,determining and possibly mitigating the effects of these factors rep-resent worthy topics of research for all presence-only modeling techniques.With these caveats,we proceed to introduce a modeling approach that may prove use-

234S.J.Phillips et al./Ecological Modelling190(2006)231–259

ful whenever the above concerns are adequately ad-dressed.

1.2.Maxent

Maxent is a general-purpose method for making predictions or inferences from incomplete information. Its origins lie in statistical mechanics(Jaynes,1957), and it remains an active area of research with an Annual Conference,Maximum Entropy and Bayesian Meth-ods,that explores applications in diverse areas such as astronomy,portfolio optimization,image recon-struction,statistical physics and signal processing.We introduce it here as a general approach for presence-only modeling of species distributions,suitable for all existing applications involving presence-only datasets. The idea of Maxent is to estimate a target probability distribution by?nding the probability distribution of maximum entropy(i.e.,that is most spread out,or closest to uniform),subject to a set of constraints that represent our incomplete information about the target distribution.The information available about the target distribution often presents itself as a set of real-valued variables,called“features”,and the constraints are that the expected value of each feature should match its em-pirical average(average value for a set of sample points taken from the target distribution).When Maxent is ap-plied to presence-only species distribution modeling, the pixels of the study area make up the space on which the Maxent probability distribution is de?ned,pixels with known species occurrence records constitute the sample points,and the features are climatic variables, elevation,soil category,vegetation type or other environmental variables,and functions thereof.

Maxent offers many advantages,and a few draw-backs;a comparison with other modeling methods will be made in Section2.1.4after the Maxent approach is described in detail.The advantages include the fol-lowing:(1)It requires only presence data,together with environmental information for the whole study area.(2)It can utilize both continuous and categorical data,and can incorporate interactions between different variables.(3)Ef?cient deterministic algorithms have been developed that are guaranteed to converge to the optimal(maximum entropy)probability distribution.

(4)The Maxent probability distribution has a concise mathematical de?nition,and is therefore amenable to analysis.For example,as with generalized linear and generalized additive models(GLM and GAM),in the absence of interactions between variables,additivity of the model makes it possible to interpret how each environmental variable relates to suitability(Dud′?k et al.,2004;Phillips et al.,2004).(5)Over-?tting can be avoided by using 1-regularization(Section2.1.2).(6) Because dependence of the Maxent probability distri-bution on the distribution of occurrence localities is ex-plicit,there is the potential(in future work)to address the issue of sampling bias formally,as in Zadrozny (2004).(7)The output is continuous,allowing?ne dis-tinctions to be made between the modeled suitability of different areas.If binary predictions are desired,this allows great?exibility in the choice of threshold.If the application is conservation planning,the?ne distinc-tions in predicted relative environmental suitability can be valuable to reserve planning algorithms.(8)Maxent could also be applied to species presence/absence data by using a conditional model(as in Berger et al.,1996), as opposed to the unconditional model used here.(9) Maxent is a generative approach,rather than discrim-inative,which can be an inherent advantage when the amount of training data is limited(see Section2.1.4).

(10)Maximum entropy modeling is an active area of re-search in statistics and machine learning,and progress in the?eld as a whole can be readily applied here.(11) As a general-purpose and?exible statistical method, we expect that it can be used for all the applications outlined in Section1above,and at all scales.

Some drawbacks of the method are:(1)It is not as mature a statistical method as GLM or GAM,so there are fewer guidelines for its use in general,and fewer methods for estimating the amount of error in a predic-tion.Our use of an“unconditional”model(cf.advan-tage8)is rare in machine learning.(2)The amount of regularization(see Section2.1.2)requires further study (e.g.,see Phillips et al.,2004),as does its effectiveness in avoiding over-?tting compared with other variable-selection methods(for alternatives,see Guisan et al., 2002).(3)It uses an exponential model for probabil-ities,which is not inherently bounded above and can give very large predicted values for environmental con-ditions outside the range present in the study area.Extra care is therefore needed when extrapolating to another study area or to future or past climatic conditions(for example,feature values outside the range of values in the study area should be“clamped”,or reset to the ap-propriate upper or lower bound).(4)Special-purpose

S.J.Phillips et al./Ecological Modelling190(2006)231–259235

software is required,as Maxent is not available in stan-dard statistical packages.

1.3.Existing approaches for presence-only modeling

Many methods have been used for presence-only modeling of species distributions,and we only attempt here to give a broad overview of existing methods. Some methods use only presences to derive a model. BIOCLIM(Busby,1986;Nix,1986)predicts suitable conditions in a“bioclimatic envelope”,consisting of a rectilinear region in environmental space represent-ing the range(or some percentage thereof)of observed presence values in each environmental dimension.Sim-ilarly,DOMAIN(Carpenter et al.,1993)uses a similar-ity metric,where a predicted suitability index is given by computing the minimum distance in environmental space to any presence record.

Other techniques use presence and background data.General-purpose statistical methods such as generalized linear models(GLMs)and generalized additive models(GAMs)are commonly used for modeling with presence–absence datasets.Recently, they have been applied to presence-only situations by taking a random sample of pixels from the study area, known as“background pixels”or“pseudo-absences”, and using them in place of absences during model-ing(Ferrier and Watson,1996;Ferrier et al.,2002).

A sample of the background pixels can be chosen purely at random(sometimes excluding sites with presence records,Graham et al.,2004),or from sites where sampling is known to have occurred or from a model of such sites(Zaniewski et al.,2002;Engler et al.,2004).Similarly,a Bayesian approach(Aspinall, 1992)proposed modeling presence versus a random sample.The Genetic Algorithm for Rule-Set Predic-tion(Stockwell and Noble,1992;Stockwell and Peters, 1999)uses an arti?cial-intelligence framework called genetic algorithms.It produces a set of positive and negative rules that together give a binary prediction; rules are favored in the algorithm according to their signi?cance(compared with random prediction)based on a sample of background pixels and presence pixels. Environmental-Niche Factor Analysis(ENFA,Hirzel et al.,2002)uses presence localities together with environmental data for the entire study area,without requiring a sample of the background to be treated like absences.It is similar to principal components analysis, involving a linear transformation of the environmental space into orthogonal“marginality”and“specializa-tion”factors.Environmental suitability is then modeled as a Manhattan distance in the transformed space.

As a?rst step in the evaluation of Maxent,we chose to compare it with GARP,as the latter has recently seen extensive use in presence-only studies(Anderson, 2003;Joseph and Stockwell,2002;Peterson and Kluza, 2003;Peterson and Robins,2003;Peterson and Shaw, 2003and references therein).While further stud-ies are needed comparing Maxent with other widely used methods that have been applied to presence-only datasets,such studies are beyond the scope of this pa-per.

2.Methods

2.1.Maxent details

2.1.1.The principle

When approximating an unknown probability dis-tribution,the question arises,what is the best approx-imation?E.T.Jaynes gave a general answer to this question:the best approach is to ensure that the ap-proximation satis?es any constraints on the unknown distribution that we are aware of,and that subject to those constraints,the distribution should have max-imum entropy(Jaynes,1957).This is known as the maximum-entropy principle.For our purposes,the un-known probability distribution,which we denoteπ,is over a?nite set X,(which we will later interpret as the set of pixels in the study area).We refer to the individ-ual elements of X as points.The distributionπassigns a non-negative probabilityπ(x)to each point x,and these probabilities sum to1.Our approximation ofπis also a probability distribution,and we denote it?π.The entropy of?πis de?ned as

H(?π)=?

x∈X

?π(x)ln?π(x)

where ln is the natural logarithm.The entropy is non-negative and is at most the natural log of the number of elements in X.Entropy is a fundamental concept in information theory:in the paper that originated that ?eld,Shannon(1948)described entropy as“a measure

236S.J.Phillips et al./Ecological Modelling 190(2006)231–259

of how much ‘choice’is involved in the selection of an event”.Thus a distribution with higher entropy involves more choices (i.e.,it is less constrained).Therefore,the maximum entropy principle can be interpreted as saying that no unfounded constraints should be placed on ?π

,or alternatively,The fact that a certain probability distribution maxi-mizes entropy subject to certain constraints represent-ing our incomplete information,is the fundamental property which justi?es use of that distribution for inference;it agrees with everything that is known,but carefully avoids assuming anything that is not known (Jaynes,1990).

2.1.2.A machine learning perspective

The maximum entropy principle has seen recent interest in the machine learning community,with a major contribution being the development of ef?-cient algorithms for ?nding the Maxent distribution (see Berger et al.,1996for an accessible introduction and Ratnaparkhi,1998for a variety of applications and a favorable comparison with decision trees).The ap-proach consists of formalizing the constraints on the unknown probability distribution πin the following way.We assume that we have a set of known real-valued functions f 1,...,f n on X ,known as “features”(which for our application will be environmental vari-ables or functions thereof).We assume further that the information we know about πis characterized by the expectations (averages)of the features under π.Here,each feature f j assigns a real value f j (x )to each point x in X .The expectation of the feature f j under πis de?ned as

x ∈X π(x )f j (x )and denoted by π[f j ].In general,for any probability distribution p and function f ,we use the notation p [f ]to denote the expectation of f under p .

The feature expectations π[f j ]can be approximated using a set of sample points x 1,...,x m drawn inde-pendently from X (with replacement)according to the probability distribution π.The empirical average of f j is 1m m

i =1f j (x i ),which we can write as ?π

[f j ](where ?π

is the uniform distribution on the sample points),and use as an estimate of π[f j ].By the maximum entropy principle,therefore,we seek the probability distribu-tion ?π

of maximum entropy subject to the constraint that each feature f j has the same mean under ?π

as ob-served empirically,i.e.?π

[f j ]=?π[f j ],for each feature f j

(1)

It turns out that the mathematical theory of convex

duality can be used (Della Pietra et al.,1997)to show

that this characterization uniquely determines ?π

,and that ?π

has an alternative characterization,which can be described as follows.Consider all probability distribu-tions of the form q λ(x )=

e λ·

f (x )

Z λ

(2)

where λis a vector of n real-valued coef?cients or fea-ture weights,f denotes the vector of all n features,and Z λis a normalizing constant that ensures that q λsums to 1.Such distributions are known as Gibbs distribu-tions.Convex duality shows that the Maxent probabil-ity distribution ?π

is exactly equal to the Gibbs prob-ability distribution q λthat maximizes the likelihood (i.e.,probability)of the m sample points.Equivalently,it minimizes the negative log likelihood of the sample points ?π

[?ln(q λ)](3)

which can also be written ln Z λ?1m m i =1λ·f (x i )and termed the “log loss”.

As described so far,Maxent can be prone to over-?tting the training data.The problem derives from the fact that the empirical feature means will typically not equal the true means;they will only approximate them.

Therefore the means under ?π

should only be restricted to be close to their empirical values.One way this can be done is to relax the constraint in (1)above (Dud′?k et al.,2004),replacing it with

|?π[f j ]??π[f j ]|≤βj ,for each feature f j (4)

for some constants βj .This also changes the dual char-acterization,resulting in a form of 1-regularization:the Maxent distribution can now be shown to be the Gibbs distribution that minimizes

?π[?ln(q λ)]+

βj |λj |(5)

where the ?rst term is the log loss (as in (3)above),while the second term penalizes the use of large values for the weights λj .Regularization forces Max-ent to focus on the most important features,and 1-

S.J.Phillips et al./Ecological Modelling190(2006)231–259237

regularization tends to produce models with few non-zeroλj values(Williams,1995).Such models are less likely to over?t,because they have fewer parameters; as a general rule,the simplest explanation of a phe-nomenon is usually best(the principle of parsimony, Occam’s Razor).Note that 1regularization has also been applied to GLM/GAMs,and is called the“lasso”in that context(Guisan et al.,2002and references therein).

This maximum likelihood formulation suggests a natural approach for?nding the Maxent probability distribution:start from the uniform probability distri-bution,for whichλ=(0,...,0),then repeatedly make adjustments to one or more of the weightsλj in such a way that the regularized log loss decreases.Regular-ized log loss can be shown to be a convex function of the weights,so no local minima exist,and several convex optimization methods exist for adjusting the weights in a way that guarantees convergence to the global min-imum(see Section2.2for the algorithm used in this study).

The above presentation describes an“uncondi-tional”maximum entropy model.“Conditional”mod-els are much more common in the machine learning literature.The task of a conditional Maxent model is to approximate a joint probability distribution p(x,y) of the inputs x and output label y.Both presence and absence data would be required to train a conditional model of a species’distribution,which is why we use unconditional models.

2.1.

3.Application to species distribution modeling

Austin(2002)examines three components needed for statistical modeling of species distributions:an eco-logical model concerning the ecological theory being used,a data model concerning collection of the data, and a statistical model concerning the statistical the-ory.Maxent is a statistical model,and to apply it to model species distributions successfully,we must con-sider how it relates to the two other modeling com-ponents(the data model and ecological model).Using the notation of Section2.1.2,we de?ne the set X to be the set of pixels in the study area,and interpret the recorded presence localities for the species as sample points x1,...,x m taken from an unknown probability distributionπ.The data model consists of the method by which the presence localities were collected.One idealized sampling strategy is to pick a random pixel,and record1if the species is present there,and0other-wise.If we denote the response variable as y,then under this sampling strategy,πis the probability distribution p(x|y=1).By applying Bayes’rule,we get thatπis proportional to probability of occurrence,p(y=1|x), although with presence-only data we cannot determine the constant of proportionality.

However,most presence-only datasets derive from surveys where the data model is much less well-de?ned that the idealized model presented above.The various sampling biases described in Section1seriously violate this data model.In practice,then,π(and?π)can be more conservatively interpreted as a relative index of envi-ronmental suitability,where higher values represent a prediction of better conditions for the species(similar to the relaxed interpretation of GLMs with presence-only data in Ferrier et al.(2002)).

The critical step in formulating the ecological model is de?ning a suitable set of features.Indeed,the con-straints imposed by the features represent our ecologi-cal assumptions,as we are asserting that they represent all the environmental factors that constrain the geo-graphical distribution of the species.We consider?ve feature types,described in Dud′?k et al.(2004).We did not use the fourth in our present study,as it may require more data than were available for our study species.

1.A continuous variable f is itself a“linear feature”.

It imposes the constraint on?πthat the mean of the environmental variable,?π[f],should be close to its observed value,i.e.,its mean on the sample locali-ties.

2.The square of a continuous variable f is a“quadratic

feature”.When used with the corresponding linear feature,it imposes the constraint on?πthat the vari-ance of the environmental variable should be close to its observed value,since the variance is equal to ?π[f2]??π[f]2.It models the species’tolerance for variation from its optimal conditions.

3.The product of two continuous environmental vari-

ables f and g is a“product feature”.Together with the linear features for f and g,it imposes the constraint that the covariance of those two variables should be close to its observed value,since the covariance is?π[fg]??π[f]?π[g].Product features therefore in-corporate interactions between predictor variables.

4.For a continuous environmental variable f,a“thresh-

old feature”is equal to1when f is above a given

238S.J.Phillips et al./Ecological Modelling190(2006)231–259

threshold,and0otherwise.It imposes the following constraint:the proportion ofπthat has values for f above the threshold should be close to the observed proportion.All possible threshold features for f to-gether allow Maxent to model an arbitrary response curve of the species to f,as any smooth function can be approximated by a linear combination of thresh-old functions.

5.For a categorical environmental variable that takes

on values v1...v k,we use k“binary features”, where the i th feature is1wherever the variable equals v i,and0otherwise.As with threshold fea-tures,these binary features constrain the proportion of?πin each category to be close to the observed proportion.

For each of these feature types,the corresponding regularization parameterβj governs how close the ex-pectation under?πis required to be to the observed value;without regularization,they are required to be equal(Section2.1.2).The above list of features types is not exhaustive,and additional feature types could be derived from the same environmental variables.The features used should be those that likely constrain the geographic distribution of the species.

The applicability of the maximum entropy prin-ciple to species distributions is supported by ther-modynamic theories of ecological processes(Aoki, 1989;Schneider and Kay,1994).The second law of thermodynamics speci?es that in systems with-out outside in?uences,processes move in a direction that maximizes entropy.Thus,in the absence of in-?uences other than those included as constraints in the model,the geographic distribution of a species will indeed tend toward the distribution of maximum entropy.

2.1.4.Relationships to other modeling approaches

Maxent has strong similarities to some existing methods for modeling species distributions,in partic-ular,generalized linear models(GLMs),generalized additive models(GAMs)and machine learning meth-ods such as Bayesian approaches and neural networks. GLMs,GAMs,Bayesian approaches and neural net-works are all broad classes of techniques,and we refer here only to the way they have been applied to presence-only modeling of species distributions.Similarly,Max-ent generally refers to a class of techniques,but we restrict our discussion to 1-regularized unconditional Maxent,as described in Section2.1.2.

Theoretically,Maxent is most similar to GLMs and GAMs.In what follows,we use the terminology of Yee and Mitchell(1991).A frequently-used GLM is the Guassian logit model,in which the logit of the predicted probability of occurrence is

α+β1f1(x)+γ1f1(x)2+...+βn f n(x)+γn f n(x)2

(6) where the f j are environmental variables,α,βj and γj are?tted coef?cients,and the logit function is de-?ned by logit(p)=ln(p

1?p

).The expression in(6)is the same form as the log(rather than logit)of the prob-ability of the pixel x in a Maxent model with linear and quadratic features.A common method for mod-eling interactions between variables in a GLM is to create product variables,which is analogous to the use of product features in Maxent.

In the same way,if probability of occurrence is mod-eled with a GAM using a logit link function,the logit of the predicted probability has the form

g1(f1(x))+...+g n(f n(x))

where the f i are again environmental variables.The g i are smooth functions?t by the model,with the amount of smoothing controlled by a width parameter.This is the same form as the log probability of the pixel x in a Maxent model with threshold features,and regulariza-tion has an analogous effect to smoothing on the oth-erwise arbitrary functions g i.In both cases,the shape of the response curve to each environmental variable is determined by the data.

Despite these similarities,important differences exist between GLM/GAMs and Maxent,causing them to make different predictions.When GLM/GAMs are used to model probability of occurrence,absence data are required.When applied to presence-only data,background pixels must be used instead of true absences(Ferrier and Watson,1996;Ferrier et al., 2002).However,the interpretation of the result is less clear-cut—it must be interpreted as a relative index of environmental suitability.In contrast,Maxent models a probability distribution over the pixels in the study region,and in no sense are pixels without species records interpreted as absences.In addition,Maxent

S.J.Phillips et al./Ecological Modelling190(2006)231–259239

is a generative approach,whereas GLM/GAMs are discriminative,and generative methods may give better predictions when the amount of training data is small(Ng and Jordan,2001).For a joint probability distribution p(x,y),a discriminative classi?er models the posterior probability p(y|x)directly,in order to choose the most likely label y for given inputs x. Typically,a generative classi?er models the distribu-tion p(x,y)or p(x|y),and relies on Bayes’rule to determine p(y|x).Our unconditional Maxent models are generative:we model a distribution p(x|y=1).

Maxent shares with other machine learning methods an emphasis on probabilistic reasoning.Regulariza-tion,which penalizes the use of large values of model parameters,can be interpreted as the use of a Bayesian prior(Williams,1995).However,Maxent is quite dif-ferent from the particular Bayesian species modeling approach of Aspinall(1992).The latter approach is known as“naive Bayes”in the machine learning liter-ature,and assumes independence of the environmental variables.This assumption is frequently not met for environmental data.

The data requirements of Maxent are closest to those of environmental niche factor analysis(ENFA),which also uses presence data in combination with environ-mental data for the whole study area(although both could use only a random sample of background pixels to improve running time).

2.2.A Maxent implementation for modeling species distributions

In order to make the Maxent method available for modeling species geographic distributions,we imple-mented an ef?cient algorithm together with a choice of feature types that are well suited to the task.Our imple-mentation uses a sequential-update algorithm(Dud′?k et al.,2004)that iteratively picks a weightλj and adjusts it so as to minimize the resulting regularized log loss. The algorithm is deterministic,and is guaranteed to converge to the Maxent probability distribution.The algorithm stops when a user-speci?ed number of iter-ations has been performed,or when the change in log loss in an iteration falls below a user-speci?ed value (convergence),whichever happens?rst.

As described in Section2.1,Maxent assigns a non-negative probability to each pixel in the study area. Because these probabilities must sum to1,each prob-ability is typically extremely small.Although these “raw”probabilities are an optional output,by default our software presents the probability distribution in an-other form that is easier to use and interpret,namely a“cumulative”representation.The value assigned to a pixel is the sum of the probabilities of that pixel and all other pixels with equal or lower probability,multiplied by100to give a percentage.The cumulative representa-tion can be interpreted as follows:if we resample pixels according to the modeled Maxent probability distribu-tion,then t%of the resampled pixels will have cumula-tive value of t or less.Thus,if the Maxent distribution ?πis a close approximation of the probability distribu-tionπthat represents reality,the binary model obtained by setting a threshold of t will have approximately t%omission of test localities and minimum predicted area among all such models(cf.the“minimal predicted area”evaluation measure of Engler et al.(2004)).This provides a theoretical foundation that aids in the selec-tion of a threshold when a binary prediction is required.

Our Maxent implementation has a straightforward graphical user interface(Fig.1).It also has a command-line interface,allowing it to be run automatically from scripts for batch processing.It is written in Java,so it can be used on all modern comput-ing platforms,and is freely available on the world-wide web at https://www.360docs.net/doc/4c2604945.html,/～schapire/ maxent.The user-speci?ed parameters and their default values(which we used in all runs de-scribed below)are:convergence threshold=10?5, maximum iterations=1000,regularization valueβ= 10?4,and use of linear,quadratic,product and binary features.The?rst two parameters are conservative val-ues that allow the algorithm to get close to conver-gence.The small value ofβhas minimal effect on the prediction but avoids potential numerical dif?culties by keepingλvalues from tending to in?nity;how to choose the best regularization parameters is a topic of ongoing research(see Dud′?k et al.(2004)).1

1A later version of the software,Version1.8.1,was posted on the web site during review of this paper.It allows eachβj to depend on observed variability in the corresponding feature,as described in Dud′?k et al.(2004).The recommended regularization is now ob-tained by setting the regularization parameter to“auto”,allowing the program to select an amount of regularization that is appropriate for the types of features used and the number of sample localities.The version of the software used in the present study(Version1.0,also available on the web site)uses the same valueβfor allβj.

240S.J.Phillips et al./Ecological Modelling 190(2006)

231–259

https://www.360docs.net/doc/4c2604945.html,er interface for the Maxent application (Version 1.0)for modeling species geographic distributions using georeferenced occurrence records and environmental variables.The interface allows for the use of both continuous and categorical environmental data,and linear,quadratic,and product features.See Section 2for further documentation.

2.3.GARP

In its simplest form,GARP seeks a collection of rules that together produce a binary prediction.Posi-tive rules predict suitable conditions for pixels satis-fying some set of environmental conditions;similarly,negative rules predict unsuitable conditions.Rules are favored in the algorithm according to their signi?cance (compared with random prediction)based on a sample of 1250presence pixels and 1250background pixels,sampled with replacement.Some pixels may receive no prediction,if no rule in the rule-set applies to them,and some may require resolution of con?icting predictions.A genetic algorithm is used to search heuristically for a good rule-set (Stockwell and Noble,1992).

There is considerable random variability in GARP predictions,so we implemented the best-subset model selection procedure as follows,similar to Peterson and Shaw (2003)and following the general recommenda-tions of Anderson et al.(2003).First,we generated 100binary models,with pixels that did not received a pre-diction interpreted as predicted absence,using GARP version 1.1.3with default values for its parameters

(0.01convergence limit,1000maximum iterations,and allowing the use of atomic,range,negated range and logit rules).We then eliminated all models with more than 5%intrinsic omission (of training localities).If at most 10models remained,they then constituted the best subset (this happened 4out of 44times,yielding best subsets with 5,7,8and 9models).In all other cases,we determined the median value of the predicted area of the remaining models,and selected the 10models whose predicted area was closest to the median.Fi-nally,we combined the best-subset models to make a composite GARP prediction,in which the value of a pixel was equal to the number of best-subset models in which the pixel was predicted present (0–10).2.4.Data sources

2.4.1.Study species

The brown-throated three-toed sloth Bradypus var-iegatus (Xenarthra:Bradypodidae)is a large arbo-real mammal (3–6kg)that is widely distributed in the Neotropics from Honduras to northern Argentina.It is found primarily in lowland areas but also ranges up to

S.J.Phillips et al./Ecological Modelling 190(2006)231–259241

middle elevations.It has been documented in regions of deciduous forest,evergreen rainforest and montane forest,but is absent from xeric areas and non-forested regions (Anderson and Handley,2001).Three other species are known in the genus.B.pygmaeus is endemic to Isla Escudo on the Caribbean coast of Panama,and two species have geographic distributions restricted to South America:B.tridactylus in the Guianan region and B.torquatus in the Atlantic forests of Brazil.The latter two species show geographic distributions that likely come into contact (or did historically)with that of B.variegatus ,but areas of sympatry are apparently minimal.

Microryzomys minutus (Rodentia:Muridae)is a small-bodied rodent (10–20g)known from middle-to-high elevations of the Andes and associated moun-tain chains from Venezuela to Bolivia (Carleton and Musser,1989).It occupies an elevational range of ap-proximately 1000–4000m and has been recorded pri-marily in wet montane forests,although sometimes in mesic p′a ramo habitats above treeline (in the p′a ramo -forest ecotone).A congeneric species,M.altissimus ,

occupies generally higher elevations in much of this re-gion,but occasionally the two have been found in sym-patry.M.minutus has not been encountered in lowland

regions (below approximately 1000m).Likewise,it is apparently absent from open p′a ramo far from forests,dry puna habitat above treeline,and obviously from permanent glaciers on the highest mountain peaks.These two species hold several characteristics con-ducive to their use in evaluating the utility of Maxent in modeling species distributions.First of all,they show widespread geographic distributions with clear ecolog-ical/environmental patterns.Secondly,they have been the subject of recent taxonomic revisions by specialists.Finally,those revisions provide a reasonable number of georeferenced occurrence localities for each species based on con?rmed museum specimens (128for B.variegatus ,Anderson and Handley,2001;88for M.minutus ,Carleton and Musser,1989;Fig.2).2.4.2.Environmental variables

We examine the species’potential distributions in the Neotropics from southeastern Mexico to

Argentina

Fig.2.Occurrence records for Bradypus variegatus (triangles;left,116records)and Microryzomys minutus (circles;right,88records)used in this study.Data derive from vouchered museum specimens reported in recent taxonomic revisions (Anderson and Handley,2001;Carleton and Musser,1989).

242S.J.Phillips et al./Ecological Modelling190(2006)231–259

(23.55?N–56.05?S,94.8?W–34.2?W),including the Caribbean from Cuba southward.The environmen-tal variables fall into three categories:climate,elevation and potential vegetation.All variables are recorded at a pixel size of0.05?by0.05?,yielding a1212×1592 grid,with648,658pixels containing data for all vari-ables.

The climatic variables derive from data provided by the Intergovernmental Panel on Climate Change (IPCC;New et al.,1999).The original variables have a resolution of0.5?by0.5?,and were produced us-ing thin-plate spline interpolation based on readings taken at weather stations around the world from1961 to1990.They describe mean monthly values of various variables,which we processed to convert to ascii raster grid format,as required by GARP and Maxent.From these monthly data,we also created annual variables by averaging or taking the minimum or maximum as appropriate.

Of the many monthly and annual variables avail-able,we selected the following twelve,based on our assessment that they would likely have relevance for the species being modeled(see also Peterson and Cohoon, 1999):annual cloud cover;annual diurnal temperature range;annual frost frequency;annual vapor pressure; January,April,July,October and annual precipitation; and minimum,maximum and mean annual tempera-ture.We used bilinear interpolation to resample to a pixel size of0.05?by0.05?.Although this resampling clearly does not actually increase the resolution of the data,bilinear interpolation is likely more realistic than simply using nearest-neighbor interpolation.

Two other variables were used in addition to the climatic data.An elevation variable was derived from USGS HYDRO1k data(USGS,2001)by resampling from the original?ner resolution(1km pixels)to0.05?by0.05?.Finally,we used a potential vegetation vari-able,consisting of a partition of Latin America and the Caribbean into“major habitat types”,produced as part of a terrestrial conservation assessment(Dinerstein et al.,1995).This variable does not take into account historical(contingent)biogeographic information or human-induced changes,and represents a recon-struction of original vegetation types in the region. We used digital data on15major habitat types in a vector coverage(shape?le),which we converted to a grid with resolution of0.05?by0.05?coincident with the climatic and elevational variables.The digital data differed slightly from the description and map in Dinerstein et al.(1995)by having15rather than 11major habitat types.The differences arise from the addition of a snow/ice/glaciers/rock category, a tundra category and a water category;deletion of the restingas category;splitting of grassland savannas and shrublands into temperate versus tropi-cal/subtropical categories;and splitting of temperate forests into temperate coniferous and temperate broadleaf and mixed forests.The processed climatic variables(at the original resolution),all resampled variables,and the occurrence localities are available at https://www.360docs.net/doc/4c2604945.html,/～schapire/maxent. 2.5.Model building

For each species,we made10random partitions of the occurrence localities.Each partition was created by randomly selecting70%of the occurrence localities as training data,with the remaining30%reserved for testing the resulting models.Twelve of the original128 localities for B.variegatus lay in coastal areas or on islands that were missing data for one or more of the environmental variables,and were excluded from this study.Each partition for B.variegatus thus held81 training localities and35test localities,and those for M. minutus held61training localities and27test localities.

We made10random partitions rather than a single one in order to assess the average behavior of the algo-rithms,and to allow for statistical testing of observed differences in performance(via Wilcoxon signed-rank tests).In addition,the algorithms were also run on the full set of occurrence localities,taking advantage of all available data to provide best estimates of the species’potential distributions for visual interpretation.

The algorithms(Maxent and GARP)were run with two suites of environmental variables:?rst with only climatic and elevational data,and then with those vari-ables plus potential vegetation.The reasons for treating potential vegetation separately are three-fold:(1)cli-matic and elevational data are readily available for the whole world(whereas potential vegetation is not),and we wished to determine whether good models can be created using uniformly available data.(2)The poten-tial vegetation coverage is rather subjective,whereas the others are objectively produced from measured data.(3)Potential vegetation is the only categorical variable,and the potential existed for the algorithms

S.J.Phillips et al./Ecological Modelling190(2006)231–259243

to respond differently to categorical versus continuous data.

2.6.Model evaluation

The?rst step in evaluating the models produced by the two algorithms was to verify that both performed signi?cantly better than random.For this purpose,we ?rst used a(threshold-dependent)binomial test based on omission and predicted area.However,it does not allow for comparisons between algorithms,as the sig-ni?cance of the test is highly dependent on predicted area.Indeed,comparison of the algorithms is made dif?cult by the fact that neither gives a binary pre-diction.Hence,we also used two comparative statisti-cal tests that employ very different means to overcome this complication.First,we employed a new threshold-dependent method of model evaluation,which we term the“equalized predicted area”test,whose purpose is to answer the following question:at the commonly used thresholds representing the extremes of the GARP pre-diction,how does Maxent compare?Second,we used (threshold-independent)receiver operating character-istic(ROC)analysis,which characterizes the perfor-mance of a model at all possible thresholds by a single number,the area under the curve(AUC),which may be then compared between algorithms.

2.6.1.Threshold-dependent evaluation

After applying a threshold,model performance can be investigated using the extrinsic omission rate,which is the fraction of the test localities that fall into pixels not predicted as suitable for the species,and the pro-portional predicted area,which is the fraction of all the pixels that are predicted as suitable for the species.

A low omission rate is a necessary(but not suf?cient) condition for a good model(Anderson et al.,2003).In contrast,it might be necessary to predict a large propor-tional area to model the species’potential distribution adequately.

A one-tailed binomial test can be used to determine whether a model predicts the test localities signi?cantly better than random(Anderson et al.,2002).Say there are t test localities,the omission rate is r,and the propor-tional predicted area is a.The null hypothesis states that the model is no better than one randomly selected from the set of all models with proportional predicted area a. It is tested using a one-tailed binomial test to determine the probability of having at least t(1?r)successes out of t trials,each with probability a.Although the prob-abilities for such tests are often approximated using aχ2or z test(for large sample sizes),we calculated exact probabilities for the binomial test using Minitab (1998).

The binomial test requires that thresholds be used,in order to convert continuous Maxent and discrete GARP predictions into binary predictions delimiting the suit-able versus unsuitable areas for the species.A good general rule for determining an appropriate threshold would depend at least on the following factors:the predicted values assigned to the training localities,the number of training localities and the context in which the prediction is to be used.Nevertheless,for each run of each algorithm,we simply used the minimum pre-dicted value assigned to any of the training localities as the threshold.However,for four of the twenty GARP runs,such a threshold would cause the whole study area to be predicted(as some training localities fell in pixels not predicted by any of the best-subset models). In those cases,we used the smallest non-zero predicted value among the training localities.

Because this omission test is highly sensitive to the proportional predicted area(Anderson et al.,2003),it cannot be used to compare model performance between two algorithms directly.Hence,we propose an“equal-ized predicted area”test,which chooses thresholds so that the two binary models have the same predicted area,allowing direct comparison of omission rates. Here,composite GARP models have little?exibility in the choice of threshold.On the other hand,Maxent pre-dictions,being continuous,can be thresholded to obtain any desired predicted area.So,we set a threshold for each Maxent prediction to give the same predicted area as the corresponding GARP prediction.A two-tailed Wilcoxon signed-rank test(a non-parametric equiva-lent of a paired t-test)can then be used to determine whether the observed difference in omission rates be-tween the two algorithms at the given predicted area is statistically signi?cant.We used this test to compare Maxent predictions with two thresholds of the com-posite GARP predictions,namely1(any best-subset model)and10(all best-subset models;see Anderson and Mart′?nez-Meyer,2004).These are natural thresh-olds for GARP that are frequently used in practice,so for reasons of conciseness,we do not consider inter-mediate thresholds.For some data partitions for B.var-

244S.J.Phillips et al./Ecological Modelling190(2006)231–259

iegatus,the maximum value of the composite GARP model was less than10(because fewer than10GARP

models met the best-subset criteria),in which case we

used the maximum predicted value instead of10.

The thresholds and resulting predicted areas used

above are not necessarily optimal for either algorithm.

Rather,they were chosen to facilitate statistical analy-

sis of the algorithms.Note that we are not suggesting

that GARP should or need be used in general to select a

threshold for Maxent predictions when binary predic-

tions are desired.Rather,we took advantage of the?ex-

ibility of Maxent’s continuous outputs to allow direct

comparisons of omission rates between it and GARP.

Determining optimal thresholds for Maxent models re-

mains a topic of future research.In practice,thresholds

would currently be chosen by hand,since no general-

purpose thresholding rule has been developed yet for

either algorithm(but see Section2.2for theoretical ex-

pectations for Maxent).

2.6.2.Threshold-independent evaluation

A second common approach compares model

performance using receiver operating characteris-

tic(ROC)curves.ROC analysis was developed in

signal processing and is widely used in clinical

medicine(Hanley and McNeil,1982,1983;Zweig and

Campbell,1993).The main advantage of ROC analy-

sis is that area under the ROC curve(AUC)provides

a single measure of model performance,independent

of any particular choice of threshold.ROC analysis

has recently been applied to a variety of classi?cation

problems in machine learning(for example Provost

and Fawcett,1997)and in the evaluation of models

of species distributions(Elith,2002;Fielding and Bell,

1997).

Here we will?rst describe ROC curves in general

terms,following Fawcett(2003),before demonstrating

how they apply to presence-only modeling.Consider

a classi?cation problem,where each instance is either

positive or negative.A classi?er assigns a real value

to each instance,to which a threshold may be applied

to predict class membership;for clarity we use labels {Y,N}for the class predictions.The sensitivity of a clas-si?er for a particular threshold is the fraction of all pos-

itive instances that are classi?ed Y,while speci?city is

the fraction of all negative instances that are classi?ed

N.Sensitivity is also known as the true positive rate, and represents absence of omission error.The quantity 1–speci?city is also known as the false positive rate, and represents commission error.

The ROC curve is obtained by plotting sensitivity on the y axis and1–speci?city on the x axis for all pos-sible thresholds.For a continuous prediction,the ROC curve typically contains one point for each test instance, while for a discrete prediction,there will typically be one point for each of the different predicted values,in addition to the origin.The area under the curve(AUC) is usually determined by connecting the points with straight lines;this is called the trapezoid method(as opposed to parametric methods,which?t a curve to the points).The AUC has an intuitive interpretation, namely the probability that a random positive instance and a random negative instance are correctly ordered by the classi?er.This interpretation indicates that the AUC is not sensitive to the relative numbers of positive and negative instances in the test data set.

When only presence data are available,it would appear that ROC curves are inapplicable,since with-out absences,there seems to be no source of negative instances with which to measure speci?city(see Sec-tion1.1,and the discussion of real and apparent com-mission error in Anderson et al.(2003)and Karl et al. (2002)).However,we can avoid this problem by con-sidering a different classi?cation problem,namely,the task of distinguishing presence from random,rather than presence from absence.More formally,for each pixel x in the study area,we de?ne a negative instance x random.Similarly,for each pixel x that is included in the species’true geographic distribution,we de-?ne a positive instance x presence.A species distri-bution model can then make predictions for the pixels corresponding to these instances,without seeing the labels random or presence.Thus,we can make predictions for both a sample of positive instances (the presence localities)and a sample of negative in-stances(background pixels chosen uniformly at ran-dom,or according to another background distribution as described in Section1.3).Together these are suf-?cient to de?ne an ROC curve,which can then be analyzed with all the standard statistical methods of ROC analysis.This process can be interpreted as using pseudo-absence in place of absence in the ROC anal-ysis,as is done in Wiley et al.(2003).However,we believe that the observation that the statistical methods of ROC analysis can be applied without prejudice to presence/random data is new.

S.J.Phillips et al./Ecological Modelling190(2006)231–259245

The above treatment differs from the use of ROC analysis on presence/absence data in one important re-spect:with presence-only data,the maximum achiev-able AUC is less than1(Wiley et al.,2003).If the species’distribution covers a fraction a of the pixels, then the maximum achievable AUC can be shown to be exactly1?a/2.Unfortunately,we typically do not know the value of a,so we cannot say how close to optimal a given AUC value is.Nevertheless,we can still use standard methods to determine statistical sig-ni?cance of the AUC,and to distinguish between the predictive power of different classi?ers.We note that random prediction still corresponds to an AUC of0.5.

We used AccuROC Version2.5(Vida,1993)for the ROC analyses.AccuROC uses the trapezoid method,as described above.To test if a prediction is signi?cantly better than random,AccuROC uses a ties-corrected Mann–Whitney-U statistic,which it approximates us-ing a z-statistic.It uses a non-parametric test(DeLong et al.,1988)to determine whether one prediction is signi?cantly better than another when using correlated samples(i.e.,with both predictions evaluated on the same test instances),and reports the result as aχ2 statistic and corresponding p value.For each ROC analysis,we used all the test localities for the species as presence instances,and a sample of10,000pix-els drawn randomly from the study region as random instances.

3.Results

3.1.Quantitative results

3.1.1.Threshold-dependent omission tests

Both algorithms consistently produced predictions that were better than https://www.360docs.net/doc/4c2604945.html,ing the simple threshold rule(Section2.6.1),the binomial omission test was highly signi?cant(p<0.001,one-tailed)for both algorithms on all data partitions for each species (see Table1for details on runs with the climatic and elevational variables;results on the variable suite including potential vegetation were similar).For Max-ent,the thresholds determined by the simple threshold rule ranged from0.022to2.564for B.variegatus and0.543to3.822for M.minutus.For GARP,the thresholds ranged from1to7for B.variegatus and 2to10for M.minutus.In addition to statistical signi?cance,omission rates were consistently low or zero,never exceeding17%(Table1).

The results of the equalized predicted area test dif-fered between the species(Tables2and3).For B. variegatus,the omission rates of the two algorithms were lower for Maxent in16cases,equal in15cases, and lower for GARP in9cases.However,two-tailed Wilcoxon signed-rank tests did not reveal a signi?cant difference in median omission rates for either thresh-old or either variable suite(p=0.178and0.314for thresholds of1and10,respectively,with climatic and elevational variables;p=0.371and0.155for thresh-olds of1and10,respectively,with addition of the po-tential vegetation variable).

Maxent almost always had equal or lower omission than GARP for M.minutus(19out of20models).The difference in median omission rates was signi?cant at both thresholds on runs with climatic and elevational variables(p=0.036and p=0.014for thresholds of1 and10,respectively;two-tailed Wilcoxon signed-rank test).When the potential vegetation variable was added, the difference in median omission rates was highly sig-ni?cant for a threshold of10,but not for a threshold of 1(p=0.009and0.345,respectively),largely because Maxent had greater omission than before on data par-tition2,discussed below(Section4.3).

3.1.2.Threshold-independent tests

For all partitions of the occurrence data,the AUC values(calculated on extrinsic test data)were highly statistically signi?cant for both algorithms and variable suites(p<0.0001),again indicating better-than-random predictions.The Maxent AUC was signif-icantly greater than that of GARP(p<0.05;two-tailed non-parametric test of DeLong et al.,1988;see Meth-ods)in all data partitions except B.variegatus-4and B. variegatus-8for models using the climatic and eleva-tional variables,and B.variegatus-8and M.minutus-2 when potential vegetation was added(Table4).

Addition of the potential vegetation variable should increase the AUC,since there is more information available to the classi?er.This was true in general for Maxent and in some cases for GARP(Table4).For Maxent on B.variegatus,the overall increase in me-dian AUC approached signi?cance(p=0.093,one-tailed Wilcoxon signed rank test).However,for GARP the test was not signi?cant(p=0.949);indeed,the AUC generally decreased.For M.minutus,the AUC

246S.J.Phillips et al./Ecological Modelling190(2006)231–259

Table1

Results of the threshold-dependent binomial tests of omission

Data partition Maxent GARP

Area Omission rate Area Omission rate Bradypus variegatus-10.510.030.410.11

B.variegatus-20.6600.560.06

B.variegatus-30.8000.610.03

B.variegatus-40.420.170.510

B.variegatus-50.750.030.570.06

B.variegatus-60.6200.540

B.variegatus-70.5900.530

B.variegatus-80.590.060.620

B.variegatus-90.6900.660

B.variegatus-100.620.060.440.06 Average0.6260.0340.5450.031 Microryzomys minutus-10.030.110.060.15

M.minutus-20.040.110.060.15

M.minutus-30.030.110.070.15

M.minutus-40.040.040.080.04

M.minutus-50.030.040.060.15

M.minutus-60.040.150.060.11

M.minutus-70.0500.090.07

M.minutus-80.040.040.100

M.minutus-90.030.070.100

M.minutus-100.030.110.080.07 Average0.0350.0780.0750.089

Area(proportion of the study area predicted)and extrinsic omission rate(proportion of the test localities falling outside the prediction)are given for each of10random data partitions for Maxent and GARP.For both B.variegatus and M.minutus,the binomial test was highly signi?cant for all partitions(p<0.001,one-tailed).Models were derived using the climatic and elevational variables for each random partition of occurrence records,and area and omission rates were calculated using simple threshold rules based on the training localities(see Section2).The results for models made with the addition of the potential vegetation variable were similar but are not shown here(see Section3).The omission rates should not be compared between algorithms,as they are strongly affected by differences in predicted area.The simple threshold rule used here for Maxent is not recommended for general use in practice;in this case,it gives too high a threshold for Maxent on B.variegatus-4,causing a high omission rate,and too low a threshold on B.variegatus-3,resulting in too much predicted area.

usually increased for both Maxent and GARP,with re-sults signi?cant or nearly so for both(p=0.051and 0.033,respectively,although performance was poorer for Maxent on data partition2;see Section4.3).While the differences in AUC values are very small,the changes may still be meaningful biologically.For ex-ample,the largest visual effect of adding potential veg-etation for Maxent was to(correctly)exclude some non-forested areas from the prediction for B.varie-gatus(Section3.2.2).However,because of the small geographic extent of those areas,the effect on AUC values was small.

The ROC curves for the two algorithms showed dis-tinct patterns,evident in the curves for the?rst random data partition for each species,for models made using climatic and elevational variables(Fig.3).In the case of M.minutus,the performance of Maxent was better across the entire spectrum:for any given omission rate, Maxent achieved that rate with a lower false positive rate(1–speci?city,which is almost identical to propor-tional predicted area,see Section2).The results with B. variegatus were more complex.There is a point where the ROC curves for the two algorithms intersect,cor-responding to a sensitivity of0.83(omission rate of 0.17)and a false positive rate of0.27.At that point, therefore,the performance of the two algorithms was the same.A small component of the higher AUC for Maxent was due to the lower omission rate it achieved to the right of that point.However,most of Maxent’s higher AUC occurred to the left of that point,where many test localities fell in small areas very strongly predicted by Maxent.In contrast,GARP did not differ-

S.J.Phillips et al./Ecological Modelling190(2006)231–259247 Table2

Results of the equalized predicted area tests of omission for B.variegatus and M.minutus produced with Maxent and GARP using the climatic and elevational variables

Data partition GARP threshold=1GARP threshold=10

Area Maxent omission GARP omission Area Maxent omission GARP omission B.variegatus-10.590.030.030.270.170.17

B.variegatus-20.560.030.060.340.110.17

B.variegatus-30.610.060.030.330.090.17

B.variegatus-40.630.1400.400.170.06

B.variegatus-50.670.0300.360.110.26

B.variegatus-60.69000.290.140.11

B.variegatus-70.74000.310.030.14

B.variegatus-80.69000.330.170.11

B.variegatus-90.72000.360.060.11

B.variegatus-100.610.060.030.340.140.17

Average0.6520.0340.0140.3330.1200.149

M.minutus-10.1200.070.060.040.15

M.minutus-20.1000.070.060.040.15

M.minutus-30.1600.040.070.070.15

M.minutus-40.1700.040.080.040.04

M.minutus-50.1200.070.0600.15

M.minutus-60.1200.040.060.070.11

M.minutus-70.16000.0900.07

M.minutus-80.17000.0900

M.minutus-90.17000.0900.04

M.minutus-100.18000.0800.07

Average0.14600.0330.0730.0260.093

Area(proportion of the study area predicted by GARP with the indicated threshold)and extrinsic omission rate(proportion of test localities falling outside the prediction)for each algorithm are given for each random partition of occurrence records under two threshold scenarios. Thresholds were set for the extremes of the GARP predictions:any GARP model predicting presence(GARP threshold=1)and all10GARP models predicting presence(GARP threshold=10).To allow for direct comparison of omission rates between the algorithms,thresholds were then set for each Maxent model to yield a binary prediction with the same area as the corresponding GARP prediction.

entiate environmental quality to the left of that point, as all pixels there were predicted by all10of the best-subset models.Results for other data partitions were roughly similar(not shown).

3.2.Visual interpretation

The output format differs dramatically between Maxent and GARP,so care must be taken when making comparisons between them.Maxent produces a con-tinuous prediction with values ranging from0to100, whereas GARP,as used here,yields a discrete compos-ite prediction with integer values from0to10.Visual inspection of the Maxent predictions for both species indicated that a low threshold was appropriate,and in general terms,pixels with predicted values of at least 1may be interpreted as a reasonable approximation of the species’potential distribution.This is in concor-dance with the thresholds obtained in Section3.1.1, and the theoretical expectation that the omission rate for a thresholded cumulative prediction will be approx-imately equal to the threshold value(see Section2.2). For GARP,visual inspection suggested a higher thresh-old in the range5–10was appropriate for approximat-ing the species’potential distribution.In the following sections,we interpret the models under this framework.

3.2.1.Models derived from climatic and elevational variables

When using the full set of occurrence localities for each species,the two algorithms produced broadly similar predictions for the potential geographic distri-bution of B.variegatus(Fig.4).For this species,both algorithms indicated suitable conditions throughout most of lowland Central America,wet lowland areas of northwestern South America,most of the Amazon

248S.J.Phillips et al./Ecological Modelling190(2006)231–259

Table3

Results of the equalized predicted area tests of omission for B.variegatus and M.minutus produced with Maxent and GARP using the climatic, elevational and potential vegetation variables

Data partition GARP threshold=1GARP threshold=10

Area Maxent omission GARP omission Area Maxent omission GARP omission B.variegatus-10.570.030.030.280.200.23

B.variegatus-20.5800.060.290.110.29

B.variegatus-30.6700.030.330.140.11

B.variegatus-40.67000.420.060.11

B.variegatus-50.670.030.030.360.140.17

B.variegatus-60.71000.280.170.17

B.variegatus-70.74000.330.060.20

B.variegatus-80.67000.340.200.17

B.variegatus-90.78000.390.030.06

B.variegatus-100.67000.360.140.17

Average0.6720.0060.0140.3370.1260.169

M.minutus-10.1200.040.060.040.15

M.minutus-20.110.110.040.060.150.19

M.minutus-30.1300.040.070.040.15

M.minutus-40.1500.040.080.040.04

M.minutus-50.1200.070.0600.15

M.minutus-60.14000.050.040.11

M.minutus-70.1600.040.0800.07

M.minutus-80.16000.0800.04

M.minutus-90.16000.0800.07

M.minutus-100.17000.0700.04

Average0.1420.0110.0260.0700.0300.100

basin,large areas of Atlantic forests in southeastern Brazil,and most large Caribbean islands in the study area.The species was generally predicted absent from high montane areas,temperate areas in southern South America,and non-forested areas of central Brazil(e.g., cerrado).The algorithms differed in their predictions for non-forested savannas in northern South America. The composite GARP model indicated the species’potential presence there,but Maxent excluded some non-forested savannas in Venezuela(llanos)and the Guianas.

In contrast,the algorithms yielded quite different predictions for M.minutus(Fig.4).Maxent indicated suitable conditions for the species in the northern and central Andes(and associated mountain chains)from Bolivia and northern Chile to northern Colombia and Venezuela.It also included highland areas in Jamaica, the Dominican Republic and Haiti,as well as very small highland areas in Brazil,southeastern Mexico, Costa Rica and Panama.In contrast,GARP predicted a much more extensive potential distribution for the species.In addition to a broad highland prediction in the northern and central Andes and the Caribbean,the composite GARP prediction also included areas of the southern Andes as well as extensive highland regions in Mesoamerica,the Guianan-shield region and southeastern Brazil.The prediction in the Brazilian highlands extended into adjacent lowland areas of that country as well as into Uruguay and northern Argentina.

3.2.2.Addition of potential vegetation variable

The two algorithms responded differently to the in-clusion of the potential vegetation variable(Fig.5). The Maxent prediction with potential vegetation for B. variegatus was generally similar to the original one,

S.J.Phillips et al./Ecological Modelling190(2006)231–259249 Table4

Results of threshold-independent receiver operating characteristic(ROC)analyses for B.variegatus and M.minutus produced with Maxent and GARP using the climatic and elevational variables(left)and climatic,elevational and potential vegetation variables(right)

Data partition Without potential vegetation With potential vegetation

Maxent AUC GARP AUC p Maxent AUC GARP AUC p B.variegatus-10.8890.807<0.010.8790.793<0.01 B.variegatus-20.8920.765<0.010.8990.769<0.01 B.variegatus-30.8720.7790.010.8870.790<0.01 B.variegatus-40.8190.7890.510.8580.757<0.01 B.variegatus-50.8680.740<0.010.8850.753<0.01 B.variegatus-60.8810.818<0.010.8680.8120.03 B.variegatus-70.9020.812<0.010.9190.784<0.01 B.variegatus-80.8390.8070.340.8290.7860.13 B.variegatus-90.9030.794<0.010.8970.784<0.01 B.variegatus-100.8660.7790.010.8790.769<0.01 Average0.8730.7890.8800.780

M.minutus-10.9850.9260.010.9860.9460.02 M.minutus-20.9870.9310.020.9320.9430.75 M.minutus-30.9850.938<0.010.9870.939<0.01 M.minutus-40.9830.938<0.010.9840.941<0.01 M.minutus-50.9880.9260.020.9900.9260.01 M.minutus-60.9830.9470.050.9860.966<0.01 M.minutus-70.9890.950<0.010.9880.936<0.01 M.minutus-80.9880.954<0.010.9890.956<0.01 M.minutus-90.9890.952<0.010.9900.955<0.01 M.minutus-100.9850.955<0.010.9870.961<0.01 Average0.9860.9420.9820.947

For each random partition of occurrence records,the area under the ROC curve(AUC)is given for Maxent and GARP,as well as the probability of the observed difference in the AUC values between the two algorithms(under a null hypothesis that the true AUCs are equal).All AUC values for both algorithms were signi?cantly better than a random prediction(p<0.0001;individual p values not shown).AUC values are given to three decimal places to reveal small changes under addition of the potential vegetation coverage.

but now indicated unsuitable conditions for the species in the llanos of Colombia and Venezuela and in other non-forested areas in Bolivia and Brazil.On the con-trary,the composite GARP prediction with potential vegetation included was very similar to the original prediction,still indicating suitable environmental con-ditions for the species in non-forested areas of Colom-bia,Venezuela,Guyana,Brazil,Paraguay and Bolivia.

Addition of the potential vegetation variable changed the Maxent and GARP predictions for M. minutus only minimally.The Maxent prediction with potential vegetation differed principally by a sharp reduction in the area predicted for the species along the western slopes of the Andes in central and southern Peru and in northern Chile.The composite GARP prediction with potential vegetation differed from the original one mainly by indicating a smaller area of suitable environmental conditions for the species in central Chile and in central-eastern Argentina.4.Discussion and conclusions

4.1.Statistical tests

Both algorithms consistently performed signif-icantly better than random,and Maxent frequently achieved better results than GARP.Threshold-dependent binomial tests(Table1)showed low omis-sion of test localities and signi?cant predictions for both algorithms across the board.The equalized pre-dicted area test generally indicated better performance for Maxent on M.minutus,but the test did not detect a signi?cant difference between the two algorithms for B.variegatus(Tables2and3).Threshold-independent ROC analysis also showed signi?cantly better-than-random performance for both algorithms.The area under the ROC curve(AUC)was signi?cantly higher for Maxent on almost all data partitions for both species (Table4).Use of the categorical potential vegetation

250S.J.Phillips et al./Ecological Modelling190(2006)

231–259

Fig.3.Extrinsic receiver operating characteristic(ROC)curves for Maxent and GARP on the?rst random partition of occurrence records of B.variegatus(left)and M.minutus(right).Models were produced using the climatic and elevational variables.Sensitivity equals the proportion of test localities correctly predicted present(1–extrinsic omission rate).The quantity(1–speci?city)equals the proportion of all map pixels predicted to have suitable conditions for the species. Note that both algorithms perform much better than random,and that Maxent is generally superior to GARP;see Table4for results of statistical analyses.B.variegatus is a wider-ranging species than M. minutus,so it has a smaller maximum achievable AUC in these ROC analyses performed without true absence data(see Section2.6.2). The curves therefore do not necessarily imply that the algorithms are performing better on M.minutus.

variable(in addition to the continuous climatic and elevational variables)generally improved performance for both algorithms on M.minutus and for Maxent on B.variegatus,but the changes had limited statis-tical signi?cance,likely due to the small amount of data.4.2.Biological interpretations

Both algorithms produced reasonable predictions of the potential distribution for B.variegatus.The areas predicted by5–10GARP models were similar geo-graphically to those areas predicted with a value of at least1(out of100)for Maxent.Although much research addressing the issue of operationally deter-mining an optimal threshold remains for both algo-rithms,these thresholds produce good maps of the species’potential distributions(areas of suitable en-vironmental conditions).In particular,the models per-form far superior to the shaded outline maps available in standard?eld guides,(e.g.,Eisenberg and Redford, 1999;Emmons,1997),and in digital compilations of species ranges designed for use in conservation biology and macroecological studies(Patterson et al.,2003). Most strikingly,the models correctly indicate an ex-pansive region of unsuitable environmental conditions for B.variegatus in the non-forested cerrado of Brazil, whereas the shaded outline maps indicate continuous distribution for the species from Amazonian forests to coastal Atlantic forests.Although GARP has the capacity to consider categorical variables,the inclu-sion of the potential vegetation variable did not rectify the de?ciencies seen in the original composite GARP prediction for B.variegatus.In contrast,Maxent suc-cessfully integrated this additional information.This is most evident in close-up images in Fig.4.2,where GARP(incorrectly)predicted suitable conditions for the species in the non-forested llanos of Colombia and Venezuela.

In contrast to the generally similar predictions for B.variegatus,different de?ciencies were evident in the predictions produced by the two algorithms for M.min-utus.Maxent produced an impressive prediction within the species’known range.The Maxent prediction lay almost entirely within the Andes.However,wet mon-tane forests also exist in Mesoamerican,Guianan and Brazilian highlands.Those areas likely contain condi-tions suitable for the species,but it apparently has not been able to colonize them due to geographic barri-ers.We investigated possible reasons for Maxent’s low prediction in these areas,by examining environmen-tal characteristics of six classical highland sites that are well sampled for small mammals:Monteverde and Cerro de la Muerte in Costa Rica,Auyan-tepui and Mount Duida in southern Venezuela,and Itatiaia and

宇宙的熵增

宇宙会消亡吗主流猜想——宇宙”热寂” 宇宙热寂的由来热力学第二定律：能量可以转化，但是无法100%利用。在转化过程中，总是有一部分能量会被浪费掉。这部分浪费掉能量命名为熵。熵不断在增加。热力学第二定律也叫熵增原理。就是孤立热力学系统的熵不减少，总是增大或者不变。用来给出一个孤立系统的演化方向。说明一个孤立系统不可能朝低熵的状态发展即不会变得有序。考虑到宇宙的能量总和是一个常量，而每一次能量转化，必然有一部分”有效能量”变成”无效能量”（即”熵”），因此不难推论，有效能量越来越少，无效能量越来越多。直到有一天，所有的有效能量都变成无效能量，那时将不再有任何能量转化，这就叫宇宙的”热寂”。结论宇宙就会进入一个死寂的状态。我的观点，宇宙不会”热寂”。宇宙是无限的，动态循环的。这个宇宙是总称，不是有人提的这里一个宇宙，那里一个，多少年前或者远处还有一个宇宙。这种分法应该叫天体，为什么还要叫宇宙呢？宇宙存在自发的熵减的过程。物质合久必分，分久必合。基本微观粒子质子、电子、中子、光子等聚合原子、分子等物质，这种聚合靠物质自身的引力、电磁力等自发完成。分子组成有序多样的宏观世界。这种聚合可以在某处相对孤立系统中完成，不需要外部干涉。这个过程是熵减的。星系是宇宙的常见的单元。星系的核心是恒星，恒星的质量足够大，引力也足够大。最初恒星的物质含有的内能（主要是核能）充足，反应活跃与引力平衡，释放出光芒。当能量消耗到一定程度，内能不足以抵抗引力，不能再向外释放能

量。恒星不断收缩、坍塌，并且不断吸收外界物质，此时“恒星”只进不出，形成“黑洞”，黑洞缓慢吸收飞来的各种物质和能量，质量超大，大到吞噬星系的大部分行星和尘埃，甚至是相邻的星系的一部分。黑洞也有生命周期，这个也许比恒星的生命更长。黑洞碾碎了大部分物质（分子原子）分解成基本粒子，喷射出去。这一过程是熵增的。粒子聚合成原子分子直至形成新的星系或者被其他星系吸收，形成循环。宇宙各处随机重复着这一过程。每一个循环周期都是极漫长的。物质的多样性是基本粒子聚合的随机性造成的。如同生物的多样性，生物演化出动、植物、细菌等等。生物的生成发展也主要是熵减的过程。但是这种熵减的体量相对于星系的熵增是微不足道的。生物不改变星系毁灭的历程。星系中存在生物也许是偶然的。智慧生命逃出一个星系的衰变毁灭，应该是概率很小的。路过万丈红尘

南京工程学院信息论参考试卷D

一填空题（本题15空,每空1分,共15分） 1 已知一个二元信源{0，1}等概分布，连接到一个二元信道 ? ?????=98.002.002.098.0ji p ，则p(x1,y1)=（），p(x1,y2)=（），p(x2,y1)=（），p(x2,y2)=（）；可得p(y1)=（）； p(y2)=（）。 2 一离散准对称信道的转移概率矩阵为 ??? ???=3/16/13/16/16/16/13/13/1P ，可得到该信道的信道容量C=（），此时信道输入端的概率分布为（）。若将两个这样的准对称信道串接后，（能/不能）（）构成一个新的信道，原因是（）。 3 R （D ）函数是在限定失真为D 的条件下，信源的最（）信息速率，被定义为（）。 4 信源序列往往具有很强的相关性，要提高信源的效率首先要解除信源的相关性。预测编码解除信源相关性的方法是在（）域中进行，而变换编码是在（）域中进行。 5 常用的检纠错方法有（）、检错重传和混合纠错三种。二判断题（本题10小题,每小题1分,共10分）（1）独立并联信道容量 ∑=≥L l l L C C 1 ,...,2,1；只有在输入符号相互独立时，且p(X 1,X 2,…,X L ) 达到最佳分布时，容量最大。（）（2）预测编码是通过在频域上解除序列的相关性来压缩码率的。（）（3）信源的冗余度=1-信源效率。（）（4）用信噪比换取频带是现代扩频通信的基本原理。（）（5）互信息I(X;Y)与信源熵H(X)的关系为：I(X;Y)≤H(X)。（）（6） )(D R 是关于失真度D 的严格递减函数。（）（7）若要求能检测出5个独立随机错误，则要求最小码距6min =d 。（）（8）信源编码的基本途径有2个：一是使序列中的各个符号尽可能相互独立；二是使各个符号出现的概率尽可能地相等。。（）（9）信息在处理过程中，具有信息不增性。（）（10）任意非系统线性分组码都可等价为相应的系统分组码。（）四计算题（本题3小题,共25分） 1 布袋中有手感完全相同的3个红球和3个兰球，每次从中随机取出一个球，取出后不放回布袋。用Xi 表示第i 次取出的球的颜色，i=1,2,….,6。求：1）H(X1)、H(X2)； 2）H(X2/X1)； 3）随k 的增加，H(Xk/X1…Xk -1)是增加还是减少？请解释。 (4+2+2=8分)

熵的应用和意义

浅谈熵的意义及其应用摘要：介绍了熵这个概念产生的原因，以及克劳修斯对熵变的定义式；介绍了玻尔兹曼从微观角度对熵的定义及玻尔兹曼研究工作的重要意义；熵在信息、生命和社会等领域的作用；从熵的角度理解人类文明和社会发展与环境的关系。关键词：克劳修斯熵玻尔兹曼熵信息熵生命熵社会熵 0 前言：熵是热力学中一个非常重要的物理量，其概念最早是由德国物理学家克劳修斯（R.Clausius）于1854年提出，用以定量阐明热力学第二定律，其表达式为 dS=(δQ/T)rev。但克劳修斯给出的定义既狭隘又抽象。1877年，玻尔兹曼（L.Boltzmann）运用几率方法，论证了熵S与热力学状态的几率W之间的关系，并由普朗克于1900给出微观表达式S=k logW,其中k为玻尔兹曼常数。玻尔兹曼对熵的描述开启了人们对熵赋予新的含义的大门，人们开始应用熵对诸多领域的概念予以定量化描述，促成了广义熵在当今自然及社会科学领域的广泛应用【1】【2】。 1 熵的定义及其意义由其表达式可知，克劳修克劳修斯所提出的熵变的定义式为dS=(δQ/T)rev ，斯用过程量来定义状态函数熵，表达式积分得到的也只是初末状态的熵变，并没有熵的直接表达式，这给解释“什么是熵”带来了困难。【1】直到玻尔兹曼从微观角度理解熵的物理意义，才用统计方法得到了熵的微观表达式：S=k logW。这一公式对应微观态等概出现的平衡态体系。若一个系统有W个微观状态数，且出现的概率相等，即每一个微观态出现的概率都是p=1/W，则玻尔兹曼的微观表达式还可写为：S=-k∑plogp。玻尔兹曼工作的杰出之处不仅在于它引入了概率方法，为体系熵的绝对值计算提供了一种可行的方案，而且更在于他通过这种计算揭示了熵概念的一般性的创造意义和价值：上面所描述的并不是体系的一般性质量和能量的存在方式和状态，而是这些质量和能量的组构、匹配、分布的方式和状态。玻尔兹曼的工作揭示了正是从熵概念的引入起始，科学的视野开始从对一般物的质量、能量的研究转入对一般物的结构和关系的研究，另外，玻尔兹曼的工作还为熵概念和熵理论的广义化发展提供了科学依据。正是玻尔兹曼开拓性的研究，促使熵概念与信息、负熵等概念联姻，广泛渗透，跨越了众多学科，并促

熵增原理

热力学第一定律就是能量守恒与转换定律，但是它并未涉及能量状态的过程能否自发地进行以及可进行到何种程度。热力学第二定律就是判断自发过程进行的方向和限度的定律，它有不同的表述方法：克劳修斯的描述①热量不可能自发地从低温物体传到高温物体，即热量不可能从低温物体传到高温物体而不引起其他变化；开尔文的描述②不可能从单一热源取出热量使之全部转化为功而不发生其他影响；因此第二类永动机是不可能造成的。热力学第二定律是人类经验的总结，它不能从其他更普遍的定律推导出来，但是迄今为止没有一个实验事实与之相违背，它是基本的自然法则之一。由于一切热力学变化（包括相变化和化学变化）的方向和限度都可归结为热和功之间的相互转化及其转化限度的问题，那么就一定能找到一个普遍的热力学函数来判别自发过程的方向和限度。可以设想，这种函数是一种状态函数，又是一个判别性函数（有符号差异），它能定量说明自发过程的趋势大小，这种状态函数就是熵函数。如果把任意的可逆循环分割成许多小的卡诺循环，可得出 0i i Q r T δ=∑ (1) 即任意的可逆循环过程的热温商之和为零。其中，δＱi 为任意无限小可逆循环中系统与环境的热交换量；Ｔi 为任意无限小可逆循环中系统的温度。上式也可写成 0Qr T δ=? (2) 克劳修斯总结了这一规律，称这个状态函数为“熵”，用Ｓ来表示，即 Qr dS T δ= (3) 对于不可逆过程，则可得ｄＳ＞δＱr/Ｔ (4) 或ｄＳ－δＱr/Ｔ＞０ (5) 这就是克劳修斯不等式，表明了一个隔离系统在经历了一个微小不可逆变化后，系统的熵变大于过程中的热温商。对于任一过程（包括可逆与不可逆过程），则有ｄＳ－δＱ/Ｔ≥０ (6) 式中：不等号适用于不可逆过程，等号适用于可逆过程。由于不可逆过程是所有自发过程之共同特征，而可逆过程的每一步微小变化，都无限接近于平衡状态，因此这一平衡状态正是不可逆过程所能达到的限度。因此，上式也可作为判断这一过程自发与否的判据，称为“熵判据”。对于绝热过程，δＱ＝０，代入上式，则ｄＳj≥０ (7) 由此可见，在绝热过程中，系统的熵值永不减少。其中，对于可逆的绝热过程，ｄＳj ＝０，即系统的熵值不变；对于不可逆的绝热过程，ｄＳj ＞０，即系统的熵值

信息论试卷含答案

《信息论基础》模拟试卷一、填空题（共15分，每空1分） 1、信源编码的主要目的是，信道编码的主要目的是。 2、信源的剩余度主要来自两个方面，一是，二是。 3、三进制信源的最小熵为，最大熵为。 4、无失真信源编码的平均码长最小理论极限制为。 5、当时，信源与信道达到匹配。 6、根据信道特性是否随时间变化，信道可以分为和。 7、根据是否允许失真，信源编码可分为和。 8、若连续信源输出信号的平均功率为2σ，则输出信号幅度的概率密度是时，信源具有最大熵，其值为值。 9、在下面空格中选择填入数学符号“,,,=≥≤?”或“?” （1）当X 和Y 相互独立时，H （XY ） H(X)+H(X/Y) H(Y)+H(X)。（2）()() 1222 H X X H X = ()()12333H X X X H X = （3）假设信道输入用X 表示，信道输出用Y 表示。在无噪有损信道中，H(X/Y) 0, H(Y/X) 0,I(X;Y) H(X)。二、（6分）若连续信源输出的幅度被限定在【2,6】区域内，当输出信号的概率密度是均匀分布时，计算该信源的相对熵，并说明该信源的绝对熵为多少。三、（16分）已知信源 1234560.20.20.20.20.10.1S s s s s s s P ????=???????? （1）用霍夫曼编码法编成二进制变长码；（6分）（2）计算平均码长L ;（4分）（3）计算编码信息率R ';（2分）（4）计算编码后信息传输率R ;（2分）（5）计算编码效率η。（2分）四、（10分）某信源输出A 、B 、C 、D 、E 五种符号，每一个符号独立出现，出现概率分别为1/8、1/8、1/8、1/2、1/8。如果符号的码元宽度为0.5s μ。计算：（1）信息传输速率t R 。（5分）（2）将这些数据通过一个带宽为B=2000kHz 的加性白高斯噪声信道传输，噪声的单边功率谱密度为 6010W n Hz -=。试计算正确传输这些数据最少需要的发送功率P 。（5分）

信息论习题

前三章习题选择题 1、离散有记忆信源],[21x x X =，12()()0.5P x P x ==，其极限熵H ∞ C 。 A 、1bit > B 、1bit < C 、1bit = D 、不能确定 2、任意离散随机变量X 、Y 、Z ， C 必定成立 A 、)|()|(XZ Y H YZ X H = B 、)()()()(Z H Y H X H XYZ H ++= C 、)|()|(Y X H YZ X H ≤ D 、0)|;(=Z Y X I 3、|Y X P 给定时，(;)I X Y 是X P 的 A 函数。 A 、上凸 B 、下凸 C 、上升 D 、下降 4、使(;)I X Y 达到最大的 D 称为最佳分布。 A 、联合分布 B 、后验分布 C 、输出分布 D 、输入分布 5、离散平稳无记忆信源],[21x x X =，且bit X H 1)(=，则=)(1x P D 。 A 、41 B 、2 C 、1 D 、2 1 6、=);(Y X I C 。 A 、)|()(X Y H X H - B 、)|()(Y X H Y H + C 、)|()(X Y H Y H - D 、)()(X H XY H - 7、通常所说的“连续信源”是指 B 信源。 A 、时间连续且取值连续的 B 、取值连续 C 、时间离散且取值连续的 D 、时间连续 8、已知信道，意味着已知 B 。 A 、先验分布 B 、转移概率分布 C 、输入输出联合概率分布 D 、输出概率分布 9、已知X Y P |，可求出 B A 、)(XY H B 、 )|(X Y H C 、);(Y X I D 、)|(i j x y I 10、连续信源的输出可用 D 来描述 A 、常量 B 、变量 C 、离散随机变量 D 、连续随机变量 11、101 )(=i x P ，则=)(i x I D 。 A 、bit 10ln B 、dit 10ln C 、dit 1 D 、dit 10log 12、信道容量表征信道的 A 。 A 、最大通过能力 B 、最大尺寸 C 、最小通过能力 D 、最小尺寸 13、DMS 的信息含量效率等于信源的实际熵 C 信源的最大熵。 A 、乘以 B 、减去 C 、除以 D 、加上 14、下面信道矩阵为准对称信道的是。

熵增加原理

熵增加原理热力学第一定律是能量的定律，热力学第二定律是熵的法则．相对于“能量”，“熵”的概念比较抽象．但随着科学的发展，“熵”的意义愈来愈重要．本文从简述热力学第二定律的建立过程着手，从各个侧面讨论“熵”的物理本质、科学内涵，以加深对它的理解． “熵”是德国物理学家克劳修斯在1865年创造的一个物理学名词，其德语为entropie,简单地说，熵表示了热量与温度的比值，具有商的意义．1923年5月25日，普朗克在南京的东南大学作“热力学第二定律及熵之观念”的学术报告时，为其作现场翻译的我国著名物理学家胡刚复根据entropie的物理意义，创造了“熵”这个字，在“商”旁加火字表示这个热学量．一、热力学第二定律１．热力学第二定律的表述 19世纪中叶，克劳修斯（Ｒ．Ｅ．Ｃｌａｕｓｉｕｓ，德，1822—1888）和开尔文（ＫｅｌｖｉｎＬｏｒｄ即Ｗ．Ｔｈｏｍｓｏｎ，英1824—1907）分别在证明卡诺定理时，指出还需要一个新的原理，从而发现了热力学第二定律．克劳修斯1850年的表述为，不可能把热量从低温物体传到高温物体而不引起其他变化．1865年，克劳修斯得出了热力学第二定律的普遍形式：在孤立系统中，实际发生的过程总是使整个系统的熵值增加，所以热力学第二定律又称“熵增加原理”．其数学表示为ＳB－ＳA＝，或ｄＳ≥ｄＱ／Ｔ（无穷小过程）．式中等号适用于可逆过程．开尔文1951年的表述为，不可能从单一热源吸热使之完全变成有用的功而不引起其他变化，开氏表述也可以称为，第二类永动机是不可能造成的．所谓第二类永动机是指能从单一热源吸热，使之完全变成有用的功而不产生其他影响的机器，该机不违反热力学第一定律，它能从大气或海洋这类单一热源吸取热量而做功．２．热力学第二定律的基本含义热力学第二定律的克氏表述和开氏表述具有等效性，设想系统经历一个卡诺循环，可以证明，若克氏表述不成立，则开氏表述也不成立；反之，亦能设想系统完成一个逆卡诺循环，如果开氏表述不成立，则克氏表述也不成立．克氏表述和开氏表述直接指出，第一，摩擦生热和热传导的逆过程不可能自动发生，也就是说摩擦生热和热传导过程具有方向性；第二，这两个过程一经发生，就在自然界留下它的后果，无论用怎样曲折复杂的方法，都不可能将它留下的后果完全消除，使一切恢复原状．只有无摩擦的准静态过程被认为是可逆过程．

信息论与编码试卷A

南京工程学院试题评分标准及参考答案一填空题（本题15空,每空1分,共15分） 1 一个消息来自于四符号集{a ，b ，c ，d}，四符号等概出现。由10个符号构成的消息“abbbccddbb ”所含的信息量为（ 20 ）bit ，平均每个符号所包含的信息量为（ 2 ）bit 。 2 两个二元信道的信道转移概率矩阵分别为 ????? ?????=??????=2/16/13/13/12/16/16/13/12/1,10000121P P ，则此信道的信道容量C1=（1bit/符号），信道2的信道容量C2=（ 0.126bit/符号）。两信道串联后，得到的信道转移概率矩阵为（??? ? ??=2/16/13/16/13/12/1P ），此时的信道容量C= （0.126bit/符号）。 3 条件熵H(Y/X)的物理含义为（唯一地确定信道噪声所需要的平均信息量），所以它又称为（噪声熵或散布度）。 4 一袋中有5个黑球、10个白球，以摸一个球为一次实验，摸出的球重新放进袋中。第一次实验包含的信息量为（0.915bit/符号）；第二次实验包含的信息量为（0.915bit/符号）。 5 线性分组码的伴随式定义为（S=EH T ），其中错误图案E 指的是（E=R-C (mod M)）。二进制码中，差错个数可等效为（收码和发码的汉明距离）。 6 设有一个二元等概信源：u={0，1}，P 0=P 1=1/2，通过一个二进制对称信道BSC ，

其失真函数d ij 与信道转移概率P ji =p(v j /u i )分别定义为 ?? ?=≠=j i j i d ij 01， ???=-≠=j i j i P ji ε ε1，则失真矩阵[d ij ]=（ ??????0110 ），平均失真D=（ ε ）。二判断题（本题10小题,每小题1分,共10分） 1．√2．√3．×4．×5．×6．√7．×8．√9．×10．√ （1） I(p i ) = -logp i 被定义为单个信源消息的非平均自信息量，它给出某个具体消息信源的信息度量。（）（2）异前置码一定是唯一可译码。（）（3）无记忆离散消息序列信道，其容量C ≥各个单个消息信道容量之和。（）（4）冗余度是表征信源信息率多余程度的物理量，它描述的是信源的剩（）（5）当信道固定时，平均互信息),(Y X I 是信源分布的∪型凸函数。（）（6） BCH 码是一类线性循环码，其纠错能力强、构造方便。（）（7） R(D)被定义为在限定失真为D 的条件下，信源的最大信息速率。（）（8）设P 为某马尔可夫信源的转移概率矩阵，若存在正整数N 使得N P 中的元素全都为0，则该马尔可夫信源存在稳态分布。（）（9）信道容量随信源概率分布的变化而变化。（）（10）如果两个错误图样e1、e2的和是一个有效的码字，则它们具有相同的伴随式。（）三名词解释（本题4小题,每小题5分,共20分） 1 极限熵序列长度趋于无限大时，序列的平均符号熵称为极限熵，又称极限信息量。 2 最佳变长码变长编码中，所有编出的唯一可译码中平均码长最短的码即为紧致码。 3 限失真信源编码离散无记忆信源X 的信息率失真函数为R(D)，当信息率大于R(D)时，只要信源序列的长度足够长，一定存在一种编码方法，其译码失真小于或等于D+ε；反之，则无论采用什么方法，其译码失真必大于D 。 4 信道容量平均互信息I （X ；Y ）在转移概率p(y/x)一定时，关于X 的概率分布是上凸函数，因此有极大值存在，这个极大值定义为信道容量) ;(max ) (Y X I C xi p =

南京工程学院信息论与编码复习试卷

南京工程学院试卷A 一填空题（本题15空 ,每空1分,共15分） 1一个消息来自于四符号集{a，b，c，d}，四符号等概出现。由10个符号构成的消息“abbbccddbb”所含的信息量为（）bit，平均每个符号所包含的信息量为（）bit。 3条件熵H(Y/X)的物理含义为（），所以它又称为（）。 4一袋中有5个黑球、10个白球，以摸一个球为一次实验，摸出的球重新放进袋中。第一次实验包含的信息量为（）bit/符号；第二次实验包含的信息量为（）bit/符号。 5线性分组码的伴随式定义为（），其中错误图案E指的是（）。二进制码中，差错个数可等效为（）。二判断题（本题10小题,每小题1分,共10分 1）I(p i) = - logp i被定义为单个信源消息的非平均自信息量，它给出某个具体消息信源的信息度量。（）（2）异前置码一定是唯一可译码。（）（3）无记忆离散消息序列信道，其容量C≥各个单个消息信道容量之和。（）

（4）冗余度是表征信源信息率多余程度的物理量，它描述的是信源的剩余。（）（5）当信道固定时，平均互信息是信源分布的∪型凸函数。（）（6）BCH码是一类线性循环码，其纠错能力强、构造方便。（）（7）R(D)被定义为在限定失真为D的条件下，信源的最大信息速率。（）（8）设P为某马尔可夫信源的转移概率矩阵，若存在正整数N使得中的元素全都为0，则该马尔可夫信源存在稳态分布。（）（9）信道容量随信源概率分布的变化而变化。（）（10）如果两个错误图样e1、e2的和是一个有效的码字，则它们具有相同的伴随式。（）三名词解释（本题4小题,每小题5分,共20分） 1 极限熵 2最佳变长码 3 限失真信源编码 4 信道容量

熵

熵的由来物理学中，熵有两个定义——热力学定义和统计力学定义。熵最初是从热力学角度定义的。19世纪50年代，克劳修斯（... R J E C lausius）编造了一个新名词：entropy，它来自希腊词“trope”，意为“转变，变换”。为了与能量（energy）相对应，克劳修斯在“trope”上加了一个前缀“en”。在克劳修斯看来，“energy”和“entropy”这两个概念有某种相似性。前者从正面量度运动转化的能力；后者从反面量度运动不能转化的能力，即运动丧失转化能力的程度，表述能量的可转换能力（活力）丧失的程度，或能量僵化（蜕化）的程度（尽管能量总体是守恒的）。例如，你用20元人民币购得一袋大米，你的价值总量（能量）不变，但一袋大米在市场上的再交换能力（活力）低于20元人民币。这种消费使其熵（经济）增大。按当初的设计，活力越丧失，能量越僵化，熵越大。热力学第一定律描述了自然界中各种形式的能量转换过程中量的守恒，并未指出不同形式能量的本质的差异。而热力学第二定律告诉我们，能量之间的品质是有差别的：有序运动的能量可以通过做功完全转变成无序运动的能量；而无序运动的能量不能完全转变成有序运动的能量（效率为100%的热机是不能实现的）。或者说，有序运动的能量转化为其他形式的能量的能力强，能被充分利用来做功，品质较高；而无序运动的能量转化能力弱，做功能力差，品质较低。根据热力学第二定律，高品质的能量转换为低品质的能量的过程是不可逆的。高品质的能量转换为低品质的能量后，就有一部分不能再做功了。我们把这样的过程称为能量的退化，通过物理学知识可以证明：退化的能量与系统的熵增成正比。于是，我们可以说：熵是能量不可用程度的度量。 “熵”的中文译名是我国物理学家胡刚复教授确定的。他于1923年5月为德国物理学家普朗克作《热力学第二定律及熵之观念》讲学时做翻译，把“entropy”译为“熵”。它是热量变化与温度之比（商），又与热学有关，就加了个“火”字旁，定名为熵。我们知道，事物（封闭系统）变化的过程大多是不可逆的。从初态可变到终态，而终态却不能自发地（不影响周围环境）变回初态，尽管能量始终是守恒的。例如，封闭容器中气体分子可以自由膨胀充满整个容器，但却不能自发地回缩到原来的某个局部；瓷瓶落地成碎片，而碎瓶却不能自发复原成瓷瓶；生米煮成熟饭，熟饭却不能晾干成生米；热量可以自动从高温物体传递给与之相连的低温物体，但却不能自动逆向传递，等等。这就是说，这些初态与终态之间有着某种本质上的差别。物理学家用“熵”（S）这个物理概念来描述这种差别，进而用“熵变”（S ?）这个物理量来计算这种差别。认为初态（宏观）所含的微观状态数较少（即熵值小，较有序），而终态（宏观）则相反。在一封闭系统中，自然演变总是指向微观状态数多的方向（熵值大，较无序）发展，而不是相反。这就是熵增大原理：0 ?>。 S 增大的最终结果只能是大家都处在同等状态——平衡态，碎瓶越摔越碎，温度差越来越小。 1896年，奥地利物理学家玻尔兹曼从分子运动论的观点对熵做了微观解释，认为熵是分子运动混乱程度的量度。这不仅是人们对熵的理解豁然开朗，而且为熵概念的泛化（推广）创造了契机。玻尔兹曼证明了，在系统的总能量、总分子数一定的情况下，表征系统宏观状态的熵（S）与该宏观态对应的微观数W有如下关系： =? S k W ln 这就是著名的玻尔兹曼公式。它把熵和微观状态数联系起来，熵越大，微观状态数越多，分子运动越混乱，熵成为分子运动混乱程

南京工程学院信息论参考试卷iJ

一填空题（本题15空，每空1分，共15分） 1 设在一8行×8列共64个方格的正方形棋盘上，甲随意将一粒棋子放在棋盘的某个方格，让乙猜测棋子所在的位置。如将方格按顺序编号 64;......,3;2;188332211→→→→y x y x y x y x ，则令乙猜测棋子所在方格顺序号的信息量为（log 264=6）bit ；如方格按行和列编号，甲将棋子所在方格的行编号告诉乙后，在令乙猜测棋子所在列所需的信息量为（3）bit 。 2 信源的平均自信息量指的是（平均每个符号所能提供的信息量）；信源熵用来表征信源的（平均不确定度）；平均互信息量I(X;Y)的物理含义是（Y 已知后所获得的关于X 的信息），I(Y ; X)的物理含义是（X 已知后所获得的关于Y 的信息）。 3 传输信道中常见的错误有（随机错误）、（突发错误）和混合错误三种；差错控制方式主要有（检错重发）、（前向纠错）和混合方式三种。 4 设C = {11100, 01001, 10010,00111}是一个二元码，该码的最小距离dmin =（3），则该码最多能检测出（2）个随机错，最多能纠正（1）个随机错。 5 设有一个二元等概信源：u={0，1}，P 0=P 1=1/2，通过一个二进制对称信道BSC ，其失真函数d ij 与信道转移概率P ji =p(v j /u i )分别定义为 ???=≠=j i j i d ij 1， ?? ?=-≠=j i j i P ji ε ε1，则失真矩阵[d ij ]=（1 00 1?? ???），平均失真D=（ ε ）。二判断题（本题10小题,每小题1分,共10分） 1．√2．×3．√4．√5．×6．×7．√8．√9．√10．× （1）对于独立信源，不可能进行预测编码。（）（2）信息率失真函数的意义是：对于给定的信源，在满足保真度准则*D D ≤的前提下，信息率失真函数R(D)是信息率允许压缩到的最大值。（）（3）一般情况下，互信息满足：0≤I(X;Y)≤min(H(X),H(Y))。（）（4）码字集合{100，101，0，11}是唯一可译码。（）（5）互信息量 ) ;(j i y x I ≥0，即具有非负性。（）（6）若要求发现n 个独立随机错误，则要求最小码距1n min +=d 。（）（7）对于强对称信道，只有当信源等概分布时，才能使其达到信道容量C 。（）（8）二维离散平稳有记忆信源的熵满足：H(X1,X2)≤H(X1)+H(X2)。（）（9）线性分组码中任意两个码字的模2加仍为一个有用码字。（）（10）马尔可夫序列的联合概率具有时间推移不变性。（）四计算题（本题3小题,共25分） 1设有离散无记忆信源X ，其概率分布为P （X ）={0.37，0.25，0.18，0.12，0.05，0.03}，求：1）信源符号熵H （X ）； 2）用哈夫曼编码编成二元变长码，并计算其编码效率； 3）如要求译码错误小于10-3，采用定长编码达到2）中的编码效率，需要多少个信源符号一起编码？（3+4+4=11分）

南京工程学院试卷D

南京工程学院试卷D 一填空题（本题15空 ,每空1分,共15分） 3R（D）函数是在限定失真为D的条件下，信源的最（）信息速率，被定义为（）。 4信源序列往往具有很强的相关性，要提高信源的效率首先要解除信源的相关性。预测编码解除信源相关性的方法是在（）域中进行，而变换编码是在（）域中进行。 5常用的检纠错方法有（）、检错重传和混合纠错三种。二判断题（本题10小题,每小题1分,共10分）（2）预测编码是通过在频域上解除序列的相关性来压缩码率的。（）（3）信源的冗余度=1-信源效率。（）（4）用信噪比换取频带是现代扩频通信的基本原理。（）

（5）互信息I(X;Y)与信源熵H(X)的关系为：I(X;Y)≤H(X)。（）（6）R(D)是关于失真度D的严格递减函数。（）（7）若要求能检测出5个独立随机错误，则要求最小码距dmin=6。（）（8）信源编码的基本途径有2个：一是使序列中的各个符号尽可能相互独立；二是使各个符号出现的概率尽可能地相等。。（）（9）信息在处理过程中，具有信息不增性。（）（10）任意非系统线性分组码都可等价为相应的系统分组码。（）三名词解释（本题4小题,每小题5分,共20分） 1 极限熵 2 最佳变长码 3 TCM 4 信道容量

四计算题（本题3小题,共25分） 1 布袋中有手感完全相同的3个红球和3个兰球，每次从中随机取出一个球，取出后不放回布袋。用Xi表示第i次取出的球的颜色，i=1,2, (6) 求：1）H(X1)、H(X2)； 2）H(X2/X1)； 3）随k的增加，H(Xk/X1…Xk-1)是增加还是减少？请解释。(4+2+2=8分) 2 某办公室和其上级机关的自动传真机均兼有电话功能。根据多年来对双方相互通信次数的统计，该办公室给上级机关发传真和打电话占的比例约为3:7，但发传真时约有5%的次数对方按电话接续而振铃，拨电话时约有1%的次数对方按传真接续而不振铃。求：（1）上级机关值班员听到电话振铃而对此次通信的疑义度；（2）接续信道的噪声熵。（4+4=8分）

南京工程学院信息论参考试卷A

一、填空题（本题15空,每空1分,共15分） 1一个消息来自于四符号集{a ，b ，c ，d}，四符号等概出现。由10个符号构成的消息“abbbccddbb ”所含的信息量为（）bit ，平均每个符号所包含的信息量为（）bit 。 2两个二元信道的信道转移概率矩阵分别为 ????? ?????=??????=2/16/13/13/12/16/16/13/12/1,10000121P P ，则此信道的信道容量C1=（）bit/符号，信道2的信道容量C2=（）bit/符号。两信道串联后，得到的信道转移概率矩阵为（），此时的信道容量C=（）bit/符号。 3条件熵H(Y/X)的物理含义为（），所以它又称为（）。 4一袋中有5个黑球、10个白球，以摸一个球为一次实验，摸出的球重新放进袋中。第一次实验包含的信息量为（）bit/符号；第二次实验包含的信息量为（）bit/符号。 5线性分组码的伴随式定义为（），其中错误图案E 指的是（）。二进制码中，差错个数可等效为（）。 6设有一个二元等概信源：u={0，1}，P 0=P 1=1/2，通过一个二进制对称信道BSC ，其失真函数d ij 与信道转移概率P ji =p(v j /u i )分别定义为 ???=≠=j i j i d ij 01， ???=-≠=j i j i P ji εε1，则失真矩阵[d ij ]=（），平均失真D=（）。二、判断题（本题10小题,每小题1分,共10分） 1、I(p i ) = - logp i 被定义为单个信源消息的非平均自信息量，它给出某个具体消息信源的信息度量。（） 2、异前置码一定是唯一可译码。（） 3、无记忆离散消息序列信道，其容量C ≥各个单个消息信道容量之和。（） 4、冗余度是表征信源信息率多余程度的物理量，它描述的是信源的剩余。（） 5、当信道固定时，平均互信息),(Y X I 是信源分布的∪型凸函数。（） 6、BCH 码是一类线性循环码，其纠错能力强、构造方便。（） 7、R(D)被定义为在限定失真为D 的条件下，信源的最大信息速率。（）

熵与熵增原理

2.2 熵的概念与熵增原理 2.2.1 循环过程的热温商 T Q 据卡诺定理知: 卡诺循环中热温商的代数和为:0=+H H L L T Q T Q 对应于无限小的循环,则有: 0=+H H L L T Q T Q δδ 对任意可逆循环过程,可用足够多且绝热线相互恰好重叠的小卡诺循环逼近.对每一个卡诺可逆循环,均有: 0,,,,=+ j H j H j L j L T Q T Q δδ 对整个过程,则有: 0)( )( ,,,,==+ ∑∑j R j j j H j H j L j L j T Q T Q T Q δδδ 由于各卡诺循环的绝热线恰好重叠,方向相反,正好抵销.在极限情况下,由足够多的小卡诺循环组成的封闭曲线可以代替任意可逆循环。故任意可逆循环过程热温商可表示为: ?=0)( R T Q δ 即在任意可逆循环过程中,工作物质在各温度所吸的热(Q )与该温度之比的总和等于零。据积分定理可知: 若沿封闭曲线的环积分为,则被积变量具有全微分的性质,是状态函数。 2.2.2 熵的定义——可逆过程中的热温商在可逆循环过程,在该过程曲线中任取两点A 和B,则可逆曲线被分为两条,每条曲线所代表的过程均为可逆过程.对这两个过程,有: 0)()(=+??B A A B R R b a T Q T Q δδ 整理得: ??=B A B A R R b a T Q T Q )( )( δδ 这表明,从状态A 到状态B,经由不同的可逆过程,它们各自的热温商的总和相等.由于所选的可逆循环及曲线上的点A 和B 均是任意的,故上列结论也适合于其它任意可逆循环过程. 可逆过程中,由于?B A R T Q )( δ的值与状态点A 、B 之间的可逆途径无关,仅由始末态决定, 具有状态函数的性质。同时，已证明，任意可逆循环过程中r T Q ??? ??δ 沿封闭路径积分一周为 p V p

南京工程学院信息论参考试卷A

6设有一个二元等概信源：u={0，1}，P 0=P 1=1/2，通过一个二进制对称信道BSC ，其失真函数d ij 与信道转移概率P ji =p(v j /u i )分别定义为???=≠=j i j i d ij 01，???=-≠=j i j i P ji εε1，则失真矩阵[d ij ]=（），平均失真D=（）。二、判断题（本题10小题,每小题1分,共10分） 1、I(p i ) = - logp i 被定义为单个信源消息的非平均自信息量，它给出某个具体消息信源的信息度量。（） 2、异前置码一定是唯一可译码。（） 3、无记忆离散消息序列信道，其容量C ≥各个单个消息信道容量之和。（） 4、冗余度是表征信源信息率多余程度的物理量，它描述的是信源的剩余。（） 5、当信道固定时，平均互信息),(Y X I 是信源分布的∪型凸函数。（） 6、BCH 码是一类线性循环码，其纠错能力强、构造方便。（） 7、R(D)被定义为在限定失真为D 的条件下，信

熵最大原理

一、熵物理学概念宏观上：热力学定律——体系的熵变等于可逆过程吸收或耗散的热量除以它的绝对温度（克劳修斯，1865）微观上：熵是大量微观粒子的位置和速度的分布概率的函数，是描述系统中大量微观粒子的无序性的宏观参数（波尔兹曼，1872）结论：熵是描述事物无序性的参数，熵越大则无序。二、熵在自然界的变化规律——熵增原理一个孤立系统的熵，自发性地趋于极大，随着熵的增加，有序状态逐步变为混沌状态，不可能自发地产生新的有序结构。当熵处于最小值, 即能量集中程度最高、有效能量处于最大值时, 那么整个系统也处于最有序的状态,相反为最无序状态。熵增原理预示着自然界越变越无序三、信息熵（1）和熵的联系——熵是描述客观事物无序性的参数。香农认为信息是人们对事物了解的不确定性的消除或减少，他把不确定的程度称为信息熵（香农，1948 ）。随机事件的信息熵：设随机变量ξ，它有A1，A2，A3，A4，……，An共n种可能的结局，每个结局出现的概率分别为p1，p2，p3，p4，……，pn，则其不确定程度，即信息熵为（2）信息熵是数学方法和语言文字学的结合。一个系统的熵就是它的无组织程度的度量。熵越大，事件越不确定。熵等于0，事件是确定的。举例：抛硬币， p（head）=0.5，p（tail）=0.5 H（p）=-0.5log2（0.5）+（-0.5l og2（0.5））=1 说明：熵值最大，正反面的概率相等，事件最不确定。四、最大熵理论在无外力作用下，事物总是朝着最混乱的方向发展。事物是约束和自由的统一体。事物总是在约束下争取最大的自由权，这其实也是自然界的根本原则。在已知条件下，熵最大的事物，最可能接近它的真实状态。

南京工程学院信息论参考试卷H

一填空题（本题15空，每空1分，共15分） 1 互信息量I(x i ;y j )被定义为（自信息量I（xi））和（条件自信息量I（xi/yj））两个不确定度之差，是不确定度被消除的部分，即从y j 得到的关于（ xi）的信息量。 2 最大后验概率译码指的是（即选取最大的后验概率就可使译码差错最小）。在（输入等概）时，最大后验概率译码即为最大似然译码。BSC信道的最大似然译码即为（求最小汉明距离的一种译码算法）。 3 信息传输率的定义为（信道中每个信源符号传输的信息量），经信源编码后，信源符号变成了码元符号进入信道传输，此时信息传输率指的是（平均每个码元载荷的信息量）；信息传输速率被定义为（单位时间内传输的信息量），单位为bit/s。 4 若同时抛掷一对色子，设每个色子各面朝上出现的概率均为1/6，则“2和6同时出现”这一事件的自信息量为（log 2 18=4.17）bit，“两个点数中至少有一个1”这一事件的自信息量为（）bit，“两个3同时出现”这一事件的自信息量为（ 5.17）bit。 5 常用的差错控制方法有（前向纠错）、（检错重发）和混合纠错。 6 码距与检、纠错能力之间的关系是（检错能力+纠错能力≤d min +1 ）。二判断题（本题10小题,每小题1分,共10分）

1．√2． × 3． × 4． √ 5． √ 6． √ 7． × 8． √ 9． × 10 ．√ （1）完备码是一种监督位得到充分利用的码。（）（2）(n,k)线性分组码的最小汉明距离d min ≤n-k。（）（3）由香农公式可以看出，随着带宽无限增加，信道容量也可无限增大。（）（4）任意线性分组码中必包含全0码字。（）（5）码字集合{1，01，000，0010，0011}是唯一可译码。（）（6）信息率失真函数R(D)的值域为[0，H(X)]。（）（7） K-L变换是按均方误差最小准则来计算的一种非正交变换。（）（8）信源的不确定度具有可加性。（）（9）任一非系统码的生成矩阵都可以通过行运算转变成系统形式，结果是映射规则不变，码集发生线性变化。（）

信息论与编码期末试卷

信息论与编码期末试卷题号一二三四五六七八九十十一十二总成绩得分一．选择题（每小题3分，共15分） 1）设信源? ? ? ? ? ? = ? ? ? ? ? ? 8/1 8/1 4/1 2/1 4 3 2 1 x x x x P X ，则此信源的熵为：比特/ 符号 A） 1.25 B） 1.5 C） 1.75 D） 2 2）对于离散信道? ? ? ? ? ? = 0.5 0.5 0.5 0.5 P，信道容量是比特/符号A） 0 B） 1 C） 2 D） 3 3）对于三个离散信源? ? ? ? ? ? = ? ? ? ? ? ? 6.0 1.0 3.0 3 2 1 x x x P X 、? ? ? ? ? ? = ? ? ? ? ? ? 3.0 4.0 3.0 3 2 1 y y y P Y 、 ? ? ? ? ? ? = ? ? ? ? ? ? 2.0 5.0 3.0 3 2 1 z z z P Z ，其中熵最小 A） X B） Y C） Z D）无法计算 4）信源编码的变长编码中，下面说法不正确的是 A）无失真r进制变长码平均码长不得低于信源r进制符号熵 B）变长编码时，随着信源序列长度的增大，编码效率会提高 C）变长码要求各个码字的长度各不相同 D）变长编码的编码效率通常高于定长码 5）以下约束条件属于保真度准则的是共 4 页第 1 页

共 4 页第 2 页

共 4 页第 3 页

共 4 页第 4 页

练习题一参考答案一．选择题（每小题3分，共15分） 1）C ） 2）A ） 3）A ） 4）C ） 5）C ）二．三状态马尔科夫（Markov ）信源，其一步状态转移概率矩阵为???? ? ?????=p q p q p q P 0 00， 1)、求出其二步转移概率矩阵2)、计算其稳态时处于各个状态的概率3)、极限熵∞H （15分）解：1）二步转移概率矩阵为P 2 P 2= ???? ? ?????++=?????????????????????=?22222 220 00 0p pq pq q p pq q p pq pq q p q p q p q p q p q p q P P 2）假设稳态时各个状态概率为p(0)，p(1)，p(2)，则 [p(0) p(1) p(2)]= [p(0) p(1) p(2)]P 且p(0)+p(1)+p(2)=1 得到： ()()?? ? ? ?? ?? ?-=-=-= pq p p pq pq p pq q p 12111)0(22 3）极限熵∞H 为稳态时各个状态熵的数学期望