Likelihood ratios and Bayesian inference for Poisson channels

合集下载

Clinical Epidemiology - Session 4 Diagnosis and LR, ROC

What Test Characteristic is Represented by the Formula?
TP ———— TP+FP
Probability a disease is
present when the test result is positive P(D+|T+)
Probability of getting a
Breast Cancer + Breast Cancer Mammogram + Mammogram Total 28 10,000 Total
4. Subtract the left column total from the total
N to get the number without disease: 10,000 – 28 = 9972
Total
2. Put a large, round number below and to
the right of the table for your total N, for example 10,000
Using a 2x2 Table to Update Prior Probability
Set up 2 x 2 table of diagnostic results and presence of disease
Presence of Disease + Diagnostic Test Results + TP FP
-
FN
TN
TP = number of true positive specimens FP = number of false positive specimens FN = number of false negative specimens TN = number of true negative specimens

MIT-SCIENCE-Lectures-feb192002

The Bayesian Central Limit Theorem,with some intructions on#1(e)of the second problem setWe have seen Bayes formula:posterior probability density=constant prior probability density likelihood function.··The“Bayesian Central Limit Theorem”says that under certain circumstances,the posterior probability distribution is approximately a normal distribution.Actually more than one proposition is called“the Bayesian Central Limit Theorem,”and we will look at two of them.First version:SupposeΘis distributed according to the probability density function fΘ(θ), andX1,X2,X3,......[Θ=θ]∼i.i.d.f X1[Θ=θ](x)||i.e.,X1,X2,X3,...are conditionally independent given thatΘ=θ,and f X1[Θ=θ](x),as a|function of x withθﬁxed,is the conditional probability density function of X1given thatΘ=θ.Suppose the posterior probability density function fΘ[X1=x1,...,X n=x n]is everywhere|twice-diﬀerentiable,for all n.Let c=the posterior expected value=E(ΘX1=x1,...,X n=x n),and|let d=the posterior standard deviation=Var(Θ|X1=x1,...,X n=x n).Then if the distribution of U is the posterior distribution,then the distribution of U−cis dapproximately standard normal;more precisely:that distribution approaches the standard normal distribution as n approaches inﬁnity. More tersely:The posterior distribution approaches a normal distribution as the sample-size n grows.How big a value of n is needed to get a good approximation?For a partial answer to that, consider#1(e)on the second problem set.A90%posterior probability interval found without using the Bayesian Central Limit Theorem is(0.419705351626,0.788091845180).These are correct to12digits after the decimal point if I can trust my software.The endpoints of this interval diﬀer from the approximations given by the above version of the Bayesian Central Limit Theorem by slightly less than0.01.In the case of beta distributions,ﬁnding the expected value and the standard deviation is trivial:Just use what is says on page306of Schervish&DeGroot.For some other1distributions,ﬁnding the expected value and the standard deviation may require laborious integrals,and hence we might use the next version of the Bayesian Central Limit Theorem: Second version:Start with the same assumptions as in theﬁrst version.Let c=the posterior mode=the value ofθthat maximizes the posterior density function fΘ[X1=x1,...,X n=x n](θ).|Let k(θ)=log fΘ[X1=x1,...,X n=x n](θ).|Let d=(−k (c)))−1/2.Then if the distribution of U is the posterior distribution,then the distribution of U−cis dapproximately standard normal;more precisely:that distribution approaches the standard normal distribution as n approaches inﬁnity. Important:In#1(e)on the second problem set,try both versions of the Bayesian Central Limit pare the result of each with the result alleged to be correct in the second-to-last paragraph on theﬁrst page of this handout.Here is a rough sketch of a proof of the second version.Denote fΘX1=x1,...,X n=x n (θ)by f(θ),|and let k(θ)=log f(θ).Thenf(θ)=e log f(θ)k(θ)k(c)+k (c)(θ−c)+k (c)(θ−c)2/2+k (c)(θ−c)3/6+=e=e·········k(c)+k (c)(θ−c)+k (c)(θ−c)2/2=e∼k(c)+0+k (c)(θ−c)2/2=e=[constant]·e k (c)(θ−c)2/2(We must have k (c)=0and k (c)<0since there is a maximum at c.)=[constant]·e−(θ−c)2/(2σ2)provided−1/σ2=k (c),i.e.,σ=(−1/k (c))−1/2,which is the same thing that followed the words“Let d=”above.Why do the3rd-and higher-degree terms evaporate as n→∞above?To answer that you need to know how they depend on n.Those interested in such theoretical questions can work out the details.2。

二斑素瓢虫线粒体基因组全序列测定和分析

·1293·二斑素瓢虫线粒体基因组全序列测定和分析朱国渊1，张永科1，2*，孔祥东1，许丽月1，黎小清1，张小娇1，王进强1，段波1*（1云南省热带作物科学研究所，云南景洪666100；2云南农业大学植物保护学院，云南昆明650201）摘要：【目的】在线粒体基因组水平探讨二斑素瓢虫［Illeis bistigmosa （Mulsant ，1850）］完整的线粒体基因组结构特征，分析其在瓢虫科（Coccinellidae ）内的分类地位，为揭示瓢虫科昆虫的系统发育和进化关系提供理论依据。

【方法】通过Illumina 二代测序技术对二斑素瓢虫线粒体基因组进行测序，对基因组序列进行拼装和注释，分析其结构特点和碱基组成；使用最大似然法和贝叶斯法构建瓢虫科16个种线粒体基因组的系统发育进化树，分析其在瓢虫科内的系统发育关系。

【结果】二斑素瓢虫线粒体基因组序列全长17840bp ，包含13个蛋白质编码基因，22个tRNA 基因，2个rRNA 基因和1个A+T 富含区（A+T rich region ）。

基因组全序列的AT 含量为78.44%，表现明显的A+T 偏向性。

13个蛋白质编码基因中除cox1基因以TTG 为起始密码子外，其余12个基因均以ATN 为起始密码子；nad4、nad5、cox2和cox3基因的终止密码子以单独的T 结尾。

除trnS1和trnP 基因外，其余20个tRNAs 的二级结构均为典型的三叶草结构，二级结构中有U-U 或U-G 碱基错配现象。

系统发育分析显示，瓢虫亚科的所有物种聚在同一分支，支持该亚科的单系性；二斑素瓢虫与素菌瓢虫属（Illeis ）另外2个种［柯氏素菌瓢虫（I.koebelei ）和素鞘瓢虫（I.cincta ）］聚为一个分支，组成姐妹关系。

【结论】获得了二斑素瓢虫的线粒体基因组全序列，二斑素瓢虫线粒体基因组符合瓢虫科昆虫线粒体基因组的一般特征；二斑素瓢虫与柯氏素菌瓢虫和素鞘瓢虫的亲缘关系较近，与传统的形态学分类相一致。

Probability and Statistics-cookbook-en

3
Uniform (discrete)
Binomial
q q q q
Geometric
n = 40, p = 0.3 n = 30, p = 0.6 n = 25, p = 0.9 0.8
q q q q
Poisson
p = 0.2 p = 0.5 p = 0.8 0.3
q q q q q
λ=1 λ=4 λ = 10
r
x ∈ N+ λi i!
1−p p2 λ
e−λ
i=0
−)
1 We
use the notation γ (s, x) and Γ(x) to refer to the Gamma functions (see §22.1), and use B(x, y ) and Ix to refer to the Beta functions (see §22.2).
x m x m−x n−x N x k
E [X ] a+b 2 p
V [X ] (b − a + 1)2 − 1 12 p(1 − p) np(1 − p)
MX (s) eas − e−(b+1)s s(b − a) 1 − p + pes (1 − p + pes )n
k n
Uniform
Unif {a, . . . , b}
This cookbook integrates a variety of topics in probability theory and statistics. It is based on literature and in-class material from courses of the statistics department at the University of net/probability-and-statistics-cookbook/. To reproduce, please contact me.

Bayesian Inference

p(black) = 18/38 = 0.473
p(red) = 18/38 = 0.473
p(color )

What is a conditional probability?
p(black|col1) = 6/12 = 0.5
p(color | col1)

p(black|col2) = 8/12 = 0.666
p(x)
Bayesian Roulette
• We’re interested in which column will win. • p(column) is our prior. • We learn color=black.
Bayesian Roulette
• • • • We’re interested in which column will win. p(column) is our prior. We learn color=black. What is p(color=black|column)?
What do you need to know to use it?
• You need to be able to express your prior beliefs about x as a probability distribution, p (x ) • You must able to relate your new evidence to your variable of interest in terms of it’s likelihood, p(y|x) • You must be able to multiply.
p(black|col1) = 6/12 = 0.5 p(black|col2) = 8/12 = 0.666 p(black|col3) = 4/12 = 0.333 p(black|zeros) = 0/2 = 0

Blind Segmentation and Labeling of Speakers via the Bayesian Information Criterion for Vide

clustering of the final segments identifies them with a proper speaker label. 2.1 Break-point hypothesis generator Once we have the speech features for each frame, instead of hypothesizing a break point in every frame of the input signal, we select a reduced set of frames for which the system estimates that the likelihood of an acoustic change is high enough. This set will contain the most silent regions in the recording. This allows us to save an important number of calculations in the ACD procedure. The hypotheses of the ACD are the initial and final time points of the hypothesized segments. To decide whether a frame belongs to a silence region or not we only take into account the frame energy. It is essential to know the characteristics of the silence and the speech with respect to their energy and automatically obtain them from the new recordings to be processed. As a first step, we obtain the energy histogram and set a threshold to split it into two regions. An example is shown in Fig. 2, where we represent the energy in dB in abscissas plotted against the number of frames.

sEparaTe包的说明文档：最大似然估计和似然比检验函数说明书

Package‘sEparaTe’August18,2023Title Maximum Likelihood Estimation and Likelihood Ratio TestFunctions for Separable Variance-Covariance StructuresVersion0.3.2Maintainer Timothy Schwinghamer<***************************.CA>Description Maximum likelihood estimation of the parametersof matrix and3rd-order tensor normal distributions with unstructuredfactor variance covariance matrices,two procedures,and for unbiasedmodiﬁed likelihood ratio testing of simple and double separabilityfor variance-covariance structures,two procedures.References:Dutilleul P.(1999)<doi:10.1080/00949659908811970>,Manceur AM,Dutilleul P.(2013)<doi:10.1016/j.cam.2012.09.017>,and Manceur AM,DutilleulP.(2013)<doi:10.1016/j.spl.2012.10.020>.Depends R(>=4.3.0)License MIT+ﬁle LICENSEEncoding UTF-8LazyData trueRoxygenNote7.2.0NeedsCompilation noAuthor Ameur Manceur[aut],Timothy Schwinghamer[aut,cre],Pierre Dutilleul[aut,cph]Repository CRANDate/Publication2023-08-1807:50:02UTCR topics documented:data2d (2)data3d (2)lrt2d_svc (3)lrt3d_svc (5)mle2d_svc (7)12data3d mle3d_svc (8)sEparaTe (10)Index11 data2d Two dimensional data setDescriptionAn i.i.d.random sample of size7from a2x3matrix normal distribution,for a small numerical example of the use of the functions mle2d_svc and lrt2d_svc from the sEparaTe packageUsagedata2dFormatA frame(excluding the headings)with42lines of data and4variables:K an integer ranging from1to7,the size of an i.i.d.random sample from a2x3matrix normal distributionId1an integer ranging from1to2,the number of rows of the matrix normal distributionId2an integer ranging from1to3,the number of columns of the matrix normal distribution value2d the sample data for the observed variabledata3d Three dimensional data setDescriptionAn i.i.d.random sample of size13from a2x3x2tensor normal distribution,for a small numerical example of the use of the functions mle3d_svc and lrt3d_svc from the sEparaTe packageUsagedata3dFormatA frame(excluding the headings)with156lines of data and5variables:K an integer ranging from1to13,the size of an i.i.d.random sample from a2x3x2tensor matrix normal distributionId3an integer ranging from1to2,the number of rows of the3rd-order tensor normal distribution Id4an integer ranging from1to3,the number of columns of the3rd-order tensor normal distribu-tionId5an integer ranging from1to2,the number of edges of the3rd-order tensor normal distribution value3d the sample data for the observed variablelrt2d_svc Unbiased modiﬁed likelihood ratio test for simple separability of avariance-covariance matrix.DescriptionA likelihood ratio test(LRT)for simple separability of a variance-covariance matrix,modiﬁed tobe unbiased inﬁnite samples.The modiﬁcation is a penalty-based homothetic transformation of the LRT statistic.The penalty value is optimized for a given mean model,which is left unstruc-tured here.In the required function,the Id1and Id2variables correspond to the row and column subscripts,respectively;“value2d”refers to the observed variable.Usagelrt2d_svc(value2d,Id1,Id2,subject,data_2d,eps,maxiter,startmat,sign.level,n.simul)Argumentsvalue2d from the formula value2d~Id1+Id2Id1from the formula value2d~Id1+Id2Id2from the formula value2d~Id1+Id2subject the replicate,also called the subject or individual,theﬁrst column in the matrix (2d)dataﬁledata_2d the name of the matrix dataeps the threshold in the stopping criterion for the iterative mle algorithm(estimation) maxiter the maximum number of iterations for the mle algorithm(estimation)startmat the value of the second factor variance-covariance matrix used for initialization,i.e.,to start the mle algorithm(estimation)and obtain the initial estimate of theﬁrst factor variance-covariance matrixsign.level the signiﬁcance level,or rejection rate in the testing of the null hypothesis of simple separability for a variance-covariance structure,when the unbiased mod-iﬁed LRT is used,i.e.,the critical value in the chi-square test is derived bysimulations from the sampling distribution of the LRT statistic n.simul the number of simulations used to build the sampling distribution of the LRT statistic under the null hypothesis,using the same characteristics as the i.i.d.random sample from a matrix normal distributionOutput“Convergence”,TRUE or FALSE“chi.df”,the theoretical number of degrees of freedom of the asymptotic chi-square distribution that would apply to the unmodiﬁed LRT statistic for simple separability of a variance-covariance structure“Lambda”,the observed value of the unmodiﬁed LRT statistic“critical.value”,the critical value at the speciﬁed signiﬁcance level for the chi-square distribution with“chi.df”degrees of freedom“mbda”will indicate whether or not the null hypothesis of separability was rejected, based on the theoretical LRT statistic“Simulation.critical.value”,the critical value at the speciﬁed signiﬁcance level that is derived from the sampling distribution of the unbiased modiﬁed LRT statistic“mbda.simulation”,the decision(acceptance/rejection)regarding the null hypothesis of simple separability,made using the theoretical(biased unmodiﬁed)LRT“Penalty”,the optimized penalty value used in the homothetic transformation between the biased unmodiﬁed and unbiased modiﬁed LRT statistics“U1hat”,the estimated variance-covariance matrix for the rows“Standardized_U1hat”,the standardized estimated variance-covariance matrix for the rows;the standardization is performed by dividing each entry of U1hat by entry(1,1)of U1hat“U2hat”,the estimated variance-covariance matrix for the columns“Standardized_U2hat”,the standardized estimated variance-covariance matrix for the columns;the standardization is performed by multiplying each entry of U2hat by entry(1,1)of U1hat“Shat”,the sample variance-covariance matrix computed from the vectorized data matrices ReferencesManceur AM,Dutilleul P.2013.Unbiased modiﬁed likelihood ratio tests for simple and double separability of a variance-covariance structure.Statistics and Probability Letters83:631-636.Examplesoutput<-lrt2d_svc(data2d$value2d,data2d$Id1,data2d$Id2,data2d$K,data_2d=data2d,n.simul=100)outputlrt3d_svc An unbiased modiﬁed likelihood ratio test for double separability of avariance-covariance structure.DescriptionA likelihood ratio test(LRT)for double separability of a variance-covariance structure,modiﬁed tobe unbiased inﬁnite samples.The modiﬁcation is a penalty-based homothetic transformation of the LRT statistic.The penalty value is optimized for a given mean model,which is left unstructured here.In the required function,the Id3,Id4and Id5variables correspond to the row,column and edge subscripts,respectively;“value3d”refers to the observed variable.Usagelrt3d_svc(value3d,Id3,Id4,Id5,subject,data_3d,eps,maxiter,startmatU2,startmatU3,sign.level,n.simul)Argumentsvalue3d from the formula value3d~Id3+Id4+Id5Id3from the formula value3d~Id3+Id4+Id5Id4from the formula value3d~Id3+Id4+Id5Id5from the formula value3d~Id3+Id4+Id5subject the replicate,also called individualdata_3d the name of the tensor dataeps the threshold in the stopping criterion for the iterative mle algorithm(estimation) maxiter the maximum number of iterations for the mle algorithm(estimation)startmatU2the value of the second factor variance-covariance matrix used for initialization startmatU3the value of the third factor variance-covariance matrix used for initialization,i.e.,startmatU3together with startmatU2are used to start the mle algorithm(estimation)and obtain the initial estimate of theﬁrst factor variance-covariancematrix U1sign.level the signiﬁcance level,or rejection rate in the testing of the null hypothesis of simple separability for a variance-covariance structure,when the unbiased mod-iﬁed LRT is used,i.e.,the critical value in the chi-square test is derived bysimulations from the sampling distribution of the LRT statistic n.simul the number of simulations used to build the sampling distribution of the LRT statistic under the null hypothesis,using the same characteristics as the i.i.d.random sample from a tensor normal distributionOutput“Convergence”,TRUE or FALSE“chi.df”,the theoretical number of degrees of freedom of the asymptotic chi-square distribution that would apply to the unmodiﬁed LRT statistic for double separability of a variance-covariance structure“Lambda”,the observed value of the unmodiﬁed LRT statistic“critical.value”,the critical value at the speciﬁed signiﬁcance level for the chi-square distribution with“chi.df”degrees of freedom“mbda”,the decision(acceptance/rejection)regarding the null hypothesis of double sep-arability,made using the theoretical(biased unmodiﬁed)LRT“Simulation.critical.value”,the critical value at the speciﬁed signiﬁcance level that is derived from the sampling distribution of the unbiased modiﬁed LRT statistic“mbda.simulation”,the decision(acceptance/rejection)regarding the null hypothesis of double separability,made using the unbiased modiﬁed LRT“Penalty”,the optimized penalty value used in the homothetic transformation between the biased unmodiﬁed and unbiased modiﬁed LRT statistics“U1hat”,the estimated variance-covariance matrix for the rows“Standardized_U1hat”,the standardized estimated variance-covariance matrix for the rows;the standardization is performed by dividing each entry of U1hat by entry(1,1)of U1hat“U2hat”,the estimated variance-covariance matrix for the columns“Standardized_U2hat”,the standardized estimated variance-covariance matrix for the columns;the standardization is performed by multiplying each entry of U2hat by entry(1,1)of U1hat“U3hat”,the estimated variance-covariance matrix for the edges“Shat”,the sample variance-covariance matrix computed from the vectorized data tensorsReferencesManceur AM,Dutilleul P.2013.Unbiased modiﬁed likelihood ratio tests for simple and double separability of a variance-covariance structure.Statistics and Probability Letters83:631-636.Examplesoutput<-lrt3d_svc(data3d$value3d,data3d$Id3,data3d$Id4,data3d$Id5,data3d$K,data_3d=data3d,n.simul=100)outputmle2d_svc Maximum likelihood estimation of the parameters of a matrix normaldistributionDescriptionMaximum likelihood estimation for the parameters of a matrix normal distribution X,which is char-acterized by a simply separable variance-covariance structure.In the general case,which is the case considered here,two unstructured factor variance-covariance matrices determine the covariability of random matrix entries,depending on the row(one factor matrix)and the column(the other factor matrix)where two X-entries are.In the required function,the Id1and Id2variables correspond to the row and column subscripts,respectively;“value2d”indicates the observed variable.Usagemle2d_svc(value2d,Id1,Id2,subject,data_2d,eps,maxiter,startmat)Argumentsvalue2d from the formula value2d~Id1+Id2Id1from the formula value2d~Id1+Id2Id2from the formula value2d~Id1+Id2subject the replicate,also called individualdata_2d the name of the matrix dataeps the threshold in the stopping criterion for the iterative mle algorithmmaxiter the maximum number of iterations for the iterative mle algorithmstartmat the value of the second factor variance-covariance matrix used for initializa-tion,i.e.,to start the algorithm and obtain the initial estimate of theﬁrst factorvariance-covariance matrixOutput“Convergence”,TRUE or FALSE“Iter”,will indicate the number of iterations needed for the mle algorithm to converge“Xmeanhat”,the estimated mean matrix(i.e.,the sample mean)“First”,the row subscript,or the second column in the dataﬁle“U1hat”,the estimated variance-covariance matrix for the rows“Standardized.U1hat”,the standardized estimated variance-covariance matrix for the rows;the stan-dardization is performed by dividing each entry of U1hat by entry(1,1)of U1hat“Second”,the column subscript,or the third column in the dataﬁle“U2hat”,the estimated variance-covariance matrix for the columns“Standardized.U2hat”,the standardized estimated variance-covariance matrix for the columns;the standardization is performed by multiplying each entry of U2hat by entry(1,1)of U1hat“Shat”,is the sample variance-covariance matrix computed from of the vectorized data matrices ReferencesDutilleul P.1990.Apport en analyse spectrale d’un periodogramme modiﬁe et modelisation des series chronologiques avec repetitions en vue de leur comparaison en frequence.D.Sc.Dissertation, Universite catholique de Louvain,Departement de mathematique.Dutilleul P.1999.The mle algorithm for the matrix normal distribution.Journal of Statistical Computation and Simulation64:105-123.Examplesoutput<-mle2d_svc(data2d$value2d,data2d$Id1,data2d$Id2,data2d$K,data_2d=data2d) outputmle3d_svc Maximum likelihood estimation of the parameters of a3rd-order ten-sor normal distributionDescriptionMaximum likelihood estimation for the parameters of a3rd-order tensor normal distribution X, which is characterized by a doubly separable variance-covariance structure.In the general case, which is the case considered here,three unstructured factor variance-covariance matrices determine the covariability of random tensor entries,depending on the row(one factor matrix),the column (another factor matrix)and the edge(remaining factor matrix)where two X-entries are.In the required function,the Id3,Id4and Id5variables correspond to the row,column and edge subscripts, respectively;“value3d”indicates the observed variable.Usagemle3d_svc(value3d,Id3,Id4,Id5,subject,data_3d,eps,maxiter,startmatU2,startmatU3)Argumentsvalue3d from the formula value3d~Id3+Id4+Id5Id3from the formula value3d~Id3+Id4+Id5Id4from the formula value3d~Id3+Id4+Id5Id5from the formula value3d~Id3+Id4+Id5subject the replicate,also called individualdata_3d the name of the tensor dataeps the threshold in the stopping criterion for the iterative mle algorithmmaxiter the maximum number of iterations for the iterative mle algorithmstartmatU2the value of the second factor variance covariance matrix used for initialization startmatU3the value of the third factor variance covariance matrix used for initialization,i.e.,startmatU3together with startmatU2are used to start the algorithm andobtain the initial estimate of theﬁrst factor variance covariance matrix U1Output“Convergence”,TRUE or FALSE“Iter”,the number of iterations needed for the mle algorithm to converge“Xmeanhat”,the estimated mean tensor(i.e.,the sample mean)“First”,the row subscript,or the second column in the dataﬁle“U1hat”,the estimated variance-covariance matrix for the rows“Standardized.U1hat”,the standardized estimated variance-covariance matrix for the rows;the stan-dardization is performed by dividing each entry of U1hat by entry(1,1)of U1hat“Second”,the column subscript,or the third column in the dataﬁle“U2hat”,the estimated variance-covariance matrix for the columns“Standardized.U2hat”,the standardized estimated variance-covariance matrix for the columns;the standardization is performed by multiplying each entry of U2hat by entry(1,1)of U1hat“Third”,the edge subscript,or the fourth column in the dataﬁle“U3hat”,the estimated variance-covariance matrix for the edges“Shat”,the sample variance-covariance matrix computed from the vectorized data tensorsReferenceManceur AM,Dutilleul P.2013.Maximum likelihood estimation for the tensor normal distribution: Algorithm,minimum sample size,and empirical bias and dispersion.Journal of Computational and Applied Mathematics239:37-49.10sEparaTeExamplesoutput<-mle3d_svc(data3d$value3d,data3d$Id3,data3d$Id4,data3d$Id5,data3d$K,data_3d=data3d) outputsEparaTe MLE and LRT functions for separable variance-covariance structuresDescriptionA package for maximum likelihood estimation(MLE)of the parameters of matrix and3rd-ordertensor normal distributions with unstructured factor variance-covariance matrices(two procedures),and for unbiased modiﬁed likelihood ratio testing(LRT)of simple and double separability forvariance-covariance structures(two procedures).Functionsmle2d_svc,for maximum likelihood estimation of the parameters of a matrix normal distributionmle3d_svc,for maximum likelihood estimation of the parameters of a3rd-order tensor normaldistributionlrt2d_svc,for the unbiased modiﬁed likelihood ratio test of simple separability for a variance-covariance structurelrt3d_svc,for the unbiased modiﬁed likelihood ratio test of double separability for a variance-covariance structureDatadata2d,a two-dimensional data setdata3d,a three-dimensional data setReferencesDutilleul P.1999.The mle algorithm for the matrix normal distribution.Journal of StatisticalComputation and Simulation64:105-123.Manceur AM,Dutilleul P.2013.Maximum likelihood estimation for the tensor normal distribution:Algorithm,minimum sample size,and empirical bias and dispersion.Journal of Computational andApplied Mathematics239:37-49.Manceur AM,Dutilleul P.2013.Unbiased modiﬁed likelihood ratio tests for simple and doubleseparability of a variance covariance structure.Statistics and Probability Letters83:631-636.Index∗datasetsdata2d,2data3d,2data2d,2data3d,2lrt2d_svc,3lrt3d_svc,5mle2d_svc,7mle3d_svc,8sEparaTe,1011。

贝叶斯混合生存模型：使用加法混合威尔布尔生存危险的贝叶斯动态生存模型，具有拉斯拉缩小和层次化说明书

Package‘BayesMixSurv’October12,2022Type PackageTitle Bayesian Mixture Survival Models using AdditiveMixture-of-Weibull Hazards,with Lasso Shrinkage andStratiﬁcationVersion0.9.1Date2016-09-08Author Alireza S.Mahani,Mansour T.A.SharabianiMaintainer Alireza S.Mahani<**************************>Description Bayesian Mixture Survival Models using Additive Mixture-of-Weibull Hazards,with Lasso Shrinkage andStratiﬁcation.As a Bayesian dynamic survival model,it relaxes the proportional-hazard sso shrinkage controlsoverﬁtting,given the increase in the number of free parameters in the model due to pres-ence of two Weibull componentsin the hazard function.License GPL(>=2)Depends survivalNeedsCompilation noRepository CRANDate/Publication2016-09-0810:24:27R topics documented:bayesmixsurv (2)bayesmixsurv.crossval (5)plot.bayesmixsurv (7)predict.bayesmixsurv (8)summary.bayesmixsurv (10)Index121bayesmixsurv Dynamic Bayesian survival model-with stratiﬁcation and Lassoshrinkage-for right-censored data using two-component additivemixture-of-Weibull hazards.DescriptionBayesian survival model for right-censored data,using a sum of two hazard functions,each hav-ing a power dependence on time,corresponding to a Weibull distribution on event density.(Note that event density function for the mixture model does NOT remain a Weibull distribution.)Each component has a different shape and scale parameter,with scale parameters each being the ex-ponential of a linear function of covariates speciﬁed in formula1and formula2.Stratiﬁcation is implemented using a common set of intercepts between the two sso shrinkage-using Laplace prior on coefﬁcients(Park and Casella2008)-allows for variable selection in the presence of low observation-to-variable ratio.The mixture model allows for time-dependent(and context-dependent)hazard ratios.Conﬁdence intervals for coefﬁcient estimation and prediction are generated using full Bayesian paradigm,i.e.by keeping all samples rather than summarizing them into mean and sd.Posterior distribution is estimated via MCMC sampling,using univariate slice sampler with stepout and shrinkage(Neal2003).Usagebayesmixsurv(formula1,data,formula2=formula1,stratCol=NULL,weights,subset,na.action=na.fail,control=bayesmixsurv.control(),print.level=2)bayesmixsurv.control(single=FALSE,alpha2.fixed=NULL,alpha.boundary=1.0,lambda1=1.0 ,lambda2=lambda1,iter=1000,burnin=round(iter/2),sd.thresh=1e-4,scalex=TRUE,nskip=round(iter/10))##S3method for class bayesmixsurvprint(x,...)Argumentsformula1Survival formula expressing the time/status variables as well as covariates usedin theﬁrst component.data Data frame containing the covariates and response variable,as well as the strat-iﬁcation column.formula2Survival formula expressing the covariates used in the second component.Noleft-hand side is necessary since the response variable information is extractedfrom formula1.Defaults to formula1.stratCol Name of column in data used for stratiﬁcation.Must be a factor or coerced intoone.Default is no stratiﬁcation(stratCol=NULL).weights Optional vector of case weights.*Not supported yet*subset Subset of the observations to be used in theﬁt.*Not supported yet*na.action Missing-dataﬁlter function.*Not supported yet(only na.fail behavior works)*control See bayesmixsurv.control for a description of the parameters inside the control list.print.level Controlling verbosity level.single If TRUE,a single-component model,equivalent to Bayesian Weibull survival re-gression,with Lasso shrinkage,is implemented.Default is FALSE,i.e.a two-component mixture-of-Weibull model.alpha2.fixed If provided,it speciﬁes the shape parameter of the second component.Defaultis NULL,which allows the MCMC sampling to estimate both shape parameters.alpha.boundary When single=FALSE and alpha2.fixed=NULL,this parameter speciﬁes an up-per bound for the shape parameter of theﬁrst component,and a lower bound forthe shape parameter of the second component.These boundary conditions areenforced in the univariate slice sampler function calls.lambda1Lasso Shrinkage parameter used in the Laplace prior on covariates used in theﬁrst component.lambda2Lasso Shrinkage parameter used in the Laplace prior on covariates used in thesecond component.Defaults to lambda1.iter Number of posterior MCMC samples to generate.burnin Number of initial MCMC samples to discard before calculating summary statis-tics.sd.thresh Threshold for standard deviation of a covariate(after possible centering/scaling).If below the threshold,the corresponding coefﬁcient is removed from sampling,i.e.its value is clamped to zero.scalex If TRUE,each covariate vector is centered and scaled before model estimation.The scaling parameters are saved in return object,and used in subsequent callsto predict ers are strongly advised against turning this feature off,since the quality of Gibbs sampling MCMC is greatly enhanced by covariatecentering and scaling.nskip Controlling how often to print progress report during MCMC run.For example,if nskip=10,progress will be reported after10,20,30,...samples.x Object of class’bayesmixsurv’,usually the result of a call to bayesmixsurv....Arguments to be passed to/from other methods.ValueThe function bayesmixsurv.control return a list with the same elements as its input parameters.The function bayesmixsurv returns object of class bayesmixsurv,with the following components: call The matched callformula1Same as input.formula2Same as input.weights Same as input.*Not supported yet*subset Same as input.*Not supported yet*na.action Same as input.*Not supported yet*(current behavior is na.fail)control Same as input.X1Model matrix used for component1,after potential centering and scaling.X2Model matrix used for component2,after potential centering and scaling.y Survival response variable(time and status)used in the model.contrasts1The contrasts used for component1(where relevant).contrasts2The contrasts used for component2(where relevant).xlevels1A record of the levels of the factors used inﬁtting for component1(where relevant).xlevels2A record of the levels of the factors used inﬁtting for component2(where relevant).terms1The terms object used for component1.terms2The terms object used for component2.colnamesX1Names of columns for X1,also names of scale coefﬁcients for component1. colnamesX2Names of columns for X1,also names of scale coefﬁcients for component2. apply.scale.X1Index of columns of X1where scaling has been applied.apply.scale.X2Index of columns of X2where scaling has been applied.centerVec.X1Vector of centering parameters for columns of X1indicated by apply.scale.X1. centerVec.X2Vector of centering parameters for columns of X2indicated by apply.scale.X2. scaleVec.X1Vector of scaling parameters for columns of X1indicated by apply.scale.X1. scaleVec.X2Vector of scaling parameters for columns of X2indicated by apply.scale.X2. Xg Model matrix associated with stratiﬁcation(if any).stratContrasts The contrasts used for stratiﬁcation model matrix,if any.stratXlevels A record of the levels of the factors used in stratiﬁcation(if any)). stratTerms The terms object used for stratiﬁcation.colnamesXg Names of columns for Xg.idx1Vector of indexes into X1for which sampling occured.All columns of X1whose standard deviation falls below sd.thresh are excluded from sampling and theircorresponding coefﬁcients are clamped to0.idx2Vector of indexes into X2for which sampling occured.All columns of X2whose standard deviation falls below sd.thresh are excluded from sampling and theircorresponding coefﬁcients are clamped to0.median List of median values,with elements including alpha1,alpha2(shape param-eter of components1and2),beta1,beta2(coefﬁcients of scale parameter forcomponents1and2),gamma(stratiﬁcation intercept adjustments,shared by2comoponents),and sigma.gamma(standard deviation of zero-mean Gaussiandistribution that is the prior for gamma’s).max Currently,a list with one element,loglike,containing the maximum sampled log-likelihood of the model.smp List of coefﬁcient samples,with elements alpha1,alpha2(shape parametersfor components1and2),beta1,beta2(scale parameter coefﬁcients for com-ponents1and2),loglike(model log-likelihood),gamma(stratiﬁcation interceptadjustments,shared by2comoponents),and sigma.gamma(standard deviationof zero-mean Gaussian distribution that is the prior for gamma’s).Each param-eter has iter samples.For vector parameters,ﬁrst dimension is the number ofsamples(iter),while the second dimension is the length of the vector.Author(s)Alireza S.Mahani,Mansour T.A.SharabianiReferencesNeal R.M.(2003).Slice Sampling.Annals of Statistics,31,705-767.Park T.and Casella G.(2008)The Bayesian Lasso.Journal of the American Statistical Association,103,681-686.Examples#NOTE:to ensure convergence,typically more than100samples are needed#fit the most general model,with two Weibull components and unspecified shape parametersret<-bayesmixsurv(Surv(time,status)~as.factor(trt)+age+as.factor(celltype)+prior,veteran,control=bayesmixsurv.control(iter=100))#fix one of the two shape parametersret2<-bayesmixsurv(Surv(time,status)~as.factor(trt)+age+as.factor(celltype)+prior,veteran ,control=bayesmixsurv.control(iter=100,alpha2.fixed=1.0))bayesmixsurv.crossval Convenience functions for cross-validation-based selection of shrink-age parameter in the bayesmixsurv model.Descriptionbayesmixsurv.crossval calculates cross-validation-based,out-of-sample log-likelihood of a bsgwmodel for a data set,given the supplied folds.bayesmixsurv.crossval.wrapper applies bayesmixsurv.crossval to a set of combinations of shrinkage parameters(lambda1,lambda2)and produces the resultingvector of log-likelihood values as well as the speciﬁc combination of shrinkage parameters asso-ciated with the maximum log-likelihood.bayesmixsurv.generate.folds generates random par-titions,while bayesmixsurv.generate.folds.eventbalanced generates random partitions withevents evenly distributed across partitions.The latter feature is useful for cross-valiation of smalldata sets with low event rates,since it prevents over-accumulation of events in one or two partitions,and lack of events altogether in other partitions.Usagebayesmixsurv.generate.folds(ntot,nfold=5)bayesmixsurv.generate.folds.eventbalanced(formula,data,nfold=5)bayesmixsurv.crossval(data,folds,all=FALSE,print.level=1,control=bayesmixsurv.control(),...)bayesmixsurv.crossval.wrapper(data,folds,all=FALSE,print.level=1,control=bayesmixsurv.control(),lambda.min=0.01,lambda.max=100,nlambda=10,lambda1.vec=exp(seq(from=log(lambda.min),to=log(lambda.max),length.out=nlambda)) ,lambda2.vec=NULL,lambda12=if(is.null(lambda2.vec))cbind(lambda1=lambda1.vec,lambda2=lambda1.vec)else as.matrix(expand.grid(lambda1=lambda1.vec,lambda2=lambda2.vec)),plot=TRUE,...) Argumentsntot Number of observations to create partitions for.It must typically be set tonrow(data).nfold Number of folds or partitions to generate.formula Formula specifying the covariates to be used in component1,and the time/statusresponse variable in the survival model.data Data frame containing the covariates and response,used in training and predic-tion.folds An integer vector of length nrow(data),deﬁning fold/partition membershipof each observation.For example,in5-fold cross-validation for a data set of200observations,folds must be a200-long vector with elements from theset{1,2,3,4,5}.Convenience functions bayesmixsurv.generate.folds andbayesmixsurv.generate.folds.eventbalanced can be used to generate thefolds vector for a given survival data frame.all If TRUE,estimation objects from each cross-validation task is collected and re-turned for diagnostics purposes.print.level Verbosity of progress report.control List of control parameters,usually the output of bayesmixsurv.control.lambda.min Minimum value used to generate lambda.vec.lambda.max Maximum value used to generate lambda.vec.nlambda Length of lambda.vec vector.lambda1.vec Vector of shrinkage parameters to be tested for component-1coefﬁcients.lambda2.vec Vector of shrinkage parameters to be tested for component-2coefﬁcients.lambda12A data frame that enumerates all combinations of lambda1and lambda2to betested.By default,it is constructed from forming all permutations of lambda1.vecand lambda2.vec.If lambda2.vec=NULL,it will only try equal values of the twoparameters in each combination.plot If TRUE,and if the lambda1and lambda2entries in lambda12are identical,aplot of loglike as a function of either vector is produced....Further arguments passed to bayesmixsurv.plot.bayesmixsurv7ValueFunctions bayesmixsurv.generate.folds and bayesmixsurv.generate.folds.eventbalanced produce integer vectors of length ntot or nrow(data)respectively.The output of these functionscan be directly passed to bayesmixsurv.crossval or bayesmixsurv.crossval.wrapper.Func-tion bayesmixsurv.crossval returns the log-likelihood of data under the assumed bsgw model,calculated using a cross-validation scheme with the supplied fold parameter.If all=TRUE,the esti-mation objects for each of the nfold estimation jobs will be returned as the"estobjs"attribute of thereturned value.Function bayesmixsurv.crossval.wrapper returns a list with elements lambda1and lambda2,the optimal shrinkage parameters for components1and2,respectively.Additionally,the following attributes are attached:loglike.vec Vector of log-likelihood values,one for each tested combination of lambda1andlambda2.loglike.opt The maximum log-likelihood value from the loglike.vec.lambda12Data frame with columns lambda1and lambda2.Each row of this data framecontains one combination of shrinkage parameters that are tested in the wrapperfunction.estobjs If all=TRUE,a list of length nrow(lambda12)is returned,with each elementbeing itself a list of nfold estimation objects associated with each call to thebayesmixsurv function.This object can be examined by the user for diagnosticpurposes,e.g.by applying plot against each object.Author(s)Alireza S.Mahani,Mansour T.A.SharabianiExamples#NOTE:to ensure convergence,typically more than30samples are neededfolds<-bayesmixsurv.generate.folds.eventbalanced(Surv(futime,fustat)~1,ovarian,5)cv<-bayesmixsurv.crossval(ovarian,folds,formula1=Surv(futime,fustat)~ecog.ps+rx,control=bayesmixsurv.control(iter=30,nskip=10),print.level=3)cv2<-bayesmixsurv.crossval.wrapper(ovarian,folds,formula1=Surv(futime,fustat)~ecog.ps+rx ,control=bayesmixsurv.control(iter=30,nskip=10),lambda1.vec=exp(seq(from=log(0.1),to=log(1),length.out=3)))plot.bayesmixsurv Plot diagnostics for a bayesmixsurv objectDescriptionFour sets of MCMC diagnostic plots are currently generated:1)log-likelihood trace plots,2)coef-ﬁcient trace plots,3)coefﬁcient autocorrelation plots,4)coefﬁcient histograms.Usage##S3method for class bayesmixsurvplot(x,pval=0.05,burnin=round(x$control$iter/2),nrow=2,ncol=3,...)Argumentsx A bayesmixsurv object,typically the output of bayesmixsurv function.pval The P-value at which lower/upper bounds on coefﬁcients are calculated andoverlaid on trace plots and historgrams.burnin Number of samples discarded from the beginning of an MCMC chain,afterwhich parameter quantiles are calculated.nrow Number of rows of subplots within eachﬁgure,applied to plot sets2-4.ncol Number of columns of subplots within eachﬁgure,applied to plot sets2-4....Further arguments to be passed to/from other methods.Author(s)Alireza S.Mahani,Mansour T.A.SharabianiExamplesest<-bayesmixsurv(Surv(futime,fustat)~ecog.ps+rx,ovarian,control=bayesmixsurv.control(iter=800,nskip=100))plot(est)predict.bayesmixsurv Predict method for bayesmixsurv modelﬁtsDescriptionCalculates log-likelihood and hazard/cumulative hazard/survival functions over a user-supplied vec-tor time values,based on bayesmixsurv model object.Usage##S3method for class bayesmixsurvpredict(object,newdata=NULL,tvec=NULL,burnin=object$control$burnin,...)##S3method for class predict.bayesmixsurvsummary(object,idx=1:dim(object$smp$h)[3],burnin=object$burnin,pval=0.05,popmean=identical(idx,1:dim(object$smp$h)[3]),make.plot=TRUE,...)Argumentsobject For predict.bayesmixsurv,an object of class"bayesmixsurv",usually the re-sult of a call to bayesmixsurv;for summary.predict.bayesmixsurv,an objectof class"predict.bayesmixsurv",usually the result of a call to predict.bayesmixsurv.newdata An optional data frame in which to look for variables with which to predict.Ifomiited,theﬁtted values(training set)are used.tvec An optional vector of time values,along which time-dependent entities(haz-ard,cumulative hazard,survival)will be predicted.If omitted,only the time-independent entities(currently only log-likelihood)will be calculated.If a singleinteger is provided for tvec,it is interpreted as number of time points,equallyspaced from0to object$tmax:tvec<-seq(from=0.0,to=object$tmax,length.out=tvec).burnin Number of samples to discard from the beginning of each MCMC chain beforecalculating median value(s)for time-independent entities.idx Index of observations(rows of newdata or training data)for which to generatesummary statistics.Default is the entire data.pval Desired p-value,based on which lower/upper bounds will be calculated.Defaultis0.05.popmean Whether population averages must be calculated or not.By default,populationaverages are only calculated when the entire data is included in prediction.make.plot Whether population mean and other plots must be created or not....Further arguments to be passed to/from other methods.DetailsThe time-dependent predicted objects(except loglike)are three-dimensional arrays of size(nsmpx nt x nobs),where nsmp=number of MCMC samples,nt=number of time values in tvec,andnobs=number of rows in newdata.Therefore,even for modest data sizes,these objects can occupylarge chunks of memory.For example,for nsmp=1000,nt=100,nobs=1000,the three objects h,H,S have a total size of2.2GB.Since applying quantile to these arrays is time-consuming(asneeded for calculation of median and lower/upper bounds),we have left such summaries out ofthe scope of predict ers can instead apply summary to the prediction object to obtainsummary statistics.During cross-validation-based selection of shrinkage parameter lambda,thereis no need to supply tvec since we only need the log-likelihood value.This signiﬁcantly speeds upthe parameter-tuning process.The function summary.predict.bayesmixsurv allows the user tocalculates summary statistics for a subset(or all of)data,if desired.This approach is in line with theoverall philosophy of delaying the data summarization until necessary,to avoid unnecessary loss inaccuracy due to premature blending of information contained in individual samples.ValueThe function predict.bayesmixsurv returns as object of class"predict.bayesmixsurv"with thefollowingﬁelds:tvec Actual vector of time values(if any)used for prediction.burnin Same as input.median List of median values for predicted entities.Currently,only loglike is pro-duced.See’Details’for explanation.smp List of MCMC samples for predicted entities.Elements include h1,h2,h(haz-ard functions for components1,2and their sum),H1,H2,H(cumulative hazardfunctions for components1,2and their sum),S(survival function),and loglike(model log-likelihood).All functions are evaluated over time values speciﬁed intvec.10summary.bayesmixsurvkm.fit Kaplan-Meyerﬁt of the data used for prediction(if data contains responseﬁelds).The function summary.predict.bayesmixsurv returns a list with the followingﬁelds:lower A list of lower-bound values for h,H,S,hr(hazard ratio of idx[2]to idx[1]observation),and S.diff(survival probability of idx[2]minus idx[1]).Thelast two are only included if length(idx)==2.median List of median values for same entities described in lower.upper List of upper-bound values for same entities described in lower.popmean Lower-bound/median/upper-bound values for population average of survival prob-ability.km.fit Kaplan-Meyerﬁt associated with the prediction object(if available).Author(s)Alireza S.Mahani,Mansour T.A.SharabianiExamplesest<-bayesmixsurv(Surv(futime,fustat)~ecog.ps+rx+age,ovarian,control=bayesmixsurv.control(iter=400,nskip=100))pred<-predict(est,tvec=50)predsumm<-summary(pred,idx=1:10)summary.bayesmixsurv Summarizing BayesMixSurv modelﬁtsDescriptionsummary method for class"bayesmixsurv".Usage##S3method for class bayesmixsurvsummary(object,pval=0.05,burnin=object$control$burnin,...)##S3method for class summary.bayesmixsurvprint(x,...)Argumentsobject An object of class’bayesmixsurv’,usually the result of a call to bayesmixsurv.x An object of class"summary.bayesmixsurv",usually the result of a call to summary.bayesmixsurv.pval Desired p-value,based on which lower/upper bounds will be calculated.Defaultis0.05.burnin Number of samples to discard from the beginning of each MCMC chain beforecalculating median and lower/upper bounds....Further arguments to be passed to/from other methods.summary.bayesmixsurv11ValueAn object of class summary.bayesmixsurv,with the following elements:call The matched call.pval Same as input.burnin Same as input.single Copied from object$control$single.See bayesmixsurv.control for explana-tion.coefficients A list including matrices alpha,beta1,beta2,and gamma(if stratiﬁcation is used).Each matrix has columns named’Estimate’,’Lower Bound’,’UpperBound’,and’P-val’.alpha has two rows,one for each components,while eachof beta1and beta2has one row per covariate.gamma has one row per stratum(except for the reference stratum).Author(s)Alireza S.Mahani,Mansour T.A.SharabianiSee AlsoSee summary for a description of the generic method.The modelﬁtting function is bayesmixsurv.Examplesest<-bayesmixsurv(Surv(futime,fustat)~ecog.ps+rx,ovarian,control=bayesmixsurv.control(iter=800,nskip=100))summary(est,pval=0.1)Indexbayesmixsurv,2,8,10,11 bayesmixsurv.control,6,11 bayesmixsurv.crossval,5 bayesmixsurv.generate.folds(bayesmixsurv.crossval),5plot.bayesmixsurv,7predict.bayesmixsurv,8print.bayesmixsurv(bayesmixsurv),2print.summary.bayesmixsurv(summary.bayesmixsurv),10 summary,11summary.bayesmixsurv,10summary.predict.bayesmixsurv(predict.bayesmixsurv),812。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

a r X i v :0709.1211v 2 [c s .I T ] 26 N o v 20071Likelihood ratios and Bayesian inference forPoisson channelsAnthony R´e veillac Laboratoire de Math´e matiques Universit´e de La Rochelle Avenue Michel Cr´e peau 17042La Rochelle CedexFranceAbstract —In recent years,inﬁnite-dimensional methods have been introduced for the Gaussian channels estimation.The aim of this note is to study the application of similar methods to Poisson channels.In particular we compute the Bayesian estimator of a Poisson channel using the likelihood ratio and the discrete Malliavin gradient.This algorithm is suitable for numerical implementation via the Monte-Carlo scheme.The result is extended to mixed Poisson-Gaussian channels and to other non-Gaussian channels.Index Terms —Poisson process,Bayesian estimation,Malliavin calculus.Mathematics Subject Classiﬁcation:94A40,60J75,62C10,60H07.I.I NTRODUCTIONRecently in [12],inﬁnite-dimensional methods have been used to derive a new expression of the conditional mean estimator for inﬁnite-dimensional additive Gaussian channels.More precisely the conditional mean estimator is obtained as the Malliavin derivative of the logarithm of the likelihood ratio.This relationship has been recently applied in [8]in order to link the quadratic risk of the conditional mean estimator to the Monge-Kantorovitch measure transportation theory.Let us give some details about these results.In the general framework of additive Gaussian channel ,an observed signal Y is decomposed into the sum of an input signal X plus an independent Gaussian noise W as Y =ρX +W,(I.1)where ρis the “signal to noise ratio”.In thiscontext the signals “lie”in an abstract Wiener space (W,H,µW )(in particular the input X is an H -valued random variable).This setting contains the case of an observed continuous-time stochastic process (Y t )t ∈[0,T ]related to an input stochastic process (X t )t ∈[0,T ](with values into the Hilbert space H :=L 2([0,T ]))by the following stochastic differential equation,dY t =√ρ∇log l (Y ),(I.3)where Y denotes the sigma ﬁeld generated by Y ,∇denotes the Malliavin gradient which is a inﬁnite-dimensional counterpart of the usual derivative on R n and l is the likelihood ratio associated to model (I.1)that is,l :=dµYdρ=ρI E X −I E [X |Y ] 2H,where I (X ;Y )denotes the mutual information be-tween X and Y ,deﬁned as,I (X ;Y ):=H ×WlogdµX,Y2Note that (I.2)is an inﬁnite-dimensional counter-part of a well known result for ﬁnite-dimensional additive Gaussian channels described as,Y =√ρyx √dµλ0(y )=exp(−αx )αx +λ0dµλ0(y )µX (dx ),y ∈N .(II.1)Now we can state the following lemma which will be extended in Section IV as Proposition IV .6and Corollary IV .8.Lemma II.1.If the Bayesian riskI E |X −I E [X |Y ]|2 is ﬁnite thenI E [X |Y ]=λ0m (Y ).(II.2)Proof:Let y in N. m(y+1)−m(y)=αλ0 yµX(dx)(∗) =αdµX(x)µX(dx),P−a.s.=αλ0m(y)I E[X|Y=y]Equality(∗)is justiﬁed by a relation of the form iii)of Proposition IV.1.dµ1(y)=e−x(x+1)y,(x,y)∈R+×N, and one obtains,I E[X|Y]=a e−a(1+a)y+b e−b(1+b)ym(Y).(II.3)To obtain results for more general Poisson channels we haveﬁrst to recall some elements of analysis on the Poisson space.III.A NALYSIS ON THE P OISSON SPACEIn this Section we introduce some elements of analysis on the Poisson space in a general frame-work then,we will describe these elements in a concrete example.Let(S,B(S),ν)a measure space whereνis an intensity atomlessσ-ﬁnite measure.Deﬁne the Pois-son spaceΩS asΩS= y=n k=0δz k,n∈¯N,z k∈S,1≤k≤n , with¯N:=N∪{∞}and let for n k=0δz k, C= n k=0δz k :={z1,...,z n}.(III.1) Deﬁne the canonical process(N A)A∈B(S)onΩS asN A(y):=y(A),y∈ΩS.We deﬁne theσ-ﬁeld F S onΩS with F S=σ({y→y(B),B∈B(S)}).There exists a probability measure P S on(ΩS,F S) called the Poisson measure such that,•∀B∈B(S),∀n∈N,P S({y|y(B)=n})=exp(−ν(B))ν(B)nIn this case H[0,T]can be deﬁned in a more tractable way by,H[0,T]= v:[0,T]→R,v(t)= t0˙v s ds,˙v∈L2([0,T]) ,equipped withh1,h2 H[0,T]:= ˙h1,˙h2 L2([0,T]),h1,h2∈H[0,T]. The Malliavin operator∇we introduce will be of interest in Section IV.Let L0(ΩS,F S,P S)be the space of measurable mapping from(ΩS,F S,P S)to R.Deﬁneﬁrst the operator D by,L0(ΩS,F S,P S)→L0(ΩS×S,F S⊗B(S),P S⊗ν)F→D z F(y):=F(y+δz)−F(y). Technical justiﬁcations about the measurability of the previous map can be found in[11]and refer-ences therein.This allows us to deﬁne the operator ∇.Deﬁnition III.1.For F:Ω→R we deﬁne∇F as the H S-valued random variable∇A F:= A D z Fν(dz),A∈B(S).Finally in the case of the classical Poisson space the Malliavin derivative∇can be expressed in a different way,for F:Ω[0,T]→R,∇F is a H[0,T]-valued random variable and∇[0,t]F:= t0D s F ds,t∈[0,T].IV.C ONDITIONAL MEAN ESTIMATORS FORP OISSON CHANNELSWe introduce in this section the Bayesian frame-work and we compute the conditional mean es-timator in the setting of Poisson point process. Let X be an input signal with values in a space (H,σ(H))with distributionµX.Consider(Ω,F,P) a probability space and assume the output Y lies in Ω.Until the end of this paper we will denote by Y theσ-ﬁeld generated by Y.We make the following assumptions,(H1)The“noise”and the“signal”are independent,i.e.the law of the pair(output,input)=(Y,X)is absolutely continuous with respect toP×µX.(H2)For all x in H,µY|X=x(the distribution of Y given X=x)is absolutely continuouswith respect to P and we denote by L thecorresponding Radon-Nikodym density.(H3)L is(σ(H)⊗F)-measurable.(H4)The Bayesian risk I E X−I E[X|Y] 2H with respect toµX isﬁnite.Then,the following functionH×F→[0,1](x,B)→µY|X=x(B)is a transition probability in the sense of[4,Deﬁni-tion III-2-1p.69].Moreover from[4,PropositionIII-2-1p.69-70]there exists a probability measureµon(H×Ω,σ(H)⊗F)such that,µ(A×B)(IV.1) = AµY|X=x(B)µX(dx),A∈σ(H),B∈F, andµis the joint distribution of(X,Y).Denote by M the marginal distribution ofµon Hdeﬁned by,M(B):=µ(H×B),B∈F.(IV.2) Proposition IV.1is mainly devoted to show the existence of the following transition probabilityΩ×σ(H)→[0,1](y,A)→µX|Y=y(A)(IV.3) and that the couple(M,(µX|Y(·,y))y∈Ω)allows usto recoverµasµ(A×B)= BµX|Y=y(A)M(dy),A∈σ(H),B∈F.(IV.4) Proposition IV.1.If(H1),(H2),and(H3)are satisﬁed theni)µis absolutely continuous with respect toµX×P and the corresponding Girsanov-Radon-Nikodym density is L.ii)M is absolutely continuous with respect to P with m as Radon-Nikodym density.iii)For M almost all y inΩ,µX|Y=y is absolutely continuous with respect toµX and for y suchthat m(y)=0,the Radon-Nikodym density isgiven by,dµX|Y=ym(y).5iv)For a (σ(H )×F )-measurable function f :H ×Ω→R ,ΩH f (x,y )µX |Y =y (dx )M (dy )= HΩf (x,y )µY |X =x (dy )µX (dx ).Proof:See [2,Theorem 1.8].d P S (y )(IV .6)=exp−αS˙x z ν(dz )×y (S )k =0(1+α˙x (z k )),where y = nk =1δz k .So hypothesis (H2)is satisﬁed.Finally assume (H4)holds.We recall a result about Bayesian estimator under quadratic loss.Proposition IV .4.The Bayesianestimator(B A )A ∈B (S )isB A (Y )=I E [X (A )|Y ]=H Sx (A )µX |Y (dx ),M −a.e.(IV .7)Remark IV .5.Note that the expression (IV .7)is theoretical and cannot be used in practice.In con-tradistinction,relation (IV .9)obtained below en-ables a numerical approximation of the Bayesian estimator as mentioned in Remark IV .9.In fact it is more tractable to estimate the densities rather than the intensity measures.So we denote by ˙Xthe L 2(S,dν)valued random variable associated to X .For z ∈S (IV .7)can be rewritten as˙B z (y )=I E [˙Xz |Y =y ]=H S˙x z µX |Y (dx,y ),M −a.e.(IV .8)We can state the main result of this Note.It allows us to express the Bayesian estimator of the input as a discrete logarithmic Malliavin gradient of the likelihood ratio m .Proposition IV .6.I E [X A |Y ]=∇A m (Y )6So∇A m(y)= A D z m(y)ν(dz)=α H S L(y,x) A1z/∈C(y)˙x zν(dz)µX(dx) =α H S L(y,x) A˙x zν(dz)µX(dx),asνis atomless,=α H S L(y,x)x(A)µX(dx)=α H S x(A)m(y)µX|Y(dx,y),by iii)of Proposition IV.1.By Proposition IV.4,this leads toI E[X A|Y]=∇A m(Y)Remark IV.7.Neither∇nor D satisfy the chain rule of derivation,and consequently ∇A Fαm(Y),t∈[0,T].(IV.10) Remark IV.9.The nonlinearﬁlter given by equa-tions(IV.6)and(IV.10)can be numerically approx-imated by evaluating m in(IV.5)by a Monte-Carlo scheme.This computation is really tractable since the Malliavin derivative∇is a difference operator.V.A GENERALIZATION TO A CLASS OFNON-G AUSSIAN CHANNELSIn this Section we give a generalization of results from Section IV.We use some notations and deﬁ-nitions presented in Section VI.Let(M t)t∈[0,T]a normal martingale which satisfy a structure equation of the form(VI.1)and which has the chaos representation property(see Deﬁnition VI.4).Let(X t)t∈[0,T]a real-valued input process with X t= t0˙X s ds,t∈[0,T].Assume the output signal(Y t)t∈[0,T]is a normal martingale such thatthe measure of the output given the inputµY|X is absolutely continuous with respect to P with likelihood given byL(y,x)=dµY|X=x2 T0˙x2s1{φs=0}ds× s≤T(1+˙x sφ(s))e−˙x sφ(s).We give a brief explanation of the previous formula (see[6,Theorem36,p.77]).•The continuous martingale parts of Y given Xand Y have the same quadratic variations.•The random measure which deﬁne the purejump martingale part of Y given X has inten-sity(1+˙X t)ν(dt)whereνdenotes the randommeasure associated to the pure jump martingalepart of Y.Lemma V.1.With notations of Deﬁnition VI.3we have,L(y,x)=∞n=01n!I y n ˙x⊗n1[0,t]nis solution to the stochastic differential equationdZ t=˙x t Z t dy t,t∈[0,T],(V.2) Cf.[5].Furthermore the process deﬁned in(V.1)is also solution of the SDE(V.2)which ends the proof.This formulation of L and the deﬁnition VI.5of the Malliavin derivative in this context giveD t L(y,x)=˙x t L(y,x),t∈[0,T].By using the general Bayesian results presented in Section IV we have the following Proposition. Proposition V.2.E[X t|Y]=∇t m(Y)We conclude this section by giving two important examples of normal martingales considered above.•Assume(φt)t∈[0,T]is deterministic.Then(M t)t∈[0,T]has the chaos representationproperty see[1],and(M t)t∈[0,T]can be repre-sented asdM t=i t dB t+φt(dN t−λt dt),M0=0,t∈[0,T] where(B t)t∈[0,T]is a standard Brownian mo-tion,i t=1{φt=0},j t=1{φt=1},and(N t)t∈[0,T]is a Poisson process independent of(B t)t∈[0,T] with intensityνt= t0j sφ2s ds.–Forφt=0,t∈[0,T];then(M t)t∈[0,T]isa standard Brownian motion.•Considerφt=βM t,β∈[−2,0).Then (M t)t∈[0,T]is an Az´e ma martingale.This pro-cess has the chaos decomposition property but its increments are not independent contrary to the previous example.VI.A PPENDIXIn this Appendix we give some further elements of stochastic analysis in the framework of normal martingales.Deﬁnition VI.1.A stochastic process(M t)t∈[0,T] deﬁned on a probability space(Ω,F,P)with a right continuousﬁltration(F t)t∈[0,T]is a normal martingale in L2(Ω)if it a martingale,that is, I E[M2t]<∞,t∈[0,T]andI E[M t|F s]=M s,0≤s<t≤T,such that,I E[(M t−M s)2|F s]=t−s,0≤s<t≤T.Let(M t)t∈[0,T]a normal martingale on a probabil-ity space(Ω,F,P)with right continuousﬁltration(F t)t∈[0,T].Deﬁnition VI.2.(M t)t∈[0,T]satisfy a structure equa-tion if there exists an adapted process(φt)t∈[0,T]such that[M,M]t=t+ t0φs dM s,t∈[0,T].(VI.1) Deﬁnition VI.3.For n≥1,let L2([0,T])◦n bethe space of symmetric functions f n in n variables. For,f n in L2([0,T])◦n deﬁne the iterated stochastic integral I M n(f n)byI M n(f n):=n! T0 t n0... t20f n(t1,···,t n)dM t1...dM t n. For f0in R we let I0(f0):=f0.With the notations of the previous Deﬁnition onecan show thatI M n(f n)=n T0I M n−1(f n(∗,t)1[0,t]n−1(∗))dM t,n≥1. Deﬁnition VI.4.Denote for n≥1,H n={I M n(f n),f n∈L2([0,T])◦n}.We say that(M t)t∈[0,T]has the chaos representation property ifL2(Ω)=∞n=0H n,that is,for every F in L2(Ω)there exists(f n)n∈N such that f n∈L2([0,T])◦n,n≥1andF=∞n=0I M n(f n).We introduce the Malliavin derivative with respectto(M t)t∈[0,T].Deﬁnition VI.5.LetS= n k=0I M k(f k),f k∈L2([0,T])◦k,0≤k≤n,n∈N .We deﬁne the Malliavin derivative D as the linear operator from S to L2(Ω×[0,T])byD t I M n(f n)=nI M n−1(f n(∗,t)),d P×dt−a.e.For t in[0,T]deﬁne∇t as∇t F= t0D s F ds,F∈Dom(D).R EFERENCES[1]M.´Emery,“On the Az´e ma martingales,”in S´e minaire de Prob-abilit´e s,XXIII,vol.1372of Lecture Notes in Math.,pp.66–87, Springer,Berlin,1990.[2] D.Fourdrinier,“Statistique inf´e rentielle,”Sciences Sup,Dunod,2002.[3] D.Guo,S.Shamai and S.Verdu,“Mutual information andminimum mean-square error in Gaussian channels,”IEEE Trans.Inform.Theory,vol.51,pp.1261–1282,2005.[4]J.Neveu,“Bases math´e matiques du calcul des probabilit´e s,”Masson,1964.[5]N.Privault,“An introduction to stochastic analysis in discreteand continuous settings,”Lecture Notes,2007.[6]P.Protter,“Stochastic integration and differential equations.Anew approach,”vol.21of Applications of Mathematics,Springer-Verlag,Berlin,1990.[7]R.Reiss,“A course on point processes,”Springer Series inStatistics,Springer-Verlag,New York,1993.[8] A.¨Ust¨u nel,“Estimation for the additive Gaussian channel andMongeKantorovitch measure transportation,”Stochastic Process.Appl.,vol.117,pp.1316–1329,2007.[9]S.Verdu,“Poisson Communication Theory,”Invited talk,March251999.The International Technion Communication Day in honor of Israel Bar-David,1999.Available at/verdu/reprints/VerduPoisson1999.pdf [10] E.Wong and M.Zakai,“A characterization of the kernelsassociated with the multiple integral representation of some functionals of the Wiener process,”Systems Control Lett.,vol.2, no.2,pp.94–98,1982.[11]L.Wu,“A new modiﬁed logarithmic Sobolev inequality forPoisson point processes and several applications,”Probab.The-ory Related Fields,vol.118,no.3,pp.427–438,2000.[12]M.Zakai,“On mutual information,likelihood-ratios and es-timation error for addaitive Gaussian channel,”IEEE Trans.Inform.Theory,vol.51,no.9,pp.3017–3024,2005.。