14 Exploiting 3D Spatial Sampling in Inverse Modeling of Thermochronological Data

合集下载

人工智能的发展

02
自动化系统若出现故障可能导致严重后果，例如
自动驾驶汽车的交通事故。
算法偏见
03
AI算法可能因训练数据偏差而产生歧视性决策，
影响公平性，如招聘软件的性别偏见问题。
隐私保护法规
为防止AI滥用个人数据，多国制定了严格隐私保护法规，
如欧盟的GDPR。
伦理道德审查
人工智能在医疗、军事等领域的应用引发伦理争议，需
机器翻译系统
谷歌翻译等机器翻译系统利用深度学习技术，提供跨语言交流的便利，打破语言障碍。
情感分析应用
情感分析被广泛应用于社交媒体监控，通过分析用户评论来了解公众情绪和品牌声誉。
自动化生产线
利用AI技术优化生产流程，实现从原料到成品的全自动化生产，提高效率和质量。
智能质量检测
通过机器学习算法分析生产数据，实时监控产品质量，减少缺陷率，提升产品合格率。
监督学习
通过标注好的训练数据，机器学习模型能够预测或分类新数据，如垃圾邮件过滤。
02
无监督学习
模型在没有标签的数据中寻找模式，常用于市场细分和社交网络分析。
03
强化学习
通过奖励机制训练模型做出决策，应用于自动驾驶汽车和游戏AI。
01 神经网络的结构
深度学习的核心是神经网络，它模仿人脑结构，通过多层处理信息，实现复杂功能。
1
疾病诊断
人工智能通过深度学习分析医学影像，辅助医生进行更准确的疾病诊断，如癌
症筛查。
2
个性化治疗
AI技术能够根据患者的基因信息和病史，提供个性化的治疗方案，提高治疗效果。
3
药物研发
利用人工智能加速药物研发过程，通过模拟和预测化合物的效果，缩短新药上

2014_ICASSP_EFFICIENT CONVOLUTIONAL SPARSE CODING

EFFICIENT CONVOLUTIONAL SPARSE CODINGBrendt WohlbergTheoretical DivisionLos Alamos National LaboratoryLos Alamos,NM87545,USAABSTRACTWhen applying sparse representation techniques to images,the standard approach is to independently compute the rep-resentations for a set of image patches.Thismethod performs very well in a variety of applications,butthe independent sparse coding of each patch results in a rep-resentation that is not optimal for the image as a whole.Arecent development is convolutional sparse coding,in whicha sparse representation for an entire image is computed by re-placing the linear combination of a set of dictionary vectorsby the sum of a set of convolutions with dictionaryﬁlters.Adisadvantage of this formulation is its computational expense,but the development of efﬁcient algorithms has received someattention in the literature,with the current leading method ex-ploiting a Fourier domain approach.The present paper intro-duces a new way of solving the problem in the Fourier do-main,leading to substantially reduced computational cost.Index Terms—Sparse Representation,Sparse Coding,Convolutional Sparse Coding,ADMM1.INTRODUCTIONOver the past15year or so,sparse representations[1]havebecome a very widely used technique for a variety of prob-lems in image processing.There are numerous approaches tosparse coding,the inverse problem of computing a sparse rep-resentation of a signal or image vector s,one of themost widely used being Basis Pursuit DeNoising(BPDN)[2]arg minx 12D x−s 22+λ x 1,(1)where D is a dictionary matrix,x is the sparse representation, andλis a regularization parameter.When applied to images, decomposition is usually applied independently to a set of overlapping image patches covering the image;this approach is convenient,but often necessitates somewhat ad hoc subse-quent handling of the overlap between patches,and results in a representation over the whole image that is suboptimal.This research was supported by the U.S.Department of Energy through the LANL/LDRD Program.More recently,these techniques have also begun to be ap-plied,with considerable success,to computer vision problems such as face recognition[3]and image classiﬁcation[4,5,6]. It is in this application context that convolutional sparse rep-resentations were introduced[7],replacing(1)with arg min{x m}12md m∗x m−s22+λmx m 1,(2)where{d m}is a set of M dictionaryﬁlters,∗denotes convo-lution,and{x m}is a set of coefﬁcient maps,each of which is the same size as s.Here s is a full image,and the{d m} are usually much smaller.For notational simplicity s and x m are considered to be N dimensional vectors,where N is the the number of pixels in an image,and the notation{x m}is adopted to denote all M of the x m stacked as a single column vector.The derivations presented here are for a single image with a single color band,but the extension to multiple color bands(for both image andﬁlters)and simultaneous sparse coding of multiple images is mathematically straightforward.The original algorithm proposed for convolutional sparse coding[7]adopted a splitting technique with alternating minimization of two subproblems,theﬁrst consisting of the solution of a large linear system via an iterative method, and the other a simple shrinkage.The resulting alternating minimization algorithm is similar to one that would be ob-tained within an Alternating Direction Method of Multipliers (ADMM)[8,9]framework,but requires continuation on the auxiliary parameter to enforce the constraint inherent in the splitting.All computation is performed in the spatial domain, the authors expecting that computation in the Discrete Fourier Transform(DFT)domain would result in undesirable bound-ary artifacts[7].Other algorithms that have been proposed for this problem include coordinate descent[10],and a proximal gradient method[11],both operating in the spatial domain.Very recently,an ADMM algorithm operating in the DFT domain has been proposed for dictionary learning for con-volutional sparse representations[12].The use of the Fast Fourier Transform(FFT)in solving the relevant linear sys-tems is shown to give substantially better asymptotic perfor-mance than the original spatial domain method,and evidence is presented to support the claim that the resulting boundary2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP)effects are not signiﬁcant.The present paper describes a convolutional sparse coding algorithm that is derived within the ADMM framework and exploits the FFT for computational advantage.It is very sim-ilar to the sparse coding component of the dictionary learning algorithm of[12],but introduces a method for solving the linear systems that dominate the computational cost of the al-gorithm in time that is linear in the number ofﬁlters,instead of cubic as in the method of[12].2.ADMM ALGORITHMRewriting(2)in a form suitable for ADMM by introducing auxiliary variables{y m},we havearg min {x m},{y m}12md m∗x m−s22+λmy m 1 such that x m−y m=0∀m,(3)for which the corresponding iterations(see[8,Sec.3]),with dual variables{u m},are{x m}(k+1)=arg min{x m}12md m∗x m−s22+ρ2mx m−y(k)m+u(k)m22(4){y m}(k+1)=arg min{y m}λmy m 1+ρ2mx(k+1)m−y m+u(k)m22(5)u(k+1) m =u(k)m+x(k+1)m−y(k+1)m.(6)Subproblem(5)is solved via shrinkage/soft thresholding asy(k+1) m =Sλ/ρx(k+1)m+u(k)m,(7)whereSγ(u)=sign(u) max(0,|u|−γ),(8) with sign(·)and|·|of a vector considered to be applied element-wise.The computational cost is O(MN).The only computationally expensive step is solving(4), which is of the formarg min {x m}12md m∗x m−s22+ρ2mx m−z m 22.(9)2.1.DFT Domain FormulationAn obvious approach is to attempt to exploit the FFT for ef-ﬁcient implementation of the convolution via the DFT convo-lution theorem.(This does involve some increase in memory requirement since the d m are zero-padded to the size of the x m before application of the FFT.)Deﬁne linear operators D m such that D m x m=d m∗x m,and denote the variables D m,x m,s,and z m in the DFT domain byˆD m,ˆx m,ˆs,andˆz m respectively.It is easy to show via the DFT convolution theorem that(9)is equivalent toarg min{ˆx m}12mˆDmˆx m−ˆs22+ρ2mˆx m−ˆz m 22(10)with the{x m}minimizing(9)being given by the inverse DFT of the{ˆx m}minimizing(10).DeﬁningˆD= ˆDˆD1...,ˆx=⎛⎜⎝ˆx0ˆx1...⎞⎟⎠,ˆz=⎛⎜⎝ˆz0ˆz1...⎞⎟⎠,(11) this problem can be expressed asarg minˆx12ˆDˆx−ˆs22+ρ2ˆx−ˆz 22,(12) the solution being given by(ˆD HˆD+ρI)ˆx=ˆD Hˆs+ρˆz.(13) 2.2.Independent Linear SystemsMatrixˆD has a block structure consisting of M concatenated N×N diagonal matrices,where M is the number ofﬁlters and N is the number of samples in s.ˆD HˆD is an MN×MN matrix,but due to the diagonal block(not block diagonal) structure ofˆD,a row ofˆD H with its non-zero element at col-umn n will only have a non-zero product with a column ofˆD with its non-zero element at row n.As a result,there is no interaction between elements ofˆD corresponding to differ-ent frequencies,so that(as pointed out in[12])one need only solve N independent M×M linear systems to solve(13). Bristow et al.[12]do not specify how they solve these linear systems(and their software implementation was not available for inspection),but since they rate the computational cost of solving them as O(M3),it is reasonable to conclude that they apply a direct method such as Gaussian elimination.This can be very effective[8,Sec. 4.2.3]when it is possible to pre-compute and store a Cholesky or similar decomposition of the linear system(s),but in this case it is not practical unless M is very small,having an O(M2N)memory requirement for storage of these decomposition.Nevertheless,this remains a reasonable approach,the only obvious alternative being an iterative method such as conjugate gradient(CG).A more careful analysis of the unique structure of this problem,however,reveals that there is an alternative,and vastly more effective,solution.First,deﬁne the m th block of the right hand side of(13)asˆr m=ˆD H mˆs+ρˆz m,(14)so that⎛⎜⎝ˆr 0ˆr 1...⎞⎟⎠=ˆDH ˆs +ρˆz .(15)Now,denoting the n th element of a vector x by x (n )to avoid confusion between indexing of the vectors themselves and se-lection of elements of these vectors,deﬁnev n =⎛⎜⎝ˆx 0(n )ˆx 1(n )...⎞⎟⎠b n =⎛⎜⎝ˆr 0(n )ˆr 1(n )...⎞⎟⎠,(16)and deﬁne a n as the column vector containing all of the non-zero entries from column n of ˆDH ,i.e.writing ˆD =⎛⎜⎜⎜⎝ˆd 0,00...ˆd 1,00...0ˆd 0,10...0ˆd 1,10...00ˆd 0,2...00ˆd 1,2...........................⎞⎟⎟⎟⎠(17)thena n =⎛⎜⎝ˆd ∗0,nˆd ∗1,n ...⎞⎟⎠,(18)where ∗denotes complex conjugation.The linear system to solve corresponding to element n of the {x m }is (a n a H n +ρI )v n =b n .(19)The critical observation is that the matrix on the left handside of this system consists of a rank-one matrix plus a scaled identity.Applying the Sherman-Morrison formula(A +uv H )−1=A −1−A −1uv H A −11+u H A −1v (20)gives(ρI +aa H )−1=ρ−1 I −aaHρ+a H a,(21)so that the solution to (19)isv n =ρ−1b n −a H n b nρ+a H n a na n.(22)The only vector operations here are inner products,element-wise addition,and scalar multiplication,so that this method is O (M )instead of O (M 3)as in [12].The cost of solving N of these systems is O (MN ),and the cost of the FFTs is O (MN log N ).Here it is the cost of the FFTs that dominates,whereas in [12]the cost of solving the DFT domain linear systems dominates the cost of the FFTs.This approach can be implemented in an interpreted lan-guage such as Matlab in a form that avoids explicit iteration over the N frequency indices by passing data for all N in-dices as a single array to the relevant linear-algebraic routines (commonly referred to as vectorization in Matlab terminol-ogy).Some additional computation time improvement is pos-sible,at the cost of additional memory requirements,by pre-computing a H n /(ρ+a Hn a n )in (22).2.3.Algorithm SummaryThe proposed algorithm is summarized in Alg.1.stop-ping criteria are those discussed in [8,Sec.3.3],together withan upper bound on the number of iterations.The options for the ρupdate are (i)ﬁxed ρ(i.e.no update),(ii)the adaptive update strategy described in [8,Sec. 3.4.1],and the multi-plicative increase scheme advocated in [12].Input :image s ,ﬁlter dictionary {d m },parameters λ,ρPrecompute:FFTs of {d m }→{ˆDm },FFT of s →ˆs Initialize:{y m }={u m }=0while stopping criteria not met doCompute FFTs of {y m }→{ˆym },{u m }→{ˆu m }Compute {ˆxm }using the method in Sec.2.2Compute inverse FFTs of {ˆxm }→{x m }{y m }=S λ/ρ({x m }+{u m }){u m }={u m }+{x m }−{y m }Update ρif appropriate endOutput :Coefﬁcient maps {x m }Algorithm 1:Summary of proposed ADMM algorithm The computational cost of the algorithm components is O (MN log N )for the FFTs,order O (MN )for the proposed linear solver,and O (MN )for both the shrinkage and dual variable update,so that the cost of the entire algorithm is O (MN log N ),dominated by the cost of FFTs.In contrast,the cost of the algorithm proposed in [12]is O (M 3N )(there is also an O (MN log N )cost for FFTs,but it is dominated by the O (M 3N )cost of the linear solver),and the cost of the original spatial-domain algorithm [7]is O (M 2N 2L ),where L is the dimensionality of the ﬁlters.3.DICTIONARY LEARNINGThe extension of (2)to learning a dictionary from training data involves replacing the minimization with respect to x m with minimization with respect to both x m and d m .The op-timization is invariably performed via alternating minimiza-tion between the two variables,the most common approach consisting of a sparse coding step followed by a dictionary update [13].The commutativity of convolution suggests that the DFT domain solution of Sec.2.1can be directly applied in minimizing with respect to d m instead of x m ,but this is not possible since the d m are of constrained size,and must be zero-padded to the size of the x m prior to a DFT domain im-plementation of the convolution.If the size constraint is im-plemented in an ADMM framework [14],however,the prob-lem is decomposed into a computationally cheap subproblem corresponding to a projection onto to constraint set,and an-other subproblem that can be efﬁciently solved by extending the method in Sec.2.1.This iterative algorithm for the dictio-nary update can alternate with a sparse coding stage to form amore traditional dictionary learning method [15],or the sub-problems of the sparse coding and dictionary update algo-rithms can be combined into a single ADMM algorithm [12].4.RESULTScomparison of execution times for the algorithm (λ=0.05)with different methods of solving the linear system,for a set of overcomplete 8×8DCT dictionaries and the 512×512greyscale Lena image,is presented in Fig.1.It is worth em-phasizing that this is a large image by the standards of prior publications on convolutional sparse coding;the test images in [12],for example,are 50×50and 128×128pixels in size.The Gaussian elimination solution is computed using a Cholesky decomposition (since it is,in general,impossible to cache this decomposition,it is necessary to recompute it at every solution),as implemented by the Matlab mldivide function,and is applied by iterating over all frequencies in the apparent absence of any practical alternative.The conjugate gradient solution is computed using two different relative error tolerances.A signiﬁcant part of the computational advantage here of CG over the direct method is that it is applied simultaneously over all frequencies.The two curves for the proposed solver based on the Sherman-Morrison formula illustrate the signiﬁcant gain from an implementation that simultaneously solves over all frequencies and that the relative advantage of doing so de-creases with increasing M .Dictionary size (M )E x e c u t i o n t i m e (s )512256128641e+051e+041e+031e+021e+01Fig.1.A comparison of execution times for 10steps of the ADMM algorithm for different methods of solving the lin-ear system:Gaussian elimination (GE),Conjugate Gradient with relative error tolerance 10−5(CG 10−5)and 10−3(CG 10−3),and Sherman-Morrison implemented with a loop over frequencies (SM-L)or jointly over all frequencies (SM-V).The performance of the three ρupdate strategies dis-cussed in the previous section was compared by sparse cod-ing a 256×256Lena image using a 9×9×512dictionary (from [16],by the authors of [17])with a ﬁxed value of λ=0.02and a range of initial ρvalues ρ0.The resulting values of the functional in (2)after 100,500,and 1000itera-tions of the proposed algorithm are displayed in Table 1.The adaptive update strategy uses the default parameters of [8,Sec. 3.4.1],and the increasing strategy uses a multiplica-tive update by a factor of 1.1with a maximum of 105,as advocated by [12].In summary,a ﬁxed ρcan perform well,but is sensitive to a good choice of parameter.When initialized with a small ρ0,the increasing ρstrategy provides the most rapid decrease in functional value,but thereafter converges very slowly.Over-all,unless rapid computation of an approximate solution is desired,the adaptive ρstrategy appears to provide the best performance,with the least sensitivity to choice of ρ0.This is-sue is complex,however,and further experimentation is nec-essary before drawing any general conclusions that could be considered valid over a broad range of problems.Iter.ρ010−210−1100101102103Fixed ρ10028.2727.8018.1010.099.7611.6050028.0522.2511.118.899.1110.13100027.8017.009.648.828.969.71Adaptive ρ10021.6216.9714.5610.7111.1411.4150010.8110.239.819.019.189.0910009.449.219.068.838.878.84Increasing ρ10014.789.829.509.9011.5115.155009.559.459.469.8911.4714.5110009.539.449.459.8811.4113.97Table parison of functional value convergence for thesame problem with three different ρupdate strategies.5.CONCLUSIONA computationally efﬁcient algorithm is proposed for solving the convolutional sparse coding problem in the Fourier do-main.This algorithm has the same general structure as a pre-viously proposed approach [12],but enables a very signiﬁcantreduction in computational cost by careful design of a linear solver for the most critical component of the iterative algo-rithm.The theoretical computational cost of the algorithm is reduced from O (M 3)to O (MN log N )(where N is the di-mensionality of the data and M is the number of elementsin the dictionary),and is also shown empirically to result in greatly reduced computation time.The signiﬁcant improve-ment in efﬁciency of the proposed approach is expected togreatly increase the range of problems that can practically be addressed via convolutional sparse representations.6.REFERENCES[1]A.M.Bruckstein,D.L.Donoho,and M.Elad,“Fromsparse solutions of systems of equations to sparse mod-eling of signals and images,”SIAM Review,vol.51, no.1,pp.34–81,2009.doi:10.1137/060657704[2]S.S.Chen,D.L.Donoho,and M.A.Saunders,“Atomicdecomposition by basis pursuit,”SIAM Journal on Sci-entiﬁc Computing,vol.20,no.1,pp.33–61,1998.doi:10.1137/S1064827596304010[3]J.Wright,A.Y.Yang,A.Ganesh,S.S.Sastry,andY.Ma,“Robust face recognition via sparse representa-tion,”IEEE Transactions on Pattern Analysis and Ma-chine Intelligence,vol.31,no.2,pp.210–227,February 2009.doi:10.1109/tpami.2008.79[4]Y.Boureau,F.Bach,Y.A.LeCun,and J.Ponce,“Learn-ing mid-level features for recognition,”in Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition(CVPR),June2010,pp.2559–2566.doi:10.1109/cvpr.2010.5539963[5]J.Yang,K.Yu,and T.S.Huang,“Supervisedtranslation-invariant sparse coding,”Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition(CVPR),pp.3517–3524,2010.doi:10.1109/cvpr.2010.5539958[6]J.Mairal,F.Bach,and J.Ponce,“Task-driven dictionarylearning,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.34,no.4,pp.791–804,April 2012.doi:10.1109/tpami.2011.156[7]M.D.Zeiler,D.Krishnan,G.W.Taylor,and R.Fer-gus,“Deconvolutional networks,”in Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition(CVPR),June2010,pp.2528–2535.doi:10.1109/cvpr.2010.5539957[8]S.Boyd,N.Parikh,E.Chu,B.Peleato,and J.Eckstein,“Distributed optimization and statistical learning via the alternating direction method of multipliers,”Founda-tions and Trends in Machine Learning,vol.3,no.1,pp.1–122,2010.doi:10.1561/2200000016[9]J.Eckstein,“Augmented Lagrangian and alternatingdirection methods for convex optimization:A tutorial and some illustrative computational results,”Rutgers Center for Operations Research,Rutgers University, Rutcor Research Report RRR32-2012,December 2012.[Online].Available:/pub/ rrr/reports2012/322012.pdf[10]K.Kavukcuoglu,P.Sermanet,Y.Boureau,K.Gregor,M.Mathieu,and Y.A.LeCun,“Learning convolutionalfeature hierachies for visual recognition,”in Advances in Neural Information Processing Systems(NIPS2010), 2010.[11]R.Chalasani,J.C.Principe,and N.Ramakrishnan,“A fast proximal method for convolutional sparse cod-ing,”in Proceedings of the International Joint Confer-ence on Neural Networks(IJCNN),Aug.2013,pp.1–5.doi:10.1109/IJCNN.2013.6706854[12]H.Bristow, A.Eriksson,and S.Lucey,“Fast con-volutional sparse coding,”in Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition(CVPR),Jun.2013,pp.391–398.doi:10.1109/CVPR.2013.57[13]B.Mailh´e and M.D.Plumbley,“Dictionary learningwith large step gradient descent for sparse representa-tions,”in Latent Variable Analysis and Signal Sepa-ration,ser.Lecture Notes in Computer Science,F.J.Theis,A.Cichocki,A.Yeredor,and M.Zibulevsky,Eds.Springer Berlin Heidelberg,2012,vol.7191,pp.231–238.doi:10.1007/978-3-642-28551-629[14]M.V.Afonso,J.M.Bioucas-Dias,and M. A.T.Figueiredo,“An Augmented Lagrangian approach to the constrained optimization formulation of imaging inverse problems,”IEEE Transactions on Image Pro-cessing,vol.20,no.3,pp.681–695,March2011.doi:10.1109/tip.2010.2076294[15]K.Engan,S.O.Aase,and J.H.Husøy,“Method ofoptimal directions for frame design,”in Proceedings of the IEEE International Conference on Acoustics, Speech,and Signal Processing(ICASSP),vol.5,1999, pp.2443–2446.doi:10.1109/icassp.1999.760624 [16]J.Mairal,Software available from http://lear.inrialpes.fr/people/mairal/denoise ICCV09.tar.gz.[17]J.Mairal,F.Bach,J.Ponce,G.Sapiro,and A.Zis-serman,“Non-local sparse models for image restora-tion,”in Proceedings of the IEEE International Con-ference on Computer Vision(CVPR),2009,pp.2272–2279.doi:10.1109/iccv.2009.5459452。

人工智能训练模型和算法

人工智能训练模型和算法人工智能（Artificial Intelligence，简称AI）是一门研究如何使机器能够像人类一样思考和行动的科学。

而人工智能的训练模型和算法则是实现人工智能的关键技术之一。

训练模型是指通过对大量数据进行学习和训练，使得机器能够从中提取到数据的规律和特征，并且能够根据这些规律和特征进行预测和判断。

而算法则是指在训练模型的过程中所使用的数学和逻辑方法，用于优化模型的性能和准确率。

在人工智能的训练模型和算法中，最常见的是机器学习（Machine Learning）和深度学习（Deep Learning）。

机器学习是一种通过训练模型来使机器具备学习能力的方法。

它通过对大量数据进行学习和训练，使得机器能够从中发现数据的规律和特征，并且能够根据这些规律和特征进行预测和判断。

机器学习的训练模型和算法包括决策树、支持向量机、朴素贝叶斯等。

深度学习是一种基于神经网络的机器学习方法，它模拟了人脑中神经元之间的连接方式。

深度学习通过多层神经网络来提取数据的高级特征，并且能够自动学习和调整网络中的参数，从而实现对复杂数据的学习和识别。

深度学习的训练模型和算法包括卷积神经网络、循环神经网络、生成对抗网络等。

在人工智能的训练模型和算法中，还有一些其他的方法和技术，如强化学习、遗传算法等。

强化学习是一种通过试错和奖惩机制来训练模型的方法，它通过不断尝试和调整行为，从而使机器能够学会最优的策略。

遗传算法是一种模拟生物进化过程的算法，它通过对模型进行适应度评估和选择，从而不断优化模型的性能和准确率。

人工智能的训练模型和算法在各个领域中都有广泛的应用。

在医疗领域，它可以帮助医生诊断疾病和制定治疗方案；在金融领域，它可以帮助银行进行风险评估和信用评级；在交通领域，它可以帮助智能驾驶车辆进行自主导航和交通管理等。

通过训练模型和算法，人工智能可以不断学习和进化，为人类带来更多的便利和创新。

人工智能的训练模型和算法是实现人工智能的关键技术之一。

大模型目标追踪算法

大模型目标追踪算法
大模型目标追踪算法是一种用于计算机视觉领域的算法，旨在
通过分析视频流或图像序列来识别和跟踪特定的目标。

这些算法通
常基于深度学习技术，利用大规模数据集进行训练，以便在复杂场
景下实现高精度的目标追踪。

从技术角度来看，大模型目标追踪算法通常基于卷积神经网络（CNN）或循环神经网络（RNN）等深度学习架构。

这些算法通过对
目标的特征进行提取和学习，从而实现对目标在连续帧中的位置和
运动状态进行准确预测和跟踪。

此外，一些算法还结合了多目标跟踪、实例分割和运动估计等技术，以提高目标追踪的鲁棒性和准确性。

从应用角度来看，大模型目标追踪算法在自动驾驶、视频监控、无人机跟踪、人机交互等领域具有广泛的应用前景。

在自动驾驶领域，目标追踪算法可以帮助车辆识别和跟踪其他车辆、行人和障碍物，从而实现安全驾驶和智能导航。

在视频监控领域，该算法可用
于实时监测和跟踪特定目标，如犯罪嫌疑人或失踪人员，以提高监
控系统的效率和准确性。

总之，大模型目标追踪算法是一种基于深度学习技术的计算机视觉算法，具有在复杂场景下实现高精度目标追踪的能力，并在自动驾驶、视频监控等领域具有广泛的应用前景。

无线链路质量评估及预测方法综述PPT学习教案

其准确性在很大程度上取决于模型参数设置，并且很难根据动态变化的网络环境进行自适应调整; 异构网络联合资源调度逐渐从用户级别向数据包级别深化。
基于模式匹配的链路质量预测机制XCoPred （usin g Cross-Correlation to Predict）
➢
提出：Ka.Farkas——“Link quality prediction in mesh networks”一文中提出
中间链路突发性特征的提出：
➢
连续可用
离散可用
➢
对无线链路的迅速变化做出反应
显著提高网络性能
。
第5页/共18页
6
南
3、无线链路质量评估及预测方法
三、无线链路质量评估及预测方法
3.1 链路质量评估和预测方法分类
第6页/共18页
7
南
3、无线链路质量评估及预测方法
3.2 经典硬度量和软度量
硬度量，指当节点收到分组时，物理层测量获得的无线参数，包括: ➢ 接收信号强度 ( RSSI，Ｒadio Signal Strength Indicator) ➢ 信噪比( SNR，Signal to Noise Ｒatio) ➢ 链路质量指示( LQI，Link Quality Indicator)
硬度量) 与距离、PRR等链路性能参数之间的直接映射关系。 Y. Xu等人在“Exploring spatial correlation for link quality estimation in wireless sensor networks”文中提出基于自由空间和对数正态阴影相结合的路径衰落模型，结合误比特率获得 PRR与通信距离之间的关系。 G．Carles等人在“Impact of LQI based routing metrics on the performance of a one-to-one routing protocol for IEEE 802.15.4 multihop networks”文中提出提出 LETX( LQI-based ETX) 方法，利用分段线性模型建立 M． Senel等人在“A kalman filter based link quality estimation scheme for wireless sensor networks”文中提出利用预先拟合的 SNR-PRR映射曲线，以 SNR为索引获得此时的 PRR。

利用重要性采样提高深度学习模型的学习效率

利用重要性采样提高深度学习模型的学习效率深度学习在计算机科学和人工智能领域中取得了巨大的成功。

然而，深度学习模型的训练往往需要大量的时间和计算资源，尤其是当处理大规模数据集时，会面临训练过程变慢的问题。

重要性采样（importance sampling）是一种常用的方法，可用于提高深度学习模型的学习效率。

本文将探讨重要性采样的原理和在深度学习中的应用。

重要性采样是一种用于减少采样偏差并提高采样效率的技术。

在深度学习中，模型的训练通常基于大量的采样数据集。

然而，某些样本的重要性可能高于其他样本，它们对模型的训练结果更具影响力。

因此，传统的随机采样方法可能会在采样过程中忽略掉一些重要的样本，从而导致训练效率低下。

重要性采样通过为各个样本赋予不同的采样权重，提高了对重要样本的采样概率，从而更有效地探索样本空间。

在深度学习中，重要性采样的应用可以通过两种主要方式来实现：重要性采样训练和重要性采样调整。

首先，重要性采样训练是一种基于重要性采样的模型训练方法。

它通过调整样本的权重来降低对低重要性样本的关注，同时增加对高重要性样本的关注。

这样一来，模型将更有可能学习到那些具有更大贡献的样本特征，从而提高模型的学习效率。

重要性采样训练可以应用于深度学习中的各个阶段，包括数据预处理、模型训练和优化等。

其次，重要性采样调整是一种基于重要性采样的模型参数更新方法。

在传统的梯度下降算法中，每个样本的梯度都被视为具有相同的重要性。

而重要性采样调整方法则根据每个样本的采样权重，调整对应的梯度，使得对于更重要的样本，其梯度对模型参数的更新更有贡献。

通过这种方式，模型能够更有效地更新参数，从而加快模型的收敛速度，提高学习效率。

此外，重要性采样还可以与其他技术相结合，以进一步提高深度学习模型的学习效率。

例如，与自适应采样方法结合使用，可以根据每个样本的重要性动态调整采样概率，从而更好地平衡采样效率和样本质量。

另外，与优化方法相结合，例如基于梯度的优化方法，可以更好地利用重要性采样的信息，加速模型的学习过程。

EISCAT_3D

EISCAT_3DThe next generation European Incoherent Scatter radarsystemIntroduction and Brief BackgroundThe high latitude environment is of increasing importance, not only for purely scientific studies, but because of the direct effects on technological systems and climate which are principally mediated through interactions with solar produced particles and fields and whose effects are overwhelmingly concentrated in the polar and high latitude areas. These effects are of importance not only from a European dimension, but globally, since the European arctic and high arctic areas are the most accessible, and best supported by installed infrastructure and existing communities, of any place on the Earth from which the necessary observations and measurements can be made.Mankind is entering a period where full knowledge and understanding of the Earth’s environment as part of the linked Sun-Earth system is essential and it is important to exploit existing advantages to provide effective and continuous monitoring of the critical interaction regions. Incoherent scatter radar is the most effective, ground-based technique for studying and monitoring the upper atmosphere and ionosphere.The radars of the EISCAT Scientific Association have defined the present state of the art within the World’s incoherent scatter community for the last several years. EISCAT has been very successful in exploiting its two radars located on the Scandinavian mainland (one operating at 931 MHz and the second at 231 MHz) and, indeed, this success led directly to the design, construction, and operation of the EISCAT Svalbard Radar nearly 1000 km further north almost ten years ago.Other radars exist at Irkutsk (Russia), Kharkov (Ukraine), Kyoto (Japan), Söndreström (Greenland), Millstone Hill (USA), Arecibo (Puerto Rico, USA), Jicamarca (Peru), and in Indonesia. Substantial recent investment by the USA will lead to the availability of new American sector high-latitude radars, to be located in central Alaska, and in northern Canada, by late 2007. These radars will have technical abilities beyond those which can be provided by the existing EISCAT radars either in their present form or through reasonable upgrades.The EISCAT Scientific Association, in co-operation with the University of Tromsø, Luleå University of Technology, and the Rutherford Appleton Laboratory, has therefore started a four-year design exercise, supported by European Union funding under the Sixth Framework initiative, which builds on its past successes and aims to maintain its world leadership role in this field.Mindful that the driving issues in ionospheric research continuously evolve, this document outlines the required specifications of a new concept in incoherentscatter radars which can both replace the two existing, but now aging, European mainland systems and also substantially extend the systems’ capabilities as required to address the scientific and service requirements of the next fifteen to twenty years. The facility envisaged in this design study will surpass all other facilities, both existing and under construction, and will provide researchers with access to the World’s most advanced and capable facility.In order to best exploit and enhance European technological resources and to arrive at the best possible design to address the needs of the present, and expanding, European community, the design exercise must evaluate a number of different approaches which are now possible in order to meet the required performance as cheaply, and efficiently, as possible.It will require the development of new radar and signal processing technology, together with crucial developments in polarisation control, built-in interferometric capabilities, the provision of remote receiving installations with electronic beam forming, signal processing, and automated data analysis.The design study also includes a component to design communication, data distribution, and data archiving systems which leverage the available skills and existing network and Grid structures within the Community. These developments will allow European scientists and other users to access data from the new systems irrespective of their location within the community.The new facility will greatly extend the range of available data, dramatically improving its temporal and spatial resolution as well as the geographic, altitude, and temporal extent. The design goals mandate improvements in the achievable temporal and spatial resolution (both parallel and perpendicular to the radar line-of-sight) by about an order of magnitude, to extend the unambiguous instantaneous measurement of full-vector ionospheric drift velocities from a single point to cover the entire altitude range of the radar and to increase the operational time by a factor of at least four (from 12 to 50%), and possibly to full-time.The facility will provide high-quality ionospheric and atmospheric parameters on an essentially continuous basis for both academic researchers and practical consumers as well as providing near-instantaneous response capabilities for scientists and users who need data to study unusual and unpredicted disturbances and phenomena in the high-latitude ionosphere and atmosphere. While users will sometimes visit the facility to obtain maximum access and response from the system, it will be possible for the radar to be operated entirely remotely and access to both the control and monitoring systems and the raw and processed data streams will be provided through secure network connections. Besides supporting data consumers for service driven applications (such as Space Weather effect forecasting), the new facility will support studies of such topics as: ion outflow to the magnetosphere, auroral acceleration, small scale plasma physics, induced changes in the ionosphere, magnetic reconnection,sub-storms, ionosphere-neutral atmosphere coupling, mesospheric Physics, and solar wind acceleration.Science DriversInterest in high latitude ionospheric physics has been concentrated for some years in the large scale convection processes occurring within the Polar Cap and at the poleward edge of the auroral oval. However, it has become increasingly clear that the processes which mediate even the largest scale effects are predominantly controlled by very the physics of small scale rapid interactions and this has led to a renewed interest in auroral electrodynamics and plasma physics. Scientists now need to study short time-scale, small spatial scale targets and processes typified by:•Quasi-coherent echoes from small plasma targets in E and F regions,•Controlled experiments on PMSE using the EISCAT Heater,•Routine D-region /middle atmosphere incoherent scatter measurements, •Statistics of E-region micrometeor head echoes for planetology,•Instantaneous E fields at many altitudes along a whole magnetic field line, Users also need to study ionospheric topside processes which have proved very difficult in the European arctic sector, perhaps as a result of asymmetries in the Earth’s magnetic field which appears to lead to relatively low levels of precipitation in this area, almost a mirror image of the situation in the vicinity of the better known South Atlantic Anomaly. For these studies, good observations are required of, for example:•H/He/O ratios the in polar ionosphere•“Polar wind” plasma outflowSystem RequirementsFor applications of this type, the existing EISCAT mainland radar systems have almost reached their limits:•Present best cross-beam resolution is ≈1 km at 100 km altitude (UHF), whereas the required resolution is now <<100 m,•Present best along-beam resolution is ≈300 m (set by available transmitter band-width), whereas the required resolution is now <100 m, •Present best temporal resolution in the D-region, lower E-region, and throughout the topside is of the order of many minutes, whereas it is clear that resolutions <10 s are often required to resolve the observed phenomena,•True three-dimensional electric field measurements can currently only be made at one altitude at a time and integration time constrains, as well as antenna motion times, result in little correlation between measurements made sequentially at different altitudes. Measurements made using the radars in monostatic modes suffer from essentially the same problem and cannot unambiguously differentiate between temporal and spatial effectswhen measuring rapid, small scale effects. Effective observations require the determination of true three dimensional parameters over at least 5-10 altitudes simultaneously.EISCAT_3DThe EISCAT-3D design study is intended to develop a feasible design for a new radar to address the needs of the European, and Global, community for the next 15-20 years.The four-year design study, which started on May 1, 2005, is supported by the European Union under the Sixth Framework program and European Union funding accounts for approximately half of the total cost of approximately 4M€. The study is being conducted by a consortium led by the EISCAT Scientific Association, and including the University of Tromsø (Norway) Luleå Technical University (Sweden) and the Rutherford Appleton Laboratory (United Kingdom). Within EISCAT, the effort has required the temporary diversion of more than 10% of staff effort from radar operations and maintenance.Completion of a successful design, and the timely availability of adequate construction funds (presently estimated to be in the region of 60M€), could see substantial parts of the new system in place, and producing good scientific data, as early as 2012.EISCAT_3D Design (current plan)Overall configurationCommon transmitter facility with RX capabilities:–Close to the present Tromsø (NO) EISCAT site–Operating frequency in the (225-240) MHz range–Power amplifiers utilising VHF TV power FETs–Phased-array system with > 16 K elements, Ppk > 2 MW–Actual antenna configuration and performance TBD–>3 outlier, RX-only array modules for interferometry–Fully digital, post-sampling beam-forming on receive–Comprehensive interferometric capabilities built-in2 + 2 very large receive-only (”remote”) arrays:–Actual sites TBD, four promising sites investigated...–Filled apertures, long enough to provide ~1 km beam resolution at E- region altitudes above transmitter–Medium gain (~10 dBi) element antennas–Fully digital, post-sampling beam-forming–Sufficient local signal processing power to generate at least five simultaneous beams–10 Gb/s connections for data transfer and remote control and monitoringRadar field-of-view (FOV)The beam generated by the central core transmit/receive antenna array will be steerable to a maximum zenith angle of ≈ 40o in all azimuth directions. At 300 km altitude, the radius of the resulting field-of-view is approximately 200 km. In theN-S plane this corresponds to a latitudinal coverage of ± 1.80° relative to the transmitter site.The antenna arrays at the EISCAT_3D receiving facilities will be arranged to permit tri-static observations to be made throughout the central core FOV at all altitudes up to 800 km.Beam steeringIt will be possible to steer the beam from the central core TX/RX antenna arrayinto any one of > 12000 discrete pointing directions, regularly distributed over itsFOV and separated by on average 0.625° in each of two orthogonal planes. Thebeam steering system will operate on a < 500 μs timescale.Central core parameters: First phase Fully instrumentedNumber of elements: 16000 30000116Diameter [λ]: 87Element separation [λ]: 0.6 0.6295P x A [GW m-2]: 91Half Power BW [degrees]: 0.62 0.46For comparison, for the EISCAT VHF system in Mode 1 (full antenna, 3 MW): P x A = 2.4 GW m-2, HPBW = 0.6 x 1.7 degreesTransmitter parametersCentre frequency 220-250MHz, subject to allocationPeak output power at least 2 MWInstantaneous -1dB power bandwidth at least 5 MHzPulse length 0.5-2000 μsPulse repetition frequency 0-3000 HzModulation Arbitrarywaveforms,limited only by power bandwidthReceiver parametersCentre frequency matching transmitter centre frequency Instantaneous bandwidth ±15 MHzOverall noise temperature better than 50 K, referenced to inputterminals Spurious-free dynamic range better than 70 dBSensor performance in incoherent scatter modePotential distribution in Northern ScandinaviaThe EISCAT_3D Test Array (“Demonstrator”)A 200 m2 filled array has being erected at the EISCAT Kiruna site to provide facilities for validating several critical aspects of the full-scale 3D ”remote” (receive-only) array in practice under realistic climatic conditions.The array is oriented in the Tromsø-Kiruna plane and consists of 48 short, 6+6 element crossed-Yagis at 55° elevation providing coverage between ~200 and 800 km above Tromsø. A centre frequency of 224 ± 3 MHz allows reception of transmissions from existing the Tromsø VHF system and the achievable SNR is estimated to be sufficient for useful bistatic incoherent scatter observations (> 6% @ 300 km, for electron densities of 1011m-3.The test array will be used to investigate:–Receiver front ends, A/D conversion– SERDES, copper/optical/copper conversion,– Time-delay beam-steering,–Simultaneous forming of multiple beams,–Adaptive pointing and (self-) calibration,– Adaptive polarisation matching,–Interferometry trigger processor,–Digital back-end/correlator for standard incoherent scatter– Time-keepingTest array at Kiruna。

KhanEtAl_MUB_ModularRecursiveKinDyn

Department of Mechanical Engineering, IIT Delhi
Hale Waihona Puke Jorge Angeles (angeles@cim.mcgill.ca)
Centre for Intelligent Machines, McGill University Abstract. Constrained multibody systems typically possess multiple closed kinematic loops that constrain the relative motions and forces within the system. Typically, such systems possess far more articulated degrees-of-freedom (within the chains) than overall end-eﬀector degrees-of-freedom. While this creates many possibilities for selecting the location of actuation within the articulations, the resulting constrained systems also feature mixtures of active and passive joints within the chains. The presence of these passive joints interferes with the eﬀective modular formulation of the dynamic equations-of-motion in terms of a minimal set of actuator coordinates as well the subsequent recursive solution for both forward and inverse dynamics applications. Thus, in this paper, we examine the development of modular and recursive formulations of equationsof-motion in terms of a minimal set of actuated-joint-coordinates for such constrained multibody systems, exempliﬁed by exactly-actuated parallel manipulators. The 3 RRR planar parallel manipulator, selected to serve as a case-study, is an illustrative example of a multi-loop, multi-degree-of-freedom system with mixtures of active/passive joints. The concept of decoupled natural orthogonal complement (DeNOC) is combined with the spatial parallelism inherent in parallel mechanisms to develop a dynamics formulation that is both recursive and modular. An algorithmic approach to the development of both forward and inverse dynamics is highlighted. The presented simulation studies highlight the overall good numerical behavior of the developed formulation, both in terms of accuracy and lack of formulation stiﬀness. Keywords: recursive kinematics, modular dynamics, decoupled natural orthogonal complement, parallel manipulators.

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

14Reviews in Mineralogy & GeochemistryV ol. 58, pp. 375-387, 2005Copyright © Mineralogical Society of America 1529-6466/05/0058-0014$05.00DOI: 10.2138/rmg.2005.58.14Exploiting 3D Spatial Sampling in Inverse Modeling of Thermochronological DataKerry Gallagher 1, John Stephenson 1, Roderick Brown 2, Chris Holmes 3, Pedro Ballester 11Dept. of Earth Sciences and Engineering Imperial College London South Kensington, London, SW7 2AS, England2Division of Earth Sciences Gregory Building University of Glasgow Glasgow, G12 8QQ, Scotland3Dept. of Statistics University of Oxford 1 South Parks Road Oxford, OX1 3TG, EnglandINTRODUCTIONThe development of quantitative models for ﬁ ssion track annealing (Laslett et al. 1987; Carlson 1990; Laslett and Galbraith 1996; Ketcham et al. 1999) and more recently, helium diffusion in apatite (Wolf et al. 1996; Farley 2000), has allowed direct inference of the temperature history of the host rocks, and a more indirect inference of denudation chronologies (see Kohn et al. this volume, and references therein). An example of a model prediction of AFT parameter and (U-Th)/He age for a speci ﬁ ed thermal history is given in Figure 1. Various approaches exist to extract a thermal history model directly from the data, and these focus around inverse modeling (Corrigan 1991; Gallagher 1995; Issler 1996; Willett 1997; Ketcham et. al. 2000). The user speci ﬁ es some constraints on the thermal history (e.g., upper and lower bounds on the temperature time, and heating/cooling rate), and then typically some form of stochastic sampling is adopted to infer either the most likely thermal history (ideally with some measure of the uncertainty of the solution), and/or a family of acceptable thermal histories. In both the forward and inverse approaches, the thermal history is typically parameterized as nodes in time-temperature space, with some form of interpolation between the nodes.Over recent years, one of the major applications of low temperature thermochronology has been the study of long term denudation as recorded in the cooling history of surface samples. More recently, some studies have speci ﬁ cally tried to link relatively short term, local estimates of denudation (e.g., from cosmogenic surface exposure dating) to these longer term estimates (Cockburn et al. 2000; Brown et al. 2001; Reiners et al. 2003). The step from the thermal history to denudation chronology is less direct that inferring the thermal history from the data, in that we need to make some assumptions in order to convert temperature to depth. This may involve an assumption that a 1D steady state with a constant temperature gradient over time is appropriate, or alternatively that a full 3D diffusion-advection model is required. The latter situation is not particularly amenable to an inversion approach, although recent applications have been made with a restricted number of parameters, to identify plausible solutions to relatively speci ﬁ c questions, such as the timing of relief development (Braun376Gallagher, Stephenson, Brown, Holmes & Ballester 2005). In practice, heat transfer can vary spatially (in both horizontal and vertical dimemsions) as a consequence of variations in thermal properties, in the mode of heat transfer (conduction and advection) and spatial variations in erosion rate and surface topography.In the simplest case, assuming a constant gradient, it is common to adopt a “representative” geotherm of around 25–30 °C and to additionally specify that this is constant over time. Often, we do not know the present day gradient in crystalline basement areas, although it may be possible to adopt a local value from a global heat ﬂ ow database (Pollack et al. 1993), which may also require assumptions regarding the thermal conductivity of the material that that has been removed by erosion. The role of thermal conductivity is often neglected, but the importance lies in the fact that thermal conductivities of common rocks can vary by a factor of 2–3 (Somerton 1992), and so for a constant heat ﬂ ow, the geothermal gradient will vary by a similar factor. However, the thermal conductivity or rocks is reasonably predictable, in terms of lithology, mineralogy and porosity. Consequently, if it is possible to infer the nature of the eroded material, then it is possible to make an informed judgement of the thermal conductivity.Here we consider some aspects related to inverse modeling of the thermal history and describe a strategy which aims to identify good, but simple, thermal history models. Such models are found by jointly ﬁ tting data from multiple samples, rather than taking each data set independently. The method has been developed to identify spatial variations in the thermal history, particularly in the context of identifying boundaries (such as faults), across which the thermal history may vary signi ﬁ cantly.What is a good but simple thermal history model?Although it is relatively straightforward to ﬁ nd a thermal history that ﬁ ts the observed thermochronological data, a more dif ﬁ cult, but signi ﬁ cant stage in modeling is to understand how good this thermal history is. Intuitively, we can argue a good model is one that ﬁ ts the observations satisfactorily without being overly complex, i.e., having structure that is not supported or required by the available data. Thus, the important criteria are the measure of the7 &7LPH 0D 7LPH 0D +H $JH 0D 7UDFN /HQJWK M P 1R RIWUDFNV2EV $)7 $JH 0D3UHG $)7 $JH 0D 2EV +H $JH 0D3UHG +H $JH 0D/DVOHWW HW DO 'XUDQJR 6SKHULFDO JUDLQ UDGLXV M P A HMHFWLRQ GLVWDQFH M P (D N- PRO 'R (-; TVKLS < ;O /L TVKLS ]/L 79A ](-; 7(A Figure 1. A typical forward model—the thermal history is speci ﬁ ed, and having chosen and annealing/diffusion model, we can predict the apatite ﬁ ssion track parameters (age, length distribution), and (U-Th)/He data. PRZ and PAZ are the partial retention zone, and partial annealing zones, over which the He and AFT systems are most sensitive on geological timescales.3D Spatial Sampling in Inverse Modeling 377data ﬁ t and also a measure of the model complexity. For ﬁ ssion track data, a natural choice of data ﬁ t is the log likelihood function given by Gallagher (1995). This is de ﬁ ned in terms of theobserved spontaneous and induced track counts, N s j and N i j for each crystal j of a total of N c ,and the N t individual track length measurements, l k , k = 1,N t and is given asL N N P l s j i j j N k k N ct =+−{}+==∑∑ln()ln()ln[()]()θθ1111where θ is a function of the predicted spontaneous and induced track densities (ρs , ρi ), given asθρρρ=+ss i ()2P (l k ) is the probability of having a track of length l k in the observed distribution, given that we have predicted the track lengths distribution for a particular thermal history (for details see Gallagher 1995). A common form of likelihood function, probably appropriate for (U-Th)/He dating, is based on a sum of squares statistic between observed and predicted ages, weighted by the error, which for N (U-Th)/He ages, is given asL t t i obs i pred i j N =−⎛⎝⎜⎞⎠⎟⎛⎝⎜⎜⎞⎠⎟⎟=∑ln ()σ123where t obs and t pred are the observed and predicted He ages. This form, interpreted as a log-likelihood, implicitly assumes normally distributed errors.In practice, the log-likelihood is a negative number, and we look for the thermal history that produces the maximum value of the log-likelihood (i.e., closest to zero), This is equivalent to the thermal history that has the maximum probability of producing the observed data. It is clear from Equation (1), that the value of the log-likelihood will depend on the number of data. As already alluded to above, the likelihood value will also depend on the complexity of the model in that we expect a model with more parameters to provide a better ﬁ t to the observed data. However, the issue then is whether the improvement in the data ﬁ t is suf ﬁ cient to justify the additional model parameters. One straightforward way of assessing this is through the Bayesian Information Criterion (Schwartz 1978), which is de ﬁ ned for a model, m i , asBIC m L m N i i m i ()=−()+24νlog()()where L is the log-likelihood, νmi is the number of model parameters in the current model, and N is the number of data (observations). The second term in Equation (4) penalizes the improvement in the data ﬁ t as a consequence of increasing the complexity of the model. If we consider two models, m 1 and m 2, where m 2 has more model parameters than m 1, then if BIC(m 1) < BIC(m 2), then we infer that model m 1 is preferable to m 2. The BIC is useful for model choice when we use the same number of data for all models and an implicit assumption is that the true model is contained in all the models we consider.Figure 2 shows three thermal history models inferred from the same set of synthetic data in which the parameterization captures the true model (which has 5 parameters, 3 temperatures and 2 times—we know the present day time). The ﬁ rst model is under-parameterized (3 parameters), and the third model is over-parameterized (7 parameters). As we expect, the 7 parameter model provides the best ﬁ t to the data, but the BIC implies that the improvement over the 5 parameter model is not signi ﬁ cant, while the 5 parameter model is signi ﬁ cantly better than the 3 parameter model. Heuristically, we could infer this by looking at the form of the thermal histories. The 7 parameter model does not really introduce any signi ﬁ cantly378Gallagher, Stephenson, Brown, Holmes & Ballester new features compared with the 5 parameter model, in that the extra time-temperature node effectively falls on the cooling trajectory from the maximum temperature to the present day for the 5 parameter model.Another aspect of modeling that is relevant to the approach advocated in this contribution is the role of the number of data used. Although there is likely to be redundancy in thermochronological data, incorporating more data to constrain a thermal history model generally leads to less variability in the acceptable solutions, or smaller con ﬁ dence regions about the inferred thermal history. This is illustrated in Figure 3 where 2 synthetic data sets were generated by sampling the predicted parameters for the same thermal history shown in Figure 2. We calculated the 95% con ﬁ dence regions about the best thermal history using the methods outlined in Gallagher (1995), and the results show that the inferred thermal history is better constrained with a larger amount of data than with relatively few data. This then implies that if we can group the data from multiple samples and model them jointly with a common thermal history, then the resolution of the inferred thermal history should be better than if we model each sample independently. The likelihood function for the collective samples is just the sum of the log-likelihoods for the individual samples, each calculated using the same, common thermal history. This approach will also tend to produce simpler models, as there will be a degree of compromise in jointly ﬁ tting multiple data sets.The philosophy behind our preferred strategy to modeling thermochronological data can be summarized as follows: we aim to incorporate multiple data sets into a common model, and try to ﬁ nd the simplest thermal history models that can satisfy the observed data. The BIC can be used to address the second aspect, but a remaining issue in addressing the ﬁ rst aspect is how best to group data together from a suite of irregularly distributed spatial samples. In some cases, there are natural groupings. For example, a suite of samples from a borehole, or vertical pro ﬁ les, in which a suite of samples is collected over a range of elevation at effectively on location. In these cases, the spatial relationship between the samples is the vertical offset and this can be regarded as a 1D geometry (i.e., the vertical dimension). Provided there have not been thermal perturbations within the section (e.g., due to ﬂ uid ﬂ ow or faulting), this can be directly translated into a temperature offset, such that samples at great depth in a borehole,7LPH 0D7UDFN /HQJWK M P17LPH 0D 7UDFN /HQJWK M P7LPH 0D7UDFN /HQJWK M P / / / Figure 2. Thermal histories (black lines) derived from ﬁ tting AFT synthetic data and ν is the number of model parameters (time and temperature nodes). The original thermal history has ν = 5 and is shown as the grey line (BIC = 1129.4). The shaded regions are the approximate 95% con ﬁ dence regions (see Gallagher 1995 for details). L is the log-likelihood, and although the model with ν = 7 yields the maximum likelihood, the BIC of 1140.0 implies the improvement over the model with ν = 5 does not warrant the extra model parameters. The model with ν = 3 has BIC = 1142.7.3D Spatial Sampling in Inverse Modeling 379or shallower elevation for a vertical pro ﬁ le, will have been at higher temperatures than the shallower depth (or higher elevation) samples. In practice, the thermochronological de ﬁ nition of a vertical pro ﬁ le does not require the samples to be aligned vertically, but does imply that the dominant direction of heat transfer is vertical. This means that factors such as spatial variations in thermal properties, erosion rate or surface topography, leading to lateral heat transfer, do not signi ﬁ cantly in ﬂ uence the thermal history of the samples (vertical pro ﬁ le) being considered. When dealing with sedimentary basins, there may also be complications in terms of preserving provenance signatures (re ﬂ ecting the pre-depositional thermal history of detrital grains), which can complicate the inference of the post-depositional thermal history (Carter and Gallagher 2004). However, it is straightforward to allow for this, by incorporating extra model parameters to account for the pre-depositional thermal history (which may or may not be speci ﬁ ed to be independent between samples).When dealing with samples irregularly distributed in two spatial dimensions (e.g., latitude and longitude), there is not such an obvious way to group samples in order to share a common thermal histories. One approach, that underlies the geostatistical method of kriging (e.g., Isaaks and Srivastava 1989), is to assume that samples which are close in space will have experienced similar thermal histories. As the distance between samples becomes greater, this requirement is relaxed. One problem with this assumption is that two nearby samples may be separated by a fault (which is a spatial discontinuity). Furthermore, such a fault may or may not have been active over the time span of the thermal history retrievable from the data, or the presence of a discontinuity may not have even been recognized. The most general case, in 3D, incorporates the features of the 1D vertical offset case, and the irregularly distributed 2D samples, which may be separated by unknown discontinuities.In the next sections, we review a general strategy to deal with these situations, and demonstrate the application to synthetic apatite ﬁ ssion track data, although the basic approach is completely general in terms of application to other thermochronological systems, and to combinations of different types of data, provided suitable likelihood functions can be de ﬁ ned. In all cases, we parameterize the thermal history as a series of time-temperature nodes, and specify bounds on the possible values of the temperature and time as described earlier. To ﬁ nd the thermal history models, we use stochastic sampling methods, primarily genetic algorithms7LPH 0D7UDFN /HQJWK M P 1 7UDFN /HQJWK M P17UDFN /HQJWK M P 1/ / / 1 1 17LPH 0D 7LPH 0D Figure 3. Inferred thermal histories based on different amounts of AFT data. The left panel has 10 track lengths and 5 single grain, ages, the central panel 200 lengths and 30 single grain ages, and the right panel has 500 lengths and 50 single grain ages. The absolute value of the likelihood depends on the number of data. The approximate 95% con ﬁ dence regions are based on differences in the likelihood and, when more data are used, these are smaller (i.e., the thermal history is more well resolved).380Gallagher, Stephenson, Brown, Holmes & Ballester(GA) and Markov chain Monte Carlo (MCMC). The former method is an ef ﬁ cient optimizer, i.e., for rapidly identifying the better data ﬁ tting models. The latter method provides reliable estimates of the joint and marginal probability density functions of the model parameters, from which it is to examine correlation between parameters, and to quantify the uncertainty in terms of, for example, the 95% credible range on individual model parameters. More detail on the methodology and applications to real data sets can be found in Gallagher et al. (2005) and Stephenson et al. (2005). We ﬁ rst consider the 1D vertical pro ﬁ le case, then the 2D case with an unknown number of spatial discontinuities, and ﬁ nally demonstrate the generalization to 3D. 1D modelingHere we want to exploit the spatial relationship of samples in the vertical dimension, in which we implicitly assume the lowermost sample was always the hottest and the uppermost sample was always the coolest. The situation we consider is shown in Figure 4 where samples are collected from a vertical pro ﬁ le (e.g., a borehole or up the side of a valley). We speci ﬁ ed a thermal history and generated synthetic data for a suite of such samples. These “synthetic samples” were ﬁ rst modeled independently and then modeled jointly. In the second case, the parameters for the thermal history model were speci ﬁ ed in the same way as adopted for modeling the samples independently, with additional model parameters which deal with the temperature offset between the upper and lower samples. We consider two cases in which we use a constant temperature offset and a time-varying offset. In both cases, we choose the pale-offset to be independent of the present day offset as many vertical pro ﬁ les are collected on surface samples which are often at similar present day temperatures (or the temperature offset is effectively the atmospheric temperature lapse rate, typically 5–6 °C/km).The results are shown in Figure 5. Modeling the samples independently leads to a better log-likelihood (L = 7514.60), as we expect, but there are 72 model parameters required (9 parameters for 8 samples). There are some common features, such as the rapid cooling recorded in the deepest samples, but generally the individual thermal histories show little coherence. When treating the samples jointly, and assuming a constant temperature offset,the inferred thermal history model is much simpler, with only 11 model parameters, and the $)7 $JH 0D 07/ M P6ALLEY "OREHOLE $EPTHMFigure 4. A vertical pro ﬁ le is obtained by sampling from different depths in a borehole, or different elevations in a valley. The distribution of ﬁ ssion track age and mean length with elevation is characteristic of the thermal history.3D Spatial Sampling in Inverse Modeling381Figure 5. (a) Results for modeling a synthetic vertical proﬁ le. The “observed” data and the predictions are shown as symbols, and lines, respectively as a function of elevation. The solid lines are the predictions when modeling each sample independently, and the dashed lines are the predictions when the samples are modeled jointly (on this scale there is no difference between the predictions from the models shown in panels c and d). (b) Inferred thermal histories when modeling the samples independently, labeled according to the present day elevation (BIC = 15569.5). The true thermal history for the uppermost and lowermost samples are shown as the dashed lines, and the intermediate samples are all parallel to these. (c) Thermal histories inferred by modeling the samples jointly, with a constant temperature offset over time (BIC = 15161.9). The grey shaded areas around each time-temperature point are the distributions obtained from MCMC sampling and approximate the 95% conﬁ dence regions. The lighter grey regions (around the lower temperature thermal history) incorporate the uncertainty on the temperature offset between the 2 thermal histories. (d) Thermal histories inferred by modeling the samples jointly, but allowing the temperature offset to vary over time (BIC=15175.9). This model is only marginally better than the constant offset model in terms of the likelihood, but the extra model parameters are not justiﬁ ed when assessed with the BIC. 95% credible regions about each time-temperature point imply the thermal history is well resolved. While we do not ﬁ t the data quite as well (L =−7539.60), the BIC tells us that this simpler model is readily acceptable (the difference in log-likelihood is 25). In fact the difference in the log-likelihood would need to be about an order of magnitude greater before we reject the simpler model. Allowing for a variable temperature gradient over time produces a slightly better model (L = −7539.04), the incorporation of the 3 extra model parameters is not warranted, based on the BIC.As part of the model formulation, we infer the temperature offset over time, which then gives a estimate of the temperature gradient directly from the thermochronological data. Moreover, as we use MCMC to characterize the model parameter space, we also obtain the probability distribution on the temperature gradient (Fig. 6). As mentioned earlier, the382Gallagher, Stephenson, Brown, Holmes & Ballestertemperature gradient is a key requirement to convert the thermal history to a equivalent depth or denudation chronology. The probability distributions on the thermal history and the temperature offset can be readily sampled to construct the probability distribution of the denudation estimates. If the temperature gradient changed over time, as a consequence of rapid denudation, and there is information in the thermochronological data, then this approach should extract that information (and the uncertainty). However, from our experience, introducing a time-varying offset tends to introduce too much variation, and we would certainly recommend exploring whether the difference in data ﬁ t in comparison with a constant offset model is justi ﬁ ed.2D modelingIn this situation, the spatial relationship is nearness, i.e., samples close together are likely to have similar thermal histories. As mentioned earlier, in the real world, there are discontinuities (e.g., faults). The problem then is how to group samples spatially, allowing for the presence of unknown discontinuities. Here we classify the samples into different sub-groups de ﬁ ned by discrete spatial regions or partitions,, such that the thermal history is the same for a given partition,, but varies between partitions. Also, we do not know how many partitions we should look for, i.e., one of the unknown parameters is the number of parameters. The problem as formulated here does not allow for lateral variations in the thermal history within a partition, although this is not a major problem to implement (it just requires some form of interpolation across a partition). Another requirement is that samples do not move laterally relative to each other, which may limit the application to active mountain belts involving large scale lateral transfer (e.g., the southern Alps in New Zealand).This is solved with a form of Bayesian Partition Modeling (BPM), which is more formally described by Denison et al. (2002). In essence, BPM provides a method for spatial clustering of different samples, according to the spatial structure of the data. In our case, we have an additional complication in that we are interesting in spatial clustering based on the thermal history inferred from the data for particular samples. The 2D space is parameterized with a V oronoi tessellation (Okabe et al. 2000), which are polygonal regions de ﬁ ned by an internal point, such that any sample location that falls within a given V oronoi cell is closer to that internal point. The boundaries of the V oronoi cells are drawn as the perpendicular bisectors of the internal points in each cell (Fig. 7). It is the boundaries of the partitions that are our proxy for geological discontinuities, such as faults, where the thermal history may change rapidlyover a small distance.Temperature Gradient (°C/km)P (dT/dz)Temperature Gradient (°C/km)P (dT/dz)Figure 6. Distributions on the temperature gradient for the model shown in Figure 5c, which assumes a constant palaeogradient. The true solution for both the present and palaeogradient is ~28.6 °C km −1. These distributions can be sampled to produce uncertainty estimates for denudation.3D Spatial Sampling in Inverse Modeling 383The implementation of BPM we adopt uses a dimension changing version of MCMC, known as Reversible Jump (RJ) MCMC (Green 1995), as we need to deal with an unknown number of partitions. In this approach, we can specify the minimum and maximum number of partitions we allow a priori . The maximum range is from 1 to the number of samples, but we typically choose to set the maximum number to value less than the number of samples. Otherwise, we can just model all the samples independently. In order to deal with the unknown thermal histories in each partition, we use the GA described by Ballester and Carter (2004) to ﬁ nd the optimal thermal history with each partition for a given partition con ﬁ guration generated during the MCMC run. When a given partition con ﬁ guration is repeated during the MCMC (in that the sample groupings have previously been considered), we take the earlier best GA thermal history model for the partitions in that con ﬁ guration. This particular approach can lead to the algorithm becoming somewhat static and sub-optimal. However, we can modify the algorithm to run another MCMC run on the thermal history in each partition, for a given partition con ﬁ guration, which improves the combined sampling of the model space for the thermal histories and partitions (Stephenson et al. 2005). The examples we consider in this paper are based on synthetic data, and here the resolution on the thermal history is not our primary objective. Rather we want to demonstrate the concept of implementing the partition model approach to irregularly distributed spatial samples.Figure 8 shows the method applied to a 3 partition problem, with 15 sample locations, where the RJ-MCMC was run allowing for up to 7 partitions. The results show that we can recover the correct number of partitions with high probability, with the correct allocation of samples in each partition, and also a good representation of the thermal history within each partition. Note that here we chose thermal histories for each partition that are distinct and relatively easy for the method to identify, as our aim is to demonstrate the ability of the methodology to identify the form and spread in the inferred partition structure. The spread in the solutions for 3 partitions is also an indication of the uncertainty about the location of the partition boundaries. With the implementation we have used here, all partitions geometries that correctly allocate the samples will use the same thermal histories (and have the same likelihood), then it is clear that the range of the location of the boundaries is determined by the location of the sample locations, subject to the requirement that the boundaries are straight. So in the top right of Figure 8. there is a relatively large spread in the location of the boundaries, as there are no sample locations there, but the spread is constrained by the samples around x = 50, and y = 70–80. Similarly, the< SRVLWLRQ; SRVLWLRQ Figure 7. The geometry of 2D V oronoi cells, and their centers (indicated by the stars). The boundaries of each cell is de ﬁ ned as the perpendicular bisectors of the lines joining its centre to all other centers. Any sample location (ﬁ lled circles) that falls within a given cell is closer to the centre of that cell than any other centre. The linear boundaries are used in our modeling approach to characterize spatial discontinuities,although their number and positionsare unknown.。