图像融合技术中英文对照外文翻译文献

合集下载

摄影测量外文文献及译文

数字摄影测量和激光扫描文化遗产文档的一体化奥尔莎娃卜克赫*·叶海亚哈拉·诺伯特德国斯图加特大学摄影测量学院斯图加特兄妹大街24D,D-70174yahyashaw@ifp.uni-stuttgart.de委员会V4工作组关键词：文化遗产，摄影测量，激光扫描，融合，线性表面特征，半自动化摘要：本文在结合地面激光扫描和近景摄影测量文物古迹文件的潜力进行了讨论。

除了改进了的几何模型，一体化的目的是支持在历史场景中如边缘和缝隙的线性特性的视觉质量。

虽然激光扫描仪提供了非常丰富的表面细节，但是它并没有提供足够的数据来构造被扫描物体的所有表面特征的轮廓，即使他们在现实中都有清晰的轮廓。

在地物边缘和线性上的信息是基于图像的分析。

为此目的一个基于图像数据的集成分割过程将支持对象的几何信息从激光数据的提取。

该方法适用于基于图像的半自动化技术以填补激光扫描数据的空白，并添加新的细节，这就要求建立现场体积更现实的感知。

实验的调查与实施是基于从艾尔卡兹尼宝库获得的数据，一个在约旦佩特拉著名的纪念碑。

1、绪论在文物古迹的文档中经常需要生成历史建筑物的三维模型。

即旅游的目的是为学生和研究人员提供教育资源。

在这些模型生成过程中，要求高几何精度，所有信息的可用性，模型中的尺寸大小和影像写实效率必须通过不用的数据收集方法来满足[埃尔-哈基姆等人，2002]。

近景摄影测量是一个在遗产文档的背景下经常采用且广为接受的技术[格鲁恩等人，2002和德贝韦茨，1996]。

在过去的十年中，这些传统的陆地方法也受益于这样一个事实:用单架相机进行数字图像采集现在是可行的。

因此，摄影测量数据采集的效率可以通过基于数字图像处理的自动工具的集成大大改善。

此外，激光扫描已成为高质量的3D模型的文化遗产和历史建筑生成三维数据采集的标准工具[伯勒尔和马布斯，2002]。

这些系统允许快速和可靠的生成根据反射的光脉冲的运行时数以百万计的三维点。

图像的分割和配准中英文翻译

外文文献资料翻译：李睿钦指导老师：刘文军Medical image registration with partial dataSenthil Periaswamy，Hany FaridThe goal of image registration is to find a transformation that aligns one image to another. Medical image registration has emerged from this broad area of research as a particularly active field. This activity is due in part to the many clinical applications including diagnosis, longitudinal studies, and surgical planning, and to the need for registration across different imaging modalities (e.g., MRI, CT, PET, X-ray, etc.). Medical image registration, however, still presents many challenges. Several notable difficulties are (1) the transformation between images can vary widely and be highly non-rigid in nature; (2) images acquired from different modalities may differ significantly in overall appearance and resolution; (3) there may not be a one-to-one correspondence between the images (missing/partial data); and (4) each imaging modality introduces its own unique challenges, making it difficult to develop a single generic registration algorithm.In estimating the transformation that aligns two images we must choose: (1) to estimate the transformation between a small number of extracted features, or between the complete unprocessed intensity images; (2) a model that describes the geometric transformation; (3) whether to and how to explicitly model intensity changes; (4) an error metric that incorporates the previous three choices; and (5) a minimization technique for minimizing the error metric, yielding the desired transformation.Feature-based approaches extract a (typically small) number of corresponding landmarks or features between the pair of images to be registered. The overall transformation is estimated from these features. Common features include corresponding points, edges, contours or surfaces. These features may be specified manually or extracted automatically. Fiducial markers may also be used as features;these markers are usually selected to be visible in different modalities. Feature-based approaches have the advantage of greatly reducing computational complexity. Depending on the feature extraction process, these approaches may also be more robust to intensity variations that arise during, for example, cross modality registration. Also, features may be chosen to help reduce sensor noise. These approaches can be, however, highly sensitive to the accuracy of the feature extraction. Intensity-based approaches, on the other hand, estimate the transformation between the entire intensity images. Such an approach is typically more computationally demanding, but avoids the difficulties of a feature extraction stage.Independent of the choice of a feature- or intensity-based technique, a model describing the geometric transform is required. A common and straightforward choice is a model that embodies a single global transformation. The problem of estimating a global translation and rotation parameter has been studied in detail, and a closed form solution was proposed by Schonemann. Other closed-form solutions include methods based on singular value decomposition (SVD), eigenvalue-eigenvector decomposition and unit quaternions. One idea for a global transformation model is to use polynomials. For example, a zeroth-order polynomial limits the transformation to simple translations, a first-order polynomial allows for an affine transformation, and, of course, higher-order polynomials can be employed yielding progressively more flexible transformations. For example, the registration package Automated Image Registration (AIR) can employ (as an option) a fifth-order polynomial consisting of 168 parameters (for 3-D registration). The global approach has the advantage that the model consists of a relatively small number of parameters to be estimated, and the global nature of the model ensures a consistent transformation across the entire image. The disadvantage of this approach is that estimation of higher-order polynomials can lead to an unstable transformation, especially near the image boundaries. In addition, a relatively small and local perturbation can cause disproportionate and unpredictable changes in the overall transformation. An alternative to these global approaches are techniques that model the global transformation as a piecewise collection of local transformations. For example, the transformation between each local region may bemodeled with a low-order polynomial, and global consistency is enforced via some form of a smoothness constraint. The advantage of such an approach is that it is capable of modeling highly nonlinear transformations without the numerical instability of high-order global models. The disadvantage is one of computational inefficiency due to the significantly larger number of model parameters that need to be estimated, and the need to guarantee global consistency. Low-order polynomials are, of course, only one of many possible local models that may be employed. Other local models include B-splines, thin-plate splines, and a multitude of related techniques. The package Statistical Parametric Mapping (SPM) uses the low-frequency discrete cosine basis functions, where a bending-energy function is used to ensure global consistency. Physics-based techniques that compute a local geometric transform include those based on the Navier–Stokes equilibrium equations for linear elastici and those based on viscous fluid approaches.Under certain conditions a purely geometric transformation is sufficient to model the transformation between a pair of images. Under many real-world conditions, however, the images undergo changes in both geometry and intensity (e.g., brightness and contrast). Many registration techniques attempt to remove these intensity differences with a pre-processing stage, such as histogram matching or homomorphic filtering. The issues involved with modeling intensity differences are similar to those involved in choosing a geometric model. Because the simultaneous estimation of geometric and intensity changes can be difficult, few techniques build explicit models of intensity differences. A few notable exceptions include AIR, in which global intensity differences are modeled with a single multiplicative contrast term, and SPM in which local intensity differences are modeled with a basis function approach.Having decided upon a transformation model, the task of estimating the model parameters begins. As a first step, an error function in the model parameters must be chosen. This error function should embody some notion of what is meant for a pair of images to be registered. Perhaps the most common choice is a mean square error (MSE), defined as the mean of the square of the differences (in either feature distance or intensity) between the pair of images. This metric is easy to compute and oftenaffords simple minimization techniques. A variation of this metric is the unnormalized correlation coefficient applicable to intensity-based techniques. This error metric is defined as the sum of the point-wise products of the image intensities, and can be efficiently computed using Fourier techniques. A disadvantage of these error metrics is that images that would qualitatively be considered to be in good registration may still have large errors due to, for example, intensity variations, or slight misalignments. Another error metric (included in AIR) is the ratio of image uniformity (RIU) defined as the normalized standard deviation of the ratio of image intensities. Such a metric is invariant to overall intensity scale differences, but typically leads to nonlinear minimization schemes. Mutual information, entropy and the Pearson product moment cross correlation are just a few examples of other possible error functions. Such error metrics are often adopted to deal with the lack of an explicit model of intensity transformations .In the final step of registration, the chosen error function is minimized yielding the desired model parameters. In the most straightforward case, least-squares estimation is used when the error function is linear in the unknown model parameters. This closed-form solution is attractive as it avoids the pitfalls of iterative minimization schemes such as gradient-descent or simulated annealing. Such nonlinear minimization schemes are, however, necessary due to an often nonlinear error function. A reasonable compromise between these approaches is to begin with a linear error function, solve using least-squares, and use this solution as a starting point for a nonlinear minimization.译文：部分信息的医学图像配准Senthil Periaswamy，Hany Farid图像配准的目的是找到一种能把一副图像对准另外一副图像的变换算法。

image fusion

基于KSVD图像融合1.研究背景及意义随着科学技术不断进步,数字图像处理技术已经被广泛应用于复杂工业生产线、卫生医疗、智能交通、公共安全监控以及卫星遥感等诸多领域。

数字图像处理不仅拓宽人类视觉范围,同时也为人类生产生活提供一项重要工具。

随着数字图像处理技术的发展,研究内容不断拓宽并涉及到多个学科领域。

数字图像处理以及发展成为一门前沿学科,得到了众多学科的研究者和各行工程技术人员的广泛关注和深入研究。

在图像处理各个研究领域中,图像信息的表示方法一直是其最根本问题。

一个有效的图像信息表示方法应该符合自然规律,能精确地揭示出图像内在特性。

由于人类对图像信息需求日益增加,图像数据量随之极速增长,给数字图像处理技术带来了前所未有的发展机遇,同时也给研究人员和工程技术人员带来了巨大的挑战。

因此,研究有效的图像表示方法具有重要意义。

图像稀疏表示是一种新型的图像表示理论。

通过稀疏表示,图像可以简洁地表示成字典中少数原子的线性组合形式[1]。

这些稀疏表示系数和对应的原子充分地揭示了图像的内在本质,同时也符合人类的视觉特性[2]。

稀疏表示以其优越的性能,得到了国内外研究人员密切关注。

早期的稀疏表示研究主要针对正交基的稀疏表示,比如傅立叶变换、离散余弦变换以及主成分分析等。

随着Wavelet变换的提出,基于多尺度几何分析的稀疏表示研究得到了空前的发展,比Ridgelet、Curvelet、Contourlet、Complex Wavelet 以及Bandelet等。

然而正交基或者多尺度几何分析无法对图像中所有的特征信息进行有效的稀疏表示。

学习字典是稀疏表示理论研究的一项最新进展。

通过大量样本学习得的字典能对图像中的各种特征信息进行自适应的稀疏表示,极大地提高了稀疏表示性能。

目前,稀疏表示已经被广泛运用于图像处理的各个领域,如图像去噪图像修复[12]、图像去模糊[13]、图像超分辨率分析[14_15]、图像融合[16-17]以及端元提取[18]等。

图像科学综述外文文献外文翻译英文文献

附录图像科学综述近几年来，图像处理与识别技术得到了迅速的发展，现在人们己充分认识到图像处理和识别技术是认识世界、改造世界的重要手段。

目前它己应用于许多领域，成为2l世纪信息时代的一门重要的高新科学技术。

1.图像处理与识别技术概述图像就是用各种观测系统以不同形式和手段观测客观世界而获得的，可以直接或间接作用于人眼而产生视知觉的实体。

科学研究和统计表明，人类从外界获得的信息约有75%来自于视觉系统，也就是说，人类的大部分信息都是从图像中获得的。

图像处理是人类视觉延伸的重要手段，可以便人们看到任意波长上所测得的图像。

例如，借助伽马相机、x光机，人们可以看到红外和超声图像：借助CT可看到物体内部的断层图像；借助相应工具可看到立体图像和剖视图像。

1964年，美国在太空探索中拍回了大量月球照片，但是由于种种环境因素的影响，这些照片是非常不清晰的，为此，美国喷射推进实验室(JPL)使用计算机对图像进行处理，使照片中的重要信息得以清晰再现。

这是这门技术发展的重要里程碑。

此后，图像处理技术在空间研究方面得到广泛的应用。

总体来说，图像处理技术的发展大致经历了初创期、发展期、普及期和实用化期4个阶段。

初创期开始于20世纪60年代，当时的图像采用像素型光栅进行扫描显示，大多采用巾、大型机对其进行处理。

在这一时期，由于图像存储成本高，处理设备造价高，因而其应用面很窄。

20世纪70年代进入了发展期，开始大量采用中、小型机进行处理，图像处理也逐渐改用光栅扫描显示方式，特别是出现了CT和卫星遥感图像，对图像处理技术的发展起到了很好的促进作用。

到了20世纪80年代，图像处理技术进入普及期，此时购微机已经能够担当起图形图像处理的任务。

VLSL的出现更使得处理速度大大提高，其造价也进一步降低，极大地促进了图形图像系统的普及和应用。

20世纪90年代是图像技术的实用化时期，图像处理的信息量巨大，对处理速度的要求极高。

21世纪的图像技术要向高质量化方面发展，主要体现在以下几点：①高分辨率、高速度，图像处理技术发展的最终目标是要实现图像的实时处理，这在移动目标的生成、识别和跟踪上有着重要意义：②立体化，立体化所包括的信息最为完整和丰富，数字全息技术将有利于达到这个目的；②智能化，其目的是实现图像的智能生成、处理、识别和理解。

3D打印外文文献翻译译文

文献出处:Paul G. 3D printing technology and its application[J]. Anatomical sciences education, 2015, 10(3): 430-450.原文3D printing technology and its applicationPaul GAbstract3D printing technology in the industrial product design, especially the application of digital product model manufacturing is being a trend and hot topic.Desktop level gradually mature and application of 3D printing devices began to promote the rise of the Global 3D printing market, Global industrial Analysis pany (Global Industry Analysis Inc) research report predicts Global 3D printing market in 2018 will be $2.99 billion. Keywords: 3D printing;Application; Trend13D printing and 3D printers3D printing and 3D printing are two entirely different concepts.3D printing is separated into different angles the picture of the red, blue two images, then the two images according to the regulation of parallax distance overprint together, using special glasses to create the 3D visual effect, or after special treatment, the picture printed directly on the special grating plate, thus rendering 3D visual effect of printing technology.And 3D printing refers to the 3D ink-jet printing technology, stacked with hierarchical processing forms, print increase step by step a material to generate a 3D entity, meet with 3D models, such as laser forming technology of manufacturing the same real 3D object digital manufacturing technology.3D printers, depending on thetechnology used by its working principle can be divided into two categories:1.1 3D printer based on 3D printing technologyBased on 3D printing technology of 3D printer, by stored barrels out a certain amount of raw material powder, powder on processing platform is roller pushed into a thin layer, then the print head in need of forming regional jet is a kind of special glue.At this time, met the adhesive will rapidly solidified powder binder, and does not meet the adhesive powder remain loose state.After each spray layer, the processing platform will automatically fall a bit, according to the result of puter chip cycle, until the real finished.After just remove the outer layer of the loose powder can obtain required for manufacturing three-dimensional physical.1.2 3D printers based on fused deposition manufacturing technologyBased on fused deposition manufacturing technology of the working principle of 3D printer is first in the control software of 3D printers into physical data generated by CAD and treated generated to support the movement of materials and thermal spray path.Then hot nozzle will be controlled by puter according to the physical section contour information in printed planar motion on the plane, at the same time by thermoplastic filamentous material for wire agency sent to the hot shower, and after the nozzle to add heat and melt into a liquid extrusion, and spraying in the corresponding work platform.Spray thermoplastic material on the platform after rapid cooling form the outline of a thickness of 0.1 mm wafer, forming a 3D printing section.The process cycle, load, decrease of bench height then layers of cladding forming stacked 3D printing section, ultimately achieve the desired three-dimensional object.2 The application of 3D printing needsThe 3D printing technology support for a variety of materials, can be widely used in jewelry, footwear, industrial design, construction, automotive, aerospace, dental, medical, and even food, etc. Different areas., according to the requirements of application targets used by material with resin, nylon, gypsum, ABS, polycarbonate (PC) or food ingredients, etc.3D printers of rapid prototyping technology has a distinct advantage in the market, the huge potential in the production application, hot applications outlined below.2.1 Industrial applications"Air cycling" is located in Bristol, UK the European aeronautic defense and Space pany using 3D printers, application of 3D printing technology to create the world's first print bike.The bike to use as strong as steel and aluminum alloy material of nylon, the weight is 65% lighter than metal materials.More interestingly, "air bike", chain wheels and bearings are printed at a time, without the original manufacture parts first, and then the parts together of assembly process, after printing, bicycles will be able to move freely.Bicycle manufacturing process like printing discontinuous in graphic print as simple lines, 3D printer can print out the object space is not connected to each other. 2.2 Medical applicationsIn medicine, the use of 3D printing will two-photon polymer and biological functional materials bination modified into the capillaries, not only has good flexibility and patibility of human body, also can be used to replace the necrosis of blood vessels, bined with artificial organs, partly replacing experimental animals in drugdevelopment.Biotechnology in Germany in October 2011 show, Biotechnical Fair), using 3D printers print artificial blood capillary to attract the attention of the participants, these artificial capillary has been applied in clinical medicine.2.3 application of daily life"3D food printer" is developed by Cornell University in New York, the United States food manufacturing equipment.The "3D food printer" used similar routine puter printers, the working principle of ingredients and ingredients in the container (cartridge) in advance only need to enter the required recipe, by supporting the CAD software can keep the food "print out".For many chefs, the new kitchen cooking means that they can create new dishes make food more individuality, higher food ing the "3D food printer" making food, from raw materials to finished products can significantly reduce the link, so as to avoid the pollution in the links of food processing, transportation, packing and so on and preservation, etc.Because of the cooking materials and ingredients must be placed in the printer, so food raw materials must be liquid or other can "print" state.2.4 IT applicationsRecently, a group of researchers in Disney's use of 3D printing in the same effect with the organic glass high pervious to light plastic, at low cost to print out the LCD screen with a variety of sensors, realize the new breakthrough in the IT ing 3D printing light pipe can produce high-tech international chess; the chess pieces can detect and display the current location.Although the monochrome screen pared with in the daily life, rich and colorful display some insignificant, but it hasa 3D printing the advantages of low cost, simple manufacturing process.In addition to the display screen, the use of 3D printing will also be able to print out a variety of sensors.These sensors can be through the stimulation such as infrared light to detect touch, vibration, and the results output.3D printing will create more for life and wisdom city of IT applications.3 The development trend of 3D printing technology3D printing technology continues to develop, greatly reduce the cost of the already from research and development of niche space into the mainstream market, the momentum of development is unstoppable, has bee a widespread concern and civil market rapidly emerging new areas.3D printing production model, the application of gifts, souvenirs and arts and crafts, greatly attracted social attention and investment, development speed, the market began to quantity and qualitative leap.It is predicted that in 2020, 3D printing products will account for 50% of the total production.In the next 10 years on the puter to plete the product design blueprint, gently press the "print" key, 3D printers can bit by bit with the designed model.Now some foundry enterprises began to develop selective laser sintering, 3D printer and its application to plex casting time reduced from 3 months to 10 days.Engine manufacturers through 3D printing, large six-cylinder diesel engine cylinder head of sand core development cycles, reduced to 1 week from the past 5 months.The biggest advantage of 3D printing is to expand the designers’ imagination space.As long as you can on the puter design into 3D graphics, whether is different styles of dress, elegant handicraft, or personalized car, as long as can solve the problem of material, can achieve 3D printing.With 3D printing technology breakthroughs, constantly improved increasingly, the new material of 3D printing in improving speed, size, its technology is constantly optimized, expanding application fields, especially in the field of graphic art potential, producer of the concept of 3D model can better municate ideas or solutions, a picture can be more than a few hundred or even thousands of words of description. Professionals believe that personalized or customized 3D printing can be envisioned a real-time 3D model in the eyes, can quickly improve product, growth will be more than imagine, will shape the future of social applications.3D printing technology to eliminate traditional production line, shorten the production cycle, greatly reduce production waste, raw materials consumption will be reduced to a fraction of the original.3D printing is not only cost savings, improve production precision, also will make up for the inadequacy of traditional manufacturing, and will rise rapidly in the civilian market, thus opening a new era of manufacturing, bring new opportunities and hope for the printing industry.译文3D打印技术及其应用Paul G摘要3D打印技术在工业产品设计，特别是数字产品模型制造领域的应用正在成为一种潮流和热门话题。

图像融合报告

研究内容
• 目前，国内外学术界在图像融合领域已取得了丰硕的研究成果。在理论和方法方面主要有主成分分析法、演化计算法、神经网络法、小波变换法和模糊逻辑等图像融合方法：在融合效果客观评价方面，有shannon提出的信息熵、交叉熵、互信息、联合熵以及均方根误差、均值、标准差、平均误差、偏差、相对偏差、空间频率、灰度标准差、相关系数、信噪比和峰值信噪比等客观评价标准。
谢谢敬请批评指正
• 例如，对于聚集不同的多幅对准图像，如果图像中的一些景物在其中的一幅图像中很清晰，而在别的图像中较为模糊的话，那么可以采取图像融合的方法获得一幅新的图像，融合后的图像比融合前的任意一幅图像具有更多的信息量。
图一：左聚焦图像
图二：右聚焦图像
图三：融合图像
研究动态
• 图像融合近年来在很多领域都有了越来越多的应用和发展。在医学上，医学图像的配准和融合为医生提供更加丰富、可靠的图像依据，以便更加直观地利用这些信息并结合临床经验做出准确诊断；随着遥感技术的发展，高空间分辨率、波谱分辨率和时间分辨率的图像数据已经问世；根据各种不同类型的多光谱数据信息之间存在着重叠和互补，利用图像融合技术对遥感图像进行融合，近年来在土地动态监测、防洪防灾和军事侦察方面得到应用。
文章理解
• 本文中，我们提出一种在空间域实行的并且适合多聚焦图像的融合。直观的想法是图像被人们理解为区域或对象级别用来代替像素级别。这包含三个步骤：图像的分割、区域的明确计算和融合图像的构建。第一步，通过简单的求均值的方法合并两个源图像。第二步，被融合的图像使用归一化分割(Ncut) 方法分割，根据这个结果图像被划分开。第三步，两个源图像相匹配的区域使用空间频率的方法相融合。

【精品】人脸识别文献翻译(中英双文)

人脸识别文献翻译(中英双文)4 Two-dimensional Face Recognition4.1 Feature LocalizationBefore discussing the methods of comparing two facial images we now take a brief look at some at the preliminary processes of facial feature alignment. This process typically consists of two stages: face detection and eye localization. Depending on the application, if the position of the face within the image is known beforehand (for a cooperative subject in a door access system for example) then the face detection stage can often be skipped, as the region of interest is already known. Therefore, we discuss eye localization here, with a brief discussion of face detection in the literature review .The eye localization method is used to align the 2D face images of the various test sets used throughout this section. However, to ensure that all results presented are representative of the face recognition accuracy and not a product of the performance of the eye localization routine, all image alignments are manually checked and any errors corrected, prior to testing and evaluation.We detect the position of the eyes within an image using a simple template based method. A training set of manually pre-aligned images of faces is taken, and each image cropped to an area around both eyes. The average image is calculated and used as a template.Figure 4-1 The average eyes. Used as a template for eye detection.Both eyes are included in a single template, rather than individually searching for each eye in turn, as the characteristic symmetry of the eyes either side of the nose, provide a useful feature that helps distinguish between the eyes and other false positives that may be picked up in the background. Although this method is highly susceptible to scale (i.e. subject distance from the camera) and also introduces the assumption that eyes in the image appear near horizontal. Some preliminary experimentation also reveals that it is advantageous to include the area of skin just beneath the eyes. The reason being that in some cases the eyebrows can closely match the template, particularly if there are shadows in the eye-sockets, but the area of skin below the eyes helps to distinguish the eyes from eyebrows (the area just below the eyebrows contain eyes, whereas the area below the eyes contains only plain skin).A window is passed over the test images and the absolute difference taken to that of the average eye image shown above. The area of the image with the lowest difference is taken as the region of interest containing the eyes. Applying the same procedure using a smaller template of the individual left and right eyes then refines each eye position.This basic template-based method of eye localization, although providing fairly precise localizations, often fails to locate the eyes completely. However, we are able to improve performance by including a weighting scheme.Eye localization is performed on the set of training images, which is then separated into two sets: those in which eye detection was successful; and those in which eye detection failed. Taking the set of successful localizations we compute the average distance from the eye template (Figure 4-2 top). Note that the image is quite dark, indicating that the detected eyes correlate closely to the eye template, as we would expect. However, bright points do occur near the whites of the eye, suggesting that this area is often inconsistent, varying greatly from the average eye template.Figure 4-2 – Distance to the eye template for successful detections (top) indicating variance due to noise and failed detections (bottom) showing credible variance due tomiss-detected features.In the lower image (Figure 4-2 bottom), we have taken the set of failed localizations(images of the forehead, nose, cheeks, background etc. falsely detected by the localization routine) and once again computed the average distance from the eye template. The bright pupils surrounded by darker areas indicate that a failed match is often due to the high correlation of the nose and cheekbone regions overwhelming the poorly correlated pupils. Wanting to emphasize the difference of the pupil regions for these failed matches and minimize the variance of the whites of the eyes for successful matches, we divide the lower image values by the upper image to produce a weights vector as shown in Figure 4-3. When applied to the difference image before summing a total error, this weighting scheme provides a much improved detection rate.Figure 4-3 - Eye template weights used to give higher priority to those pixels that bestrepresent the eyes.4.2 The Direct Correlation ApproachWe begin our investigation into face recognition with perhaps the simplest approach, known as the direct correlation method (also referred to as template matching by Brunelli and Poggio) involving the direct comparison of pixel intensity values taken from facial images. We use the term ‘Direct Correlation’ to encompass all techniques in which face images are compared directly, without any form of image space analysis, weighting schemes or feature extraction, regardless of the distance metric used. Therefore, we do not infer that Pearson’s correlation is applied as the similarity function (although such an approach would obviously come under our definition of direct correlation). We typically use the Euclidean distance as our metric in these investigations (inversely related to Pearson’s correlation and can be considered as a scale and translation sensitive form of image correlation), as this persists with the contrast made between image space and subspace approaches in later sections.Firstly, all facial images must be aligned such that the eye centers are located at two specified pixel coordinates and the image cropped to remove any background information. These images are stored as grayscale bitmaps of 65 by 82 pixels and prior to recognition converted into a vector of 5330 elements (each element containing the corresponding pixel intensity value). Each corresponding vector can be thought of as describing a point within a 5330 dimensional image space. This simple principle can easily be extended to much larger images: a 256 by 256 pixel image occupies a single point in 65,536-dimensional image space and again, similar images occupy close points within that space. Likewise, similar faces are located close together within the image space, while dissimilar faces are spaced far apart. Calculating the Euclidean distance d, between two facial image vectors (often referred to as the query image q, and gallery image g), we get an indication of similarity. A threshold is then applied to make the final verification decision.4.2.1 Verification TestsThe primary concern in any face recognition system is its ability to correctly verify a claimed identity or determine a person's most likely identity from a set of potential matches in a database. In order to assess a given system’s ability to perform these tasks, a variety of evaluation methodologies have arisen. Some of these analysis methods simulate a specific mode of operation (i.e. secure site access or surveillance), while others provide a more mathematical description of data distribution in some classification space. In addition,the results generated from each analysis method may be presented in a variety of formats. Throughout the experimentations in this thesis, we primarily use the verification test as our method of analysis and comparison, although we also use Fisher’s Linear Discriminate to analyze individual subspace components in section 7 and the identification test for the final evaluations described in section 8. The verification test measures a system’s ability to correctly accept or reject the proposed identity of an individual. At a functional level, this reduces to two images being presented for comparison, for which the system must return either an acceptance (the two images are of the same person) or rejection (the two images are of different people). The test is designed to simulate the application area of secure site access. In this scenario, a subject will present some form of identification at a point of entry, perhaps as a swipe card, proximity chip or PIN number. This number is then used to retrieve a stored image from a database of known subjects (often referred to as the target or gallery image) and compared with a live image captured at the point of entry (the query image). Access is then granted depending on the acceptance/rejection decision.The results of the test are calculated according to how many times the accept/reject decision is made correctly. In order to execute this test we must first define our test set of face images. Although the number of images in the test set does not affect the results produced (as the error rates are specified as percentages of image comparisons), it is important to ensure that the test set is sufficiently large such that statistical anomalies become insignificant (for example, a couple of badly aligned images matching well). Also, the type of images (high variation in lighting, partial occlusions etc.) will significantly alter the results of the test. Therefore, in order to compare multiple face recognition systems, they must be applied to the same test set.However, it should also be noted that if the results are to be representative of system performance in a real world situation, then the test data should be captured under precisely the same circumstances as in the application environment. On the other hand, if the purpose of the experimentation is to evaluate and improve a method of face recognition, which may be applied to a range of application environments, then the test data should present the range of difficulties that are to be overcome. This may mean including a greater percentage of ‘difficult’ images than would be expected in the perceived operating conditions and hence higher error rates in the results produced. Below we provide the algorithm for executing the verification test. The algorithm is applied to a single test set of face images, using a single function call to the face recognition algorithm: Compare Faces (FaceA, FaceB). This call is used to compare two facial images, returning a distance score indicating how dissimilar the two face images are: the lower the score the more similar thetwo face images. Ideally, images of the same face should produce low scores, while images of different faces should produce high scores.Every image is compared with every other image, no image is compared with itself and no pair is compared more than once (we assume that the relationship is symmetrical). Once two images have been compared, producing a similarity score, the ground-truth is used to determine if the images are of the same person or different people. In practical tests this information is often encapsulated as part of the image filename (by means of a unique person identifier). Scores are then stored in one of two lists: a list containing scores produced by comparing images of different people and a list containing scores produced by comparing images of the same person. The final acceptance/rejection decision is made by application of a threshold. Any incorrect decision is recorded as either a false acceptance or false rejection. The false rejection rate (FRR) is calculated as the percentage of scores from the same people that were classified as rejections. The false acceptance rate (FAR) is calculated as the percentage of scores from different people that were classified as acceptances.These two error rates express the inadequacies of the system when operating at a specific threshold value. Ideally, both these figures should be zero, but in reality reducing either the FAR or FRR (by altering the threshold value) will inevitably result in increasing the other. Therefore, in order to describe the full operating range of a particular system, we vary the threshold value through the entire range of scores produced. The application of each threshold value produces an additional FAR, FRR pair, which when plotted on a graph produces the error rate curve shown below.Figure 4-5 - Example Error Rate Curve produced by the verification test.The equal error rate (EER) can be seen as the point at which FAR is equal to FRR. This EER value is often used as a single figure representing the general recognitionperformance of a biometric system and allows for easy visual comparison of multiple methods. However, it is important to note that the EER does not indicate the level of error that would be expected in a real world application. It is unlikely that any real system would use a threshold value such that the percentage of false acceptances was equal to the percentage of false rejections. Secure site access systems would typically set the threshold such that false acceptances were significantly lower than false rejections: unwilling to tolerate intruders at the cost of inconvenient access denials.Surveillance systems on the other hand would require low false rejection rates to successfully identify people in a less controlled environment. Therefore we should bear in mind that a system with a lower EER might not necessarily be the better performer towards the extremes of its operating capability.There is a strong connection between the above graph and the receiver operating characteristic (ROC) curves, also used in such experiments. Both graphs are simply two visualizations of the same results, in that the ROC format uses the True Acceptance Rate (TAR), where TAR = 1.0 – FRR in place of the FRR, effectively flipping the graph vertically. Another visualization of the verification test results is to display both the FRR and FAR as functions of the threshold value. This presentation format provides a reference to determine the threshold value necessary to achieve a specific FRR and FAR. The EER can be seen as the point where the two curves intersect.Figure 4-6 - Example error rate curve as a function of the score thresholdThe fluctuation of these error curves due to noise and other errors is dependant on the number of face image comparisons made to generate the data. A small dataset that only allows for a small number of comparisons will results in a jagged curve, in which large steps correspond to the influence of a single image on a high proportion of the comparisons made. A typical dataset of 720 images (as used in section 4.2.2) provides 258,840verification operations, hence a drop of 1% EER represents an additional 2588 correct decisions, whereas the quality of a single image could cause the EER to fluctuate by up to4 二维人脸识别4.1 特征定位在讨论两幅人脸图像的比较之前，我们先简单看下面部图像特征定位的初始过程。

一种有效地自动图像增强方法-外文文献翻译

附录A：外文文献An Effective Automatic Image Enhancement MethodABSTRACT Otsu method is proper to deal with two conditions: (1) two or more classes with distintive gray-values respectively; (2) classes without distinctive gray-values, but with similar areas. However, when the gray-value differences among classes are not so distinct, and the object is small relative to backgroud, the separabilities among classes are insufficient. In order to overcome the above problem, this paper presents an improved spatial low-pass filter with a parameter and presents an unsupervised method of automatic parameter selection for image enhancement based on Otsu method. This method combines image enhancement with image segmentation as one procedure through a discriminant criterion. The optimal parameter of the filter is selected by the discriminant criterion given to maximize the separability between object and background. The optimal threshold for image segmentation is computed simultaneously. The method is used to detect the surface defect of container. Experiments illustrate the validity of the method.KEYWORDS image processing; automated image enhancement; image segmentation; automated visual inspection1 IntroductionAutomated visual inspection of cracked container (A VICC) is a practical application of machine vision technology. To realize our goal, four essential operations must be dealt with – image preprocessing, object detection, feature description and final cracked object classification. Image enhancement is to provide a result more suitable than original image for specific applications. In this paper the objective of enhancement, followed by image segmentation, is to obtain an image with a higher content about the object interesting with less content about noise and background. Gonzalez [1] discusses that image enhancement approaches fall into two main categories, in that spatial domain and frequency domain methods. Burton [2] applies image averaging technique to face recognition system, making it able to recognise familiar faces easily across large variations in image quality. Centeno [3] proposes an adaptive image enhancement algorithm, which reverse the processing order of image enhancement and segmentation in order to avoid sharpening noise and blurring borders. Munteanu [4] applies artificial intelligence technology to image enhancement providing denoising function. In addition to spatial domain methods, frequency domain processing techniques are based on modifying the Fourier transform of an image. Bakir [5] discusses image enhancement used for medical image processing in frequency space. Besides, Wang [6] presents a global multiscale analysis of images based on Haar wavelet technique for image denoising. Recently, Agaian [7] proposes image enhancement methods based on the properties of the logarithmic transform domain histogram and histogram equalization. We apply spatial processing here in order to guarantee the real-time and sufficient accuracy property of the system.Segmentation is discussed in [8]. The most simplest, represented by Otsu [9], is method using only the gray level histogram analysis to maximize the separability ofthe resultant classes. Kuntimad [10] describes a method for segmenting digital images using pulse coupled neural networks (PCNN). Salzenstein [11] deals with a comparison of recent statistical models on fuzzy Markov random fields and chains for multispectral image segmentation. Due to ill-defined, there is no unique segmentation of an image. Evaluation of segmentation algorithms thus far has been largely subjective. Ranjith [12] demonstrates how a recently proposed measureof similarity can be used to perform a quantitative comparison among image segmentation algorithms.In this paper, we present an improved spatial low-pass filter with a tunable parameter in the mask making all elements no longer sum to unity. The optimal parameter for the filter can be determined by the improved discriminant criterion based on the one mentioned in [9]. Convolving images with this mask, the background uninteresting can be removed easily leaving the object intact to some extent. The remainder of the paper is organized as follows: Sect.2 presents how to enhance an input image in theory and presents the algorithm. Sect.3 illustrates the validity of the method in Sect.2. Finally, conclusion and discussion are presented in Sect.4.2Image Enhancement2.1 Analysis of Prior KnowledgeThe preprocessing quality influences the latter work directly, in that, feature description. Therefore, analysis for the characteristics related to input images should be presented. A standard image of cracked container is shown as Fig.1 (a). From the image, we see the cracked part occupies small region. Much noise, such as rust, shadow, smear etc, appears within the background. At a coarse glance, however, we find gray level of the hole is less than the other parts distinctly. Further study shows gray level of pixels, around the edge of the hole, is the minimal. Fig.1(b) displays the histogram of Fig.1(a) and edge of the hole is marked.Fig.1 (a) is a standard gray level image of a cracked container(b) is the histogram of Fig.1 (a), indicating gray level region of the hole’s edge.2.2 FormulationThis section discusses the principal content in the paper. Traditional spatial filter uses a 3×3 mask, the elements of which sum to unity, to convolve with the input image. This method can deal with some cases shown in equation (1):=+（1）(,)(,)(,)G x y I x y N x ywhere, I is image interested, N is Gaussian white noise, (x,y) denotes each pair ofcoordinates. N can be deliminated by blurring G. Our objective, however, is todeliminate not only white noise, but any other background uninteresting. Thusequation (1) is improved by equation (2):'(,)'(,)'(,)G x y I x y N x y =+ （2）where, I' is the object, N' consists of white noise and the other parts except I'.Fig.2 (c) displays an improved mask with a parameter Para. We will later illustratethat tuning Para properly is to facilitate object segmentation. The smoothing functionused is shown in equation (3):1111(,)'(,)(,)f m n I x y G x y F x m y n =-=-=++∑∑ （3）where, F(x,y) denotes the smoothing filter, in that, the mask shown as Fig.2 (c).Now, we only consider gray -level images, and define Mg as the maximum graylevel of an image. Then the following equations are set to distinguish the object ofinterest and the non -object :',,f f g fg f g I if I M I M if I M <⎧⎪=⎨ ≥⎪⎩ （4） In essence, convolution operator is a low -pass filtering process, which blurs animage by sliding a mask through the image and leaves the filtering response at theposition corresponding to central location of the mask. One question occurs that, whynot enhance value of each pixel by the same scale directly for the distinct gray levelsbetween the object and background. The reason is that it doesn’t consider therelationship of adjacent pixels. When individual noise point occur, enhancing its grayvalue directly will preserve the noise point. Experiments illustrate the latter methodwill leave lots of noise points can’t be removed, but the former method will not.Now, we will search the optimal parameter Para so as to maximize the separabilitybetween object and background. Let a given image be represented in L gray levels.The number of pixels at level i is denoted by ni and the total number of pixels by N.The probability of each level is denoted by Pi as follow [9]:1/,0,1li i i i i P n N P P ==≥=∑ （5）Suppose that we partition the pixels into two classes C0 and C1 (object andbackground) by a threshold at level k; C0 denotes pixels with levels [1, … , k], andC1 denotes pixels with levels [k+1, … , L]. Then the probabilities of classoccurrence w0,w1 and the class mean levels u0,u1 respectively,are given by01==ni i P k =ωω()∑ （6）11==1-ni i k P k =+ωω()∑ （7）001=()/=/k i i iP k k =μωμ()ω()∑ （8）111=()/=[/[1-Li i iP k k τ=μωμ-μ()]ω()]∑ （9）1=Li i iP τ=μ∑ （10）220001=[()]/K i i i p σ=-μω∑ （11） 221111=[()]/K i i i p σ=-μω∑ （12） The procedure of obtaining optimal para is based on obtaining optimal thresholdfor every filtered image. The optimal threshold is determined by maximizing theseparability between object and background using the following discriminant criterionmeasure as mentioned in [9] :22=/B T σση （13）where222200110110()()()B T T σ=ωμ-μ+ωμ-μ=ωωμ-μ （14）2B σand 2T σare the between class variance and the total variance of levels,respectively.221=()LTT i i i p σ=-μ∑ （15） The optimal threshold k* that maximizes n is selected in the following sequentialsearch by using equation (5)-(14):*1()max ()k Lk k ≤<η=η （16） Equation (16) is a discriminant criterion to select the gray level to maximize theseparability between object and background for a given picture. In this paper, aparameter Para is introduced, so the equations (6)~(9), (11)~(14), (16) isparameterized by Para and k and equations (10), (15) is parameterized by Para.Equation (13) can be rewritten as:22(,)/(,)B T para k para k σση= （17）Where 2T σ is not a constant any more and is not negligible, but somecomputation reduction can be operated on 2(,)Bpara k σand 2(,)T para k σ Here, what we want to acquire is the proper filtered picture including vividobject by searching parameter Para, the discriminant criterion used is improved asfollow :**1max()k Lk ≤<η= （18） In the above representation, parameter Para plays an important role, becauseoptimal Para makes the separability between object and background maximal, andmake Otsu segmentation method effective to segment small object from largebackground without distinctive gray -value between them, which can be observed laterfrom image histogram after image enhancement2.3 Existence Discussion of Para and k*The problem above is reduced to search for a threshold k* under the condition ofPara which maximizes the discriminant criterion in equation (18). The conditiondiscussed is the image with two class at least. Subsequently, the following two casesdon’t occur, in that,(1) w0 or w1 is zero originally without setting Para,in which thereis only one class;(2) w0 or w1 is zero with certain increasing Para,in which there isalso one class finally;The above two cases are decribed as:()()()()()A {Para,k |Para,k Para,k 0,1/2n 1*2n 1Para ,0k L 1}==++<<+<<-⎡⎤⎣⎦The case concerned is A,Thus,there is certain Para with proper k to makediscriminant criterion maximal.3 ExperimentsThis paper aims at monochrome images. First, the initial values are presented.Several values should be set: Para = 1/9 (beginning with an averaging filter for 3*3mask),Mg=L=256 (the range of gray -level, shown in equation (4) and (5)). Using thealgorithm above, we can compute each value of discriminant criterion (k*) computed,in that image I 'f that is most proper to be segmented is obtained. Here, we takeimages of cracked container for example. Fig.3 and Fig.4 show the experimentalprocess, in which the first rows show the filtered pictures, the second rows show thecorresponding histograms and the third rows show curves of correspondingdiscriminant criterion. The last columns are the optimal results of image enhancement,from which we can observe all the noise such as rust, shadow, smear, etc is almostremoved, leaving the cracked parts intact. Tab.1 and Tab.2 present the varying courseof corresponding to each parameter Para. Subsequently, the optimal Para is obtainby comparing all n(k*) computed, in that image I 'f that is most proper to besegmented is obtained. Here, we take images of cracked container for example. Fig.3and Fig.4 show the experimental process, in which the first rows show the filteredpictures, the second rows show the corresponding histograms and the third rows show curves of corresponding discriminant criterion. The last columns are the optimal results of image enhancement, from which we can observe all the noise such as rust, shadow, smear, etc is almost removed, leaving the cracked parts intact. Tab.1 and Tab.2 present the varying course of n(k*) along with Para respectively in terms of Fig.3 and Fig.4. When Para increases to 5/9 for both examples, n(k*) will reach their maximum and the most proper filtered image are obtained. When Para continue to increase, n(k*) will decrease and the integrity of the crack part will be destroyed seriously as last two columns in Fig.5.4ConclusionThis paper is to overcome the disadvantage of Otsu method in dealing with the condition: when the gray-value differences among classes are not so distinct, and the object is small relative to backgroud, the separabilities among classes are not sufficient. This paper proposes an effective image enhancement method in spatial domain. We define all the non-objects as noise, which urges us to design an effective filter to remove noise at one time. We propose an improved mask, according to the characteristic of gray level of cracked container, to make gray value of non-objectsabove a threshold and leave the object below it. The filtered image, most proper to be segmented, is computed automaticly by using the improved discriminant criterion in terms of the principle of maximize the separability between object interesting and background uninteresting. After the proposed image enhancement, subsequent operations can be carried on easily. Experiments illustrate the proposed method is valid and effective.译文：一种有效地自动图像增强方法1.简介基于集装箱裂纹的自动视觉检测(A VICC)是一个应用机器视觉技术。

测绘工程摄影测量中英文对照外文翻译文献

中英文对照外文翻译(文档含英文原文和中文翻译)基于活动熔岩流可见光和热感影像的倾斜摄影测量手持相机数码图像越来越多地被用于科学目的，特别是在非接触式测量是必需的地方。

然而，他们往往由显著的相机到对象的深度的变化和闭塞的斜视组成，复杂化定量分析。

在这里，我们报告通过斜摄影技术来确定基于地面的热感照相机方位（位置并指出方向），并产生在西西里岛火山的熔岩流场景信息。

从基于大众使用的消费级单反数码相机的多个图像来构造一个基于地面热图像的地形模型和参考。

我们展示在2004-2005年火山爆发期间收集的数据和对于基于像素的热感图像，使用派生曲面模型来查看距离改正（考虑大气衰减）。

对于查看约为100至400米的距离，更正导致在辐射强度系统的变化就值方面高达±3％，其是假设在一个统一的平均图像的观看距离计算而得。

关键词：近景摄影测量，埃特纳（Etna）火山，熔岩流，热感图像引言：为了提高我们了解了熔岩如何流动并最终停止，需要更先进的测量（熔岩）流动的发生和冷却的技术技术（Hidaka，等。

2005年）。

卫星数据经常用于监测活火山，但为获取准确的温度信息，可行的空间分辨率为30 m左右的中红外或近红外区域（陆地卫星TM和ASTER数据）或更大的热红外波段（60m的陆地卫星ETM +和ASTER的90米; Donegan和Flynn 2004; Pieri 和 Abrams 2004）。

这些尺寸都大大高于在熔岩流表面热性结构的空间变异，限制冷却模型从而限制卫星数据的使用。

最近手持式热成像仪的使用提供了一个潜在的解决方案，即通过实现1-103m的距离观看，增加空间分辨率10的一因子（1毫米左右）。

因此，手持式热成像仪获取的图像有潜力提供了丰富达到5的（熔岩）流动模型信息，并提供地面实况'信息用于卫星数据的解释（例如Calvari 等。

2005年）。

不过，关键的缺点，存在于大多数近距离数据集（强烈的斜视角，未知的成像几何关系与传感器的空间位置）通常阻碍地理参考和目前制约定量分析。

ChatGPT在PPT幻灯片设计中的图像处理与可视化呈现技术探索(英文中文双语版优质文档)

ChatGPT在PPT幻灯片设计中的图像处理与可视化呈现技术探索（英文中文双语版优质文档）Exploration of ChatGPT's image processing and visual presentation technology in PPT slide design (English and Chinese bilingual version high-quality documents) Exploration of image processing and visual presentation technology in PPT slide designSummaryThis research explores the image processing and visual presentation technology in ChatGPT-based PPT slide design. We introduce the application of ChatGPT in image processing and visualization, and propose some innovative technical explorations. Through experiments and user feedback, we evaluate the effectiveness and feasibility of these techniques in PPT slide design. Based on the research results, we discuss the advantages and challenges of ChatGPT in image processing and visualization, and look forward to future development directions.1 IntroductionPPT slides are a common information display tool, in which image processing and visual presentation play an important role in improving the slideshow effect and attractiveness. In recent years, artificial intelligence technologies such as ChatGPT have been applied to PPT slide design, providing new possibilities and innovative image processing and visualization technologies.2. Application of ChatGPT in image processingChatGPT can assist image processing tasks in PPT slide design. It has capabilities such as image recognition, semantic understanding, and idea generation, which can automatically generate and optimize image elements in slides.3. Application of ChatGPT in visual presentationChatGPT can also assist the visual presentation of PPT slides. It can understand text descriptions and data information, and generate corresponding charts, graphics and animation effects to better display and convey information.4. Technology exploration and empirical researchIn order to explore ChatGPT's image processing and visual presentation technology in PPT slide design, we conducted a series of technical exploration and empirical research. We designed experimental scenarios and tasks, and used ChatGPT to assist in image processing and visualization.4.1 Exploration of image processing technologyWe explore ChatGPT-based image processing techniques, including image recognition, image style transfer, and image generation. Through the image understanding and generation capabilities of ChatGPT, we try to automatically generate and optimize image elements in slideshows, and evaluate their effects and feasibility.4.2 Exploration of visual presentation technologyWe also explored ChatGPT-based visual presentation techniques, including data visualization, chart generation, and animation effect design. Through the semantic understanding and idea generation capabilities of ChatGPT, we try to generate optimizedVisualizations and assess their impact on slide presentations.5. Results and DiscussionBased on the analysis of experiments and user feedback, we draw the following conclusions:5.1 Effect evaluation of image processing technologyThe image processing technology assisted by ChatGPT can provide fast and high-quality image generation and optimization effects. Presenters can more easily use image elements in PPT slide design, and customize and modify them as needed.5.2 Visual presentation technology effect evaluationChatGPT-assisted visualization technology can generate attractive and readable charts, graphs and animation effects. These effects help improve your slideshow's messaging and audience engagement.6. Advantages and challenges of ChatGPT in image processing and visualizationBased on the result analysis, we discuss the advantages and challenges of ChatGPT in image processing and visualization. Benefits include quickly generating and optimizing images, providing personalized visualizations, and more. Challenges include limitations on the ability to understand and generate domain-specific knowledge.7. Development direction and future prospectIn order to further develop the image processing and visualization technology of ChatGPT in PPT slide design, we propose the following development directions: First, further optimize the ChatGPT model to improve image understanding and generation capabilities; second, combine user feedback and personalized needs to realize More precise image processing and visualization.8. SummaryThis research explores the image processing and visual presentation technology in ChatGPT-based PPT slide design. Through technology exploration and empirical research, we evaluate the effectiveness and feasibility of these technologies and discuss their advantages and challenges. Future research can further develop and improve these techniques to enhance the design quality and user experience of PPT slides.PPT幻灯片设计中的图像处理与可视化呈现技术探索摘要本研究探索了基于ChatGPT的PPT幻灯片设计中的图像处理与可视化呈现技术。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

中英文资料对照外文翻译使用不变特征的全景图像自动拼接摘要本文研究全自动全景图像的拼接问题，尽管一维问题（单一旋转轴）很好研究，但二维或多行拼接却比较困难。

以前的方法使用人工输入或限制图像序列，以建立匹配的图像，在这篇文章中，我们假定拼接是一个多图像匹配问题，并使用不变的局部特征来找到所有图像的匹配特征。

由于以上这些，该方法对输入图像的顺序、方向、尺度和亮度变化都不敏感；它也对不属于全景图一部分的噪声图像不敏感，并可以在一个无序的图像数据集中识别多个全景图。

此外，为了提供更多有关的细节，本文通过引入增益补偿和自动校直步骤延伸了我们以前在该领域的工作。

1. 简介全景图像拼接已经有了大量的研究文献和一些商业应用。

这个问题的基本几何学很好理解，对于每个图像由一个估计的3×3的摄像机矩阵或对应矩阵组成。

估计处理通常由用户输入近似的校直图像或者一个固定的图像序列来初始化，例如，佳能数码相机内的图像拼接软件需要水平或垂直扫描，或图像的方阵。

在自动定位进行前，第4版的REALVIZ拼接软件有一个用户界面，用鼠标在图像大致定位，而我们的研究是有新意的，因为不需要提供这样的初始化。

根据研究文献，图像自动对齐和拼接的方法大致可分为两类——直接的和基于特征的。

直接的方法有这样的优点，它们使用所有可利用的图像数据，因此可以提供非常准确的定位，但是需要一个只有细微差别的初始化处理。

基于特征的配准不需要初始化，但是缺少不变性的传统的特征匹配方法（例如，Harris角点图像修补的相关性）需要实现任意全景图像序列的可靠匹配。

在本文中，我们描述了一个基于不变特征的方法实现全自动全景图像的拼接，相比以前的方法有以下几个优点。

第一，不变特征的使用实现全景图像序列的可靠匹配，尽管在输入图像中有旋转、缩放和光照变化。

第二，通过假定图像拼接是一个多图像匹配问题，我们可以自动发现这些图像间的匹配关系，并且在无序的数据集中识别出全景图。

第三，通过使用多波段融合呈现无缝输出的全景图，可以产生高质量的结果。

本文通过引入增益补偿和自动校直步骤延伸了我们以前在该领域的工作，我们还描述了一个高效的捆绑调整实现并展示对任意数量波段的多个重叠图像如何进行多波段融合。

本文其余部分的结构如下。

第二部分说明所研究问题的几何学和我们选择不变特征的原因。

第三部分介绍了图像匹配方法（RANSAC ）和验证图像匹配的概率模型。

第四部分中，我们描述了图像对准算法（捆绑调整），即共同优化每个摄像头的参数。

五到七部分描述了处理过程，包括自动校直、增益补偿和多波段融合。

第九部分中，我们给出了结论和对未来工作的展望。

2. 特征匹配全景识别算法的第一步是在所有图像之间提取和匹配SIFT 特征检测点。

SIFT 特征检测子位于不同尺度空间高斯插值函数的极值点处，对每一个特征点，特征尺度和方向被确定，这为测量提供了一个相似不变的结构。

尽管在这个结构中简单的采样强度值是相似不变的，但是不变描述子实际上是通过对方向直方图的局部梯度值进行累积计算得到的，这样就允许边缘有轻微的移动而不会改变描述子的矢量，对仿射变换提供了一定的鲁棒性。

空间累积计算对平移不变性同样重要，因为感兴趣点位置通常仅在0~3个像素的范围内是精确的。

为了实现亮度不变性可以使用梯度（消除偏差）和对描述子矢量归一化（消除增益）。

由于SIFT 特征在旋转和尺度变化时是不变的，我们可以处理具有变化的方向和大小的图像（见图8）。

值得注意的是，这是传统的特征匹配技术不能实现的，例如Harris 角点图像修补的相关性。

传统的相关性在图像旋转时是变化的，Harris 角点在改变图像尺度时也是变化的。

假设相机绕光学中心旋转，图像的变换群是一个对应矩阵的特殊群。

由一个旋转矢量[]321θθθθ，，=和焦距f 将每个摄像头参数化，就给出了成对的对应矩阵j ij i u H u ~~=，其中1-=j T j i i ij K R R K H （1）并且j i u u ~~,是均匀的图像坐标（[]1,~i i i u s u =，其中i u 是二维的图像坐标）。

4参数的相机模型定义为：⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡=1000000iii f f K （2）对旋转使用指数表示：[],⨯=i e R i θ[]⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡---=⨯000121323i i i i i i i θθθθθθθ （3）在这个变换群中，理想条件下将会使用不变的图像特征。

可是，在图像坐标中对于小的变换表示如下：j u j i i i u u u u u i ∆∂∂+=00 （4）或者等价于 j ij i u A u ~~=，其中，⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡=100232221131211a a a a a a A ij （5）是通过一个关于0i u 的对应线性化得到的仿射变换。

这意味着每个小的图像修补经过一次仿射变换，并且合理利用了在仿射变换下局部不变的SIFT 特征。

一旦从所有n 个图像中提取特征点后（线性时间内），需对特征点进行匹配。

由于多个图像可能重叠在一个单一的光线上，在特征空间内每个特征点需和它最近的k 个领域点匹配（k=4），通过使用k-d 树算法找到近似最近的领域点，时间复杂度为O （n n log ）。

k-d 树是一种轴对齐的二进制空间划分，它在平均最高方差维递归划分特征空间。

3. 图像匹配图像匹配的目标是找到所有匹配（例如重叠）图像，稍后图像匹配连通集会成为全景图。

由于每个图像可能和任意其他一个匹配，这个问题一开始就呈现是出图像数的二次方。

为了得到一个好的拼接结果，对于图像几何而言，每个图像只需要和少数重叠的图像来匹配。

从特征匹配这个步骤中，我们已找出图像间有大量匹配点的图像。

对于当前图像，我们将m 幅图像作为可能的匹配图像（m=6），这m 幅图像与当前图像有最大数量的特征匹配点。

首先，使用RANSAC 算法选择一系列和图像间对应矩阵兼容的内点，然后应用概率模型做进一步的验证。

3.1 使用RANSAC 算法的鲁棒对应矩阵估计RANSAC （随机抽样一致性算法）算法是使用最少的一组随机采样匹配点的一种鲁棒估计过程，用来估计图像变换参数，并找到与数据具有最好一致性的解决方案。

在全景图的情况下，我们选择r=4对匹配特征点，使用直接线性变换（DLT)方法计算图像间的对应矩阵H 。

重复500次试验，选择内点数最大的解决方案（在像素误差范围内，其预测和H 是一致的）。

假设一对匹配图像间的特征匹配点是正确的概率（内点概率）为i p ，n 次试验后找到正确变换的概率为：H p (is correct ）n r i p ))(1(1--= （6）经过大量试验后，找到正确对应矩阵的概率非常大。

例如，对于内点概率5.0=i p ，在500次试验后，未找到正确对应矩阵的概率为14101-⨯。

RANSAC 算法本质上是一种估计H 的采样方法，如果用对数似然和的最大化代替内点数量的最大化，结果是最大似然估计（MLE ）。

此外，如果变换参数的先验值是有效的，可以计算出最大后验概率（MAP ）。

这些算法被分别称为MLESAC 和MAPSAC 。

3.2 图像匹配关系验证的概率模型对两两图像间是否存在匹配关系，我们使用一系列几何一致的特征匹配点（RANSAC 内点）和一系列在重叠区域内，但不一致的特征点（RANSAC 外点）来验证。

验证模型通过比较这些正确匹配产生的一系列内点和错误匹配产生的一系列外点的概率来进行验证。

对于一幅给定的图像，重叠区域内总的匹配特征点数为f n ，内点数为i n 。

图像是否有效匹配通过二进制变量{}1,0∈m 表示。

第i 个匹配特征点(){}1,0∈i f 是否为内点被假定为独立的贝努力分布，以便于内点总数服从贝努力分布：()()()1:1,;1p n n B m fp f i n f == （7） ()()()0:1,;0p n n B m f p f i n f == （8）其中，1p 是可以正确匹配图像时特征点为内点的概率，0p 是不能实现图像匹配时特征点为内点的概率；()f n f :1表示特征匹配点变量的集合(){}f i n i f ,...,2,1,=，内点数()∑==f n i i i f n 1，B(.)是贝努力分布，表示如下：()()x n p p x n x n p n x B x ---=1!!!),;( （9）我们选择6.01=p ，1.00=p ，则可以使用贝叶斯规则（式10、11）计算正确图像匹配的先验概率。

()()()()()()()f f f n n n f p m p m f p f m p :1:1:1111==== （10）()()())1(1)0(011):1(:1====+=m p m f p m p m f p f f n n （11）如果满足min ):1()1(p f m p f n >= ()()()1110,;1);(min 01,->==<p m p p n n B m p p n n B accept f i f i reject（12）我们可以实现图像匹配。

假定610)1(-==m p ，999.0min =p ，进一步得出正确图像匹配的判定条件：f i n n βα+> （13）其中0.8=α，3.0=β。

尽管在这我们选择了0p ，1p ，()0=m p ，()1=m p 和min p 的值，但在原理上可以从数据中进一步确定这些值。

例如，可以通过在大的数据集中计算一部分匹配点和正确的对应矩阵相一致来估计1p 。

一旦图像间的匹配点对确定，我们可以找到全景序列作为连接匹配图像集，它可以识别图像集中的多个全景，拒绝不匹配的噪声图像（见图2）。

（a ）图一（b ）图二（c ）SIFT 匹配点1 （d ）SIFT 匹配点2（e ）RANSAC 内点1 （f ）RANSAC 内点2（g ）依据对应矩阵的图像对准图1，从所有图像中提取SIFT特征点。

使用k-d树匹配所有特征点后，对于一个给定图像，用有最多特征匹配点的m幅图像进行图像匹配。

首先执行RANSAC算法计算出对应矩阵，然后调用概率模型验证基于内点数的图像匹配，在这个例子中，输入图像是517×374像素，有247个正确特征匹配点。

（a）图像匹配点（b）图像匹配点的连接分量（c）输出全景图图2，可识别全景图。

考虑一个特征匹配点的噪声集，我们使用RANSAC算法和概率验证过程找到一致的图像匹配（a ），每个图像对间的箭头表示在图像对间找到一致的特征匹配点集，图像匹配连接分量被找到（b ），拼接成全景图（c ）；注意到该算法对不属于全景图的噪声图像不敏感。

4. 捆绑调整考虑到图像间的几何一致性匹配集，使用捆绑调整解决所有相机参数的问题，这是重要的一个步骤，由于成对对应矩阵拼接将会造成累计误差，忽略图像间的多重约束，如全景图两端应联合起来。