Face Recognition under Varying Lighting Conditions Using Self Quotient Image

合集下载

人脸识别 面部 数字图像处理相关 中英对照 外文文献翻译 毕业设计论文 高质量人工翻译 原文带出处

人脸识别 面部 数字图像处理相关 中英对照 外文文献翻译 毕业设计论文 高质量人工翻译 原文带出处

人脸识别相关文献翻译,纯手工翻译,带原文出处(原文及译文)如下翻译原文来自Thomas David Heseltine BSc. Hons. The University of YorkDepartment of Computer ScienceFor the Qualification of PhD. — September 2005 -《Face Recognition: Two-Dimensional and Three-Dimensional Techniques》4 Two-dimensional Face Recognition4.1 Feature LocalizationBefore discussing the methods of comparing two facial images we now take a brief look at some at the preliminary processes of facial feature alignment. This process typically consists of two stages: face detection and eye localisation. Depending on the application, if the position of the face within the image is known beforehand (fbr a cooperative subject in a door access system fbr example) then the face detection stage can often be skipped, as the region of interest is already known. Therefore, we discuss eye localisation here, with a brief discussion of face detection in the literature review(section 3.1.1).The eye localisation method is used to align the 2D face images of the various test sets used throughout this section. However, to ensure that all results presented are representative of the face recognition accuracy and not a product of the performance of the eye localisation routine, all image alignments are manually checked and any errors corrected, prior to testing and evaluation.We detect the position of the eyes within an image using a simple template based method. A training set of manually pre-aligned images of feces is taken, and each image cropped to an area around both eyes. The average image is calculated and used as a template.Figure 4-1 - The average eyes. Used as a template for eye detection.Both eyes are included in a single template, rather than individually searching for each eye in turn, as the characteristic symmetry of the eyes either side of the nose, provides a useful feature that helps distinguish between the eyes and other false positives that may be picked up in the background. Although this method is highly susceptible to scale(i.e. subject distance from the camera) and also introduces the assumption that eyes in the image appear near horizontal. Some preliminary experimentation also reveals that it is advantageous to include the area of skin justbeneath the eyes. The reason being that in some cases the eyebrows can closely match the template, particularly if there are shadows in the eye-sockets, but the area of skin below the eyes helps to distinguish the eyes from eyebrows (the area just below the eyebrows contain eyes, whereas the area below the eyes contains only plain skin).A window is passed over the test images and the absolute difference taken to that of the average eye image shown above. The area of the image with the lowest difference is taken as the region of interest containing the eyes. Applying the same procedure using a smaller template of the individual left and right eyes then refines each eye position.This basic template-based method of eye localisation, although providing fairly preciselocalisations, often fails to locate the eyes completely. However, we are able to improve performance by including a weighting scheme.Eye localisation is performed on the set of training images, which is then separated into two sets: those in which eye detection was successful; and those in which eye detection failed. Taking the set of successful localisations we compute the average distance from the eye template (Figure 4-2 top). Note that the image is quite dark, indicating that the detected eyes correlate closely to the eye template, as we would expect. However, bright points do occur near the whites of the eye, suggesting that this area is often inconsistent, varying greatly from the average eye template.Figure 4-2 一Distance to the eye template for successful detections (top) indicating variance due to noise and failed detections (bottom) showing credible variance due to miss-detected features.In the lower image (Figure 4-2 bottom), we have taken the set of failed localisations(images of the forehead, nose, cheeks, background etc. falsely detected by the localisation routine) and once again computed the average distance from the eye template. The bright pupils surrounded by darker areas indicate that a failed match is often due to the high correlation of the nose and cheekbone regions overwhelming the poorly correlated pupils. Wanting to emphasise the difference of the pupil regions for these failed matches and minimise the variance of the whites of the eyes for successful matches, we divide the lower image values by the upper image to produce a weights vector as shown in Figure 4-3. When applied to the difference image before summing a total error, this weighting scheme provides a much improved detection rate.Figure 4-3 - Eye template weights used to give higher priority to those pixels that best represent the eyes.4.2 The Direct Correlation ApproachWe begin our investigation into face recognition with perhaps the simplest approach,known as the direct correlation method (also referred to as template matching by Brunelli and Poggio [29 ]) involving the direct comparison of pixel intensity values taken from facial images. We use the term "Direct Conelation, to encompass all techniques in which face images are compared directly, without any form of image space analysis, weighting schemes or feature extraction, regardless of the distance metric used. Therefore, we do not infer that Pearson's correlation is applied as the similarity function (although such an approach would obviously come under our definition of direct correlation). We typically use the Euclidean distance as our metric in these investigations (inversely related to Pearson's correlation and can be considered as a scale and translation sensitive form of image correlation), as this persists with the contrast made between image space and subspace approaches in later sections.Firstly, all facial images must be aligned such that the eye centres are located at two specified pixel coordinates and the image cropped to remove any background information. These images are stored as greyscale bitmaps of 65 by 82 pixels and prior to recognition converted into a vector of 5330 elements (each element containing the corresponding pixel intensity value). Each corresponding vector can be thought of as describing a point within a 5330 dimensional image space. This simple principle can easily be extended to much larger images: a 256 by 256 pixel image occupies a single point in 65,536-dimensional image space and again, similar images occupy close points within that space. Likewise, similar faces are located close together within the image space, while dissimilar faces are spaced far apart. Calculating the Euclidean distance d, between two facial image vectors (often referred to as the query image q, and gallery image g), we get an indication of similarity. A threshold is then applied to make the final verification decision.d . q - g ( threshold accept ) (d threshold ⇒ reject ). Equ. 4-14.2.1 Verification TestsThe primary concern in any face recognition system is its ability to correctly verify a claimed identity or determine a person's most likely identity from a set of potential matches in a database. In order to assess a given system's ability to perform these tasks, a variety of evaluation methodologies have arisen. Some of these analysis methods simulate a specific mode of operation (i.e. secure site access or surveillance), while others provide a more mathematicaldescription of data distribution in some classification space. In addition, the results generated from each analysis method may be presented in a variety of formats. Throughout the experimentations in this thesis, we primarily use the verification test as our method of analysis and comparison, although we also use Fisher's Linear Discriminant to analyse individual subspace components in section 7 and the identification test for the final evaluations described in section 8. The verification test measures a system's ability to correctly accept or reject the proposed identity of an individual. At a functional level, this reduces to two images being presented for comparison, fbr which the system must return either an acceptance (the two images are of the same person) or rejection (the two images are of different people). The test is designed to simulate the application area of secure site access. In this scenario, a subject will present some form of identification at a point of entry, perhaps as a swipe card, proximity chip or PIN number. This number is then used to retrieve a stored image from a database of known subjects (often referred to as the target or gallery image) and compared with a live image captured at the point of entry (the query image). Access is then granted depending on the acceptance/rej ection decision.The results of the test are calculated according to how many times the accept/reject decision is made correctly. In order to execute this test we must first define our test set of face images. Although the number of images in the test set does not affect the results produced (as the error rates are specified as percentages of image comparisons), it is important to ensure that the test set is sufficiently large such that statistical anomalies become insignificant (fbr example, a couple of badly aligned images matching well). Also, the type of images (high variation in lighting, partial occlusions etc.) will significantly alter the results of the test. Therefore, in order to compare multiple face recognition systems, they must be applied to the same test set.However, it should also be noted that if the results are to be representative of system performance in a real world situation, then the test data should be captured under precisely the same circumstances as in the application environment.On the other hand, if the purpose of the experimentation is to evaluate and improve a method of face recognition, which may be applied to a range of application environments, then the test data should present the range of difficulties that are to be overcome. This may mean including a greater percentage of6difficult9 images than would be expected in the perceived operating conditions and hence higher error rates in the results produced. Below we provide the algorithm for executing the verification test. The algorithm is applied to a single test set of face images, using a single function call to the face recognition algorithm: CompareF aces(F ace A, FaceB). This call is used to compare two facial images, returning a distance score indicating how dissimilar the two face images are: the lower the score the more similar the two face images. Ideally, images of the same face should produce low scores, while images of different faces should produce high scores.Every image is compared with every other image, no image is compared with itself and nopair is compared more than once (we assume that the relationship is symmetrical). Once two images have been compared, producing a similarity score, the ground-truth is used to determine if the images are of the same person or different people. In practical tests this information is often encapsulated as part of the image filename (by means of a unique person identifier). Scores are then stored in one of two lists: a list containing scores produced by comparing images of different people and a list containing scores produced by comparing images of the same person. The final acceptance/rejection decision is made by application of a threshold. Any incorrect decision is recorded as either a false acceptance or false rejection. The false rejection rate (FRR) is calculated as the percentage of scores from the same people that were classified as rejections. The false acceptance rate (FAR) is calculated as the percentage of scores from different people that were classified as acceptances.For IndexA = 0 to length(TestSet) For IndexB = IndexA+l to length(TestSet) Score = CompareFaces(TestSet[IndexA], TestSet[IndexB]) If IndexA and IndexB are the same person Append Score to AcceptScoresListElseAppend Score to RejectScoresListFor Threshold = Minimum Score to Maximum Score:FalseAcceptCount, FalseRejectCount = 0For each Score in RejectScoresListIf Score <= ThresholdIncrease FalseAcceptCountFor each Score in AcceptScoresListIf Score > ThresholdIncrease FalseRejectCountF alse AcceptRate = FalseAcceptCount / Length(AcceptScoresList)FalseRej ectRate = FalseRejectCount / length(RejectScoresList)Add plot to error curve at (FalseRejectRate, FalseAcceptRate)These two error rates express the inadequacies of the system when operating at aspecific threshold value. Ideally, both these figures should be zero, but in reality reducing either the FAR or FRR (by altering the threshold value) will inevitably resultin increasing the other. Therefore, in order to describe the full operating range of a particular system, we vary the threshold value through the entire range of scores produced. The application of each threshold value produces an additional FAR, FRR pair, which when plotted on a graph produces the error rate curve shown below.False Acceptance Rate / %Figure 4-5 - Example Error Rate Curve produced by the verification test.The equal error rate (EER) can be seen as the point at which FAR is equal to FRR. This EER value is often used as a single figure representing the general recognition performance of a biometric system and allows for easy visual comparison of multiple methods. However, it is important to note that the EER does not indicate the level of error that would be expected in a real world application. It is unlikely that any real system would use a threshold value such that the percentage of false acceptances were equal to the percentage of false rejections. Secure site access systems would typically set the threshold such that false acceptances were significantly lower than false rejections: unwilling to tolerate intruders at the cost of inconvenient access denials.Surveillance systems on the other hand would require low false rejection rates to successfully identify people in a less controlled environment. Therefore we should bear in mind that a system with a lower EER might not necessarily be the better performer towards the extremes of its operating capability.There is a strong connection between the above graph and the receiver operating characteristic (ROC) curves, also used in such experiments. Both graphs are simply two visualisations of the same results, in that the ROC format uses the True Acceptance Rate(TAR), where TAR = 1.0 - FRR in place of the FRR, effectively flipping the graph vertically. Another visualisation of the verification test results is to display both the FRR and FAR as functions of the threshold value. This presentation format provides a reference to determine the threshold value necessary to achieve a specific FRR and FAR. The EER can be seen as the point where the two curves intersect.Figure 4-6 - Example error rate curve as a function of the score threshold The fluctuation of these error curves due to noise and other errors is dependant on the number of face image comparisons made to generate the data. A small dataset that only allows fbr a small number of comparisons will results in a jagged curve, in which large steps correspond to the influence of a single image on a high proportion of the comparisons made. A typical dataset of 720 images (as used in section 4.2.2) provides 258,840 verification operations, hence a drop of 1% EER represents an additional 2588 correct decisions, whereas the quality of a single image could cause the EER to fluctuate by up to 0.28.422 ResultsAs a simple experiment to test the direct correlation method, we apply the technique described above to a test set of 720 images of 60 different people, taken from the AR Face Database [ 39 ]. Every image is compared with every other image in the test set to produce a likeness score, providing 258,840 verification operations from which to calculate false acceptance rates and false rejection rates. The error curve produced is shown in Figure 4-7.Figure 4-7 - Error rate curve produced by the direct correlation method using no image preprocessing.We see that an EER of 25.1% is produced, meaning that at the EER threshold approximately one quarter of all verification operations carried out resulted in an incorrect classification. Thereare a number of well-known reasons for this poor level of accuracy. Tiny changes in lighting, expression or head orientation cause the location in image space to change dramatically. Images in face space are moved far apart due to these image capture conditions, despite being of the same person's face. The distance between images of different people becomes smaller than the area of face space covered by images of the same person and hence false acceptances and false rejections occur frequently. Other disadvantages include the large amount of storage necessaryfor holding many face images and the intensive processing required for each comparison, making this method unsuitable fbr applications applied to a large database. In section 4.3 we explore the eigenface method, which attempts to address some of these issues.4二维人脸识别4.1功能定位在讨论比较两个人脸图像,我们现在就简要介绍的方法一些在人脸特征的初步调整过程。

基于 Retinex 理论图像增强的边缘检测

基于 Retinex 理论图像增强的边缘检测

基于 Retinex 理论图像增强的边缘检测袁洪;汤辉【摘要】分析了基于Retinex 理论图像增强算法和传统的Canny 边缘检测算法,提出将2种方法有效结合起来,应用于曝光不均匀、图像模糊不清或者对比度低图像的边缘检测中。

实验表明该方法能准确检测图像中的目标边缘,比单一应用Canny 算法检测到的边缘细节更丰富,且能较好地保持目标边缘的连通性。

%In this paper,the image enhancement algorithm based on Retinex theory and the traditional Canny edge detection algorithm is analyzed.The new method was proposed by effectively combined the two methods to applyto the exposure of uneven,blurred images,or low contrast images.The experimen-tal results show that the method can accurately detect the image edge of the target,more abundant than the only application of the Canny algorithm to detect the edge details,and can maintain good goal edge connectivity.【期刊名称】《昆明冶金高等专科学校学报》【年(卷),期】2015(000)003【总页数】4页(P51-53,95)【关键词】Retinex理论;图像增强;Canny算法;边缘检测【作者】袁洪;汤辉【作者单位】昆明冶金高等专科学校智能化管理部,云南昆明 650033;昆明理工大学信息工程与自动化学院,云南昆明 650051【正文语种】中文【中图分类】TP391.410 引言图像增强是根据需要使图像中的局部予以强化显示,减弱或去除不必要的信息,达到使计算机进行识别的目的。

facial recognition阅读理解

facial recognition阅读理解

facial recognition阅读理解一、简介Facial Recognition是一种基于人工智能的技术,能够识别和分析图像中的面孔,从而对目标对象进行身份验证和定位。

在本文中,我们将讨论这项技术的基本概念、应用场景、优缺点以及相关法律法规。

二、基本原理Facial Recognition的核心原理是基于图像处理和机器学习算法。

它首先对输入的图像进行预处理,如光线调整、面部对齐等,以改善识别精度。

然后,使用机器学习算法训练一个模型,该模型能够从图像中提取面部的特征,并与数据库中的已知面孔进行比对。

如果匹配成功,则识别成功。

三、应用场景1. 安全监控:在公共场所,如机场、车站、大型活动现场等,使用Facial Recognition技术可以进行人员身份验证和流量控制,提高安全保障。

2. 人脸识别门禁:在住宅小区、办公大楼等场所,Facial Recognition门禁系统可以方便快捷地验证访客身份,提高门禁管理效率。

3. 社交媒体应用:许多社交媒体应用使用Facial Recognition技术来识别用户上传的照片中的人物,并提供相关标签和推荐。

4. 执法机构:在刑事调查和人口统计中,Facial Recognition也发挥着重要作用。

四、优缺点优点:1. 准确度高:随着技术的进步,Facial Recognition的准确度越来越高。

2. 快速高效:相对于传统的人工比对方法,Facial Recognition可以大大提高识别的速度和效率。

3. 非接触式:Facial Recognition系统通常是非接触式的,对用户来说更加方便和舒适。

缺点:1. 隐私问题:Facial Recognition涉及个人隐私,需要遵守相关的法律法规。

2. 技术局限性:Facial Recognition技术在一些特殊情况下(如化妆、发型改变、面部遮挡等)可能无法正常工作。

3. 安全风险:如果系统被黑客攻击或滥用来识别无辜公民,可能会引发安全风险。

基于小波商图像的人脸光照补偿

基于小波商图像的人脸光照补偿

基于小波商图像的人脸光照补偿刘丽华;王映辉;邓方安【期刊名称】《计算机工程与设计》【年(卷),期】2009(030)014【摘要】Face recognition under complex illumination conditions is still an difficult but must deal with problem an effective illumination compensation method is proposed: Based on quotient image theory, the illumination condition is estimated on facial wavelet de-dimension illumination training set. Moreover, the aim of facial illumination compensation is implemented by two basic strategies of adding light and reducing light. Compared with traditional quotient image theory, this method enhances arithmetic efficiency by wavelet transform. The experimental results show that the improved methods get a very competitive recognition rate with low computational cost.%复杂光照条件下的人脸识别是一个困难但需迫切解决的问题,为此提出了一种有效的光照补偿算法.该方法根据人脸光照线性变换子空间理论--商图像理论,构造了小波低维训练集,实现了对待识别图像的光照条件估计,并且通过加光和去光策略增强了光照补偿效果.与传统商图像理论相比,该方法利用小波分解,提高的算法执行效率,实验结果表明,该算法以较小的代价取得了较高的识别性能.【总页数】4页(P3402-3405)【作者】刘丽华;王映辉;邓方安【作者单位】陕西理工学院数学系,陕西,汉中,723001;西安理工大学,计算机科学与工程学院,陕西,西安;陕西理工学院数学系,陕西,汉中,723001【正文语种】中文【中图分类】TP391.41【相关文献】1.利用小波的人脸图像光照补偿方法 [J], 黄小勤2.利用小波的人脸图像光照补偿方法 [J], 黄小勤3.基于多级小波分解的人脸图像光照补偿方法 [J], 龚卫国;杨利平;辜小花;李伟红4.基于小波的人脸光照补偿技术研究 [J], 刘文娅5.基于小波变换和光照补偿的人脸识别方法 [J], 刘俊;黄际彦;吴波;邓华;母国才因版权原因,仅展示原文概要,查看原文内容请购买。

人脸识别技术观点英语作文

人脸识别技术观点英语作文

人脸识别技术观点英语作文Here is an English essay on the topic of facial recognition technology, with the content exceeding 1000 words as requested. The essay is written without a title and without any additional punctuation marks in the main body.Facial recognition technology has become increasingly prevalent in our modern society with its wide-ranging applications across various sectors. This advanced biometric identification system utilizes algorithms to map an individual's facial features and compare them against a database of stored facial profiles. While the technology offers numerous benefits in terms of security enhancement and convenience, it also raises significant ethical and privacy concerns that warrant careful consideration.One of the primary advantages of facial recognition technology is its ability to enhance security measures. In the realm of law enforcement, this technology can aid in the identification and apprehension of criminals, potentially leading to improved public safety. By cross-referencing facial data with criminal databases, authorities can quickly and accurately pinpoint suspects, streamlining investigative processes. Additionally, this technology can beimplemented in secure access control systems, ensuring that only authorized individuals can gain entry to sensitive areas or facilities, thereby reducing the risk of unauthorized access.Furthermore, facial recognition technology has proven invaluable in various commercial and social applications. Retailers can leverage this technology to personalize customer experiences, offering targeted advertisements and recommendations based on individual preferences. In the financial sector, banks can utilize facial recognition for secure authentication, reducing the reliance on traditional password-based systems and enhancing the overall security of transactions. Similarly, social media platforms can employ facial recognition to facilitate features such as automatic tagging and photo organization, improving user experience and engagement.However, the widespread adoption of facial recognition technology also raises significant ethical and privacy concerns. One of the primary issues is the potential for infringement on individual privacy. The collection and storage of biometric data, such as facial profiles, without explicit consent or adequate safeguards, can be perceived as a violation of an individual's right to privacy. This concern is amplified by the possibility of unauthorized access or misuse of this sensitive information, which could lead to identity theft, stalking, or other malicious activities.Moreover, the accuracy and reliability of facial recognition systems have come under scrutiny. Studies have shown that these systems can exhibit biases, often performing less accurately for individuals from certain demographic groups, such as women and people of color. This bias can lead to disproportionate targeting or false identifications, which can have severe consequences, particularly in law enforcement contexts. The potential for such errors to perpetuate societal inequalities and undermine the principles of fairness and justice is a significant concern.Another pressing issue is the lack of comprehensive regulatory frameworks governing the use of facial recognition technology. In many countries, the legal landscape surrounding the collection, storage, and application of biometric data remains unclear, leaving individuals vulnerable to potential abuses. Without clear guidelines and oversight, there is a risk of unchecked surveillance, profiling, and discrimination, which can erode the fundamental civil liberties and democratic principles that societies strive to uphold.Furthermore, the potential for the misuse of facial recognition technology by authoritarian regimes or malicious actors is a grave concern. Unchecked access to this technology could enable the monitoring and suppression of dissent, the targeting of minority groups, and the erosion of freedom of expression and association. The dystopian scenarios envisioned in works of science fiction havethe potential to become a reality if appropriate safeguards and ethical considerations are not prioritized.In response to these concerns, there have been growing calls for greater regulation and oversight of facial recognition technology. Policymakers and civil society organizations have advocated for the implementation of robust privacy laws, data protection frameworks, and algorithmic accountability measures. These efforts aim to ensure that the development and deployment of facial recognition systems are aligned with fundamental human rights and that individuals are granted the necessary protections against potential abuses.Additionally, there have been calls for increased transparency and public discourse around the use of facial recognition technology. Engaging with diverse stakeholders, including privacy advocates, technology experts, and affected communities, can help shape policies that strike a balance between the benefits of the technology and the preservation of individual rights and civil liberties.It is also crucial that the developers and deployers of facial recognition systems prioritize the principles of fairness, non-discrimination, and ethical design. This includes addressing the issue of algorithmic bias, improving the accuracy and reliability of the technology, and implementing rigorous testing and auditing procedures to identify and mitigate potential harms.In conclusion, the rapid advancement of facial recognition technology presents both opportunities and challenges. While the technology offers valuable applications in enhancing security and enabling convenient commercial and social experiences, the ethical and privacy concerns it raises cannot be overlooked. Striking the right balance between the benefits of the technology and the protection of individual rights will require a multifaceted approach involving robust regulation, transparent governance, and a commitment to ethical and responsible development. As we navigate the evolving landscape of facial recognition technology, it is crucial that we prioritize the preservation of fundamental human rights and the well-being of individuals and society as a whole.。

IEEE_SMC_A_gao_CAS-PEAL

IEEE_SMC_A_gao_CAS-PEAL

The CAS-PEAL Large-Scale Chinese Face Database and Baseline EvaluationsWen Gao,Senior Member,IEEE,Bo Cao,Shiguang Shan,Member,IEEE,Xilin Chen,Member,IEEE,Delong Zhou,Xiaohua Zhang,and Debin ZhaoAbstract—In this paper,we describe the acquisition and con-tents of a large-scale Chinese face database:the CAS-PEAL face database.The goals of creating the CAS-PEAL face database include the following:1)providing the worldwide researchers of face recognition with different sources of variations,particularly pose,expression,accessories,and lighting(PEAL),and exhaustive ground-truth information in one uniform database;2)advanc-ing the state-of-the-art face recognition technologies aiming at practical applications by using off-the-shelf imaging equipment and by designing normal face variations in the database;and 3)providing a large-scale face database of Mongolian.Currently, the CAS-PEAL face database contains99594images of1040 individuals(595males and445females).A total of nine cameras are mounted horizontally on an arc arm to simultaneously cap-ture images across different poses.Each subject is asked to look straight ahead,up,and down to obtain27images in three shots. Five facial expressions,six accessories,and15lighting changes are also included in the database.A selected subset of the database (CAS-PEAL-R1,containing30863images of the1040subjects) is available to other researchers now.We discuss the evaluation protocol based on the CAS-PEAL-R1database and present the Manuscript received June29,2005;revised January9,2006.This work was supported in part by the National Natural Science Foundation of China under Grants60332010,60772071,and60533030by the100Talents Program of the Chinese Academy of Sciences.The work of B.Cao was also supported in part by the Natural Science Foundation of China under Grant60473043.This paper was recommended by Associate Editor M.Celenk.W.Gao was with the Institute of Computing Technology,Chinese Academy of Sciences,Beijing100080,China.He is now with the School of Electronics Engineering and Computer Science,Peking University,Beijing100871,China (e-mail:wgao@).B.Cao is with the ICT–ISVISION Joint Research and Development Lab-oratory for Face Recognition,Institute of Computing Technology,Chinese Academy of Sciences,Beijing100080,China,and also with the Graduate School of the Chinese Academy of Sciences,Beijing100049,China(e-mail: bcao@).S.Shan and X.Chen are with the ICT–ISVISION Joint Research and Development Laboratory for Face Recognition,Institute of Computing Technology and the Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences,Beijing100080,China(e-mail: sgshan@;xlchen@).D.Zhou was with the ICT–ISVISION Joint Research and Development Laboratory for Face Recognition,Institute of Computing Technology,Chinese Academy of Sciences,Beijing100080,China.He is now with the College of Information Engineering,Zhejiang University of Technology,Hangzhou 310027,China(e-mail:zdl@).X.Zhang was with the ICT–ISVISION Joint Research and Development Laboratory for Face Recognition,Institute of Computing Technology,Chinese Academy of Sciences,Beijing100080,China.He is now with the Department of Electrical and Computer Engineering,University of Rochester,Rochester, NY14627USA(e-mail:xhzhang79@).D.Zhao is with the Department of Computer Science,Harbin Institute of Technology,Harbin15001,China,and also with the Institute of Computing Technology,Chinese Academy of Sciences,Beijing100080,China(e-mail: dbzhao@).Color versions of one or more of thefigures in this paper are available online at .Digital Object Identifier10.1109/TSMCA.2007.909557performance of four algorithms as a baseline to do the following: 1)elementarily assess the difficulty of the database for face recog-nition algorithms;2)preference evaluation results for researchers using the database;and3)identify the strengths and weaknesses of the commonly used algorithms.Index Terms—Accessory,evaluation protocol,expression,face databases,face recognition,lighting,pose.I.I NTRODUCTIONA UTOMATIC face recognition(AFR)has been studiedfor over30years[1]–[3].Especially in recent years,it has become one of the most active research areas in pattern recognition,computer vision,and psychology due to the ex-tensive public expectation of its wide potential applications in public security,financial security,entertainment,intelligent human–computer interaction,etc.In addition,much progress has been made in the past few years.However,AFR remains a research area far from maturity,and its applications are still limited in controllable environments.Therefore,it is becoming more and more significant to discover the bottleneck and the valuable future research topics by evaluating and comparing the potential AFR technologies exhaustively and objectively. Aiming at these goals,large-scale and diverse face databases are obviously one of the basic requirements.Internationally, face recognition technology(FERET)[4],[5],face recognition vendor test(FRVT)[6],[7],and face recognition grand chal-lenge(FRGC)[8]have pioneered both evaluation protocols and database construction.Furthermore,FERET has released its database that contains14051face images of over1000subjects and has variations in expression,lighting,pose,and acquisition time.Despite its success in the evaluations of face recognition algorithms,the FERET database has limitations in the rela-tively simple and unsystematically controlled variations of face images for research purposes.FRGC has released its training and validation partitions.The training partition consists of two training sets:the large still training set(6388controlled and 6388uncontrolled still images from222subjects)and the3-D training set(3-D scans,and controlled and uncontrolled still images from943subject sessions).The validation partition contains images from466subjects collected in4007subject sessions.Other publicly available face databases include the CMU PIE[9],AR[10],XM2VTSDB[11],ORL[12],UMIST [13],MIT[14],Yale[15],(Extended)Yale Face Database B [16],[17],BANCA[18],etc.Among them,both the CMU PIE and the(Extended)Yale Face Database B have well-controlled variations in pose and illumination.The CMU PIE contains1083-4427/$25.00©2007IEEETABLE IO VERVIEW OF THE R ECODING C ONDITIONS IN S OME F ACE DATABASESsubjects,which may not satisfy the practical requirements for training and evaluating most face recognition algorithms.To complement the existing face databases,we design and construct a large-scale Chinese face database—the CAS-PEAL face database which covers variations in pose,ex-pression,accessory,lighting,backgrounds,etc.Currently,it contains 99594images of 1040individuals (595males and 445females).A selected subset CAS-PEAL-R1,which con-tains 30863images of 1040subjects,is now made available for other researchers.Table I gives a brief overview of these databases to help researchers choose the most appropriate one for their specific needs.Some older databases are not included in the table (for a complete reference,refer to [19]).It is obvious that the CAS-PEAL-R1database has advantages both in the quantity of subjects and in a number of controlled variations of the recording conditions,which facilitate the train-ing and evaluation of face recognition algorithms,particularly those statistical-based learning techniques.Furthermore,most of the current face databases mainly consist of Caucasian peo-ple,whereas the CAS-PEAL database consists of Mongolian people.Such difference makes it possible to study on the “cross-race”effect in face recognition algorithms [20]–[22].This paper describes the design,collection,and categoriza-tion of the CAS-PEAL database in detail.In addition,we present an evaluation protocol to regulate the potential fu-ture evaluation on the CAS-PEAL-R1face database,based on which we then evaluate the performance of several typical face recognition methods including the eigenface [principle components analysis (PCA)][14],the fisherface [PCA +linear discriminant analysis (LDA)][15],[23],[24],the Gabor-based PCA +LDA (G PCA +LDA)[25],[26],and the local Gabor binary-pattern histogram sequence (LGBPHS)[27]in combi-nation with the different preprocessing methods.The evaluation results have assessed the difficulty of the database for face recognition algorithms on the basis of individual probe sets containing different variations.By analyzing their performance,some insights to the commonly used algorithms and preprocess-ing methods are obtained.The remaining part of this paper is organized as follows.The setup of the photographic room is described in Section II.Then,the design of the CAS-PEAL face database isdetailedFig.1.Illustration of the camera configuration.Note that,in our face database,αis equal to 22.5degree for the subjects #001∼#101,while for other subjects,i.e.,#102∼#1042,αis equal to 15degree.in Section III.The publicly released CAS-PEAL-R1and its accompanying evaluation protocol are described in Section IV.The evaluation results of four baseline algorithms on the CAS-PEAL-R1database are presented in Section V.Finally,some conclusions are drawn in the last section with some further discussions.II.P HOTOGRAPHIC R OOMTo capture face images with varying poses,expressions,accessories,and lighting conditions,a special photographic room with the dimension of 4.0×5.0m and 3.5-m height is set in our laboratory,and the necessary apparatuses are configured in the room including a camera system,a lighting system,accessories,and various backgrounds.The details are described in the following sections.A.Camera SystemIn our photographic room,a camera system consisting of nine digital cameras and a computer is elaborately designed.The cameras we used are Web-Eye PC631with 640×480pixels charge-coupled device (CCD).All nine cameras are mounted on a horizontal semicircular arm of 0.8-m radius and 1.1-m height.They all point to the center of the semicircular arm and are labeled as C0–C8from the subject’s right to left.The sketch map of the cameras’distribution on the semicircle arm is shown in Fig.1.All nine cameras are connected to a computer through USB interface.The computer is specially configured to support upGAO et al.:CAS-PEAL LARGE-SCALE CHINESE FACE DATABASE AND BASELINE EV ALUATIONS151Fig.2.Setup of the photographicroom.Fig.3.Configuration of the lamps and their serial numbers.“U,”“M,”and “D”denote the rough positions of the lamps:“upper,”“middle,”and “down,”respectively.to 12USB ports.We developed software to control the nine cameras and capture the images in one shot.In each shot,the software can obtain nine images of the subject across the nine poses and store these images in the hard drive using a uniform naming convention.Each subject is asked to sit down in a height-adjustable chair.Before photographs are taken,the chair is adjusted to keep the head of the subject at the center of the arm,and the subject is asked to look directly into the camera C4that locates at the middle of the semicircular arm (as Fig.1shows).Fig.2shows the scene that one subject sat on the chair and was ready for the photography procedure.B.Lighting SystemTo simulate the ambient illumination,two photographic sun-lamps of high power covered with a ground glass are used to irradiate to the rough white ceiling,which can obtain more uniform lighting and mimic the normal indoor-lighting envi-ronment (overhead lighting sources).To generate various lighting conditions needed,we set a lighting system in the photographic room using multiple lamps and lampshades.Fifteen fluorescent lamps are placed at the “lamp”positions,as shown in Fig.3,to form varying direc-tional lighting environments.In a spherical coordinate system whose origin is the center of the circle that coincides with the semicircular shelf (the x axes is the middle camera’s opticalTABLE IIA LL P OSSIBLE S OURCES OF V ARIATIONS C OVEREDIN THE CAS-PEAL F ACE DATABASEy axes is horizontal),these positions are locatedat the crossover of five azimuths (−90◦,−45◦,0◦,+45◦,and +90◦)and three elevations (−45◦,0◦,and +45◦).By turning on/off each lamp while the aforementioned ambient lamps are kept on,different directional lighting conditions are simulated.A switch matrix is exploited to control the on/off conditions of these lamps.It should be noted that the flash systems like the CMU PIE [9]or Y ALE Face Database B [16]are not exploited in our system.Therefore,the illumination variations are not as strictly controlled as those in the PIE or Yale.However,these illumination variations are more natural and complicated.C.Accessories:Glasses and HatsIn the tasks of face detection,landmark localization,and face recognition,wearing accessories such as glasses and hats may cause great difficulty because they sometimes result in lighting change or occlusion or both.However,it is hardly evitable in the practical applications such as video surveillance.In the existing face databases,the accessory variations are not adequate.Therefore,we have carefully used several types of glasses and hats as accessories to further increase the diversity of the CAS-PEAL database.The glasses we used include dark frame glasses,glasses without frame,and sunglasses.There are also several hats with brims of different sizes and shapes.In the image collection,some of the subjects are asked to wear these accessories.Another purpose to evaluate face recognition systems with some heads wearing different hats is to emphasize the variabil-ity of hairstyles.Typically,the hairstyle of a specific subject is constant in a face database,which was captured in a single ses-sion and thus may be used as discriminating features,whereas it is changeable in daily life.D.BackgroundsThe background variations,in theory,may not influence the performance of face recognition algorithms provided that the face region is correctly segmented from the background.How-ever,in real-world applications,many cameras are working un-der the mode of automatic white balance or automatic intensity gain,which may change the face appearance evidently under different imaging conditions,particularly for those consumer video cameras.Therefore,it is necessary to mimic this situation in the database.In the current version of the CAS-PEAL,we just consider the cases when the background color has been changed.Concretely,five different unicolor (blue,white,black,red,and yellow)blankets are used.152IEEE TRANSACTIONS ON SYSTEMS,MAN,AND CYBERNETICS—PART A:SYSTEMS AND HUMANS,VOL.38,NO.1,JANUARY2008 Fig.4.The27images of one subject under pose variation in the CAS-PEAL database.The nine cameras(C0–C8)are mounted on the horizontal semicircular arm,(see Fig.1for the camera locations).The subject was asked to look upward,look right into the camera C4,and lookdownward.Fig.5.Example images of one subject with six expressions across three poses(from cameras C3,C4,and C5).III.D ESIGN OF THE CAS-PEAL D ATABASEBy utilizing the devices described in Section II,seven vari-ations are applied to construct the CAS-PEAL face database: pure-pose,expression,lighting,accessory,background,time, and distance variations.Due to the fact that nine cameras from different directions are used to capture each subject simultane-ously,all the variations are automatically combined with nine pose(viewpoint)changes.Table II lists all the possible sources of variations.For some subjects in the database,not all the variations are captured.However,any subject is captured under at least two kinds of these variations.The following sections describe each of the variations and demonstrate some example face images.A.Pure-Pose VariationTo capture images with varying poses,the subject is asked to look upward(about30◦),look right into the camera C4(the middle one),and look downward(about30◦).In each facing direction,nine images are obtained from the nine cameras in one shot.Thus,a total of27images of the subject will be obtained.Fig.4shows the27images of one subject.B.Expression VariationIn addition to the neutral expression,some subjects are asked to smile,to frown,to be surprised,to close eyes,and to open mouth.For each expression,nine images of the subject under different poses are obtained using the nine cameras.Fig.5 shows some example images of the six expressions(including the neutral one)across three poses.C.Lighting VariationUsing the lighting system described in Section II-B,we capture the images of a number of subjects under15different illumination conditions.Example images of one subject under these conditions are shown in Fig.6.Note that,in all cases,the ambient lighting lamps are turned on.D.Accessory VariationFor those subjects who are willing to perform this session, the prepared accessories,three hats and three pairs of glasses, are adorned one by one.Fig.7shows the example images of one subject recorded by the camera C4.E.Background VariationAs mentioned in Section II-D,the background is changed by using different unicolor blankets.Example images under five different backgrounds are shown in Fig.8.It can be found that the exposures of these images are highly dependent on the backgrounds.GAO et al.:CAS-PEAL LARGE-SCALE CHINESE FACE DATABASE AND BASELINE EV ALUATIONS153Fig.6.Example images of one subject illuminated by fluorescent light source located at different azimuth and elevation coordinates from cameraC4.Fig.7.Example images (cropped)of one subject with six differentaccessories.Fig.8.Example images of one subject with different backgrounds.F .Time DifferenceIn FERET,FRVT,and other face recognition competitions,time difference is another important factor decreasing the ac-curacy.In most face databases,images of one subject captured in different times are insufficient or absent because the subjects are hard to be traced.In CAS-PEAL database,66subjects have been captured in two sessions half a year apart.Fig.9shows six images captured in the two sessions.We are further extending this part of the database.G.Different DistanceIn real-world applications,the distance between the subject and the camera is subject to changing,which may not be simply treated as a scale problem.To make possible the evaluation of this problem’s effect on face recognition,we collect some images at different distances for some subjects.In our system,the focal length of the cameras is equal to 36mm.ThreeFig.9.Example images captured with time differences.The images in the bottom row were captured half a year after those in the toprow.Fig.10.Example images at different distances from the camera.distances are used:0.8,1.0,and 1.2m.Fig.10shows three images of one subject at these distances from the camera.IV .P UBLICLY R ELEASED CAS-PEAL-R1AND C ORRESPONDING E VALUATION P ROTOCOLA subset of the CAS-PEAL face database,named CAS-PEAL-R1,has been made publicly available to researchers working on AFR.This section describes the CAS-PEAL-R1as well as its accompanying evaluation protocol.A.Publicly Released CAS-PEAL-R1Face DatabaseContents of the CAS-PEAL-R1:CAS-PEAL-R1is a subset of the entire CAS-PEAL face database.It contains 30863images of 1040subjects.These images belong to two main subsets:the frontal and nonfrontal subsets.1)In the frontal subset,all images are captured by the cam-era C4(see Fig.1),with the subjects looking right into this camera.Among them,377subjects have images with six different expressions.Some 438subjects have images wearing six different accessories.Some 233subjects have154IEEE TRANSACTIONS ON SYSTEMS,MAN,AND CYBERNETICS—PART A:SYSTEMS AND HUMANS,VOL.38,NO.1,JANUARY 2008TABLE IIIC ONTENTS OFCAS-PEAL-R1subjects have images against two to four different back-grounds.The 296subjects have images with different distances from the camera.Furthermore,66subjects have images recorded in two sessions at a six-month interval.2)In the nonfrontal subset,the images of 1040subjects across 21different poses (subset of those described in Section III-A)without any other variations are included.Table III summarizes the contents of CAS-PEAL-R1.Image Naming Convention:In the CAS-PEAL face data-base,the filename of each image encodes the majority of the ground-truth information of that image.Its format is described in Table IV.It consists of 12fields of 46characters long in total.The fields are separated by underline marks as shown above.In these fields,x’s and n’s represent the letter and digit sequences,respectively,which vary with the properties of each image.The meaning of each field,letter sequence,and digit sequence is described in turn as follows.1)Gender and age field.Its two-character type sequence is defined in Table V.2)ID field.Its six-digit sequence indicates the serial number of the subject in the image,ranging from 000001to 001042(000833and 000834are absent.).3)Lighting-variation field.The initial character “I”rep-resents illumination variation.The first “x”(E,F,L)indicates the kind of lighting source.The second “x”(U,M,D)indicates the elevation of the lighting source.The “±nn”indicates the azimuth of the lighting source.See Table VI.4)Pose field.The initial character “P”represents pose vari-ation.The “x”(U,M,D)indicates the subject’s pose (see Table VII).The “±nn”indicates the azimuth of theTABLEVIFig.1for the configuration of the cameras.5)Expression field.The initial character “E”represents expression variation.The following “x”can be “N,”“L,”“F,”“S,”“C,”or “O.”Its meaning is as shown in Table VIII.6)Accessory field.The initial character “A”represents ac-cessory variation.The following “n”can be a value ranging from 0to 6(see Table IX).7)Distance field.The initial character “D”represents dis-tance variation.The following “n”has a value ranging from 0to 2,indicating different distances from the subject to the camera C4.8)Time field.The initial character “T”indicates time vari-ation.The following “n”has value denoting different sessions (see Table X).9)Background field.The initial character “B”represents background variation.Table XI gives the value for “x”.10)This field is reserved for future use.11)Privacy field.Only images whose ID is less than 100and with an “R1”label in this field will be published or released in technical reports and papers in the face recognition research area only.12)Resolution field.The initial character “S”represents res-olution.The “n”has two values:0and 1,denoting two different resolutions of the image (see Table XII).Because the filename of each image describes the property of the subject in that image,the images in the database can be re-trieved and reorganized easily to meet any specific requirement.In addition,the ground-truth eye locations of all the images are provided in a text file (named FaceFP_2.txt).Image Format:The original 30863RGB color images of size 640×480in CAS-PEAL-R1require about 26.6GB storage space.To facilitate the release,all the images wereGAO et al.:CAS-PEAL LARGE-SCALE CHINESE FACE DATABASE AND BASELINE EV ALUATIONS155TABLEXIFig.11.Several examples of the cropped face images in CAS-PEAL-R1. converted to grayscale images and cropped to size360×480 excluding most of the background.The cropped images are stored as TIFFfiles with lossless Lempel–Ziv–Welch compres-sion.Several cropped images are shown in Fig.11.B.Evaluation ProtocolGiven a face database,there are many possible methods to evaluate a specific AFR method.To facilitate the comparisons among the results of different methods,we have specified a standard evaluation protocol accompanying the database,and expect the potential users of the database to evaluate their methods according to the protocol.In the following part of this section,we describe the proposed evaluation protocol by presenting the definition,design,and some underlying design philosophies of the data sets as well as the evaluation methods.1)Data Sets for Evaluation:In the proposed evaluation protocol,three kinds of data sets are composed from the CAS-PEAL-R1database:one training set,one gallery set,and several probe sets.Their definitions and descriptions are as follows.a)Training set:A training set is a collection of images used to build a recognition model or to tune the parameters of the model or both.We construct a training set containing1200 images of300subjects,which are randomly selected from the 1040subjects in the CAS-PEAL-R1database,with each subject contributing four images randomly selected from the frontal subset of the CAS-PEAL-R1database.b)Gallery set:A gallery set is a collection of images of known individuals against which a probe image is matched. In the evaluation protocol,we formed a gallery set containing 1040images of the1040subjects,with each subject having one image under a normal condition.The gallery set consists of all the normal images mentioned in Table III.c)Probe sets:A probe set is a collection of probe images of unknown individuals that need to be recognized.In the evaluation,nine probe sets are composed from the CAS-PEAL-R1database,and each probe set contains images restricted to one main variation,as described in Section III.These partitions can be used to identify the strengths and weaknesses of a specific algorithm and to address the performance variations associated with the changes in the probe sets.Among them,six probe sets correspond to the six subsets in the frontal subset: expression,lighting,accessory,background,distance,and time,as described in Table III.The other three probe sets correspond to the images of subjects in the nonfrontal subset:looking upward,looking right into the camera C4(the middle one),and looking downward.All the images that appear in the training set are excluded from these probe sets.The data sets used in the evaluation are summarized in Table XIII.2)Evaluation Methods:Based on the aforementioned data sets,one may set up many meaningful evaluation methods for a specific face recognition algorithm.Basically,we believe that how an evaluation method is configured depends on the following three criteria.a)Is the training set for constructing and tuning the face model restricted or open?:For most statistics-or learning-based face recognition algorithms,their performance on the designated testing sets heavily depends on the composition of the training set,such as the size(the number of subjects and the number of images per subject)of the training set,the variations(lighting,pose,expression,etc.)contained in it,and so on.Generally speaking,the training images with similar attributes to those in the testing set would lead to superior performance.Therefore,in most literature works,the perfor-mance comparison of different algorithms is conducted based on the same training set to achieve justice.On the other hand, the proposed training set may not be appropriate for a specific face recognition method or not be adequate to fully utilize the learning capability of the method;thus,the evaluation results are still biased.Considering these aspects,we have defined two training modes to construct face models:one is the restricted mode using and only using the TS training set specified in Section IV-B1;the other is the open mode with no restriction on the training set except that no testing images are included. Hereinafter,these two modes are denoted as“R”and“O,”respectively.b)Does the face recognition algorithm work in a fully automatic mode or a partially automatic one?:A fully auto-matic mode means that the presented face recognition algo-rithm completes face detection,facial landmark localization, and identification without any interaction.On the other hand, in a partially automatic mode,the precise facial landmark locations are provided to the algorithm beforehand.In most cases,the coordinates of the two eye centers are given.The partially automatic mode has been exploited by the FERET and most of the academic publications so far since it facilitates a“clean”comparison for researchers.However,perfect auto-matic eye localization is impossible,and many face recognition algorithms would degrade abruptly with the increase of the eye location error[28].Therefore,it is necessary to compare different algorithms in the fully automatic mode to investigate its practicability in real-world applications.Hereinafter,these two modes are denoted as“F”and“P,”respectively.c)What task does the algorithm complete:identification or verification?:In practical applications,there are typically three different tasks:identification,verification,and watch list [7].While identification and verification are the special cases of the watch-list task,they are still the most fundamental and different tasks.For an identification task,one needs to determine the identity of the given face image by matching。

行人道路不同LED照明条件下的三维人脸表情识别能力

行人道路不同LED照明条件下的三维人脸表情识别能力

行人道路不同LED照明条件下的三维人脸表情识别能力李天宇;江昕;杨彪【摘要】LED照明技术的发展及其灵活的连续光谱为保持视觉功效的同时实现照明节能提供了新的途径.我们主要针对LED的照度、光谱功率分布(spectral power distribution,SPD)和S/P值对3D人脸表情识别能力的影响,共设置了15种光照环境,包括3种不同SPD的LED光源(HPS、MH、HIGH-SP)、5个照度等级(0.33、1.00、3.33、10.00和30.0 lx).实验结果和统计分析表明:①垂直照度10 lx可以作为行人道路照明最小照度水平的参考阈值;②相同照度下具有较高S/P值的LED并不能提高人脸表情的识别效率.我们的研究结果将为行人道路照明的标准制定提供参考.【期刊名称】《照明工程学报》【年(卷),期】2018(029)004【总页数】6页(P95-99,115)【关键词】行人照明;3D人脸表情识别;光谱功率分布;S/P值;LED【作者】李天宇;江昕;杨彪【作者单位】哈尔滨工业大学(深圳)建筑学院,广东深圳518000;哈尔滨工业大学(深圳)建筑学院,广东深圳518000;哈尔滨工业大学(深圳)建筑学院,广东深圳518000【正文语种】中文【中图分类】TM923引言随着 LED 照明技术的发展,其在道路照明应用中的潜在优势引起了广泛的关注,但由于缺乏系统的理论指导和科学依据,LED光源在视觉功效学方面的节能潜力尚未完全发挥。

国家发改委在2017年发布的《战略性新兴产业重点产品和服务指导目录(2016版)》中,重点列入了“高效低成本LED替代光源”,包括高效低成本筒灯、射灯、路灯、隧道灯、球泡灯等替代型半导体照明光源和新型LED照明应用产品。

国际照明委员会(CIE)标准CIE 115—2010[1]和英国道路照明标准BS 5489-1:2013[2]指出:行人道路照明旨在确保其安全的夜间步行环境,满足视觉信息的需求,其中对他人的意图识别被认为是安全步行中的关键视觉任务。

人脸识别(英文)Face RecognitionPPT课件

人脸识别(英文)Face RecognitionPPT课件
n A computer application for automatically identifying or verifying a person from a digital image or a video frame from a video source.
Processing Flow
Face Recognition
甘显豪 张向裕 孙吉刚 劳建成 范 超
Contents
Face Recognition Processing Flow Fundamentals
Application
step1 step2 step3
What is Face Recognition
n An advance biometric identification technique
Fundamentals
step 1 ) face detection
In this step, the system will check is input image a face or not?
face detection is a computer technology that identifies human faces in digital images. It detects human faces which might then be used for recognizing a particular face. This technology is being used in a variety of applications nowadays.
Application
Face Recognition Access Control System
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Face Recognition under Varying Lighting Conditions Using Self QuotientImage1Haitao Wang, 2Stan Z Li, 1Yangsheng Wang1Institute of Automation, Chinese Academy of Sciences, Beijing, 100080, China,Email: {htwang,wys}@ 2Beijing Sigma Center, Microsoft Research AsiaBeijing 100080, ChinaEmail: szli@AbstractIn this paper, we introduce the concept of Self-Quotient Image (SQI) for robust face recognition under varying lighting conditions. It is based on the Quotient Image method [4][5] to achieve lighting invariant. However, the SQI has three advantages: (1)It needs only one face image for extraction of intrinsic lighting invariant property of a face while removing extrinsic factor corresponding to the lighting. (2)No alignment is needed. (3) It works in shadow regions. The theoretical analysis on conditions where the algorithm is applicable and a non-iterative filtering algorithm for computing SQI are presented. Experiment results demonstrate the effectiveness of our method for robust face recognition under varying lighting conditions.1. IntroductionLighting variation is one of the most difficult problems for face recognition and has received much attention [1-13] in recent years. It is well known that image variation due to lighting changes is more significant than that due to different personal identities [12]. Recent years, many algorithms have been proposed; let us highlight three major approaches of lighting modeling for 3D object recognition: as Illumination Cone [1]-[3], Quotient Image [4][5], and Spherical Harmonic Subspace [6-10].The Illumination Cone method theoretically explained that face images due to varying lighting directions form an illumination cone. In this algorithm, both self-shadow and cast-shadow were considered and its experiment results outperformed most existing methods.Ramamoorthi [6-8] and Basri [9] [10] independently developed the spherical harmonic representation. This original representation explained why images of an object under different lighting conditions can be described by low dimensional subspace in some previous empirical experiments [16-18]. Given enough training images with same pose and different lightings for 3D modeling, the above two approaches achieved almost perfect recognition rate on some face databases [3][19]. However, the requirement of large training sets restricts their applications in many areas.Whereas the above two approaches assumed that each individual had a different 3D geometry and needed a training set for each person, quotient image proposed by Shashua and Riklin-Raviv [4][5] is a simple yet practical algorithm for extracting lighting invariant representation. It was shown that the quotient image, i.e. image ratio between a test image and linear combination of three non-coplanar illuminated images, depended only on the albedo information, which is illumination free.Still another approach for dealing with lighting problem is from the 2D image processing viewpoint. Jobson, et al [13][14] presented a multi-scale version of Retinex method [20] for high quality visual display of high dynamic image on low dynamic devices, such as printer and computer screen. This is closely related to the lighting issue. More recently Gross and Brajovie [15] presented an anisotropic version of Retinex for lighting normalization. Both the above two groups of authors proposed an algorithm which estimates low frequency component of the input image as the light field and compensated illumination variations by subtracting it from the input image.We would like to differentiate between extrinsic and intrinsic factors in imaging process and analyze their influence on object recognition. The Lambertian model states that the image of a 3D object is subject to three factors, namely the surface normal, albedo andlighting. These can be factorized into an intrinsic part of the surface normal and albedo, and an extrinsic partof lighting. The identity of an object is determined by the intrinsic factor only and this in fact largely motivated the existing work [1-11] on lighting normalization. While pure appearance based learning methods, such as PCA, ICA and LDA, learn appearance models from examples mixing intrinsic and extrinsic factors, our analysis explains why they cannot achieve high performance with features extracted from the learned models.In this paper, we analyze of the Retinex algorithm in terms of Lambertien model and demonstrate its relationship with the quotient image. Based on this, we propose a novel concept called the self-quotient image (SQI) which is defined as the ratio of the input image and its smooth versions. We analyze properties of the SQI, and show that SQI is a lighting invariant representation of 3D objects. Compared with the quotient image, the SQI has the following advantages: (1) no need for alignment; (2) valid for both shadow and non-shadow region; (3) valid for any type of lighting sources.The paper is organized as follows: Section 2 analyzes the related approaches, Retinex method and the quotient image method. Section 3 establishes the concept of SQI, presents our multi-scale anisotropic approach for SQI and its relationship with QI, and analyzes its properties in different imaging cases. The experiment results are analyzed in Section 4 and finally the conclusion is made in section 5.2. Related ApproachesRetinex algorithms are based on the reflectance-illumination model (1) instead of Lambertian model (4).RLI= (1)where I is the image, R is the reflectance of the scene and L is lighting.The lighting L can be considered as the low frequency component of image I, as has been theoretically proved by the spherical harmonics analysis [6-10], and hence can be an estimation by using a low-pass filterIFL*≈ (2)where F is a Gaussian filter and * donates the convolution operation. The general form of the center/surround Retinex can be defined asIFILIR*== (3)The Retinex in [13][14] was designed for dynamic range compression, which was applied in displaying high dynamic range image onto some low dynamic range device as printer and screen. They processed images according to acquire the best visual effect and there is no quantitative standard to measure Retinex’s results.The quotient image method [4][5] is designed for dealing with lighting changes in face recognition.It provides an invariant representation of face images under different lighting conditions. The theory is based on the Lambertian modelsyxnyxyxI T),(),(),(ρ= (4)where ρ is the albedo (surface texture) of face, n(x,y)T is the surface normal (3D shape) of the object (same for all objects of the class), and s is the point light source, which may vary arbitrarily.The quotient image Q y of face y against face a is defined by∑∑=====31),(),(),(),(),(),(),(),(),(jjjyjjjTayyTayTyayyxIIsxvunvuIsvunvusvunvuvuvuvuQρρρρρ(5)where u and v range over the image, I y is an image of object y with the illumination direction s y, and x j are combining coefficients estimated by Least Square based on training set [4][5], and I1 , I2 , I3 are three non-collinearly illuminated images.The following assumptions are made in the quotient image framework: (a) the imaging process follows the Lambertian model without shadow and the object is illuminated by a point light source; (b) all the faces under consideration have the same shape, i.e. the same surface normal; (c) there is no shadow in face images;(d) accurate alignment between faces is known; and(e) a training set of faces under at least three non-collinear lighting are available as basis for estimation of lighting directions.However, in a face recognition system the above assumptions could not be satisfied at the same time.For example, the light sources are generally not of point; 3D face shapes of different people are not the same in general; the shadow can exist; and accurate alignment is still an unsolved problem by now.In the following, we will introduce what we call the self-quotient image , and demonstrate that it has lighting invariant properties similar to those of the original quotient image, yet presents several technical advantages.3. SQI and its Lighting Invariant Property Analysis3.1 Definition of SQI and its Invariant Property AnalysisIn the following, we define the self-quotient image as an intrinsic property of face images of a person. Definition 1: (Self-Quotient Image SQI ) The Self-Quotient image Q of image I is defined by IF I I I Q *==) (6)where I )is the smoothed version of I , F is thesmoothing kernel, and the division is point-wise as in the original quotient image. We call Q the Self-Quotient Image because it is derived from one image and has the same quotient form as that in the quotient image method. We will demonstrate in the following part of this section that SQI has similar lighting invariant property as that of quotient image. But there are significant differences between quotient image and self-quotient image. (1) Sself-quotient image is calculated from one image, (2) only image processing technique is used in self-quotient image and no empirical learning is needed, and (3) there is no assumption about face imagesThe lighting invariant properties are demonstrated below using the Lambertian model but with shadow. When shadows present, the Lambertian model with shadows can be represented as)0,min(s n I T ρ= (7)We consider three cases of different shapes and shadow conditions in the analysis of SQI.Case 1: In regions without shadow and with small n T variations. In this case,()1,C s v u n T≈, where C 1is a constant. Then we haveFv u v u C F v u C v u v u I v u I Q *),(),(]*),([),(),(),(11ρρρρ=≈=) (8)In this case, Q is approximately illumination free and depends only on the albedo of the face. Equation (8) is similar in form to the quotient image; however it is calculated only from the self image.Case 2: In regions without shadow but with large n Tvariation. In this case, ()s v u n T, is not a constant. The SQI is]),(),([*),(),(),(),(s v u n v u F s v u n v u v u I v u I Q T ρρ==) (9)In such regions, Q depends on the 3D shape, albedo and lighting n T . Therefore Q is not illumination free in this case.Case 3: In shadowed regions . In these regions, the gray value is low and less variable. We assume that in shadow regions, light is uniformly distributed from all directions, i.e. for any n(u,v)T in shadow , all the visible lights form a semi-hemisphere. Therefore, the summation of the dot products between n T and s i is constant in such regions.211),(),(),(),(C v u s v u n v u s v u n i i Ti iT==∑∑∞=∞= (10)where C 2 is a constant. Therefore, Q in shadow regions can be written as equation (8).As in case 1, SQI in this kind of regions is also illumination-free; in other words, the SQI can remove the shadow effect, as shown in Fig. 1.Fig. 1 De-shadow effects of SQIAlthough the analysis is based on the Lambertian model with point lighting, it is also valid for other types of lighting sources. This is because any lighting can be expressed as a linear combination of L point lighting sources, as follows∑===Li i TT s n S n I 1ρρ (11)If we replace the point lighting source s in cases 1 - 3 with S with as the above, the analytic results still hold.The above analysis shows the following two properties of the self-quotient image: (1) The algorithm is robust to lighting variation for case 1 and 3. (2) SQI is not the expected reflectance as in Retinex, but the albedo ratio in case 1 and case 3 and lighting dependent image ratio in case 2.For face recognition, if we can ensure that the filter’s kernel size is small enough compared with face surface normal n T ’s variation, the self-quotient image will be illumination free as analyzed previously. However, when the filter’s kernel size is too small, SQI will approach one and the albedo information is lost.Figure 2 shows results of the self-quotient image filtering. The self-quotient image of boy’s hair is not as dark as it in the original image; the whole face looks flatter; and shadows are removed.Figure 2. (left) Original Image (right) Self-QuotientImageThe advantages of the self-quotient method as opposed to the original quotient image is summarized as follows: (1) The alignment between image I and itssmoothed version I )is automatically perfect, and hence it does not need an alignment procedure. (2) Notraining images are needed for the estimation of the lighting direction because the lighting fields of I andI )are similar. (3) the self-quotient image is good atremoving shadows; whereas in the previous approaches [1-11], the shadow problem was either ignored or was solved by complex 3D rendering. (4) Lighting sources can be any type.Note that the property of Q is dependant on the kernel size. If the kernel size of F is too small, Q will approximate to one and albedo information will be severely reduced. If the kernel size of F is too large, there will appear halo effects near step-edge region. We use the multi-scale technique to make the result more robust, and in practice, we choose kernel sizes to take more care smoother regions.3.2 Implementation of SQIThe only processing needed for SQI is smoothing filtering. We design a weighed Gaussian filter for anisotropic smoothing, as illustrated in Fig.3, where W is the weight and G is the Gaussian kernel, and N is the normalization factor for which11=∑ΩG W N (12)where Ω is the convolution kernel size. We divide the convolution region into two sub-regions M 1 and M 2 with respect to a threshold τ. Assuming that there are more pixels in M 1 than in M 2, τ is calculated by)(Ω=I Mean τ (13)For the two sub-regions, W has corresponding value.⎩⎨⎧∈∈=12M j)I(i, 1M j)I(i, 0),(j i W (14)If the convolution image region is smooth , i.e. little gray value variation (non-edge region ), there is also little difference between the smoothing the whole region and part of the region. If there is large gray value variation in convolution region, i.e. edge region, the threshold can divide the convolution region into two parts M 1 and M 2 along the edge and the filter kernel will convolute only with the large part M 1, which contains more pixels. Therefore the halo effects can be significantly reduced by the weighted Gaussian kernel.Figure. 3 Anisotropic Smoothing FilterThe essence of this anisotropic filter is that it smoothes only the main part of convolution region i.e. only one side of edge region in case of step edge region.The division operation in the SQI may magnify high frequent noise especially in low signal noise ratio regions, such as in shadows. To reduce noise in Q , weuse a nonlinear transformation function to transform Q into D ,)(Q T D = (15)where T is a nonlinear transform.Jobson, et al [13][14] used Logarithm, which is considered as the similar characteristic of human visual ability. We find in our experiments that Arctangent and Sigmoid nonlinear function have similar or superior results in dynamic range compression for recognition effects.Our implementation of SQI approach is summarized below:(1) Select several smoothing kernel G 1, G 2, …, G n andcalculate corresponding weights W 1, W 2, …, W n according to image I , and then smooth I by each weighed anisotropic filter WG i .k k WG NI I 1⊕=), k = 1, 2, … , n (16)Calculate self-quotient image between each input image I and its smoothing versionkk I I Q )=, k = 1, 2, … , n (17) (2) Transfer self-quotient image with nonlinearfunction)(k k Q T D =, k = 1, 2, … , n (18) (3) Summarize nonlinear transfered results ∑==nk k k D m Q1, k = 1, 2, … , n (19)The m 1, m 2, m n are the weights for each scale of filter and we set them to one in our experiments.4. Experiments and DiscussionExperiments are performed to evaluate SQI for face recognition, using Yale face database B [3] and CMU PIE face database [19]. Frontal face images with lighting variation are selected from the two face databases to reduce the image changes only due to lighting variations. There are 68 subjects in CMU PIE and we select the frontal face images which were taken under 20 different illuminations without background lighting for each subject. There are 640 images (10 subjects with 64 images each) from Yale B. The eyes, nose and mouth are located manually for each image,and the face is then aligned and cropped. The PCA and original QI methods are also included as the baselines, in which the PCA (60 dimensional) is learned by using all the examples from either PIE or Yale B data sets. Figure 4 show some results of SQI based lighting normalization. We can see that the convolution based anisotropic filtering is very effective in smoothing the noisy image without blurring the step edge and shadows are removed.For the PIE data set, the leave-one-out scheme is used, i.e. each image as template in turn and the others as test images. The results are compared in Figure 5 for the 20 different leave-one-out partitions. For the Yale B data set, the images are divided into 4 subsets of increasing illumination angles, and only the frontal illuminated images are used as the templates. The results are shown in Figure 6 for the 4 different data sets. Compared with PCA and original QI, the new algorithms, SQI, can significantly improve the recognition rate both in CMU PIE and Yale B face database.(a) Yale B (b) CMU PIEFigure 4. Example Results of SQI Light RemovalFigure 5. Recognition Results on CMU PIEFigure 6. Recognition Results on Yale B5. ConclusionWe have introduced a new algorithm, self-quotient image (SQI), for robust face recognition under various lighting conditions. Illumination invariant and variant properties of self-quotient algorithm are analyzed according to the Lambertian model. Though SQI has similar form as QI, it needs only one face image for implementation and no alignment in SQI procedure is needed. This algorithm has special ability of de-shadow. The experiment results show that the SQI method can significantly improve the recognition rate to face images under different lighting conditions.Reference[1] P. N. Belhumeur, David J. Kriegman, “What is the set ofImages of an Object Under All possible Lighting Conditions?”, IEEE conf. On Computer Vision and Pattern Recognition, 1996[2] Athinodoros S. Georghiades and Peter N. Belhumeur,“Illumination cone models for Faces recognition under variable lighting ”, CVPR, 1998[3] Athinodoros S. Georghiades and Peter N.Belhumeur, ”From Few to many: Illumination cone models for face recognition under variable lighting and pose”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 6, pp 643-660, 2001 [4] Amnon Shashua, and Tammy Riklin-Raviv, “Thequotient image: Class-based re-rendering and recognition with varying illuminations”, Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 2, pp129-139, 2001[5] T. Riklin-Raviv and A. Shashua. “The Quotient image:Class based recognition and synthesis under varying illumination”. In Proceedings of the 1999 Conference on Computer Vision and Pattern Recognition, pages 566--571, Fort Collins, CO, 1999.[6] Ravi Ramamoorthi, Pat Hanrahan, “ On the relationshipbetween radiance and irradiance: determining the illumination from images of a convex Lambertian object”, J. Opt. Soc. Am., Vol. 18, No. 10, 2001[7] Ravi Ramamoorthi, “Analytic PCA Construction forTheoretical Analysis of Lighting Variability in Images ofa Lambertian Object”, IEEE Transactions on PatternAnalysis and Machine Intelligence, Vol. 24, No. 10, 2002-10-21[8] Ravi Ramamoorthi and Pat Hanrahan, “An EfficientRepresentation for Irradiance Environment Maps”, SIGGRAPH 01, pages 497--500, 2001[9] Ronen Basri, David Jacobs, “Lambertian Reflectance andLinear Subspaces”, NEC Research Institute Technical Report 2000-172R[10] Ronen Basri and David Jacobs, Lambertian Reflectanceand Linear Subspaces, IEEE Transactions on Pattern Analysis and Machine Intelligence, forthcoming [11]Terence Sim, Takeo Kanade, “Illuminating the Face”,CMU-RI-TR-01-31, Sept. 28, 2001[12]Yael Adnin, Yael Moses and Shimon Ullman, “Facerecognition: The problem of compensating for changes in illumination direction”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 7, 1997, pp712-732.[13] Daniel J. Jobson, Zia-ur Rahman, and Glenn A. Woodell,“ Properties and Performance of a Center/Surround Retinex”, IEEE Transactions on Image Processing, Vol. 6, No. 3, 1997, pp 451-462[14] Daniel J. Jobson, Zia-ur Rahman, and Glenn A. Woodell,“ A Multiscale Retinex for Bridging the Gap Between Color Images and the Human Observation of Scenes”, IEEE Transactions on Image Processing, Vol. 6, No. 7, 1997, pp 965-976[15] Ralph Gross, Vladimir Brajovie, “An ImagePreprocessing Algorithm for Illumination Invariant Face Recognitoin”, 4th International Conference on Audio and Video Based Biometric Person Authentication, pp. 10-18, 2003[16]R. Epstein, A. L. Yuille and P.N. Bellumeur, “ LearningObject Reorientations form Lighting Variation”, in Object Rep. in Computer Vision II (J. Ponce, A. Zisserman, and M. Hebert, eds.), pp.179--199, Springer-Verlag, 1996 [17] P. Hallanan, “A Low-Demensional Representation ofHuman Faces for Arbitary Lighting Conditions”, IEEE Conf. On Computer Vision and Pattern Recognition, pp 995-99, 1994[18] R. Epstein, P. Hallanan, A. L. Yuille, “5±2 EigenimagesSuffice: An Empirical Investigation of Low-Dimensional Lighting Models”, IEEE Conf. Workshop on Physics-Based Vision, pp 108-116, 1995.[19] T. Sim, S. Baker, and M. Bsat, “The CMU Pose,Illumination, and Expression (PIE) Database”, Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, May, 2002 [20]E. Land, “An alternative technique for the computationof the designator in the retinex theory of color vision,”, in Proc. Nat. Adad. Sic., vol. 83。

相关文档
最新文档