Face recognition A hybrid neural network approach

合集下载

多模态人脸识别

多模态人脸识别多模态人脸识别是一种结合多种感知模态的技术，旨在提高人脸识别的准确性和鲁棒性。

传统的人脸识别技术主要基于单一的感知模态，如图像或视频。

然而，单一模态的人脸识别在面对光照变化、姿态变化、表情变化等问题时往往表现不佳。

多模态人脸识别通过结合多种感知模态，如图像、视频、红外等，可以克服传统方法的局限性，并取得更好的效果。

多模态人脸识别技术主要包括三个关键步骤：特征提取、特征融合和分类器设计。

在特征提取阶段，不同感知模态下的特征被提取出来，并转换成统一维度以便于后续处理。

常用的特征提取方法包括局部二值模式（Local Binary Pattern, LBP）、主成分分析（Principal Component Analysis, PCA）和深度学习方法等。

在特征融合阶段，通过将不同感知模态下得到的特征进行组合和整合来得到更具代表性和区分度的综合特征。

常用的特征融合方法包括特征级融合和决策级融合。

特征级融合是将不同感知模态下的特征进行拼接、连接或加权求和等操作，得到一个综合的特征向量。

决策级融合是将不同感知模态下得到的分类决策进行加权或投票等操作，得到最终的分类结果。

在分类器设计阶段，根据特征提取和特征融合得到的综合特征，设计一个分类器来进行人脸识别任务。

常用的分类器包括支持向量机（Support Vector Machine, SVM）、最近邻（Nearest Neighbor, NN）和深度神经网络（Deep Neural Network, DNN）等。

多模态人脸识别技术在实际应用中具有广泛的应用前景。

首先，在安防领域中，多模态人脸识别可以提高识别准确度和鲁棒性，减少误报率和漏报率，从而提高安全性。

其次，在金融领域中，多模态人脸识别可以用于身份验证、交易安全等方面，提高用户体验和交易安全性。

此外，在医疗领域中，多模态人脸识别可以用于病人身份验证、疾病诊断等方面，提高医疗服务的质量和效率。

面部特征交换实验方法

面部特征交换实验方法引言面部特征交换实验是一种通过计算机技术实现人脸图像间特征互换的研究领域。

该方法可以在不改变人物身份和外貌特征的基础上，将一个人的面部特征转移到另一个人的面部图像上，从而实现面部特征的交换，具有重要的应用价值。

本文将介绍面部特征交换实验的方法及其应用。

人脸特征提取与标定在进行面部特征交换实验前，首先需要对人脸图像进行特征提取与标定。

特征提取是指从人脸图像中提取出与人脸相关的特征信息，如面部轮廓、眼睛位置、嘴巴位置等。

常用的特征提取方法包括基于深度学习的方法和传统的计算机视觉方法。

对于基于深度学习的方法，通常使用卷积神经网络（CNN）进行特征提取。

通过训练CNN模型，可以从人脸图像中学习到高层次的特征表示。

常用的CNN模型有VGG、ResNet等。

在进行面部特征交换实验时，可以使用预训练好的CNN模型进行特征提取。

传统的计算机视觉方法主要利用人脸识别算法进行特征提取。

常用的人脸识别算法包括特征点标定、轮廓提取、纹理提取等。

这些算法可以通过检测人脸的关键点、外观、形状等特征信息进行面部特征提取。

面部特征对齐与变形在进行面部特征交换实验时，需要对两个人脸图像进行特征对齐和变形。

特征对齐是指将两个人脸图像中的面部特征对应到同一位置，使得它们之间的对应关系是准确的。

特征对齐常用的方法有：1.利用人脸关键点进行对齐：提取人脸图像中的关键点（例如眼睛、鼻子、嘴巴等），通过将两张图像中的关键点进行对应，计算得到他们之间的变换关系（如旋转、平移、缩放等），从而实现面部特征对齐。

2.利用人脸纹理进行对齐：提取人脸图像中的纹理特征，通过计算纹理之间的相似度，找到两张图像中纹理最相似的部分，并将其对齐。

面部特征对齐完成后，还需要进行面部特征的变形。

变形主要包括形状变形和纹理变形。

形状变形是指将一个人的面部特征变形成另一个人的特征，使得两个人的面部形状尽可能相似。

纹理变形是指将一个人的面部纹理变形成另一个人的纹理，使得两个人的面部纹理尽可能相似。

人脸识别文献

人脸识别文献人脸识别技术在当今社会中得到了广泛的应用，其应用领域涵盖了安全监控、人脸支付、人脸解锁等多个领域。

为了了解人脸识别技术的发展，下面就展示一些相关的参考文献。

1. 《Face Recognition: A Literature Survey》- 作者: Rabia Jafri, Shehzad Tanveer, and Mubashir Ahmad这篇综述性文献回顾了人脸识别领域的相关研究，包括了人脸检测、特征提取、特征匹配以及人脸识别系统的性能评估等。

该文中给出了对不同方法的综合评估，如传统的基于统计、线性判别分析以及近年来基于深度学习的方法。

2. 《Deep Face Recognition: A Survey》- 作者: Mei Wang, Weihong Deng该综述性文献聚焦于深度学习在人脸识别中的应用。

文中详细介绍了深度学习中的卷积神经网络（Convolutional Neural Networks, CNN）以及其在人脸特征学习和人脸识别中的应用。

同时，文中还回顾了一些具有代表性的深度学习人脸识别方法，如DeepFace、VGG-Face以及FaceNet。

3. 《A Survey on Face Recognition: Advances and Challenges》-作者: Anil K. Jain, Arun Ross, and Prabhakar这篇综述性文献回顾了人脸识别技术中的进展和挑战。

文中首先介绍了人脸识别技术的基本概念和流程，然后综述了传统的人脸识别方法和基于机器学习的方法。

此外，该文还介绍了一些面部表情识别、年龄识别和性别识别等相关技术。

4. 《Face Recognition Across Age Progression: A Comprehensive Survey》- 作者: Weihong Deng, Jiani Hu, Jun Guo该综述性文献主要关注跨年龄变化的人脸识别问题。

面部识别在中国的应用英语作文

面部识别在中国的应用英语作文Facial recognition technology has been rapidly advancing in recent years, and China has emerged as a global leader in the development and implementation of this innovative technology. China's vast population, coupled with its ambitious plans to build a comprehensive surveillance system, has made facial recognition a crucial component of the country's technological landscape. This essay will explore the various applications of facial recognition in China, its benefits, and the ethical concerns surrounding its use.One of the primary applications of facial recognition in China is its integration into the country's extensive surveillance network. China has been investing heavily in building a nationwide network of surveillance cameras, with estimates suggesting that the country has over 200 million surveillance cameras installed, making it the world's largest video surveillance system. Facial recognition technology is used to identify and track individuals as they move through public spaces, providing the government with a powerful tool for monitoring and controlling its citizens.The Chinese government has justified the use of facial recognition by claiming that it enhances public safety and security. The technology has been employed to identify and apprehend criminals, as well as to monitor the movements of individuals deemed to be potential threats to social stability. For example, the government has used facial recognition to track and monitor the Uyghur minority population in the Xinjiang region, a practice that has been widely criticized by human rights organizations as a violation of individual privacy and a form of ethnic discrimination.In addition to its use in surveillance, facial recognition technology has also been integrated into various other aspects of daily life in China. The technology is widely used in mobile payment systems, allowing users to authenticate their identity and make payments using their facial features. This has led to a significant increase in the adoption of mobile payment platforms, such as Alipay and WeChat Pay, which have become ubiquitous in the country.Furthermore, facial recognition has been implemented in various public services, such as accessing public transportation, entering office buildings, and even checking into hotels. This has led to increased efficiency and convenience for users, but it has also raised concerns about the potential for abuse and the erosion of personal privacy.One of the most controversial applications of facial recognition in China is its use in the country's social credit system. The social credit system is a government-run initiative that aims to monitor and assess the behavior of Chinese citizens, with the goal of incentivizing "good" behavior and punishing "bad" behavior. Facial recognition is used to identify individuals and track their activities, which can then be used to assign them a social credit score. This score can have significant consequences, affecting an individual's access to various public services and opportunities.The use of facial recognition in China's social credit system has been widely criticized by human rights organizations and international observers. They argue that the system represents a significant threat to individual privacy and civil liberties, as it gives the government unprecedented power to monitor and control its citizens.Despite these concerns, the Chinese government has continued to invest heavily in the development and deployment of facial recognition technology. The country has become a global leader in this field, with Chinese companies such as Hikvision, Dahua, and SenseTime emerging as major players in the global facial recognition market.The rapid advancement of facial recognition technology in China has also raised concerns about the potential for abuse and the erosion ofindividual privacy. There are fears that the technology could be used to suppress dissent, target minority groups, and create a highly invasive surveillance state. Moreover, the lack of robust privacy protections and oversight mechanisms in China has exacerbated these concerns.In response to these concerns, the Chinese government has attempted to address some of the ethical issues surrounding the use of facial recognition. For example, the government has introduced regulations that require companies to obtain user consent before collecting and using facial recognition data. Additionally, the government has established guidelines for the ethical use of facial recognition technology, which include measures to protect individual privacy and prevent discrimination.However, critics argue that these measures are largely inadequate and that the Chinese government's commitment to protecting individual privacy is questionable. They point to the government's continued use of facial recognition for surveillance and social control purposes as evidence of its prioritization of national security over individual rights.In conclusion, the application of facial recognition technology in China is a complex and multifaceted issue. While the technology has brought about increased efficiency and convenience in variousaspects of daily life, it has also raised significant ethical concerns about the potential for abuse and the erosion of individual privacy. As China continues to push the boundaries of this technology, it will be crucial for the government to strike a delicate balance between national security and individual rights, and to implement robust safeguards and oversight mechanisms to ensure the ethical and responsible use of facial recognition technology.。

HSANet：混合型自我注意力网络识别微整容人脸方法

Computer Science and Application 计算机科学与应用, 2023, 13(3), 301-310 Published Online March 2023 in Hans. https:///journal/csa https:///10.12677/csa.2023.133029HSANet ：混合型自我注意力网络识别微整容人脸方法帕孜来提·努尔买提，古丽娜孜·艾力木江*伊犁师范大学网络安全与信息技术学院，新疆伊宁收稿日期：2023年2月5日；录用日期：2023年3月3日；发布日期：2023年3月14日摘要微整容给在日常生产中给人脸识别技术带来了新的挑战，因人脸特征变化较大导致对原人脸正确识别率较低，针对现象，该实验提出了一种混合型自我注意力块结构，用于识别面部特征变化的人脸，为此自制了26类微整容小样本图片数据集。

将自我注意力融合到残差网络的瓶颈块中，提高了混合型自我注意力块对图片各区域特征的捕获能力，在对小样本微整容数据集的实验表明，该实验提出的混合型自我注意力网络有较高的正确识别率：89.70%，相比ResNet50正确识别率提高了2.65%，改进连接的混合型自我注意力模型比未改进连接的混合型自我注意力模型正确识别率提高了1.12%，网络性能也有所提升。

关键词卷积神经网络，残差网络，瓶颈块，自我注意力，混合型自我注意力网络HSANet: Hybrid Self-Attention Network Recognition Facial Micro Plastic MethodPazilaiti Nuermaiti, Gulinazi Ailimujiang *School of Network Security and Information Technology, Yili Normal University, Yining XinjiangReceived: Feb. 5th , 2023; accepted: Mar. 3rd , 2023; published: Mar. 14th, 2023AbstractDue to the large changes in facial features, the correct recognition rate of the original face is low. In view of the phenomenon, this experiment proposed a hybrid self-attention block structure for recognizing faces with facial features changes. For this reason, 26 kinds of micro-plastic surgery*通讯作者。

Face Recognition

FaceRecognition一、定义1.人脸识别特指利用分析比较人脸视觉特征信息进行身份鉴别的计算机技术。

广义的人脸识别实际包括构建人脸识别系统的一系列相关技术，包括人脸图像采集、人脸定位、人脸识别预处理、身份确认以及身份查找等；而狭义的人脸识别特指通过人脸进行身份确认或者身份查找的技术或系统。

人脸识别是一项热门的计算机技术研究领域，它属于生物特征识别技术，是对生物体（一般特指人）本身的生物特征来区分生物体个体。

2.LFWLabeled Faces in the Wild （户外脸部监测数据库）是人脸识别研究领域比较有名的人脸图像集合，其图像采集自Yahoo! News，共13233幅图像，其中5749个人，其中1680人有两幅及以上的图像，4069人只有一幅图像；大多数图像都是由Viola-Jones人脸检测器得到之后，被裁剪为固定大小，有少量的人为地从false positive 中得到。

所有图像均产生于现实场景（有别于实验室场景），具备自然的光线，表情，姿势和遮挡，且涉及人物多为公物人物，这将带来化妆，聚光灯等更加复杂的干扰因素。

因此，在该数据集上验证的人脸识别算法，理论上更贴近现实应用，这也给研究人员带来巨大的挑战。

3.FDDBFDDB全称Face Detection Data Set and Benchmark，是由马萨诸塞大学计算机系维护的一套公开数据库，为来自全世界的研究者提供一个标准的人脸检测评测平台，其中涵盖在自然环境下的各种姿态的人脸，作为全世界最具权威的人脸检测评测平台之一，FDDB使用Faces in the Wild数据库中的包含5171张人脸的2845张图片作为测试集，而其公布的评测集也代表了人脸检测的世界最高水平。

4.300-w人脸关键点定位5.FRVTFace Recognition Vendor Test人脸识别供应商测试，由美国国家标准技术研究所定制。

更趋近于现实应用的人脸识别测试。

人脸识别专题教育课件

➢ 图像增强
图像增强是为了改善人脸图像旳质量，在视觉上愈加清楚图像，使图像更利于辨认。
➢ 归一化
归一化工作旳目旳是取得尺寸一致，灰度取值范围相同旳原则化人脸图像。
2 灰度化
将彩色图像转化为灰度图像旳过程是图像旳灰度化处理。彩色图像中旳每个像素旳颜色由R，G，B三个分量决定，而每个分量中可取值0-255，像素点旳颜色变化范围太大。而灰度图像是R，G，B三个分量相同旳一种特殊旳彩色图像，会大大降低后续旳计算量。
02
人脸图像 . 预处理
预处理是人脸辨认过程中旳一种主要环节。输入图像因为采集环境旳不同，可能收到光照，遮挡旳影响得到旳样图是有缺陷旳。
2 图像预处理
➢ 灰度化
将彩色图像转换为灰度图，其中有三种措施：最大值法、平均值法、以及加权平均法。
➢ 几何变换
经过平移、转置、镜像、旋转、缩放等几何变换对采集旳图像进行处理，用于改正图像采集系统旳系统误差。
人脸辨认
Artificial Intelligence && Face Recognition
定义
人脸辨认是基于计算机图像处理技术和生物特征辨认技术，提取图像或视频中旳人像特征信息，并将其与已知人脸进行比对，从而辨认每个人旳身份。它集成了人工智能、机器学习、模型理论、视频图像处理等多样专业技术。
01 人脸辨认 . 应用
1 应用场景
身份证查验，证据留存
目前主要是经过扫描或者复印身份证信息，人工比对身份证照片。扫描或复印身份证只是作为备案，并不能有效核实身份证真伪。要确保是采用真实身份证办理业务，必须有某种技术手段对办事人提供旳身份证进行查验。
学校宿舍，刷脸进门电商网站，刷脸支付
4 人脸辨认

人脸识别中多模态生物识别技术介绍

人脸识别中多模态生物识别技术介绍下载提示：该文档是本店铺精心编制而成的，希望大家下载后，能够帮助大家解决实际问题。

文档下载后可定制修改，请根据实际需要进行调整和使用，谢谢！本店铺为大家提供各种类型的实用资料，如教育随笔、日记赏析、句子摘抄、古诗大全、经典美文、话题作文、工作总结、词语解析、文案摘录、其他资料等等，想了解不同资料格式和写法，敬请关注！Download tips: This document is carefully compiled by this editor. I hope that after you download it, it can help you solve practical problems. The document can be customized and modified after downloading, please adjust and use it according to actual needs, thank you! In addition, this shop provides you with various types of practical materials, such as educational essays, diary appreciation, sentence excerpts, ancient poems, classic articles, topic composition, work summary, word parsing, copy excerpts, other materials and so on, want to know different data formats and writing methods, please pay attention!人脸识别中多模态生物识别技术介绍1. 引言人脸识别技术作为一种重要的生物识别技术，在安防、金融、医疗等领域有着广泛的应用。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Face Recognition:A Hybrid Neural Network Approach Steve Lawrence,C.Lee Giles,Ah Chung Tsoi,Andrew D.Back,NEC Research Institute,4Independence Way,Princeton,NJ08540 Electrical and Computer Engineering,University of Queensland,St.Lucia,AustraliaTechnical ReportUMIACS-TR-96-16and CS-TR-3608Institute for Advanced Computer StudiesUniversity of MarylandCollege Park,MD20742April1996(Revised August1996)AbstractFaces represent complex,multidimensional,meaningful visual stimuli and developing a computa-tional model for face recognition is difﬁcult(Turk and Pentland,1991).We present a hybrid neural network solution which compares favorably with other methods.The system combines local image sam-pling,a self-organizing map neural network,and a convolutional neural network.The self-organizing map provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space,thereby providing dimensionality reduction and invariance to minor changes in the image sample,and the convolutional neural network provides for partial invariance to translation,rotation,scale,and deformation.The convolutional network extracts successively larger features in a hierarchical set of layers.We present results using the Karhunen-Lo`e ve transform in place of the self-organizing map,and a multilayer perceptron in place of the convolu-tional network.The Karhunen-Lo`e ve transform performs almost as well(5.3%error versus3.8%).The multilayer perceptron performs very poorly(40%error versus3.8%).T he method is capable of rapid classiﬁcation,requires only fast,approximate normalization and preprocessing,and consistently exhibits better classiﬁcation performance than the eigenfaces approach(Turk and Pentland,1991)on the database considered as the number of images per person in the training database is varied from1to5.With5 images per person the proposed method and eigenfaces result in3.8%and10.5%error respectively.The recognizer provides a measure of conﬁdence in its output and classiﬁcation error approaches zero when rejecting as few as10%of the examples.We use a database of400images of40individuals which con-tains quite a high degree of variability in expression,pose,and facial details.We analyze computational complexity and discuss how new classes could be added to the trained recognizer.Keywords:Convolutional Networks,Hybrid Systems,Face Recognition,Self-Organizing Map1IntroductionThe requirement for reliable personal identiﬁcation in computerized access control has resulted in an in-creased interest in biometrics1.Biometrics being investigated includeﬁngerprints(Blue,Candela,Grother, Chellappa and Wilson,1994),speech(Burton,1987),signature dynamics(Qi and Hunt,1994),and face recognition(Chellappa,Wilson and Sirohey,1995).Sales of identity veriﬁcation products exceed$100mil-lion(Miller,1994).Face recognition has the beneﬁt of being a passive,non-intrusive system for verifying personal identity.The techniques used in the best face recognition systems may depend on the application of the system.There are at least two broad categories of face recognition systems:1.The goal is toﬁnd a person within a large database of faces(e.g.in a police database).These systemstypically return a list of the most likely people in the database(Pentland,Starner,Etcoff,Masoiu,Oliyide and Turk,1993).Often only one image is available per person.It is usually not necessary for recognition to be done in real-time.2.The goal is to identify particular people in real-time(e.g.in a security monitoring system,locationtracking system,etc.),or to allow access to a group of people and deny access to all others(e.g.access to a building,computer,etc.)(Chellappa et al.,1995).Multiple images per person are often available for training and real-time recognition is required.This paper is primarily concerned with the second case2.This work considers recognition with varying facial detail,expression,pose,etc.Invariance to high degrees of rotation or scaling are not considered–it is assumed that a minimal preprocessing stage is available if required(i.e.to locate the position and scale of a face in a larger image).We are interested in rapid classiﬁcation and hence we do not assume that time is available for extensive preprocessing and normalization.Good algorithms for locating faces in images can be found in(Turk and Pentland,1991;Sung and Poggio,1995;Rowley,Baluja and Kanade,1995).The remainder of this paper is organized as follows.The data used is presented in section2and related work with this and other databases is discussed in section3.The components and details of our system are described in sections4and5respectively.Results are presented and discussed in sections6and7. Computational complexity is considered in section8and conclusions are drawn in section10.2DataThe database used is the ORL database which contains photographs of faces taken between April1992and April1994at the Olivetti Research Laboratory in Cambridge,UK3.There are10different images of40 distinct subjects.For some of the subjects,the images were taken at different times.There are variations in facial expression(open/closed eyes,smiling/non-smiling),and facial details(glasses/no glasses).All of the images were taken against a dark homogeneous background with the subjects in an upright,frontal position, with tolerance for some tilting and rotation of up to about20degrees.There is some variation in scale of up to about10%.Thumbnails of all of the images are shown inﬁgure1and a larger set of images for one subject is shown inﬁgure2.The images are greyscale with a resolution of.Figure1.The ORL face database.There are10images each of the40subjects.3Related WorkThis section summarizes related work on face recognition–geometrical feature based approaches,templatematching,neural network approaches,and the popular eigenfaces technique.Figure2.The set of10images for one subject.Considerable variation can be seen.3.1Geometrical FeaturesMany people have explored geometrical feature based methods for face recognition.Kanade(1973)pre-sented an automatic feature extraction method based on ratios of distances(between feature points such as the location of the eyes,nose,etc.)and reported a recognition rate of between45-75%with a database of 20people.Brunelli and Poggio(1993)compute a set of geometrical features such as nose width and length, mouth position,and chin shape.They report a90%recognition rate on a database of47people.However, they show that a simple template matching scheme provides100%recognition for the same database.Cox, Ghosn and Yianilos(1995)have recently introduced a mixture-distance technique which achieves a recog-nition rate of95%using95test images and685training images(one image per person in each case).Each face is represented by30manually extracted distances.Systems which employ precisely measured distances between features may be most useful forﬁnding pos-sible matches in a large mugshot database(a mugshot database typically contains side views where the performance of feature point methods is known to improve(Chellappa et al.,1995)).For other applications, automatic identiﬁcation of these points would be required,and the resulting system would be dependent on the accuracy of the feature location algorithm.Current algorithms for automatic location of feature points do not consistently provide a high degree of accuracy(Sutherland,Renshaw and Denyer,1992).3.2EigenfacesHigh-level recognition tasks are typically modeled with many stages of processing as in the Marr paradigm of progressing from images to surfaces to three-dimensional models to matched models(Marr,1982).How-ever,Turk and Pentland(1991)argue that it is likely that there is also a recognition process based on low-level,two-dimensional image processing.Their argument is based on the early development and extreme rapidity of face recognition in humans,and on physiological experiments in monkey cortex which claim to have isolated neurons that respond selectively to faces(Perret,Rolls and Caan,1982).However,these experiments do not exclude the possibility of the sole operation of the Marr paradigm.Turk and Pentland(1991)present a face recognition scheme in which face images are projected onto the principal components of the original set of training images.The resulting eigenfaces are classiﬁed by comparison with known individuals.Turk and Pentland(1991)present results on a database of16subjects with various head orientation,scaling, and lighting.Their images appear identical otherwise with little variation in facial expression,facial details, pose,etc.For lighting,orientation,and scale variation their system achieves96%,85%and64%correct classiﬁcation respectively.Scale is renormalized to the eigenface size based on an estimate of the head size. The middle of the faces is accentuated,reducing any negative affect of changing hairstyle and backgrounds. In Pentland et al.(1993;1994)good results are reported on a large database(95%recognition of200people from a database of3,000).It is difﬁcult to draw broad conclusions as many of the images of the same people look very similar(in the sense that there is little difference in expression,hairstyle,etc.),and the database has accurate registration and alignment(Moghaddam and Pentland,1994).In Moghaddam and Pentland (1994),very good results are reported with the US Army FERET database database–only one mistake was made in classifying150frontal view images.The system used extensive preprocessing for head location, feature detection,and normalization for the geometry of the face,translation,lighting,contrast,rotation, and scale.In summary,it appears that eigenfaces is a fast,simple,and practical algorithm.However,it may be limited because optimal performance requires a high degree of correlation between the pixel intensities of the train-ing and test images.This limitation has been addressed by using extensive preprocessing to normalize the images.3.3Template MatchingTemplate matching methods such as(Brunelli and Poggio,1993)operate by performing direct correlation of image segments(e.g.by computing the Euclidean distance).Template matching is only effective when the query images have the same scale,orientation,and illumination as the training images(Cox et al.,1995).3.4Neural Network ApproachesMuch of the present literature on face recognition with neural networks presents results with only a small number of classes(often below20).For example,in(DeMers and Cottrell,1993)theﬁrst50principal components of images are extracted and reduced to5dimensions using an autoassociative neural network. The resulting representation is classiﬁed using a standard multilayer perceptron.Good results are reported but the database is quite simple:the pictures are manually aligned and there is no lighting variation,rotation, or tilting.There are20people in the database.3.5The ORL Database and Application of HMM and Eigenfaces MethodsIn(Samaria and Harter,1994)a HMM-based approach is used for classiﬁcation of the ORL database images. HMMs are typically used for the stochastic modeling of non-stationary vector time series.In this case,they are applied to images and a sampling window is passed over the image to generate a vector at each step.The best model resulted in a13%error rate.Samaria also performed extensive tests using the popular eigenfaces algorithm(Turk and Pentland,1991)on the ORL database and reported a best error rate of around10% when the number of eigenfaces was between175and199.Around10%error was also observed in this work when implementing the eigenfaces algorithm.In(Samaria,1994)Samaria extends the top-down HMM of (Samaria and Harter,1994)with pseudo two-dimensional HMMs.The pseudo-2D HMMs are obtainedby linking one dimensional HMMs to form vertical superstates.The network is not fully connected in two dimensions(hence“pseudo”).The error rate reduces to5%at the expense of high computational complexity–a single classiﬁcation takes four minutes on a Sun Sparc II.Samaria notes that,although an increased recognition rate was achieved,the segmentation obtained with the pseudo two-dimensional HMMs appeared quite erratic.Samaria uses the same training and test set sizes as used later in this paper (200training images and200test images with no overlap between the two sets).The5%error rate is the best error rate previously reported for the ORL database that we are aware of.4System Components4.1OverviewThe following sections introduce the techniques which form the components of the proposed system and describe the motivation for using them.Brieﬂy,the investigations consider local image sampling and a tech-nique for partial lighting invariance,a self-organizing map(SOM)for projection of the local image sample representation into a quantized lower dimensional space,the Karhunen-Lo`e ve(KL)transform for compar-ison with the self-organizing map,a convolutional network(CN)for partial translation and deformation invariance,and a multilayer perceptron(MLP)for comparison with the convolutional network.4.2Local Image SamplingTwo different methods of representing local image samples have been evaluated.In each method a window is scanned over the image as shown inﬁgure3.1.Theﬁrst method simply creates a vector from a local window on the image using the intensity values ateach point in the window.Let be the intensity at the th column,and the th row of the given image.If the local window is a square of sides long,centered on,then the vector associated with this window is simply.2.The second method creates a representation of the local sample by forming a vector out of a)the intensityof the center pixel,and b)the difference in intensity between the center pixel and all other pixels within the square window.The vector is given by.The resulting representation becomes partially invariant to vari-ations in intensity of the complete sample.The degree of invariance can be modiﬁed by adjusting the weight()connected to the central intensity component.4.3The Self-Organizing Map4.3.1OverviewMaps are an important part of both natural and artiﬁcial neural information processing systems(Bauer and Pawelzik,1992).Examples of maps in the nervous system are retinotopic maps in the visual cortexFigure3.A depiction of the local image sampling process.A window is stepped over the image and a vector is created at each location.(Obermayer,Blasdel and Schulten,1991),tonotopic maps in the auditory cortex(Kita and Nishikawa,1993), and maps from the skin onto the somatosensoric cortex(Obermayer,Ritter and Schulten,1990).The self-organizing map,or SOM,introduced by Teuvo Kohonen(1990;1995)is an unsupervised learning process which learns the distribution of a set of patterns without any class information.A pattern is projected from an input space to a position in the map–information is coded as the location of an activated node.The SOM is unlike most classiﬁcation or clustering techniques in that it provides a topological ordering of the classes.Similarity in input patterns is preserved in the output of the process.The topological preservation of the SOM process makes it especially useful in the classiﬁcation of data which includes a large number of classes.In the local image sample classiﬁcation,for example,there may be a very large number of classes in which the transition from one class to the next is practically continuous(making it difﬁcult to deﬁne hard class boundaries).4.3.2AlgorithmWe give a brief description of the SOM algorithm,for more details see(Kohonen,1995).The SOM deﬁnes a mapping from an input space onto a topologically ordered set of nodes,usually in a lower dimensional space.An example of a two-dimensional SOM is shown inﬁgure4.A reference vector in the input space,,is assigned to each node in the SOM.During training,each input,,is compared to all of the,obtaining the location of the closest match().The input point is mapped to this location in the SOM.Nodes in the SOM are updated according to:(1)where is the time during learning and is the neighborhood function,a smoothing kernel which is maximum ually,,where and represent the location of the nodes in the SOM output space.is the node with the closest weight vector to the input sample and ranges over all nodes.approaches0as increases and also as approaches.A widely applied neighborhood function is:where is a scalar valued learning rate and deﬁnes the width of the kernel.They are generally both monotonically decreasing with time.The use of the neighborhood function means that nodes which are topographically close in the SOM structure activate each other to learn something from the same input.A relaxation or smoothing effect results which leads to a global ordering of the map.Note that should not be reduced too far as the map will lose its topographical order if neighboring nodes are not updated along with the closest node.The SOM can be considered a non-linear projection of the probability density, (Kohonen,1995).Figure4.A two-dimensional SOM showing a square neighborhood function which starts as and reduces in size to over time.4.3.3Improving the Basic SOMThe original self-organizing map is computationally expensive because:1.In the early stages of learning,many nodes are adjusted in a correlated manner.Luttrell(1989)proposeda method,which is used here,where learning starts in a small network,and the network is doubled insize periodically during training.When doubling,new nodes are inserted between the current nodes.The weights of the new nodes are set equal to the average of the weights of the immediately neighboring nodes.2.Each learning pass requires computation of the distance of the current sample to all nodes in the network,which is.However,this may be reduced to using a hierarchy of networks which is created using the above node doubling strategy4.This has not been used for the results reported here.4.4Karhunen-Lo`e ve TransformThe optimal linear method(in the least mean squared error sense)for reducing redundancy in a dataset is the Karhunen-Lo`e ve(KL)transform or eigenvector expansion via Principle Components Analysis(PCA) (Fukunaga,1990).The basic idea behind the KL transform is to transform possibly correlated variables in a data set into uncorrelated variables.The transformed variables will be ordered so that theﬁrst one describes most of the variation of the original data set.The second will try to describe the remaining part of variation under the constraint that it should be uncorrelated with theﬁrst variable.This continues until all the variation is described by the new transformed variables,which are called principal components.PCA appears to be involved in some biological processes,e.g.edge segments are principal components and edge segments are among theﬁrst features extracted in the primary visual cortex(Hubel and Wiesel,1962). Mathematically,the KL transform can be written as(Dony and Haykin,1995):(3) where is an dimensional input vector,is an dimensional output vector(),and is an dimensional transformation matrix.The transformation matrix,,consists of rows of the eigenvectors which correspond to the largest eigenvalues of the sample autocovariance matrix,(Dony and Haykin,1995):(4)where represents expectation.The KL transform is used here for comparison with the SOM in the dimensionality reduction of the local image samples.The KL transform is also used in eigenfaces,however in that case it is used on the entire images whereas it is only used on small local image samples in this work.4.5Convolutional NetworksThe problem of face recognition from2D images is typically very ill-posed,i.e.there are many models whichﬁt the training points well but do not generalize well to unseen images.In other words,there are not enough training points in the space created by the input images in order to allow accurate estimation of class probabilities throughout the input space.Additionally,for MLP networks with the2D images as input,there is no invariance to translation or local deformation of the images(Le Cun and Bengio,1995). Convolutional networks(CN)incorporate constraints and achieve some degree of shift and deformation invariance using three ideas:local receptiveﬁelds,shared weights,and spatial subsampling.The use of shared weights also reduces the number of parameters in the system aiding generalization.Convolutional networks have been successfully applied to character recognition(Le Cun,1989;Le Cun,Boser,Denker, Henderson,Howard,Hubbard and Jackel,1990;Bottou,Cortes,Denker,Drucker,Guyon,Jackel,Le Cun, Muller,Sackinger,Simard and Vapnik,1994;Bengio,Le Cun and Henderson,1994;Le Cun and Bengio, 1995).A typical convolutional network is shown inﬁgure5(Le Cun,Boser,Denker,Henderson,Howard,Hubbard and Jackel,1990).The network consists of a set of layers each of which contains one or more planes. Images which are approximately centered and normalized enter at the input layer.Each unit in a plane receives input from a small neighborhood in the planes of the previous layer.The idea of connecting units tolocal receptiveﬁelds dates back to the1960s with the perceptron and Hubel and Wiesel’s(1962)discoveryof locally sensitive,orientation-selective neurons in the visual system of a cat(Le Cun and Bengio,1995).The weights forming the receptiveﬁeld for a plane are forced to be equal at all points in the plane.Eachplane can be considered as a feature map which has aﬁxed feature detector that is convolved with a localwindow which is scanned over the planes in the previous layer.Multiple planes are usually used in eachlayer so that multiple features can be detected.These layers are called convolutional layers.Once a featurehas been detected,its exact location is less important.Hence,the convolutional layers are typically followedby another layer which does a local averaging and subsampling operation(e.g.for a subsampling factor of2:where is the output of a subsampling plane at position and is the output of the same plane in the previous layer).The network is trained with theusual backpropagation gradient descent procedure(Haykin,1994).Figure5.A typical convolutional network.A connection strategy can be used to reduce the number of weights in the network.For example,withreference toﬁgure5,Le Cun,Boser,Denker,Henderson,Howard,Hubbard and Jackel(1990)connect thefeature maps in the second convolutional layer only to1or2of the maps in theﬁrst subsampling layer(the connection strategy was chosen manually).This can reduce training time and improve performance(Le Cun,Boser,Denker,Henderson,Howard,Hubbard and Jackel,1990).Convolutional networks are similar to the Neocognitron(Fukushima,1980;Fukushima,Miyake and Ito,1983;Hummel,1995)which is a neural network model of deformation-resistant pattern recognition.TheNeocognitron is similar to the convolutional neural network.Alternating S and C-cell layers in the Neocog-nitron correspond to the convolutional and blurring layers in the convolutional network.However,in theNeocognitron,the C-cell layers respond to the most active input S-cell as opposed to performing an av-eraging operation.The Neocognitron can be trained using either unsupervised or supervised approaches(Fukushima,1995).5System DetailsThe system used for face recognition in this paper is a combination of the preceding parts–a high-level block diagram is shown inﬁgure6andﬁgure7shows a breakdown of the various subsystems that are experimented with or discussed.Figure6.A high-level block diagram of the system used for face recognition.Figure7.A diagram of the system used for face recognition showing alternative methods which are considered in this paper.The top“multilayer perceptron style classiﬁer”(5)represents theﬁnal MLP style fully connected layer of the convolutional network(the CN is a constrained MLP,however theﬁnal layer has no constraints).This decomposition of the convolutional network is shown in order to highlight the possibility of replacing theﬁnal layer(or layers)with a different type of classiﬁer.The nearest-neighbor style classiﬁer is potentially interesting because it may make it possible to add new classes with minimal extra training time.The bottom“multilayer perceptron”(7)shows that the entire convolutional network can be replaced with a multilayer perceptron.Results are presented with either a self-organizing map(2)or the Karhunen-Lo`e ve transform(3)for dimensionality reduction,and either a convolutional neural network(4,5)or a multilayer perceptron(7)for classiﬁcation.The system works as follows(complete details of dimensions etc.are given later):1.For the images in the training set,aﬁxed size window(e.g.)is stepped over the entire image asshown inﬁgure3and local image samples are extracted at each step.At each step the window is moved by4pixels.2.A self-organizing map(e.g.with three dimensions andﬁve nodes per dimension,total nodes)istrained on the vectors from the previous stage.The SOM quantizes the25-dimensional input vectors into 125topologically ordered values.The three dimensions of the SOM can be thought of as three features.The SOM is used primarily as a dimensionality reduction technique and it is therefore of interest to compare the SOM with a more traditional technique.Hence,experiments were performed with the SOM replaced by the Karhunen-Lo`e ve transform.In this case,the KL transform projects the vectors in the 25-dimensional space into a3-dimensional space.3.The same window as in theﬁrst step is stepped over all of the images in the training and test sets.Thelocal image samples are passed through the SOM at each step,thereby creating new training and test sets in the output space of the self-organizing map.(Each input image is now represented by3maps,each of which corresponds to a dimension in the SOM.The size of these maps is equal to the size of the input image()divided by the step size(for a step size of4,the maps are).)4.A convolutional neural network is trained on the newly created training set.Training a standard MLPwas also investigated for comparison.5.1Simulation DetailsDetails of the best performing system from all experiments are given in this section.For the SOM,training was split into two phases as recommended by Kohonen(1995)–an ordering phase, and aﬁne-adjustment phase.100,000updates were performed in theﬁrst phase,and50,000in the second.In theﬁrst phase,the neighborhood radius started at two-thirds of the size of the map and was reduced linearly to1.The learning rate during this phase was:where is the current update number,and is the total number of updates.In the second phase,the neighborhood radius started at2and was reduced to1.The learning rate during this phase was:.The convolutional network containedﬁve layers excluding the input layer.A conﬁdence measure was calcu-lated for each classiﬁcation:where is the maximum output,and is the second maxi-mum output(for outputs which have been transformed using the softmax transformation:5This helps avoid saturating the sigmoid function.If targets were set to the asymptotes of the sigmoid this would tend to:a) drive the weights to inﬁnity,b)cause outlier data to produce very large gradients due to the large weights,and c)produce binary outputs even when incorrect–leading to decreased reliability of the conﬁdence measure.used.A search then converge learning rate schedule was used 6:wherelearning rate,initial learning rate =0.1,total training epochs,current training epoch,,.The schedule is shown in ﬁgure 8.Total training time was around four hours on an SGIIndy 100Mhz MIPS R4400system.0.050.10.150.20.250.30.350.40.450.550100150200250300350400450500L e a r n i n g R a t eEpochLayer 1Layer 2Figure 8.The learning rate as a function of the epoch number.Layer Units yReceptiveﬁeld x Percentage120263Subsampling112–325113Subsampling52–540166Relatively high learning rates are typically used in order to help avoid slow convergence and local minima.However,a constant learning rate results in signiﬁcant parameter and performance ﬂuctuation during the entire training cycle such that the performance of the network can alter signiﬁcantly from the beginning to the end of the ﬁnal epoch.Moody and Darkin have proposed “search then converge”learning rate schedules.We have found that these schedules still result in considerable parameter ﬂuctuation and hence we have added another term to further reduce the learning rate over the ﬁnal epochs.We have found the use of learning rate schedules to improve performance considerably.。