Spatio-temporal saliency detection using phase spectrum of

Spatio-temporal saliency detection using phase spectrum of
Spatio-temporal saliency detection using phase spectrum of

Spatio-temporal Saliency Detection Using Phase Spectrum of Quaternion

Fourier Transform

Chenlei Guo,Qi Ma and Liming Zhang

Department of Electronic Engineering,Fudan University

No.220,Handan Road,Shanghai,200433,China

https://www.360docs.net/doc/c23183334.html,/?clguo,https://www.360docs.net/doc/c23183334.html,nce@https://www.360docs.net/doc/c23183334.html,,lmzhang@https://www.360docs.net/doc/c23183334.html,

Abstract

Salient areas in natural scenes are generally regarded as the candidates of attention focus in human eyes,which is the key stage in object detection.In computer vision, many models have been proposed to simulate the behav-ior of eyes such as SaliencyToolBox(STB),Neuromorphic Vision Toolkit(NVT)and etc.,but they demand high com-putational cost and their remarkable results mostly rely on the choice of parameters.Recently a simple and fast ap-proach based on Fourier transform called spectral residual (SR)was proposed,which used SR of the amplitude spec-trum to obtain the saliency map.The results are good,but the reason is questionable.

In this paper,we propose it is the phase spectrum,not the amplitude spectrum,of the Fourier transform that is the key in obtaining the location of salient areas.We provide some examples to show that PFT can get better results in comparison with SR and requires less computational com-plexity as well.Furthermore,PFT can be easily extended from a two-dimensional Fourier transform to a Quaternion Fourier Transform(QFT)if the value of each pixel is repre-sented as a quaternion composed of intensity,color and mo-tion feature.The added motion dimension allows the phase spectrum to represent spatio-temporal saliency in order to engage in attention selection for videos as well as images.

Extensive tests of videos,natural images and psycholog-ical patterns show that the proposed method is more effec-tive than other models.Moreover,it is very robust against white-colored noise and meets the real-time requirements, which has great potentials in engineering applications. 1.Introduction

Most traditional object detectors need training in order to detect speci?c object categories[1,2,3],but human vi-sion can focus on general salient objects rapidly in a clus-tered visual scene without training because of the existence of visual attention mechanism.So human can easily deal with general object detection well,which is becoming an intriguing subject for more and more researches.

What attracts people’s attention?Tresiman[4]pro-posed a theory which describes that visual attention has two stages.A set of basic visual features such as color,motion and edges is processed in parallel at pre-attentive stage.And then a limited-capacity process stage performs other more complex operations like face recognition and etc.[5].Dis-tinctive features(e.g.luminous color,high velocity motion and etc.)will”pop out”automatically in the pre-attentive stage,which become the object candidates.

Several computational models have been proposed to simulate human’s visual attention.Itti et al.proposed a bottom-up model and built a system called Neuromor-phic Vision C++Toolkit(NVT)[6].After that,following Rensink’s theory[7],Walther extended this model to attend to proto object regions and created SaliencyToolBox(STB) [8].He also applied it to the object recognition tasks[9]. However,the high computational cost and variable param-eters are still the weaknesses of these models.Recently, Spectral Residual(SR)approach based on Fourier Trans-form was proposed by[10],which does not rely on the pa-rameters and can detect salient objects rapidly.In this ap-proach,the difference(SR)between the original signal and a smooth one in the log amplitude spectrum is calculated, and the saliency map is then obtained by transforming SR to spatial domain.All these models mentioned above,how-ever,only consider static images.Incorporating motion into these models is a challenging task that motivates us to de-velop a novel method to generate spatio-temporal saliency map.

After careful analysis,we?nd that SR of the ampli-tude spectrum is not essential to obtain the saliency map in[10];however,the saliency map can be calculated by the image’s Phase spectrum of Fourier Transform(PFT) alone.Moreover,this discovery of PFT provides an easy way to extend our work to Quaternion Fourier Transform (QFT)[11].Each pixel of the image is represented by a

978-1-4244-2243-2/08/$25.00 ?2008 IEEE

quaternion that consists of color,intensity and motion fea-ture.The Phase spectrum of QFT(PQFT)is used to obtain the spatio-temporal saliency map,which considers not only salient spatial features like color,orientation and etc.in a single frame but also temporal feature between frames like motion.

In section2,we propose PFT and discuss the relation-ship between PFT and SR.In section3,we introduce the quaternion representation of an image and propose PQFT to obtain spatio-temporal saliency map.Many experimen-tal results of comparing our methods with others are shown in section4,and the conclusions and discussions are given

thereafter.

2.From Phase spectrum of Fourier Transform

to the Saliency Map

It was discovered that an image’s SR of the log ampli-tude spectrum represented its innovation.By using the ex-ponential of SR instead of the original amplitude spectrum and keeping the phase spectrum,the reconstruction of the image results in the saliency map[10].However,we?nd that this saliency map can be calculated by PFT regardless of the amplitude spectrum value,which motivates us to ex-plore the role of PFT in obtaining the saliency map.

2.1.What does the phase spectrum represent?

What can be detected from the reconstruction that is cal-culated only by the phase spectrum of the input signal?In order to show its underlying principle,we give three one-dimensional waveforms shown in Fig.1(left)and hope to ?nd some intrinsic rules from their reconstructions of PFT.

Note for the following examples in Fig.1,the reconstruc-tion is obtained by the phase spectrum alone.When the waveform is a positive or negative pulse,its reconstruction contains the largest spikes at the jump edge of the input pulse.This is because many varying sinusoidal components locate there.In contrast,when the input is a single sinu-soidal component of constant frequency,there is no distinct spike in the reconstruction.Less periodicity or less homo-geneity of a location,in comparison with its entire wave-form,creates more”pop out”.

The same rule can be applied to two-dimension signals like images as well.[12]pointed out that the amplitude spectrum speci?es how much of each sinusoidal component is present and the phase information speci?es where each of the sinusoidal components resides within the image.The location with less periodicity or less homogeneity in vertical or horizonal orientation creates the”pop out”proto objects in the reconstruction of the image,which indicates where the object candidates are located.

Please note that we do not take the borders of the signals or images into consideration because of their discontinuity.Figure1.One dimension data examples.(left)Original data(right) Reconstructions only by the phase spectrum.

2.2.The steps of PFT approach

According to discovery above,we propose the PFT ap-proach to calculate the saliency map.The steps(Eq.1-3)can be summarized as follows:

Given an image I(x,y),

f(x,y)=F(I(x,y))(1)

p(x,y)=P(f(x,y))(2) sM(x,y)=g(x,y)? F?1[e i·p(x,y)] 2(3) where F and F?1denote the Fourier Transform and In-verse Fourier Transform,respectively.P(f)represents the phase spectrum of the image.g(x,y)is a2D gaussian?l-ter(σ=8),which is the same with SR[10].The value of the saliency map at location(x,y)is obtained by Eq.3.In SR approach,there needs to add spectral residual of the log amplitude spectrum to the square bracket of Eq.3.

We use the intensity of the image as the input to PFT and SR in the following experiments.

https://www.360docs.net/doc/c23183334.html,parison between PFT and SR

It is obvious that PFT omits the computation of SR in the amplitude spectrum,which saves about1/3computational cost(see Section4.2and4.3for details).How different are the saliency maps?In this subsection,we analyze the saliency maps computed from PFT and SR and give some results.

We use the database of[10]as the test images.The database contains62natural images with resolution of around800×600.

De?ne the saliency map of image i from PFT as sM1i and that from SR as sM2i.Fig.2shows the results of three natural images,in which sM1i and sM2i look the same.In order to evaluate the similarity between these saliency maps

to?nd them all.The fourth and?fth images are the patterns that are both salient in color and orientation,which should be the easiest task.PQFT and NVT can?nd the targets by only one?xation.PFT and STB both?nd the salient red vertical bar but fail to detect the salient red horizonal bar. SR fails to?nd any of the salient targets above.

In Fig.10,NVT and STB can?nd all the patterns within ?ve?xations.Please note that they are capable of?nding the target in closure pattern because of the denser intensity of the enclosed pieces,but not because of enclosure.PQFT, PFT and SR fail in the closure search,which is one common limitation of these methods.

Our PQFT and PFT can attend to the missing vertical black bar at the?rst?xation in Fig.11,which agrees with human behavior.However,other methods fail in this test. Please note that the saliency maps by PFT and SR are quite different in this case although they look very similar in other tests.

Fig.12shows that all the models cannot perform con-junction search effectively because it is believed that con-junction search needs thinking and prior knowledge(top-down)and all these models only considers bottom-up infor-mation.

In sum,our PQFT method performs best because it en-counters only three detection failures(one in closure pattern and two in conjunction search)and needs only one?xation to detect the targets among all the other patterns,which

shows that our spatio-temporal saliency map provides ef-fective information to detect salient visual patterns.

5.Conclusions and Discussions

We proposed a method called PQFT to calculate spatio-temporal saliency maps which is used to detect salient ob-jects in both natural images and videos.[10]discovered SR and used it to detect proto objects;however,our work indi-cates that the saliency map can be easily obtained when the amplitude spectrum of the image is at any nonzero constant value.Thus the phase spectrum is critical to the saliency map and the experimental results show that PFT is better and faster than SR,especially in the test of psychological patterns(Fig.11).As SR still preserves the phase spectrum, we doubt whether SR is a necessary step or not.

The effect of the phase spectrum provides us with a very easy way to extend PFT to PQFT which considers not only color,orientation and intensity but also the motion feature between frames.We incorporate these features as a quater-nion image and process them in parallel,which is better than processing each feature separately,because some fea-tures can not”pop out”if they are projected into each di-mension.As a result,the spatio-temporal saliency map that PQFT produces can deal with videos,natural images and psychological patterns better than the other state-of-the-art models,which shows that PQFT is an effective saliency de-Figure9.Search results of salient color or orientation patterns. tection method.In addition,it is very robust against white-colored noise and is fast enough to meet real-time require-ments.Our PQFT is independent of parameters and prior knowledge as well.Considering its good performance,is it possible that human vision system does the same function like PQFT or PFT?We hope that more work can be done to discover the role of phase spectrum in early human vision.

Comparing with human vision,our methods still have some limitations.Firstly,our methods can not deal with the closure pattern well up to now;however,human can?nd these patterns in a very short time.Secondly,our experi-mental results show the strong robustness of PQFT against white-colored noise;however,if the noise is very simi-lar to the salient feature of the target,our spatio-temporal saliency map will fail to detect the target.Finally,Wang et al.suggested that people could perform effective conjunc-tion search[14],and our experimental results showed that all the models including our own failed in these patterns. We will do more work to explore these unsolved issues.

The potential of our work lies in the engineering?elds, which can be extended to the application like object recog-nition,video coding and etc.In addition,as our model only considers bottom-up information,it is necessary to add top-

Figure 10.Search results of salient orientation patterns.

Figure 11.Only PQFT and PFT can ?nd the missing item.

Figure 12.Conjunction search result of ?ve models.

down signals (e.g .visual memory)for developing an effec-tive vision system in robot application.

References

[1]R.Fergus,P.Perona,and A.Zisserman.Object class

recognition by unsupervised scale-invariant learning.Proc.CVPR ,2,2003.1

[2]T.Liu,J.Sun,N.Zheng,X.Tang and H.Shum Learn-ing to Detect A Salient Object.Proc.CVPR ,2007.1[3]D.Gao and N.Vasconcelos Integrated learning of

saliency,complex features,and objection detectors from cluttered scenes.Proc.CVPR ,2005.1

[4]A.Treisman and G.Gelade.A Feature-Integration The-ory of Attention.Cognitive Psychology ,12(1):97-136,1980.1,6[5]J.Wolfe.Guided Search 2.0:A Revised Model

of Guided Search.Psychonomic Bulletin &Review ,1(2):202-238,1994.1,6

[6]L.Itti,C.Koch,E.Niebur,et al.A Model of Saliency-Based Visual Attention for Rapid Scene Analysis.IEEE Transactions on Pattern Analysis and Machine Intelli-gence ,20(11):1254-1259,1998.1,3,4,5,6

[7]R.Rensink.Seeing,sensing,and scrutinizing.Vision

Research ,40(10-12):1469-87,2000.1

[8]D.Walther and C.Koch.Modeling attention to salient

proto-objects.Neural Networks .19,1395-1407,2006.1,3,4

[9]D.Walther,L.Itti,M.Riesenhuber,T.Poggio,and C.

Koch.Attentional Selection for Object Recognition -a Gentle Way.Lecture Notes in Computer Science ,2525(1):472-479,2002.1[10]X.Hou and L.Zhang.Saliency Detection:A Spectral

Residual Approach.Proc.CVPR ,2007.1,2,3,4,5,7[11]T.Ell and S.Sangwin.Hypercomplex Fourier Trans-forms of Color Images.IEEE Transactions on Image Processing ,16(1):22-35,2007.1,3

[12]K.Castleman.Digital Image Processing.Prentice-Hall,New York,1996.2

[13]S.Engel,X.Zhang,and B.Wandell.Colour Tuning in

Human Visual Cortex Measured With Functional Mag-netic Resonance Imaging.Nature ,vol.388,no.6,637,pp.68-71,July 1997.3[14]D.L.Wang,A.Kristjansson,and K.Nakayama.Ef-?cient visual search without top-down or bottom-up guidance.Perception &Psychophysics ,vol.67,pp.239-253.7

结合域变换和轮廓检测的显著性目标检测

第30卷第8期计算机辅助设计与图形学学报Vol.30No.8 2018年8月Journal of Computer-Aided Design & Computer Graphics Aug. 2018结合域变换和轮廓检测的显著性目标检测 李宗民1), 周晨晨1), 宫延河1), 刘玉杰1), 李华2) 1) (中国石油大学(华东)计算机与通信工程学院青岛 266580) 2) (中国科学院计算技术研究所智能信息处理重点实验室北京 100190) (lizongmin@https://www.360docs.net/doc/c23183334.html,) 摘要: 针对多层显著性图融合过程中产生的显著目标边缘模糊、亮暗不均匀等问题, 提出一种基于域变换和轮廓检测的显著性检测方法. 首先选取判别式区域特征融合方法中的3层显著性图融合得到初始显著性图; 然后利用卷积神经网络计算图像显著目标外部轮廓; 最后使用域变换将第1步得到的初始显著性图和第2步得到的显著目标轮廓图融合. 利用显著目标轮廓图来约束初始显著性图, 对多层显著性图融合产生的显著目标边缘模糊区域进行滤除, 并将初始显著性图中检测缺失的区域补充完整, 得到最终的显著性检测结果. 在3个公开数据集上进行实验的结果表明, 该方法可以得到边缘清晰、亮暗均匀的显著性图, 且准确率和召回率、F-measure, ROC以及AUC等指标均优于其他8种传统显著性检测方法. 关键词: 显著性目标; 卷积神经网络; 轮廓检测; 域变换融合 中图法分类号:TP391.41 DOI: 10.3724/SP.J.1089.2018.16778 Saliency Object Detection Based on Domain Transform and Contour Detection Li Zongmin1), Zhou Chenchen1), Gong Yanhe1), Liu Yujie1), and Li Hua2) 1)(College of Computer & Communication Engineering , China University of Petroleum, Qingtao 266580) 2) (Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190) Abstract: In order to solve the problem of edge blur and brightness non-uniformity of salient object in the process of multi-level saliency maps integration, this paper proposes a saliency detection method based on domain transform and contour detection. Firstly, we obtain initial saliency map by integrate three saliency maps using DRFI method. Then, the salient object contour of image are computed by convolutional neural network. Finally, we use domain transform to blend the initial saliency map and the salient object contour. Under the constraints of the salient object contour, we can filter out the errors in the initial saliency map and compensate the missed region. The experimental results on three public datasets demonstrates that our method can produce a pixel-wised clearly saliency map with brightness uniformity and outperform other eight state-of-the-art saliency detection methods in terms of precision-recall curves, F-measure, ROC and AUC. Key words: salient object; convolutional neural network; contour detection; domain transform fuse 人类视觉系统可以快速定位图像中的显著区域, 显著性检测就是通过模拟人类大脑的视觉 收稿日期: 2017-07-10; 修回日期: 2017-09-12. 基金项目: 国家自然科学基金(61379106, 61379082, 61227802); 山东省自然科学基金(ZR2013FM036, ZR2015FM011). 李宗民(1965—), 男, 博士, 教授, 博士生导师, CCF会员, 主要研究方向为计算机图形学与图像处理、模式识别; 周晨晨(1992—), 女, 硕士研究生, 主要研究方向为计算机图形学与图像处理; 宫延河(1989—), 男, 硕士研究生, 主要研究方向为计算机图形学与图像处理; 刘玉杰(1971—), 男, 博士, 副教授, CCF会员, 主要研究方向为计算机图形图像处理、多媒体数据分析、多媒体数据库; 李华(1956—), 男, 博士, 教授, 博士生导师, CCF会员, 主要研究方向为计算机图形图像处理. 万方数据

基于全局对比度的显著性区域检测

基于全局对比度的显著性区域检测 Ming-Ming Cheng1Guo-Xin Zhang1 Niloy J. Mitra2 Xiaolei Huang3Shi-Min Hu1 1TNList, Tsinghua University 2 KAUST 3 Lehigh University 摘要 视觉显著性的可靠估计能够实现即便没有先验知识也可以对图像适当的处理,因此在许多计算机视觉任务中留有一个重要的步骤,这些任务包括图像分割、目标识别和自适应压缩。我们提出一种基于区域对比度的视觉显著性区域检测算法,同时能够对全局对比度差异和空间一致性做出评估。该算法简易、高效并且产出满分辨率的显著图。当采用最大的公开数据集进行评估时,我们的算法比已存的显著性检测方法更优越,具有更高的分辨率和更好的召回率。我们还演示了显著图是如何可以被用来创建用于后续图像处理的高质量分割面具。 1 引言 人们经常毫不费力地判断图像区域的重要性,并且把注意力集中在重要的部分。由于通过显著性区域可以优化分配图像分析和综合计算机资源,所以计算机检测图像的显著性区域存在着重要意义。提取显著图被广泛用在许多计算机视觉应用中,包括对兴趣目标物体图像分割[13, 18]、目标识别[25]、图像的自适应压缩[6]、内容感知图像缩放[28, 33,30, 9]和图像检索[4]等。 显著性源于视觉的独特性、不可预测性、稀缺性以及奇异性,而且它经常被归因于图像属性的变化,比如颜色、梯度、边缘和边界等。视觉显著性是通过包括认知心理学[26, 29]、神经生物学[8, 22]和计算机视觉[17, 2]在内的多学科研究出来的,与我们感知和处理视觉刺激密切相关。人类注意力理论假设人类视力系统仅仅详细处理了部分图像,同时保持其他的图像基本未处理。由Treisman和Gelade [27],Koch和Ullman [19]进行的早期工作,以及随后由Itti,Wolfe等人 提出的注意力理论提议将视觉注意力分为两个阶段:快速的、下意识的、自底向 上的、数据驱动显著性提取;慢速的、任务依赖的、自顶向下的、目标驱动显著

显著性目标检测中的视觉特征及融合

第34卷第8期2017年8月 计算机应用与软件 Computer Applications and Software VoL34 No.8 Aug.2017 显著性目标检测中的视觉特征及融合 袁小艳u王安志1潘刚2王明辉1 \四川大学计算机学院四川成都610064) 2 (四川文理学院智能制造学院四川达州635000) 摘要显著性目标检测,在包括图像/视频分割、目标识别等在内的许多计算机视觉问题中是极为重要的一 步,有着十分广泛的应用前景。从显著性检测模型过去近10年的发展历程可以清楚看到,多数检测方法是采用 视觉特征来检测的,视觉特征决定了显著性检测模型的性能和效果。各类显著性检测模型的根本差异之一就是 所选用的视觉特征不同。首次较为全面地回顾和总结常用的颜色、纹理、背景等视觉特征,对它们进行了分类、比较和分析。先从各种颜色特征中挑选较好的特征进行融合,然后将颜色特征与其他特征进行比较,并从中选择较 优的特征进行融合。在具有挑战性的公开数据集ESSCD、DUT-0M0N上进行了实验,从P R曲线、F-M easure方法、M A E绝对误差三个方面进行了定量比较,检测出的综合效果优于其他算法。通过对不同视觉特征的比较和 融合,表明颜色、纹理、边框连接性、Objectness这四种特征在显著性目标检测中是非常有效的。 关键词显著性检测视觉特征特征融合显著图 中图分类号TP301.6 文献标识码 A DOI:10. 3969/j. issn. 1000-386x. 2017.08. 038 VISUAL FEATURE AND FUSION OF SALIENCY OBJECT DETECTION Yuan Xiaoyan1,2Wang Anzhi1Pan Gang2Wang Minghui1 1 (College o f Computer Science,Sichuan University,Chengdu 610064,Sichuan,China) 2 {School o f Intelligent M anufacturing, Sichuan University o f A rts and Science, Dazhou 635000, Sichuan, China) Abstract The saliency object detection is a very important step in many computer vision problems, including video image segmentation, target recognition, and has a very broad application prospect. Over the past 10 years of development of the apparent test model, it can be clearly seen that most of the detection methods are detected by using visual features, and the visual characteristics determine the performance and effectiveness of the significance test model. One of the fundamental differences between the various saliency detection models is the chosen of visual features. We reviewed and summarized the common visual features for the first time, such as color, texture and background. We classified them, compared and analyzed them. Firstly, we selected the better features from all kinds of color features to fuse, and then compared the color features with other characteristics, and chosen the best features to fuse. On the challenging open datasets ESSCD and DUT-OMON, the quantitative comparison was made from three aspects:PR curve, F-measure method and MAE mean error, and the comprehensive effect was better than other algorithms. By comparing and merging different visual features, it is shown that the four characteristics of color, texture, border connectivity and Objectness are very effective in the saliency object detection. Keywords Saliency detection Visual feature Feature fusion Saliency map 收稿日期:2017-01-10。国家重点研究与发展计划项目(2016丫?80700802,2016丫?80800600);国家海洋局海洋遥感工程技术 研究中心创新青年项目(2015001)。袁小艳,讲师,主研领域:计算机视觉,机器学习,个性化服务。王安志,讲师。潘刚,讲师。王 明辉,教授。

显著性区域检测matlab_code_20170412

1.Main_program %====test1 clear close all clc %img_in,输入的待处理的图像 %n,图像分块的patch的大小 %img_rgb:缩放后的rgb图 %img_lab:rgb2lab后的图 %h,w,图像高宽 %mex_store,存储矩阵 img=imread('test1.jpg'); % img=imread('E:\例程\matlab\显著性分割\假设已知均值\test picture\0010.tif'); max_side=120; yu_value=0.8; img_lab = pre_rgb2lab(img,max_side); [ img_scale_1,img_scale_2,img_scale_3,img_scale_4 ] = get4Scale( img_lab ); [ DistanceValue_scale_1_t1,DistanceValue_scale_1_exp,DistanceValue_scale_1_t1_rang,DistanceValue_scale_1_exp_rang] = distanceValueMap_search_onescale_2( img_scale_1,max_side ); [ DistanceValue_scale_2_t1,DistanceValue_scale_2_exp,DistanceValue_scale_2_t1_rang,DistanceValue_scale_2_exp_rang] = distanceValueMap_search_onescale_2( img_scale_2,max_side ); [ DistanceValue_scale_3_t1,DistanceValue_scale_3_exp,DistanceValue_scale_3_t1_rang,DistanceValue_scale_3_exp_rang] = distanceValueMap_search_onescale_2( img_scale_3,max_side ); [ DistanceValue_scale_4_t1,DistanceValue_scale_4_exp,DistanceValue_scale_4_t1_rang,DistanceValue_scale_4_exp_rang] = distanceValueMap_search_onescale_2( img_scale_4,max_side ); value_C_1=Center_weight( DistanceValue_scale_1_exp_rang,yu_value ); value_C_2=Center_weight( DistanceValue_scale_2_exp_rang,yu_value ); value_C_3=Center_weight( DistanceValue_scale_3_exp_rang,yu_value ); value_C_4=Center_weight( DistanceValue_scale_4_exp_rang,yu_value ); [h,w]=size(value_C_1); value_C_1_resize=value_C_1; value_C_2_resize=imresize(value_C_2,[h,w]); value_C_3_resize=imresize(value_C_3,[h,w]); value_C_4_resize=imresize(value_C_4,[h,w]); value_C_sum=(value_C_1_resize+value_C_2_resize+value_C_3_resize+value_C_4_resize)/4; figure, imshow(value_C_sum); figure, imshow(value_C_1_resize); figure, imshow(value_C_2_resize); figure, imshow(value_C_3_resize); figure, imshow(value_C_4_resize); 8888888888888888888888888888888888888888888888888888888888888888888888888888

图像显著性目标检测算法研究

图像显著性目标检测算法研究 随着移动电子设备的不断升级与应用,使用图像来记录或表达信息已成为一种常态。我们要想快速地在海量图像中提取出有价值的信息,那么需要模拟人类视觉系统在机器视觉系统进行计算机视觉热点问题的研究。 图像显著性目标检测对图像中最引人注意且最能表征图像内容的部分进行检测。在图像显著性目标检测任务中,传统的方法一般利用纹理、颜色等低层级视觉信息自下向上地进行数据驱动式检测。 对于含有单一目标或高对比度的自然场景图像,可以从多个角度去挖掘其显著性信息,如先验知识、误差重构等。然而,对于那些具有挑战性的自然场景图像,如复杂的背景、低对比度等,传统的方法通常会检测失败。 基于深度卷积神经网络的算法利用高层级语义信息结合上下文充分挖掘潜在的细节,相较于传统的方法已取得了更优越的显著性检测性能。本文对于图像显著性检测任务存在的主要问题提出了相应的解决方法。 本文的主要贡献如下:为充分挖掘图像多种显著性信息,并使其能够达到优势互补效果,本文提出了一种有效的模型,即融合先验信息和重构信息的显著性目标检测模型。重构过程包括密度重构策略与稀疏重构策略。 密度重构其优势在于能够更准确地定位存在于图像边缘的显著性物体。而稀疏重构更具鲁棒性,能够更有效地抑制复杂背景。 先验过程包含背景先验策略与中心先验策略,通过先验信息可更均匀地突出图像中的显著性目标。最后,把重构过程与先验过程生成的显著特征做非线性融合操作。 实验结果充分说明了该模型的高效性能与优越性能。针对图像中存在多个显

著性目标或者检测到的显著性目标存在边界模糊问题,本文提出了一种基于多层级连续特征细化的深度显著性目标检测模型。 该模型包括三个阶段:多层级连续特征提取、分层边界细化和显著性特征融合。首先,在多个层级上连续提取和编码高级语义特征,该过程充分挖掘了全局空间信息和不同层级的细节信息。 然后,通过反卷积操作对多层级特征做边界细化处理。分层边界细化后,把不同层级的显著特征做融合操作得到结果显著图。 在具有挑战性的多个基准数据集上使用综合评价指标进行性能测试,实验结果表明该方法具有优越的显著性检测性能。对于低对比度或者小目标等问题,本文提出一种新颖模型,即通道层级特征响应模型。 该模型包含三个部分:通道式粗特征提取,层级通道特征细化和层级特征图融合。该方法基于挤压激励残差网络,依据卷积特征通道之间的相关性进行建模。 首先,输入图像通过通道式粗特征提取过程生成空间信息丢失较多的粗糙特征图。然后,从高层级到低层级逐步细化通道特征,充分挖掘潜在的通道相关性细节信息。 接着,对多层级特征做融合操作得到结果显著图。在含有复杂场景的多个基准数据集上与其它先进算法进行比较,实验结果证明该算法具有较高的计算效率和卓越的显著性检测性能。

视觉显著性检测方法及其应用研究

视觉显著性检测方法及其应用研究 随着科学技术和多媒体技术的发展,人们在日常生活中产生的多媒体数据,尤其是图像数据呈指数级增长。海量的图像数据除了使人们的日常生活变得丰富多彩和便利之外,也给计算机视觉处理技术提出了新的挑战。 大部分图像中只包含了少量重要的信息,人眼视觉系统则具有从大量数据中找出少量重要信息并进行进一步分析和处理的能力。计算机视觉是指使用计算机模拟人眼视觉系统的机理,并使其可以像人类一样视察与理解事物,其中的一个关键问题为显著性检测。 本文针对目前已有显著性检测方法存在的问题,将重点从模拟人眼视觉注意机制以及针对图像像素和区域的鲁棒特征提取方法进行专门的研究。同时,本文还将显著性检测思想和方法引入到场景文本检测的研究中,既能提高场景文本检测的性能,又能拓展基于显著性检测的应用范畴。 针对人眼视觉注意机制的模拟,本文提出了一种基于超像素聚类的显著性检测方法。该方法分析了人眼视觉注意机制中由粗到细的过程,并采用计算机图像处理技术来模拟该过程。 具体而言,本文首先将原始图像分割为多个超像素,然后采用基于图的合并聚类算法将超像素进行聚类,直到只有两个类别为止,由此得到一系列具有连续类别(区域)个数的中间图像。其中在包含类别数越少的中间图像中的区域被给予更大的权重,并采用边界连通性度量来计算区域的显著性值,得到初始显著性图。 最终基于稀疏编码的重构误差和目标偏见先验知识对初始显著性图进一步细化得到最终的显著性图。针对鲁棒特征提取,本文提出了一种基于区域和像素级融合的显著性检测方法。

对于区域级显著性估计,本文提出了一种自适应区域生成技术用于区域提取。对于像素级显著性预测,本文设计了一种新的卷积神经网络(CNN)模型,该模型考虑了不同层中的特征图之间的关系,并进行多尺度特征学习。 最后,提出了一种基于CNN的显著性融合方法来充分挖掘不同显著性图(即 区域级和像素级)之间的互补信息。为了提高性能和效率,本文还提出了另一种基于深层监督循环卷积神经网络的显著性检测方法。 该网络模型在原有的卷积层中引入循环连接,从而能为每个像素学习到更丰富的上下文信息,同时还在不同层中分别引入监督信息,从而能为每个像素学习 到更具区分能力的局部和全局特征,最后将它们进行融合,使得模型能够进行多 尺度特征学习。针对基于文本显著性的场景文本检测方法的研究,本文提出了一种仅对文本区域有效的显著性检测CNN模型,该模型在不同层使用了不同的监督信息,并将多层信息进行融合来进行多尺度特征学习。 同时为了提高文本检测的性能,本文还提出了一种文本显著性细化CNN模型和文本显著性区域分类CNN模型。细化CNN模型对浅层的特征图与深层的特征图进行整合,以便提高文本分割的精度。 分类CNN模型使用全卷积神经网络,因此可以使用任意大小的图像作为模型的输入。为此,本文还提出了一种新的图像构造策略,以便构造更具区分能力的图像区域用于分类并提高分类精度。

相关文档
最新文档