Qin CVPR 2013 - Query Adaptive Similarity for Large Scale Object Retrieval

合集下载

基于随机梯度法的选择性神经网络二次集成

基于随机梯度法的选择性神经网络二次集成
施彦;黄聪明;侯朝桢
【期刊名称】《计算机工程》
【年(卷),期】2004(30)16
【摘要】针对使用贪心法、遗传算法等方法实现选择性神经网络集成时出现的"局部最小点"和"过拟合"问题,提出了一类基于随机梯度法的选择性神经网络二次集成方法.理论分析和实验表明,与上述选择性神经网络集成方法相比,该方法易于实现且效果明显.
【总页数】4页(P133-135,159)
【作者】施彦;黄聪明;侯朝桢
【作者单位】北京理工大学化工与环境学院,北京,100081;北京理工大学化工与环境学院,北京,100081;北京理工大学信息科学技术学院,北京,100081
【正文语种】中文
【中图分类】TP301.6
【相关文献】
1.基于选择性集成神经网络的电路板故障智能诊断 [J], 于敏;马丽华;卢朝梁
2.选择性神经网络二次集成在火药近红外分析中的应用研究 [J], 施彦;黄聪明
3.基于蜂群算法的选择性神经网络集成的风机齿轮箱轴承故障诊断 [J], 朱俊;刘天羽;王致杰;黄麒元;孟畅;江秀臣;盛戈皞
4.基于选择性神经网络集成的地基云状识别算法 [J], 鲁高宇;李涛
5.基于随机贪心选择的选择性集成算法 [J], 江峰;张友强;杜军威;刘国柱;眭跃飞
因版权原因，仅展示原文概要，查看原文内容请购买。

基于双重阈值近邻查找的协同过滤算法

基于双重阈值近邻查找的协同过滤算法
李颖;李永丽;蔡观洋
【期刊名称】《吉林大学学报（信息科学版）》
【年(卷),期】2013(031)006
【摘要】为了提高协同过滤算法的推荐精度,从协同过滤算法中近邻用户/项目组的选择入手,提出基于双重阈值近邻查找的协同过滤算法.该算法能充分利用现有的稀疏用户项目评分矩阵,找出与目标用户相关性较强,且能参与到评分预测过程中的候选用户.实验结果表明,该算法相比传统的协同过滤算法及部分改进算法,其推荐精度有一定提高,对实际应用具有一定的参考价值.
【总页数】7页(P647-653)
【作者】李颖;李永丽;蔡观洋
【作者单位】吉林师范大学计算机学院,吉林四平136000;东北师范大学计算机科学与信息技术学院,长春130117;吉林大学计算机科学与技术学院,长春130012【正文语种】中文
【中图分类】TP301.6
【相关文献】
1.基于正相关和负相关最近邻居的协同过滤算法 [J], 徐怡;唐一民;王冉
2.通过评分特征优化基于K近邻的协同过滤算法 [J], 韩林峄;吴晟
3.基于近邻协同过滤算法的相似度计算方法研究 [J], 王博生;何先波;朱广林;郭军平;陶卫国;李丽
4.基于BiasSVD和聚类用户最近邻的协同过滤算法研究 [J], 李佳; 张牧
5.基于BiasSVD和聚类用户最近邻的协同过滤算法研究 [J], 李佳;张牧
因版权原因，仅展示原文概要，查看原文内容请购买。

基于高效注意力模块的三阶段网络图像修复

第 22卷第 8期2023年 8月Vol.22 No.8Aug.2023软件导刊Software Guide基于高效注意力模块的三阶段网络图像修复周遵富1，2，张乾1，2，李伟1，2，李筱玉1，2（1.贵州民族大学数据科学与信息工程学院；2.贵州省模式识别与智能系统重点实验室，贵州贵阳 550025）摘要：现存的人脸图像修复方法，在图像大比例缺损或分辨率高的条件下合成的图像会出现图像纹理结构不协调和上下文语义信息不一致的情况，不能准确合成清晰的图像结构，比如眼睛和眉毛等。

为解决此问题，提出一种基于归一化注意力模块（NAM）的三阶段网络（RLGNet）人脸图像修复方法，能加快模型收敛速度并降低计算成本。

其中，粗修复网络对残损图像进行初步修复后，局部修复网络对残损图像局部进行细致的修复，基于归一化注意力模块的全局细化修复网络对残损图像整体进行修复，以增加图像的语义连贯性和纹理结构的协调性。

该方法在CelebA-HQ数据集上进行实验，结果表明在掩码比例为20%~30%时PSNR达到30.35 dB，SSIM达到0.926 9，FID为2.55，能够合成相邻像素过渡自然和纹理结构合理的人脸图像。

关键词：人脸图像修复；归一化注意力模块；三阶段修复网络；激活函数DOI：10.11907/rjdk.231474开放科学（资源服务）标识码（OSID）：中图分类号：TP391.41 文献标识码：A文章编号：1672-7800（2023）008-0196-07Three-stage Network Image Inpainting Based on Efficient Attention ModuleZHOU Zunfu1，2， ZHANG Qian1，2， LI Wei1，2， LI Xiaoyu1，2（1.School of Data Science and Information Engineering， Guizhou Minzu University；2.Key Laboratory of Pattern Recognition and Intelligent Systems of Guizhou， Guiyang 550025， China）Abstract：Existing face image inpainting methods， which synthesize face images under the conditions of large scale image deficiency or high resolution that will have inconsistent image texture structure and inconsistent contextual semantic information， and cannot accurately synthe⁃size clear image structures， such as eyes and eyebrows， etc. To solve this problem， This paper proposed a Rough-Local-Global Networks（RL⁃GNet） face image inpainting method based on a Normalization-based Attention Module（NAM）， which can accelerate the convergence speed of the model and reduce the computational cost. Among them， the coarse inpainting network performs the initial repair of the residual image and then the local inpainting network performs the detailed repair of the residual image locally； the global refinement inpainting network based on the normalized attention mechanism performs the repair of the residual image as a whole to increase the semantic coherence and the coordi⁃nation of the texture structure of the image. The method proposed in this paper is tested on the CelebA-HQ dataset. The results show that the PSNR reaches 30.35 dB and the SSIM value reaches 0.926 9 and FID is 2.55 at the mask ratio of 20%~30%， which can synthesize face images with a natural transition of adjacent pixels and a reasonable texture structure.Key Words：face image inpainting； normalized attention module； three-stage inpainting network； activation function0 引言图像修复是指补全图像中不同比例的缺失区域。

基于自适应图学习与邻域嵌入的特征选择

基于自适应图学习与邻域嵌入的特征选择基于自适应图学习与邻域嵌入的特征选择随着数据规模和维度的不断增加，特征选择在机器学习和数据挖掘中变得越来越重要。

特征选择可以帮助提高学习算法的效率和准确性，并便于解释模型的结果。

然而，传统的特征选择方法往往无法处理高维和复杂的数据，因此需要开发新的方法来解决这个问题。

在本文中，我们提出了一种基于自适应图学习与邻域嵌入的特征选择方法。

首先，我们先介绍一下自适应图学习。

自适应图学习是一种非监督学习方法，通过在数据空间中建立图结构来捕捉数据之间的相似性。

自适应图学习可以有效地处理非线性和非高斯分布的数据，因此在特征选择中具有潜力。

然后，我们介绍邻域嵌入。

邻域嵌入是一种无监督降维方法，它可以将高维数据映射到低维空间中，并保持数据之间的邻域关系。

邻域嵌入可以帮助我们发现数据的潜在结构和特征之间的关联性。

基于以上两种方法，我们提出了一种特征选择方法。

首先，我们通过自适应图学习构建一个数据图，其中每个数据样本表示为一个节点，并根据相似性确定边的权重。

然后，我们使用邻域嵌入方法将数据映射到低维空间中，并计算每个特征在低维空间中的重要性。

通过对比映射前后的特征重要性，我们可以确定哪些特征对于数据的判别性质更重要。

最后，我们根据特征的重要性进行特征选择，选择那些具有较高重要性的特征。

为了验证我们提出的方法的有效性，我们在几个公开数据集上进行了实验。

实验结果表明，我们的方法在特征选择方面具有较高的准确性和稳定性。

与传统的特征选择方法相比，我们的方法能够更好地捕捉到数据的潜在结构和特征之间的关联性，并且能够得到更优的特征子集。

此外，我们的方法还具有较低的计算复杂度，可以在较短的时间内完成特征选择。

总结起来，本文提出了一种基于自适应图学习与邻域嵌入的特征选择方法。

通过将自适应图学习和邻域嵌入相结合，我们的方法能够更好地处理高维和复杂的数据，并且能够得到更优的特征子集。

未来，我们将进一步改进我们的方法，并将其应用于更广泛的领域综上所述，本文提出了一种基于自适应图学习与邻域嵌入的特征选择方法，该方法能够更好地捕捉到数据的潜在结构和特征之间的关联性，并且能够得到更优的特征子集。

迁移学习的分类方法二

– Task (任务)：由目标函数和学习结果组成，是学习的结果
• 迁移学习的形式化定义
– 条件：给定一个源域和源域上的学习任务，目标域和目标域上的学习任务
– 目标：利用和学习在目标域上的预测函数。
– 限制条件：
或
• 应用领域
迁移学习
二
迁移学习的分类方法
• 按迁移情境 • 按特征空间 • 按迁移方法
– domain adaptation; cross-domain learning – 问题定义：有标签的源域和无标签的目标域共享相同的特征
和类别，但是特征分布不同，如何利用源域标定目标域
• 域适配问题：
– 基于特征的迁移方法：
• Transfer component analysis [Pan, TKDE-11] • Geodesic flow kernel [Duan, CVPR-12] • Transfer kernel learning [Long, TKDE-15] • TransEMDT [Zhao, IJCAI-11]
• 迁移成分分析 (TCA, transfer component analysis) [Pan, TKDE-11]
– 将源域和目标域变换到相同空间，最小化它们的距离
• 迁移成分分析：
– 优化目标：
– Maximum mean discrepancy (MMD)
• GFK (geodesic flow kernel) [Duan, CVPR-12]
• TrAdaBoost [Dai, ICML-07]
– 利用Boost的技术过滤掉多个源域中与目标域不相似的样本，然后进行实例迁移学习
• MsTL-MvAdaboost [Xu, ICONIP-12]

基于自适应对偶字典的磁共振图像的超分辨率重建

ＡｄａｐｔｉｖｅＤｕａｌＤｉｃｔｉｏｎａｒｙ
ＬＩＵＺｈｅｎ－ｑｉ，ＢＡ０Ｌｉ－ｊｕｎ，ＣＨＥＮＺｈｏｎｇ
ｒＤｅｐａｒｔｍｅｎｔｏｆＥｌｅｃｔｒｏｎｉｃＳｃｉｅｎｃｅ，ＸｉａｍｅｎＵｎｉｖｅｒｓｉｔｙ，Ｘｉａｍｅｎ３６１００５，Ｃｈｉｎａ）
刘振圻，包立君，陈忠
（厦门大学电子科学系，福建厦门３６１００５）
摘要：为了提高磁共振成像的图像质量，提出了一种基于自适应对偶字典的超分辨率去噪重建方法，在超分辨率重建过程中引入去噪功能，使得改善图像分辨率的同时能够有效地滤除图像中的噪声，实现了超分辨率重建和去噪技术的有机结合。该方法利用聚类一ＰｃＡ算法提取图像的主要特征来构造主特征字典，采用训练方法设计出表达图像细节信息的自学习字典，两者结合构成的自适应对偶字典具有良好的稀疏度和自适应性。实验表明，与其他超分辨率算法相比，该方法超分辨率重建效果显著，峰值信噪比和平均结构相似度均有所提高。
第２８卷第４期
２０１３年８月
பைடு நூலகம்光电技术应用
ＥＬＥＣＴＲＯ一０ＰＴＩＣＴＥＣＨＮＯＬＯＧＹＡＰＰＬＩＣＡＴ１０Ｎ
ＶＯ１．２８．ＮＯ．４
Ａｕｇｕｓｔ，２０１３
・
信号与信息处理・
基于自适应对偶字典的磁共振图像的超分辨率重建

北京理工大学科技成果——一种基于自适应采样的实时视觉跟踪系统

北京理工大学科技成果——一种基于自适应采样的
实时视觉跟踪系统
成果简介
视觉跟踪是计算机视觉领域中备受关注的一个研究热点。

它在智能交通、人机交互、视频监控和军事等领域有着广泛的应用。

在实际应用中，视觉跟踪方法通常需要处理一些复杂的目标运动如运动突变和几何外观急剧变化等情况，传统的跟踪技术无法满足跟踪要求。

我们提出基于自适应MCMC采样的方法，很好地解决了这一难题，为实际应用中复杂运动目标的跟踪提供了保障。

项目来源自主研发
技术领域计算机应用技术，人工智能与模式识别。

应用范围交通、广场、车站、会场等视频监控中人，车辆等目标的实时跟踪。

现状特点本系统的跟踪准确率>95%;在使用现有性能个人计算机的条件下，可以做到每秒30帧的实时跟踪，采用嵌入式技术固化跟踪算法可以做到复杂目标运动的实时跟踪。

所在阶段跟踪算法已在一些应用系统中试用，跟踪效果良好。

成果知识产权独立知识产权。

成果转让方式技术合作
市场状况及效益分析目前视频监控的基础设施已经普及使用到各种公共场所，特别是“十二五”期间国家实施的天网计划将进一步推动这些基础设施的普及，使得本项目减少了大量的基础实施投入。

另外由于具有广泛的应用前景，市场效益可观。

发生镜头切换情况的行人跟踪，传统跟踪方法此时将会跟踪失败。

基于人工智能的多模态雷达自适应抗干扰优化算法

基于人工智能的多模态雷达自适应抗干扰优化算法
许诚;程强;赵鹏;程玮清
【期刊名称】《现代电子技术》
【年(卷),期】2024(47)7
【摘要】多模态雷达系统容易受到外界环境干扰,如天气条件、电磁干扰等,而这些干扰可能会影响多模态雷达数据的准确性和稳定性。

多模态雷达的抗干扰性能决定雷达的测量精度,因此,为提升多模态雷达的抗干扰能力,提出基于人工智能的多模态雷达自适应抗干扰优化算法。

该算法以多模态雷达信号模型为基础,分析距离速度同步欺骗干扰、频谱弥散干扰原理,计算欺骗干扰时雷达接收的总回波信号。

将计算的回波信号结果输入至人工智能的YOLOv5s深度学习模型中,通过模型的训练和映射处理,完成多模态雷达自适应抗干扰优化,实现雷达欺骗性信号干扰抑制。

测试结果显示,该算法的干扰对消比结果在0.935以上,干扰输出功率结果在0.017以下,能够可靠完成多干扰和单一干扰两种干扰抑制,实现多模态雷达自适应抗干扰优化。

【总页数】4页(P73-76)
【作者】许诚;程强;赵鹏;程玮清
【作者单位】空军预警学院
【正文语种】中文
【中图分类】TN95-34;TN911.1;TP391
【相关文献】
1.基于相控阵雷达自适应波束形成的抗干扰技术
2.基于AA的多通道雷达自适应抗干扰方法
3.基于AA的多通道雷达自适应抗干扰方法
4.一种基于自适应搜索的多模态多目标优化算法
5.基于鲸鱼优化算法的变分模态分解和改进的自适应加权融合算法的管道泄漏检测与定位方法
因版权原因，仅展示原文概要，查看原文内容请购买。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Query Adaptive Similarity for Large Scale Object Retrieval Danfeng Qin Christian Wengert Luc van GoolETH Z¨u rich,Switzerland{qind,wengert,vangool}@vision.ee.ethz.chAbstractMany recent object retrieval systems rely on local fea-tures for describing an image.The similarity between a pair of images is measured by aggregating the similarity between their corresponding local features.In this paper we present a probabilistic framework for modeling the fea-ture to feature similarity measure.We then derive a query adaptive distance which is appropriate for global similar-ity evaluation.Furthermore,we propose a function to score the individual contributions into an image to image similar-ity within the probabilistic framework.Experimental results show that our method improves the retrieval accuracy sig-niﬁcantly and consistently.Moreover,our result compares favorably to the state-of-the-art.1.IntroductionWe consider the problem of content-based image re-trieval for applications such as object recognition or simi-lar image retrieval.This problem has applications in web image retrieval,location recognition,mobile visual search, and tagging of photos.Most of the recent state-of-the-art large scale image re-trieval systems rely on local features,in particular the SIFT descriptor[14]and its variants.Moreover,these descrip-tors are typically used jointly with a bag-of-words(BOW) approach,reducing considerably the computational burden and memory requirements in large scale scenarios.The similarity between two images is usually expressed by aggregating the similarities between corresponding lo-cal features.However,to the best of our knowledge,few attempts have been made to systematically analyze how to model the employed similarity measures.In this paper we present a probabilistic view of the fea-ture to feature similarity.We then derive a measure that is adaptive to the query feature.We show-both on simulated and real data-that the Euclidean distance density distribu-tion is highly query dependent and that our model adapts the original distance accordingly.While it is difﬁcult to know the distribution of true correspondences,it is actu-ally quite easy to estimate the distribution of the distance of non-corresponding features.The expected distance to the non-corresponding features can be used to adapt the origi-nal distance and can be efﬁciently estimated by introducing a small set of random features as negative examples.Fur-thermore,we derive a global similarity function that scores the feature to feature similarities.Based on simulated data, this function approximates the analytical result.Moreover,in contrast to some existing methods,our method does not require any parameter tuning to achieve its best performance on different datasets.Despite its simplic-ity,experimental results on standard benchmarks show that our method improves the retrieval accuracy consistently and signiﬁcantly and compares favorably to the state-of-the-art.Furthermore,all recently presented post-processing steps can still be applied on top of our method and yield an additional performance gain.The rest of this paper is organized as follows.Section2 gives an overview of related research.Section3describes our method in more detail.The experiments for evaluating our approach are described in Section4.Results in a large scale image retrieval system are presented in Section5and compared with the state-of-the-art.2.Related WorkMost of the recent works addressing the image similar-ity problem in image retrieval can be roughly grouped into three categories.Feature-feature similarity Theﬁrst group mainly works on establishing local feature correspondence.The most fa-mous work in this group is the bag-of-words(BOW)ap-proach[24].Two features are considered to be similar if they are assigned to the same visual word.Despite the efﬁ-ciency of the BOW model,the hard visual word assignment signiﬁcantly reduces the discriminative power of the local features.In order to reduce quantization artifacts,[20]pro-posed to assign each feature to multiple visual words.In contrast,[8]rely on using smaller codebooks but in con-junction with short binary codes for each local feature,re-ﬁning the feature matching within the same V oronoi cell. Additionally,product quantization[12]was used to esti-2013 IEEE Conference on Computer Vision and Pattern Recognitionmate the pairwise Euclidean distance between features,and the top k nearest neighbors of a query feature is considered as matches.Recently,several researchers have addressed the problem of the Euclidean distance not being the optimal similarity measure in most situations.For instance in[16], a probabilistic relationship between visual words is learned from a large collection of corresponding feature tracks.Al-ternatively,in[21],they learn a projection from the original feature space to a new space,such that Euclidean metric in this new space can appropriately model feature similarity. Intra-image similarity The second group focuses on effec-tively weighting the similarity of a feature pair considering its relationship to other matched pairs.Several authors exploit the property that the local fea-tures inside the same image are not independent.As a consequence,a direct accumulation of local feature sim-ilarities can lead to inferior performance.This problem was addressed in[4]by down-weighting the contribution of non-incidentally co-occurring features.In[9]this prob-lem was approached by re-weighting features according to their burstiness measurement.As the BOW approach discards spatial information,a scoring step can be introduced which exploits the property that the true matched feature pairs should follow a consis-tent spatial transformation.The authors of[19]proposed to use RANSAC to estimate the homography between im-ages,and only count the contribution of feature pairs con-sistent with this model.[26]and[23]propose to quantize the image transformation parameter space in a Hough vot-ing manner,and let each matching feature pair vote for its correspondent parameter cells.A feature pair is considered valid if it supports the cell of maximum votes.Inter-image similarity Finally,the third group addresses the problem of how to improve the retrieval performance by exploiting additional information contained in other images in the database,that depict the same object as the query im-age.[5]relies on query expansion.That is,after retrieving a set of spatially veriﬁed database images,this new set is used to query the system again to increase recall.In[22], a set of relevant images is constructed using k-reciprocal nearest neighbors,and the similarity score is evaluated on how similar a database image is to this set.Our work belongs to theﬁrst group.By formulating the feature-feature matching problem in a probabilistic frame-work,we propose an adaptive similarity to each query fea-ture,and a similarity function to approximate the quanti-tative result.Although the idea of adapting similarity by dissimilarity has already been exploited in[11][17],we pro-pose to measure dissimilarity by mean distance of the query to a set of random features,while theirs use k nearest neigh-bors(kNN).According to the fact that,in a realistic dataset, different objects may have different numbers of relevant im-ages,it is actually quite hard for the kNN based method to ﬁnd an generalized k for all queries.Moreover,as kNN is an order statistic,it could be sensitive to outliers and can’t be used reliably as an estimator in realistic scenarios.In con-trast,in our work,the set of random features could be con-sidered as a clean set of negative examples,and the mean operator is actually quite robust as shown later.Considering the large amount of data in a typical large scale image retrieval system,it is impractical to compute the pairwise distances between high-dimensional original feature vectors.However,several approaches exist to re-lieve that burden using efﬁcient approximations such as [12,13,3,6].For simplicity,we adopt the method proposed in[12]to estimate the distance between features.3.Our ApproachIn this section,we present a theoretical framework for modeling the visual similarity between a pair of features, given a pairwise measurement.We then derive an analytical model for computing the accuracy of the similarity estima-tion in order to compare different similarity measures.Fol-lowing the theoretical analysis,we continue the discussion on simulated data.Since the distribution of the Euclidean distance varies enormously from one query feature to an-other,we propose to normalize the distance locally to ob-tain similar degree of measurement across queries.Further-more,using the adaptive measure,we quantitatively analyze the similarity function on the simulated data and propose a function to approximate the quantitative result.Finally,we discuss how to integrate ourﬁndings into a retrieval system.3.1.A probabilistic view of similarity estimationWe are interested in modeling the visual similarity be-tween features based on a pairwise measurement.Let us denote as x i the local feature vectors from a query image and as Y={y1,...,y j,...,y n}a set of local fea-tures from a collection of database images.Furthermore, let m(x i,y j)denote a pairwise measurement between x i and y j.Finally T(x i)represents the set of features which are visually similar to x i,and F(x i)as the set of features which are dissimilar to x i.Instead of considering whether y j is similar to x i and how similar they look,we want to evaluate how likely y j belongs to T(x i)given a measure m.This can be modeled as followsf(x i,y j)=p(y j∈T(x i)|m(x i,y j))(1) For simplicity,we denote m j=m(x i,y j),T i=T(x i), and F i=F(x i).As y j either belongs to T i or F i,we have p(y j∈T i|m j)+p(y j∈F i|m j)=1(2) Furthermore,according to the Bayes Theoremp(y j∈T i|m j)=p(m j|y j∈T i)×p(y j∈T i)p(m j)(3)andp (y j ∈F i |m j )=p (m j |y j ∈F i )×p (y j ∈F i )p (m j )(4)Finally,by combining Equations 2,3and 4we getp (y j ∈T i |m j )=1+p (m j |y j ∈F i )p (m j |y j ∈T i )×p (y j ∈F i )p (y j ∈T i )−1(5)For large datasets the quantity p (y j ∈T i )can be modeled by the occurrence frequency of x i .Therefore,p (y j ∈T i )and p (y j ∈F i )only depend on the query feature x i .In contrast,p (m j |y j ∈T i )and p (m j |y j ∈F i )are the probability density functions of the distribution of m j ,for {y j |y ∈T i }and {y j |y ∈F i }.We will show in Section 3.3,how to generate simulated data for estimating these distributions.In Section 3.5we will further exploit these distributions in our framework.3.2.Estimation accuracySince the pairwise measurement between features is the only observation for our model,it is essential to estimate its reliability.Intuitively,an optimal measurement should be able to perfectly separate the true correspondences from the false ones.In other words,the better the measurement distinguishes the true correspondences from the false ones,the more accurately the feature similarity based on it can be estimated.Therefore,the measurement accuracy can be modeled as the expected pureness.Let T be a collection of all matched pairs of features,i.e,T ={(x,y )|y ∈T (x ))}(6)The probability that a pair of features is a true match given the measurement value z can be expressed asp (T |z )=p ((x,y )∈T |m (x,y )=z )(7)Furthermore,the probability of observing a measurement value z given a corresponding feature pair isp (z |T )=p (m (x,y )=z |(x,y )∈T )(8)Then,the accuracy for the similarity estimation isAcc (m )= ∞−∞p (T |z )×p (z |T )d z(9)with m some pairwise measurement and Acc (m )the accu-racy of the model based on m .Sincep (T |z )≤1and∞−∞p (z |T )d z =1(10)the accuracy of a measure m isAcc (m )≤1(11)andAcc (m )=1⇔p (T |z )=1,∀p (z |T )>0(12)This measure allows to compare the accuracy of different distance measurements as will be shown in the next section.3.3.Ground truth data generationIn order to model the property of T (x i ),we simulate cor-responding features using the following method:First,re-gions r i,0are detected on a random set of images by the Hessian Afﬁne detector[15].Then,we apply numerous ran-dom afﬁne warpings (using the afﬁne model proposed by ASIFT [25])to r i,0,and generate a set of related regions.Finally,SIFT features are computed on all regions resulting in {x i,1,x i,2,...,x i,n }as a subset of T (x i,0).The parameters for the simulated afﬁne transformation are selected randomly and some random jitter is added to model the detection errors occurring in a practical setting.The non-corresponding features F (x i )are simply generated by selecting 500K random patches extracted from a differ-ent and unrelated dataset.In this way,we also generate a dataset D containing 100K matched pairs of features from different images,and 1M non-matched paris.Figure 1de-picts two corresponding image patches randomly selected from the simulated data.Figure 1.Corresponding image patches for two randomly selected points of the simulated data3.4.Query adaptive distanceIt has been observed that the Euclidean distance is not an appropriate measurement for similarity [21,16,11].We argue that the Euclidean distance is a robust estimator when normalized locally.As an example,Figure 2depicts the distributions of the Euclidean distance of the corresponding and non corre-sponding features for the two different interest points shown in Figure 1.For each sample point x i ,we collected a set of 500corresponding features T (x i )using the procedure from Section 3.3and a set of 500K random non-corresponding features F (x i ).It can be seen,that the Euclidean dis-tance separates the matching from the non-matching fea-tures quite well in the local neighborhood of a given query feature x i .However,by averaging the distributions of T (x i )and F (x i )respectively for all queries x i ,the Euclidean distance loses its discriminative power.This explains,why the Eu-clidean distance has inferior performance in estimating vi-00.050.10.150.20.250.30.350.40.45T(x_1)F(x_1)T(x_2)F(x_2)Euclidean DistanceFigure 2.Distribution of the Euclidean distance for two points from the simulated data.The solid lines show the distribution for corresponding features T (x i ),whereas the dotted line depict non-corresponding ones F (x i ).sual similarity from a global point of view.A local adapta-tion is therefore necessary to recover the discriminability of the Euclidean Distance.Another property can also be observed in Figure 2:if a feature has a large distance to its correspondences,it also has a large distance to the non-matching features.By ex-ploiting this property,a normalization of the distance can be derived for each query featured n (x i ,y j )=d (x i ,y j )/N d (x i )(13)where d n (·,·)represents the normalized distance,d (·,·)represents the original Euclidean distance and N d (x i )rep-resents the expected distance of x i to its non-matching fea-tures.It is intractable to estimate the distance distribution between all feature and their correspondences,but it is sim-ple to estimate the expected distance to non-corresponding features.Since the non-corresponding features are inde-pendent from the query,a set of randomly sampled,thus unrelated features can be used to represent the set of non-correspondent features to each query.Moreover,if we as-sume the distance distribution of the non-corresponding set to follow a normal distribution N (μ,σ),then the estima-tion error of its mean based on a subset follows another normal distribution N (0,σ/N ),with N the size of the sub-set.Therefore,N d (x i )can be estimated sufﬁciently well and very efﬁciently from even a small set of random,i.e.non-corresponding features.The probability that an unknown feature matches to the query one when observing their distance z can be modeled as,p (T |z )=N T ×p (z |T )N T ×p (z |T )+N F ×p (z |F )={1+N FN T ×p (z |F )p (z |T )}−1(14)with N T and N F the number of corresponding and non-corresponding pairs respectively.In practical settings,N F is usually many orders of magnitude larger than N T .There-fore,once p (z |F )starts getting bigger than 0,p (T |z )rapidly decreases,and the corresponding features would be quickly get confused with the non-corresponding ones.Figure 3illustrates how the adaptive distance recovers more correct matches compared to the Euclidean distance.Moreover,by assuming that N F /N T ≈1000the measurement accuracy following Equation 9can be com-puted.For the Euclidean distance,the estimation accuracy is 0.7291,and for the adaptive distance,the accuracy is 0.7748.Our proposed distance thus signiﬁcantly outper-forms the Euclidean distance.3.5.Similarity functionIn this section,we show how to derive a globally appro-priate feature similarity in a quantitative manner.After hav-ing established the distance distribution of the query adap-tive distance in the previous section,the only unknown inEquation 5remains p (y j∈F i )p (y j ∈T i ).As discussed in Section 3.1,this quantity is inversely proportional to the occurrence frequency of x i ,and it isgenerally a very large term.Assuming c =p (y j∈F i )p (y j ∈T i )be-ing between 10and 100000,the full similarity function can be estimated and is depicted in Figure 4.The resulting curves follow an inverse sigmoid form such that the similarity is 1for d n →0and 0if d n →1.They all have roughly the same shape and differ approxi-mately only by an offset.It is to be noted,that they show a very sharp transition making it very difﬁcult to correctly estimate the transition point and thus to achieve a good sep-aration between true and false matches.In order to reduce the estimation error due to such sharp transitions,a smoother curve would be desirable.Since the distance distributions are all long-tailed,we have ﬁtted dif-ferent kinds of exponential functions to those curves.How-ever,we observe similar results.For the reason of simplic-ity,we choose to approximate the similarity function asf (x i ,y j )=exp(−α×d n (x i ,y j )4)(15)As can be seen in Figure 4,this curve is ﬂatter and coversapproximately the full range of possible values for c .In Equation 15,αcan be used to tune the shape of the ﬁnal function and roughly steers the slope of our function,we achieved best results with α=9and keep this value throughout all experiments.In the next section,the robustness of this function in real image retrieval system will be evaluated.3.6.Overall methodIn this section we will integrate the query adaptive dis-tance measurement and the similarity function presented00.020.040.060.080.10.120.140.160.18Euclidean Distance E m p i r i c a l P r o b a b i l i t yfalse correpondent pairs(a)Euclidean Distance 00.020.040.060.080.10.120.140.160.18Query Adaptive DistanceE m p i r i c a l P r o b a b i l i t yfalse correpondent pairs(b)Query Adaptive DistanceE m p i r i c a l P r o b a b i l i t yQuery Adaptive Distance(c)ComparisonFigure 3.The comparison of our adaptive distance to the Euclidean distance on dataset D .The solid lines are the distance distribution of the matched pairs,and the dotted lines are the distance distribution of non-matched pairs.The green dashed lines denotes where the probability of the non-matching distance exceed 0.1%,i.e,the non-matching feature is very likely to dominate our observation.A comparison of the right tails of both distributions is shown in (c).0.511.5200.10.20.30.40.50.60.70.80.91Our adaptive distanceS i m i l a r i t yc=10c=100c=1000c=10000c=100000Our functionFigure 4.Feature similarity evaluated on dataset D .Red lines are the visual similarity for different c evaluated on the simulated data.The blue line is our ﬁnal similarity function with α=9.before into an image retrieval system.Let the visual similarity between the query image q ={x 1,...,x m }and a database image d ={y 1,...,y n }besim (q,d )=m i =1n j =1f (x i ,y j )(16)with f (x i ,y j )the pairwise feature similarity as in Equa-tion 15.As mentioned before,d n (x i ,y j )and N d (x i )areestimated using the random set of features.For retrieval,we use a standard bag-of-words inverted ﬁle.However,in order to have an estimation of the pairwise distance d (x i ,y j )between query and database features,we add a product quantization scheme as in [12]and select the same parameters as the original author.The feature space is ﬁrstly partitioned into N c =20 000V oronoi cells ac-cording to a coarse quantization codebook K c .All features located in the same V oronoi cell are grouped into the same inverted list.Each feature is further quantized with respect to its coarse quantization centroid.That is,the residual be-tween the feature and its closest centroid is equally split into m =8parts and each part is separately quantized according to a product quantization codebook K p with N p =256cen-troids.Then,each feature is encoded using its related image identiﬁer and a set of quantization codes,and is stored in its corresponding inverted list.We select random features from Flickr and add 100of them to each inverted list.For performance reasons,we make sure that the random features are added to the inverted list before adding the database vectors.At query time,all inverted lists whose related coarse quantization centers are in the k nearest neighborhood of the query vector are scanned.With our indexing scheme,the distances to non-matching features are always computed ﬁrst,with their mean value being directly N d (x i ).Then,the query adap-tive distance d n (x i ,y j )to each database vector can directly be computed as in Equation 13.In order to reduce un-necessary computation even more,a threshold βis used to quickly drop features whose Euclidean distance is larger than β×N d (x i ).This parameter has little inﬂuence on the retrieval performance,but reduces the computational load signiﬁcantly.Its inﬂuence is evaluated in Section 4.As pointed out by [9],local features of an image tend to occur in bursts.In order to avoid multiple counting of statis-tically correlated features,we incorporate both “intra bursti-ness”and “inter burstiness”normalization [9]to re-weight the contributions of every pair of features.The similarity function thus changes tosim (q,d )=m i =1n j =1w (x i ,y j )f (x i ,y j )(17)with w (x i ,y j )the burstiness weighting.4.ExperimentsIn this part,weﬁrst introduce the evaluation protocol. Then we give some implementation details of our algo-rithm.Furthermore,we discuss the inﬂuence of each pa-rameter and experimentally select the best ones.Finally,we evaluate each part of our method separately.4.1.Datasets and performance evaluation protocolWe evaluated our method on the Oxford5k[19], Paris[20],Holidays[8]and Oxford105k dataset.Ox-ford105k consists of Oxford5k and100285distractor im-ages.The100285distractor images are a set of random images that we downloaded from Flickr having the same resolution of1024×768as the original Oxford5k dataset.We follow the same evaluation measurement method as proposed in the original publications,that is,the mean av-erage precision(mAP)is calculated as the overall perfor-mance of the retrieval system.4.2.Implementation detailsPreprocessing For all experiments,all images are resized such that their maximum resolution is1024×768.In each image,interest points are detected using the Hessian Afﬁne detector and a SIFT descriptor is computed around each point.As in[2]a square root scaling is applied to each SIFT vector,yielding a signiﬁcantly better retrieval performance when using the Euclidean metric.Codebook training The vocabularies were trained on an independent dataset of images randomly downloaded from Flickr in order to prevent overﬁtting to the datasets. Random feature dataset preparation Random images from Flickr(however different from the codebook training dataset)are used to generate the random feature dataset. 4.3.Parameter selectionIn this section,we evaluate the retrieval performance of our approach on the Oxford5K dataset for different settings of parameters.There are two parameters in our method:the number of random features in each inverted list,and the cut-off thresholdβforﬁltering out features whose contribution is negligible.The inﬂuence of the number of the random features Ta-ble1shows the retrieval performance by varying the num-ber of random features for each inverted list.The perfor-mance remains almost constant for a very large range of number of random features.This supports the assumption, that the mean distance of a query feature to the dissimilar features can be robustly estimated even with a small num-ber of random features.We select100random features per inverted list throughout the rest of this paper.The inﬂuence of the cut-off thresholdβTable2showsthat features with a distance larger thanβ×N d(xi)withLength50100500100010000mAP0.7390.7390.7390.7390.738 Table1.Inﬂuence of the size of the random feature set for eachinverted list on Oxford5kβ0.800.850.90.95similarity score0.0250.0090.0030.001#selected features1343124292mAP0.7330.7390.7400.739Table2.Inﬂuence of the cut-off valueβon Oxford5kβ∈[0.8,0.95]have almost no contribution to the retrievalperformance.In order to reduce the number of updates ofthe scoring table,we selectβ=0.85for all experiments.4.4.Effectiveness of our methodLocal adaptive distance In order to compare the adap-tive distance function to the Euclidean distance,we use athreshold for separating matching and non-matching fea-tures.Figure4.4shows the retrieval performance for a vary-ing threshold both for the Euclidean distance as well as forthe adaptive distance.Overall,the best mAP using the adap-tive distance is3%better than the Euclidean distance.Fur-thermore,the adaptive distance is less sensitive when se-lecting a non-optimal threshold.It is to be noted that in theﬁnal setup,our method does not require any thresholding.00.10.20.30.40.50.60.70.80.910.050.10.150.20.250.30.350.40.450.50.550.60.650.670.69query adapted distanceEuclidean distanceThresholdparison of our adaptive distance with Euclidean dis-tance on Oxford5k datasetContributions of other steps In order to justify the contri-bution of other steps that are contained in our method,weevaluate the performance of our method by taking them outof the pipeline.For the experiment on Oxford5k,weﬁndout that without the feature scaling,mAP will drop from0.739to0.707,while without burstiness weighting,mAPwill drop to0.692.With multi-assignment only on the queryside,mAP can increase from0.739to0.773for MA=5,and0.780for MA=10.MA denotes the number of in-verted lists that are traversed per query feature.5.ResultsThroughout all experiments,the set of parameters was ﬁxed to the values obtained in the previous section and vocabularies were trained always on independent datasets.Table 3shows the retrieval performance on all typical benchmarks both with single assignment (SA)and multi-assignment (MA =10).As expected,multi-assignment (scanning of several inverted lists)reduces the quantization artifacts and improves the performance consistently,how-ever,in exchange for more computational load.Furthermore,we applied an image level post-processing step on top of our method.We choose to use reciprocal nearest neighbors (RNN)[22],for the reason that it can be easily integrated on top of a retrieval system independently from the image similarity function.We adopt the publicly available code [1]provided by the original authors and the default settings.RNN signiﬁcantly improves the results on Oxford5K and Paris datasets,but slightly lowers the result on Holidays.Considering that RNN tries to exploit addi-tional information contained in other relevant database im-ages,which are scarce in Holidays (in average only 2to 3relevant database images per query),it is difﬁcult for query expansion methods to perform much better.Dataset SA MA MA +RNN Oxford5k 0.7390.7800.850Oxford105k 0.6780.7280.816Paris 0.7030.7360.855Holidays0.8140.8210.801Table 3.Performance of our method on public datasets.parison with state-of-the-art00.050.10.150.20.250.30.350.40.450.50.550.60.650.70.750.8INRIA Holiday Oxford5K ParisTop kFigure 6.Retrieval Performance by using top k nearest neighbor as similar features [12]We ﬁrst compare the performance of our method to [12]which relies on using the top k nearest neighbors of the Euclidean distance for selecting the similar features of aquery.This work is closest to ours,both in memory over-head and computational complexity.It can be seen in Fig-ure 6,that no single k maximizes the performance for all datasets,showing that this parameter is very sensitive to the data.Moreover,our method outperforms the peak results from [12]consistently by roughly 10points of mAP.Table 4shows the comparison to several other methods without applying any image-level post-processing step.As pointed out by [10],training a vocabulary on independent data rather than the evaluated dataset itself can better rep-resent the search performance in a very large dataset.We only compare to state-of-the-art methods using codebooks trained on independent datasets.We achieve the best per-formance for Oxford5k,Oxford105k,and Holidays and fall only slightly behind [16]on Paris.Dataset Ours [16][7][18]Oxford5k 0.7800.7420.7040.725Oxford105k 0.7280.675∗-0.652Paris 0.7360.749--Holidays0.8210.749∗∗0.8170.769/0.818∗∗Table parisons with state-of-the-art methods without apply-ing image level post-processing.∗indicates the score of merging Oxford5k and Paris and 100K distractor images.∗∗denotes the result obtained by manually rotating all images in the Holidays dataset to be upright.Furthermore,Table 5gives a comparison for the results when additional image-level post-processing steps are ap-plied.We argue,that any post-processing step can directly beneﬁt from our method and illustrate with RNN as exam-ple that the best performance can be achieved.Dataset Ours+RNN [16][18][2]Oxford5k 0.8500.8490.8220.809Oxford105k 0.8160.7950.7720.722Paris 0.8550.824-0.765Holidays0.8010.758∗∗0.78-Table parisons with the state of art methods with post-processing in image level.∗∗denotes the result obtained by man-ually rotating all images in the Holidays dataset to be upright.In all of the previous experiments,each feature costs 12bytes of memory.Speciﬁcally,4bytes is used for the image identiﬁer and 8bytes for the quantization codes.As [11]mainly show results using more bytes for feature encoding,we also compare our method to theirs with more bytes per feature.As shown in Table 6,using more bytes further im-proves the retrieval results.Even with less bytes than [11],better performance is achieved on all datasets.In all experiments,we compare favorably to the state-of-the-art by exploiting a simple similarity function without any parameter tuning for each dataset.The good results。