nipsdlufl10-ProbabilisticModelSemanticWordVectors
深度优先局部聚合哈希

Vol.48,No.6Jun. 202 1第48卷第6期2 0 2 1年6月湖南大学学报)自然科学版)Journal of Hunan University (Natural Sciences )文章编号:1674-2974(2021 )06-0058-09 DOI : 10.16339/ki.hdxbzkb.2021.06.009深度优先局艺B 聚合哈希龙显忠g,程成李云12(1.南京邮电大学计算机学院,江苏南京210023;2.江苏省大数据安全与智能处理重点实验室,江苏南京210023)摘 要:已有的深度监督哈希方法不能有效地利用提取到的卷积特征,同时,也忽视了数据对之间相似性信息分布对于哈希网络的作用,最终导致学到的哈希编码之间的区分性不足.为了解决该问题,提出了一种新颖的深度监督哈希方法,称之为深度优先局部聚合哈希(DeepPriority Local Aggregated Hashing , DPLAH ). DPLAH 将局部聚合描述子向量嵌入到哈希网络 中,提高网络对同类数据的表达能力,并且通过在数据对之间施加不同权重,从而减少相似性 信息分布倾斜对哈希网络的影响.利用Pytorch 深度框架进行DPLAH 实验,使用NetVLAD 层 对Resnet18网络模型输出的卷积特征进行聚合,将聚合得到的特征进行哈希编码学习.在CI-FAR-10和NUS-WIDE 数据集上的图像检索实验表明,与使用手工特征和卷积神经网络特征的非深度哈希学习算法的最好结果相比,DPLAH 的平均准确率均值要高出11%,同时,DPLAH 的平均准确率均值比非对称深度监督哈希方法高出2%.关键词:深度哈希学习;卷积神经网络;图像检索;局部聚合描述子向量中图分类号:TP391.4文献标志码:ADeep Priority Local Aggregated HashingLONG Xianzhong 1,覮,CHENG Cheng1,2,LI Yun 1,2(1. School of Computer Science & Technology ,Nanjing University of Posts and Telecommunications ,Nanjing 210023, China ;2. Key Laboratory of Jiangsu Big Data Security and Intelligent Processing ,Nanjing 210023, China )Abstract : The existing deep supervised hashing methods cannot effectively utilize the extracted convolution fea tures, but also ignore the role of the similarity information distribution between data pairs on the hash network, result ing in insufficient discrimination between the learned hash codes. In order to solve this problem, a novel deep super vised hashing method called deep priority locally aggregated hashing (DPLAH) is proposed in this paper, which em beds the vector of locally aggregated descriptors (VLAD) into the hash network, so as to improve the ability of the hashnetwork to express the similar data, and reduce the impact of similarity distribution skew on the hash network by im posing different weights on the data pairs. DPLAH experiment is carried out by using the Pytorch deep framework. Theconvolution features of the Resnet18 network model output are aggregated by using the NetVLAD layer, and the hashcoding is learned by using the aggregated features. The image retrieval experiments on the CIFAR-10 and NUS - WIDE datasets show that the mean average precision (MAP) of DPLAH is11 percentage points higher than that of* 收稿日期:2020-04-26基金项目:国家自然科学基金资助项目(61906098,61772284),National Natural Science Foundation of China(61906098, 61772284);国家重 点研发计划项目(2018YFB 1003702) , National Key Research and Development Program of China (2018YFB1003702)作者简介:龙显忠(1985—),男,河南信阳人,南京邮电大学讲师,工学博士,硕士生导师覮 通信联系人,E-mail : *************.cn第6期龙显忠等:深度优先局部聚合哈希59non-deep hash learning algorithms using manual features and convolution neural network features,and the MAP of DPLAH is2percentage points higher than that of asymmetric deep supervised hashing method.Key words:deep Hash learning;convolutional neural network;image retrieval;vector of locally aggregated de-scriptors(VLAD)随着信息检索技术的不断发展和完善,如今人们可以利用互联网轻易获取感兴趣的数据内容,然而,信息技术的发展同时导致了数据规模的迅猛增长.面对海量的数据以及超大规模的数据集,利用最近邻搜索[1(Nearest Neighbor Search,NN)的检索技术已经无法获得理想的检索效果与可接受的检索时间.因此,近年来,近似最近邻搜索[2(Approximate Nearest Neighbor Search,ANN)变得越来越流行,它通过搜索可能相似的几个数据而不再局限于返回最相似的数据,在牺牲可接受范围的精度下提高了检索效率.作为一种广泛使用的ANN搜索技术,哈希方法(Hashing)[3]将数据转换为紧凑的二进制编码(哈希编码)表示,同时保证相似的数据对生成相似的二进制编码.利用哈希编码来表示原始数据,显著减少了数据的存储和查询开销,从而可以应对大规模数据中的检索问题.因此,哈希方法吸引了越来越多学者的关注.当前哈希方法主要分为两类:数据独立的哈希方法和数据依赖的哈希方法,这两类哈希方法的区别在于哈希函数是否需要训练数据来定义.局部敏感哈希(Locality Sensitive Hashing,LSH)[4]作为数据独立的哈希代表,它利用独立于训练数据的随机投影作为哈希函数•相反,数据依赖哈希的哈希函数需要通过训练数据学习出来,因此,数据依赖的哈希也被称为哈希学习,数据依赖的哈希通常具有更好的性能.近年来,哈希方法的研究主要侧重于哈希学习方面.根据哈希学习过程中是否使用标签,哈希学习方法可以进一步分为:监督哈希学习和无监督哈希学习.典型的无监督哈希学习包括:谱哈希[5(Spectral Hashing,SH);迭代量化哈希[6](Iterative Quantization, ITQ);离散图哈希[7(Discrete Graph Hashing,DGH);有序嵌入哈希[8](Ordinal Embedding Hashing,OEH)等.无监督哈希学习方法仅使用无标签的数据来学习哈希函数,将输入的数据映射为哈希编码的形式.相反,监督哈希学习方法通过利用监督信息来学习哈希函数,由于利用了带有标签的数据,监督哈希方法往往比无监督哈希方法具有更好的准确性,本文的研究主要针对监督哈希学习方法.传统的监督哈希方法包括:核监督哈希[9](Supervised Hashing with Kernels,KSH);潜在因子哈希[10](Latent Factor Hashing,LFH);快速监督哈希[11](Fast Supervised Hashing,FastH);监督离散哈希[1(Super-vised Discrete Hashing,SDH)等.随着深度学习技术的发展[13],利用神经网络提取的特征已经逐渐替代手工特征,推动了深度监督哈希的进步.具有代表性的深度监督哈希方法包括:卷积神经网络哈希[1(Convolutional Neural Networks Hashing,CNNH);深度语义排序哈希[15](Deep Semantic Ranking Based Hash-ing,DSRH);深度成对监督哈希[16](Deep Pairwise-Supervised Hashing,DPSH);深度监督离散哈希[17](Deep Supervised Discrete Hashing,DSDH);深度优先哈希[18](Deep Priority Hashing,DPH)等.通过将特征学习和哈希编码学习(或哈希函数学习)集成到一个端到端网络中,深度监督哈希方法可以显著优于非深度监督哈希方法.到目前为止,大多数现有的深度哈希方法都采用对称策略来学习查询数据和数据集的哈希编码以及深度哈希函数.相反,非对称深度监督哈希[19](Asymmetric Deep Supervised Hashing,ADSH)以非对称的方式处理查询数据和整个数据库数据,解决了对称方式中训练开销较大的问题,仅仅通过查询数据就可以对神经网络进行训练来学习哈希函数,整个数据库的哈希编码可以通过优化直接得到.本文的模型同样利用了ADSH的非对称训练策略.然而,现有的非对称深度监督哈希方法并没有考虑到数据之间的相似性分布对于哈希网络的影响,可能导致结果是:容易在汉明空间中保持相似关系的数据对,往往会被训练得越来越好;相反,那些难以在汉明空间中保持相似关系的数据对,往往在训练后得到的提升并不显著.同时大部分现有的深度监督哈希方法在哈希网络中没有充分有效利用提60湖南大学学报(自然科学版)2021年取到的卷积特征.本文提出了一种新的深度监督哈希方法,称为深度优先局部聚合哈希(Deep Priority Local Aggregated Hashing,DPLAH).DPLAH的贡献主要有三个方面:1)DPLAH采用非对称的方式处理查询数据和数据库数据,同时DPLAH网络会优先学习查询数据和数据库数据之间困难的数据对,从而减轻相似性分布倾斜对哈希网络的影响.2)DPLAH设计了全新的深度哈希网络,具体来说,DPLAH将局部聚合表示融入到哈希网络中,提高了哈希网络对同类数据的表达能力.同时考虑到数据的局部聚合表示对于分类任务的有效性.3)在两个大型数据集上的实验结果表明,DPLAH在实际应用中性能优越.1相关工作本节分别对哈希学习[3]、NetVLAD[20]和Focal Loss[21]进行介绍.DPLAH分别利用NetVLAD和Focal Loss提高哈希网络对同类数据的表达能力及减轻数据之间相似性分布倾斜对于哈希网络的影响. 1.1哈希学习哈希学习[3]的任务是学习查询数据和数据库数据的哈希编码表示,同时要满足原始数据之间的近邻关系与数据哈希编码之间的近邻关系相一致的条件.具体来说,利用机器学习方法将所有数据映射成{0,1}r形式的二进制编码(r表示哈希编码长度),在原空间中不相似的数据点将被映射成不相似)即汉明距离较大)的两个二进制编码,而原空间中相似的两个数据点将被映射成相似(即汉明距离较小)的两个二进制编码.为了便于计算,大部分哈希方法学习{-1,1}r形式的哈希编码,这是因为{-1,1}r形式的哈希编码对之间的内积等于哈希编码的长度减去汉明距离的两倍,同时{-1,1}r形式的哈希编码可以容易转化为{0,1}r形式的二进制编码.图1是哈希学习的示意图.经过特征提取后的高维向量被用来表示原始图像,哈希函数h将每张图像映射成8bits的哈希编码,使原来相似的数据对(图中老虎1和老虎2)之间的哈希编码汉明距离尽可能小,原来不相似的数据对(图中大象和老虎1)之间的哈希编码汉明距离尽可能大.h(大象)=10001010h(老虎1)=01100001h(老虎2)=01100101相似度尽可能小相似度尽可能大图1哈希学习示意图Fig.1Hashing learning diagram1.2NetVLADNetVLAD的提出是用于解决端到端的场景识别问题[20(场景识别被当作一个实例检索任务),它将传统的局部聚合描述子向量(Vector of Locally Aggregated Descriptors,VLAD[22])结构嵌入到CNN网络中,得到了一个新的VLAD层.可以容易地将NetVLAD 使用在任意CNN结构中,利用反向传播算法进行优化,它能够有效地提高对同类别图像的表达能力,并提高分类的性能.NetVLAD的编码步骤为:利用卷积神经网络提取图像的卷积特征;利用NetVLAD层对卷积特征进行聚合操作.图2为NetVLAD层的示意图.在特征提取阶段,NetVLAD会在最后一个卷积层上裁剪卷积特征,并将其视为密集的描述符提取器,最后一个卷积层的输出是H伊W伊D映射,可以将其视为在H伊W空间位置提取的一组D维特征,该方法在实例检索和纹理识别任务[23別中都表现出了很好的效果.NetVLAD layer(KxD)x lVLADvectorh------->图2NetVLAD层示意图⑷Fig.2NetVLAD layer diagram1201NetVLAD在特征聚合阶段,利用一个新的池化层对裁剪的CNN特征进行聚合,这个新的池化层被称为NetVLAD层.NetVLAD的聚合操作公式如下:NV((,k)二移a(x)(血⑺-C((j))(1)i=1式中:血(j)和C)(j)分别表示第i个特征的第j维和第k个聚类中心的第j维;恣&)表示特征您与第k个视觉单词之间的权.NetVLAD特征聚合的输入为:NetVLAD裁剪得到的N个D维的卷积特征,K个聚第6期龙显忠等:深度优先局部聚合哈希61类中心.VLAD的特征分配方式是硬分配,即每个特征只和对应的最近邻聚类中心相关联,这种分配方式会造成较大的量化误差,并且,这种分配方式嵌入到卷积神经网络中无法进行反向传播更新参数.因此,NetVLAD采用软分配的方式进行特征分配,软分配对应的公式如下:-琢II Xi-C*II 2=—e(2)-琢II X-Ck,II2k,如果琢寅+肄,那么对于最接近的聚类中心,龟&)的值为1,其他为0.aS)可以进一步重写为:w j X i+b ka(x i)=—e-)3)w J'X i+b kk,式中:W k=2琢C k;b k=-琢||C k||2.最终的NetVLAD的聚合表示可以写为:N w;x+b kv(j,k)=移—----(x(j)-Ck(j))(4)i=1w j.X i+b k移ek,1.3Focal Loss对于目标检测方法,一般可以分为两种类型:单阶段目标检测和两阶段目标检测,通常情况下,两阶段的目标检测效果要优于单阶段的目标检测.Lin等人[21]揭示了前景和背景的极度不平衡导致了单阶段目标检测的效果无法令人满意,具体而言,容易被分类的背景虽然对应的损失很低,但由于图像中背景的比重很大,对于损失依旧有很大的贡献,从而导致收敛到不够好的一个结果.Lin等人[21]提出了Focal Loss应对这一问题,图3是对应的示意图.使用交叉爛作为目标检测中的分类损失,对于易分类的样本,它的损失虽然很低,但数据的不平衡导致大量易分类的损失之和压倒了难分类的样本损失,最终难分类的样本不能在神经网络中得到有效的训练.Focal Loss的本质是一种加权思想,权重可根据分类正确的概率p得到,利用酌可以对该权重的强度进行调整.针对非对称深度哈希方法,希望难以在汉明空间中保持相似关系的数据对优先训练,具体来说,对于DPLAH的整体训练损失,通过施加权重的方式,相对提高难以在汉明空间中保持相似关系的数据对之间的训练损失.然而深度哈希学习并不是一个分类任务,因此无法像Focal Loss一样根据分类正确的概率设计权重,哈希学习的目的是学到保相似性的哈希编码,本文最终利用数据对哈希编码的相似度作为权重的设计依据具体的权重形式将在模型部分详细介绍.正确分类的概率图3Focal Loss示意图[21】Fig.3Focal Loss diagram12112深度优先局部聚合哈希2.1基本定义DPLAH模型采用非对称的网络设计.Q={0},=1表示n张查询图像,X={X i}m1表示数据库有m张图像;查询图像和数据库图像的标签分别用Z={Z i},=1和Y ={川1表示;i=[Z i1,…,zj1,i=1,…,n;c表示类另数;如果查询图像0属于类别j,j=1,…,c;那么z”=1,否则=0.利用标签信息,可以构造图像对的相似性矩阵S沂{-1,1}"伊”,s”=1表示查询图像q,和数据库中的图像X j语义相似,S j=-1表示查询图像和数据库中的图像X j语义不相似.深度哈希方法的目标是学习查询图像和数据库中图像的哈希编码,查询图像的哈希编码用U沂{-1,1}"",表示,数据库中图像的哈希编码用B沂{-1,1}m伊r表示,其中r表示哈希编码的长度.对于DPLAH模型,它在特征提取部分采用预训练好的Resnet18网络[25].图4为DPLAH网络的结构示意图,利用NetVLAD层聚合Resnet18网络提取到的卷积特征,哈希编码通过VLAD编码得到,由于VLAD编码在分类任务中被广泛使用,于是本文将NetVLAD层的输出作为分类任务的输入,利用图像的标签信息监督NetVLAD层对卷积特征的利用.事实上,任何一种CNN模型都能实现图像特征提取的功能,所以对于选用哪种网络进行特征学习并不是本文的重点.62湖南大学学报(自然科学版)2021年conv1图4DPLAH结构Fig.4DPLAH structure图像标签soft-max1,0,1,1,0□1,0,0,0,11,1,0,1,0---------*----------VLADVLAD core)c)l・>:i>数据库图像的哈希编码2.2DPLAH模型的目标函数为了学习可以保留查询图像与数据库图像之间相似性的哈希编码,一种常见的方法是利用相似性的监督信息S e{-1,1}n伊"、生成的哈希编码长度r,以及查询图像的哈希编码仏和数据库中图像的哈希编码b三者之间的关系[9],即最小化相似性的监督信息与哈希编码对内积之间的L损失.考虑到相似性分布的倾斜问题,本文通过施加权重来调节查询图像和数据库图像之间的损失,其公式可以表示为:min J=移移(1-w)(u T b j-rs)专,B i=1j=1s.t.U沂{-1,1}n伊r,B沂{-1,1}m伊r,W沂R n伊m(5)受FocalLoss启发,希望深度哈希网络优先训练相似性不容易保留图像对,然而Focal Loss利用图像的分类结果对损失进行调整,因此,需要重新进行设计,由于哈希学习的目的是为了保留图像在汉明空间中的相似性关系,本文利用哈希编码的余弦相似度来设计权重,其表达式为:1+。
VICTOR Nivo Multimode Plate Reader 版本说明书

VICTOR Nivo Multimode Plate ReaderMultimode DetectionDescriptionThe VICTOR ® Nivo ™ is a high-performance filter-based multimode plate reader system that can be equipped with all major detection technologies – absorbance, luminescence, fluorescence intensity, time-resolved fluorescence, fluorescence polarization, and Alpha. It is a compact, light-weight instrument designed for life science research laboratories performing routine low-throughput assays, or assay development work, and with diverse application requirements. The system’s software has a modern, workflow-oriented user interface that is easy to learn and includes pre-defined application protocols to get users productive quickly . In addition, MyAssays Desktop Standard software is provided for data analysis.Detection TechnologiesThe system incorporates a dynamic wheel system with space for storage of up to 32 filters, providing ready access to filters for a large number of dyes. Filters are exchanged between the inner and outer filter wheels, so any individual filter can serve either excitation or emission lightpaths. As a result, there’s no need to install new filters when switching between assays, and filters can be locked within the system so they can’t be mislaid in the lab – ideal for multi-user environments. When fully-loaded, the filter system provides the flexibility to detect many dyes with better sensitivityand greater cost-effectiveness compared to a monochromator. For absorbance measurements, there is a choice of either a filter- or a spectrometer-based system. Full spectrum absorbance measurements are ultra-fast – 230 to 1000 nm at selectable resolutions (2.0 nm, 5.0 nm, 10 nm) in less than one second per well. The spectrometer system also allows for the detection of a wide range of dyes or measurement of samples with unknown absorbance spectra. The system also features high-performance Alpha laser technology, validated for use with our proprietary AlphaScreen ® and AlphaLISA ® technologies.Key Features• Available in four configurations - standard models include absorbance, luminescence, and fluorescence; option to add time-resolved fluorescence, fluorescence polarization, and/or Alpha • T op and bottom reading of all standard technologies (with the exception of Alpha) for plate formats up to 1536-wells • Compact, lightweight instrument frees-up bench space and is easy to move • Internal dynamic filter wheel system with space for up to 32 filters • For absorbance, choice of filter-based detection for best sensitivity or spectrometer for wavelength flexibility • Time resolved fluorescence certified for use with proprietary LANCE ® and HTRF ® technologies • Enhanced Security software for regulated environments provides technological controls and features that support 21 CFR Part 11 compliance • Laser based Alpha detection capabilities for fast and sensitive Alpha measurements • Browser-based software enables control from a variety of devices – PC, laptop, or tablet • Controllable via network or Wi-Fi to facilitate remote working and data access • Optional dispenser for applications such as fast kinetics, flash luminescence, or dual addition assays • Integrated temperature control and optional gas control unit to keep cells healthy during long term kinetic assaysFor research purposes only. Not for use in diagnostic procedures.For a complete listing of our global offices, visit /ContactUsCopyright ©2020, PerkinElmer, Inc. All rights reserved. PerkinElmer ® is a registered trademark of PerkinElmer, Inc. All other trademarks are the property of their respective owners.58608 PKIPerkinElmer, Inc. 940 Winter StreetWaltham, MA 02451 USA P: (800) 762-4000 or (+1) 203-925-4602*ATP detected by VICTOR Nivo with dispenser。
Probabilistic model checking of an anonymity system

Probabilistic Model Checking ofan Anonymity SystemVitaly ShmatikovSRI International333Ravenswood AvenueMenlo Park,CA94025U.S.A.shmat@AbstractWe use the probabilistic model checker PRISM to analyze the Crowds system for anonymous Web browsing.This case study demonstrates howprobabilistic model checking techniques can be used to formally analyze se-curity properties of a peer-to-peer group communication system based onrandom message routing among members.The behavior of group mem-bers and the adversary is modeled as a discrete-time Markov chain,and thedesired security properties are expressed as PCTL formulas.The PRISMmodel checker is used to perform automated analysis of the system and ver-ify anonymity guarantees it provides.Our main result is a demonstration ofhow certain forms of probabilistic anonymity degrade when group size in-creases or random routing paths are rebuilt,assuming that the corrupt groupmembers are able to identify and/or correlate multiple routing paths originat-ing from the same sender.1IntroductionFormal analysis of security protocols is a well-establishedfield.Model checking and theorem proving techniques[Low96,MMS97,Pau98,CJM00]have been ex-tensively used to analyze secrecy,authentication and other security properties ofprotocols and systems that employ cryptographic primitives such as public-key en-cryption,digital signatures,etc.Typically,the protocol is modeled at a highly ab-stract level and the underlying cryptographic primitives are treated as secure“black boxes”to simplify the model.This approach discovers attacks that would succeed even if all cryptographic functions were perfectly secure.Conventional formal analysis of security is mainly concerned with security against the so called Dolev-Yao attacks,following[DY83].A Dolev-Yao attacker is a non-deterministic process that has complete control over the communication net-work and can perform any combination of a given set of attacker operations,such as intercepting any message,splitting messages into parts,decrypting if it knows the correct decryption key,assembling fragments of messages into new messages and replaying them out of context,etc.Many proposed systems for anonymous communication aim to provide strong, non-probabilistic anonymity guarantees.This includes proxy-based approaches to anonymity such as the Anonymizer[Ano],which hide the sender’s identity for each message by forwarding all communication through a special server,and MIX-based anonymity systems[Cha81]that blend communication between dif-ferent senders and recipients,thus preventing a global eavesdropper from linking sender-recipient pairs.Non-probabilistic anonymity systems are amenable to for-mal analysis in the same non-deterministic Dolev-Yao model as used for verifica-tion of secrecy and authentication protocols.Existing techniques for the formal analysis of anonymity in the non-deterministic model include traditional process formalisms such as CSP[SS96]and a special-purpose logic of knowledge[SS99].In this paper,we use probabilistic model checking to analyze anonymity prop-erties of a gossip-based system.Such systems fundamentally rely on probabilistic message routing to guarantee anonymity.The main representative of this class of anonymity systems is Crowds[RR98].Instead of protecting the user’s identity against a global eavesdropper,Crowds provides protection against collaborating local eavesdroppers.All communication is routed randomly through a group of peers,so that even if some of the group members collaborate and share collected lo-cal information with the adversary,the latter is not likely to distinguish true senders of the observed messages from randomly selected forwarders.Conventional formal analysis techniques that assume a non-deterministic at-tacker in full control of the communication channels are not applicable in this case. Security properties of gossip-based systems depend solely on the probabilistic be-havior of protocol participants,and can be formally expressed only in terms of relative probabilities of certain observations by the adversary.The system must be modeled as a probabilistic process in order to capture its properties faithfully.Using the analysis technique developed in this paper—namely,formalization of the system as a discrete-time Markov chain and probabilistic model checking of2this chain with PRISM—we uncovered two subtle properties of Crowds that causedegradation of the level of anonymity provided by the system to the users.First,if corrupt group members are able to detect that messages along different routingpaths originate from the same(unknown)sender,the probability of identifyingthat sender increases as the number of observed paths grows(the number of pathsmust grow with time since paths are rebuilt when crowd membership changes).Second,the confidence of the corrupt members that they detected the correct senderincreases with the size of the group.Thefirstflaw was reported independently byMalkhi[Mal01]and Wright et al.[W ALS02],while the second,to the best ofour knowledge,was reported for thefirst time in the conference version of thispaper[Shm02].In contrast to the analysis by Wright et al.that relies on manualprobability calculations,we discovered both potential vulnerabilities of Crowds byautomated probabilistic model checking.Previous research on probabilistic formal models for security focused on(i)probabilistic characterization of non-interference[Gra92,SG95,VS98],and(ii)process formalisms that aim to faithfully model probabilistic properties of crypto-graphic primitives[LMMS99,Can00].This paper attempts to directly model andanalyze security properties based on discrete probabilities,as opposed to asymp-totic probabilities in the conventional cryptographic sense.Our analysis methodis applicable to other probabilistic anonymity systems such as Freenet[CSWH01]and onion routing[SGR97].Note that the potential vulnerabilities we discovered inthe formal model of Crowds may not manifest themselves in the implementationsof Crowds or other,similar systems that take measures to prevent corrupt routersfrom correlating multiple paths originating from the same sender.2Markov Chain Model CheckingWe model the probabilistic behavior of a peer-to-peer communication system as adiscrete-time Markov chain(DTMC),which is a standard approach in probabilisticverification[LS82,HS84,Var85,HJ94].Formally,a Markov chain can be definedas consisting in afinite set of states,the initial state,the transition relation such that,and a labeling functionfrom states to afinite set of propositions.In our model,the states of the Markov chain will represent different stages ofrouting path construction.As usual,a state is defined by the values of all systemvariables.For each state,the corresponding row of the transition matrix de-fines the probability distributions which govern the behavior of group members once the system reaches that state.32.1Overview of PCTLWe use the temporal probabilistic logic PCTL[HJ94]to formally specify properties of the system to be checked.PCTL can express properties of the form“under any scheduling of processes,the probability that event occurs is at least.”First,define state formulas inductively as follows:where atomic propositions are predicates over state variables.State formulas of the form are explained below.Define path formulas as follows:Unlike state formulas,which are simplyfirst-order propositions over a single state,path formulas represent properties of a chain of states(here path refers to a sequence of state space transitions rather than a routing path in the Crowds speci-fication).In particular,is true iff is true for every state in the chain;is true iff is true for all states in the chain until becomes true,and is true for all subsequent states;is true iff and there are no more than states before becomes true.For any state and path formula,is a state formula which is true iff state space paths starting from satisfy path formula with probability greater than.For the purposes of this paper,we will be interested in formulas of the form ,evaluated in the initial state.Here specifies a system con-figuration of interest,typically representing a particular observation by the adver-sary that satisfies the definition of a successful attack on the protocol.Property is a liveness property:it holds in iff will eventually hold with greater than probability.For instance,if is a state variable represent-ing the number of times one of the corrupt members received a message from the honest member no.,then holds in iff the prob-ability of corrupt members eventually observing member no.twice or more is greater than.Expressing properties of the system in PCTL allows us to reason formally about the probability of corrupt group members collecting enough evidence to success-fully attack anonymity.We use model checking techniques developed for verifica-tion of discrete-time Markov chains to compute this probability automatically.42.2PRISM model checkerThe automated analyses described in this paper were performed using PRISM,aprobabilistic model checker developed by Kwiatkowska et al.[KNP01].The toolsupports both discrete-and continuous-time Markov chains,and Markov decisionprocesses.As described in section4,we model probabilistic peer-to-peer com-munication systems such as Crowds simply as discrete-time Markov chains,andformalize their properties in PCTL.The behavior of the system processes is specified using a simple module-basedlanguage inspired by Reactive Modules[AH96].State variables are declared in thestandard way.For example,the following declarationdeliver:bool init false;declares a boolean state variable deliver,initialized to false,while the followingdeclarationconst TotalRuns=4;...observe1:[0..TotalRuns]init0;declares a constant TotalRuns equal to,and then an integer array of size,indexed from to TotalRuns,with all elements initialized to.State transition rules are specified using guarded commands of the form[]<guard>-><command>;where<guard>is a predicate over system variables,and<command>is the tran-sition executed by the system if the guard condition evaluates to mandoften has the form<expression>...<expression>, which means that in the next state(i.e.,that obtained after the transition has beenexecuted),state variable is assigned the result of evaluating arithmetic expres-sion<expression>If the transition must be chosen probabilistically,the discrete probability dis-tribution is specified as[]<guard>-><prob1>:<command1>+...+<probN>:<commandN>;Transition represented by command is executed with probability prob,and prob.Security properties to be checked are stated as PCTL formulas (see section2.1).5Given a formal system specification,PRISM constructs the Markov chain and determines the set of reachable states,using MTBDDs and BDDs,respectively. Model checking a PCTL formula reduces to a combination of reachability-based computation and solving a system of linear equations to determine the probability of satisfying the formula in each reachable state.The model checking algorithms employed by PRISM include[BdA95,BK98,Bai98].More details about the im-plementation and operation of PRISM can be found at http://www.cs.bham. /˜dxp/prism/and in[KNP01].Since PRISM only supports model checking offinite DTMC,in our case study of Crowds we only analyze anonymity properties offinite instances of the system. By changing parameters of the model,we demonstrate how anonymity properties evolve with changes in the system configuration.Wright et al.[W ALS02]investi-gated related properties of the Crowds system in the general case,but they do not rely on tool support and their analyses are manual rather than automated.3Crowds Anonymity SystemProviding an anonymous communication service on the Internet is a challenging task.While conventional security mechanisms such as encryption can be used to protect the content of messages and transactions,eavesdroppers can still observe the IP addresses of communicating computers,timing and frequency of communi-cation,etc.A Web server can trace the source of the incoming connection,further compromising anonymity.The Crowds system was developed by Reiter and Ru-bin[RR98]for protecting users’anonymity on the Web.The main idea behind gossip-based approaches to anonymity such as Crowds is to hide each user’s communications by routing them randomly within a crowd of similar users.Even if an eavesdropper observes a message being sent by a particular user,it can never be sure whether the user is the actual sender,or is simply routing another user’s message.3.1Path setup protocolA crowd is a collection of users,each of whom is running a special process called a jondo which acts as the user’s proxy.Some of the jondos may be corrupt and/or controlled by the adversary.Corrupt jondos may collaborate and share their obser-vations in an attempt to compromise the honest users’anonymity.Note,however, that all observations by corrupt group members are local.Each corrupt member may observe messages sent to it,but not messages transmitted on the links be-tween honest jondos.An honest crowd member has no way of determining whether6a particular jondo is honest or corrupt.The parameters of the system are the total number of members,the number of corrupt members,and the forwarding probability which is explained below.To participate in communication,all jondos must register with a special server which maintains membership information.Therefore,every member of the crowd knows identities of all other members.As part of the join procedure,the members establish pairwise encryption keys which are used to encrypt pairwise communi-cation,so the contents of the messages are secret from an external eavesdropper.Anonymity guarantees provided by Crowds are based on the path setup pro-tocol,which is described in the rest of this section.The path setup protocol is executed each time one of the crowd members wants to establish an anonymous connection to a Web server.Once a routing path through the crowd is established, all subsequent communication between the member and the Web server is routed along it.We will call one run of the path setup protocol a session.When crowd membership changes,the existing paths must be scrapped and a new protocol ses-sion must be executed in order to create a new random routing path through the crowd to the destination.Therefore,we’ll use terms path reformulation and proto-col session interchangeably.When a user wants to establish a connection with a Web server,its browser sends a request to the jondo running locally on her computer(we will call this jondo the initiator).Each request contains information about the intended desti-nation.Since the objective of Crowds is to protect the sender’s identity,it is not problematic that a corrupt router can learn the recipient’s identity.The initiator starts the process of creating a random path to the destination as follows: The initiator selects a crowd member at random(possibly itself),and for-wards the request to it,encrypted by the corresponding pairwise key.We’ll call the selected member the forwarder.The forwarderflips a biased coin.With probability,it delivers the request directly to the destination.With probability,it selects a crowd member at random(possibly itself)as the next forwarder in the path,and forwards the request to it,re-encrypted with the appropriate pairwise key.The next forwarder then repeats this step.Each forwarder maintains an identifier for the created path.If the same jondo appears in different positions on the same path,identifiers are different to avoid infinite loops.Each subsequent message from the initiator to the destination is routed along this path,i.e.,the paths are static—once established,they are not altered often.This is necessary to hinder corrupt members from linking multiple7paths originating from the same initiator,and using this information to compromise the initiator’s anonymity as described in section3.2.3.3.2Anonymity properties of CrowdsThe Crowds paper[RR98]describes several degrees of anonymity that may be provided by a communication system.Without using anonymizing techniques, none of the following properties are guaranteed on the Web since browser requests contain information about their source and destination in the clear.Beyond suspicion Even if the adversary can see evidence of a sent message,the real sender appears to be no more likely to have originated it than any other potential sender in the system.Probable innocence The real sender appears no more likely to be the originator of the message than to not be the originator,i.e.,the probability that the adversary observes the real sender as the source of the message is less thanupper bound on the probability of detection.If the sender is observed by the adversary,she can then plausibly argue that she has been routing someone else’s messages.The Crowds paper focuses on providing anonymity against local,possibly co-operating eavesdroppers,who can share their observations of communication in which they are involved as forwarders,but cannot observe communication involv-ing only honest members.We also limit our analysis to this case.3.2.1Anonymity for a single routeIt is proved in[RR98]that,for any given routing path,the path initiator in a crowd of members with forwarding probability has probable innocence against collaborating crowd members if the following inequality holds:(1)More formally,let be the event that at least one of the corrupt crowd members is selected for the path,and be the event that the path initiator appears in8the path immediately before a corrupt crowd member(i.e.,the adversary observes the real sender as the source of the messages routed along the path).Condition 1guarantees thatproving that,given multiple linked paths,the initiator appears more often as a sus-pect than a random crowd member.The automated analysis described in section6.1 confirms and quantifies this result.(The technical results of[Shm02]on which this paper is based had been developed independently of[Mal01]and[W ALS02],be-fore the latter was published).In general,[Mal01]and[W ALS02]conjecture that there can be no reliable anonymity method for peer-to-peer communication if in order to start a new communication session,the initiator must originate thefirst connection before any processing of the session commences.This implies that anonymity is impossible in a gossip-based system with corrupt routers in the ab-sence of decoy traffic.In section6.3,we show that,for any given number of observed paths,the adversary’s confidence in its observations increases with the size of the crowd.This result contradicts the intuitive notion that bigger crowds provide better anonymity guarantees.It was discovered by automated analysis.4Formal Model of CrowdsIn this section,we describe our probabilistic formal model of the Crowds system. Since there is no non-determinism in the protocol specification(see section3.1), the model is a simple discrete-time Markov chain as opposed to a Markov deci-sion process.In addition to modeling the behavior of the honest crowd members, we also formalize the adversary.The protocol does not aim to provide anonymity against global eavesdroppers.Therefore,it is sufficient to model the adversary as a coalition of corrupt crowd members who only have access to local communication channels,i.e.,they can only make observations about a path if one of them is se-lected as a forwarder.By the same token,it is not necessary to model cryptographic functions,since corrupt members know the keys used to encrypt peer-to-peer links in which they are one of the endpoints,and have no access to links that involve only honest members.The modeling technique presented in this section is applicable with minor mod-ifications to any probabilistic routing system.In each state of routing path construc-tion,the discrete probability distribution given by the protocol specification is used directly to define the probabilistic transition rule for choosing the next forwarder on the path,if any.If the protocol prescribes an upper bound on the length of the path(e.g.,Freenet[CSWH01]),the bound can be introduced as a system parameter as described in section4.2.3,with the corresponding increase in the size of the state space but no conceptual problems.Probabilistic model checking can then be used to check the validity of PCTL formulas representing properties of the system.In the general case,forwarder selection may be governed by non-deterministic10runCount goodbad lastSeen observelaunchnewstartrundeliver recordLast badObserve4.2Model of honest members4.2.1InitiationPath construction is initiated as follows(syntax of PRISM is described in section 2.2):[]launch->runCount’=TotalRuns&new’=true&launch’=false;[]new&(runCount>0)->(runCount’=runCount-1)&new’=false&start’=true;[]start->lastSeen’=0&deliver’=false&run’=true&start’=false;4.2.2Forwarder selectionThe initiator(i.e.,thefirst crowd member on the path,the one whose identity must be protected)randomly chooses thefirst forwarder from among all group mem-bers.We assume that all group members have an equal probability of being chosen, but the technique can support any discrete probability distribution for choosing for-warders.Forwarder selection is a single step of the protocol,but we model it as two probabilistic state transitions.Thefirst determines whether the selected forwarder is honest or corrupt,the second determines the forwarder’s identity.The randomly selected forwarder is corrupt with probability badCbe next on the path.Any of the honest crowd members can be selected as the forwarder with equal probability.To illustrate,for a crowd with10honest members,the following transition models the second step of forwarder selection: []recordLast&CrowdSize=10->0.1:lastSeen’=0&run’=true&recordLast’=false+0.1:lastSeen’=1&run’=true&recordLast’=false+...0.1:lastSeen’=9&run’=true&recordLast’=false;According to the protocol,each honest crowd member must decide whether to continue building the path byflipping a biased coin.With probability,the forwarder selection transition is enabled again and path construction continues, and with probability the path is terminated at the current forwarder,and all requests arriving from the initiator along the path will be delivered directly to the recipient.[](good&!deliver&run)->//Continue path constructionPF:good’=false+//Terminate path constructionnotPF:deliver’=true;The specification of the Crowds system imposes no upper bound on the length of the path.Moreover,the forwarders are not permitted to know their relative position on the path.Note,however,that the amount of information about the initiator that can be extracted by the adversary from any path,or anyfinite number of paths,isfinite(see sections4.3and4.5).In systems such as Freenet[CSWH01],requests have a hops-to-live counter to prevent infinite paths,except with very small probability.To model this counter,we may introduce an additional state variable pIndex that keeps track of the length of the path constructed so far.The path construction transition is then coded as follows://Example with Hops-To-Live//(NOT CROWDS)////Forward with prob.PF,else deliver13[](good&!deliver&run&pIndex<MaxPath)->PF:good’=false&pIndex’=pIndex+1+notPF:deliver’=true;//Terminate if reached MaxPath,//but sometimes not//(to confuse adversary)[](good&!deliver&run&pIndex=MaxPath)->smallP:good’=false+largeP:deliver’=true;Introduction of pIndex obviously results in exponential state space explosion, decreasing the maximum system size for which model checking is feasible.4.2.4Transition matrix for honest membersTo summarize the state space of the discrete-time Markov chain representing cor-rect behavior of protocol participants(i.e.,the state space induced by the abovetransitions),let be the state in which links of the th routing path from the initiator have already been constructed,and assume that are the honestforwarders selected for the path.Let be the state in which path constructionhas terminated with as thefinal path,and let be an auxiliary state. Then,given the set of honest crowd members s.t.,the transi-tion matrix is such that,,(see section4.2.2),i.e.,the probability of selecting the adversary is equal to the cumulative probability of selecting some corrupt member.14This abstraction does not limit the class of attacks that can be discovered using the approach proposed in this paper.Any attack found in the model where indi-vidual corrupt members are kept separate will be found in the model where their capabilities are combined in a single worst-case adversary.The reason for this is that every observation made by one of the corrupt members in the model with separate corrupt members will be made by the adversary in the model where their capabilities are combined.The amount of information available to the worst-case adversary and,consequently,the inferences that can be made from it are at least as large as those available to any individual corrupt member or a subset thereof.In the adversary model of[RR98],each corrupt member can only observe its local network.Therefore,it only learns the identity of the crowd member imme-diately preceding it on the path.We model this by having the corrupt member read the value of the lastSeen variable,and record its observations.This cor-responds to reading the source IP address of the messages arriving along the path. For example,for a crowd of size10,the transition is as follows:[]lastSeen=0&badObserve->observe0’=observe0+1&deliver’=true&run’=true&badObserve’=false;...[]lastSeen=9&badObserve->observe9’=observe9+1&deliver’=true&run’=true&badObserve’=false;The counters observe are persistent,i.e.,they are not reset for each session of the path setup protocol.This allows the adversary to accumulate observations over several path reformulations.We assume that the adversary can detect when two paths originate from the same member whose identity is unknown(see sec-tion3.2.2).The adversary is only interested in learning the identity of thefirst crowd mem-ber in the path.Continuing path construction after one of the corrupt members has been selected as a forwarder does not provide the adversary with any new infor-mation.This is a very important property since it helps keep the model of the adversaryfinite.Even though there is no bound on the length of the path,at most one observation per path is useful to the adversary.To simplify the model,we as-sume that the path terminates as soon as it reaches a corrupt member(modeled by deliver’=true in the transition above).This is done to shorten the average path length without decreasing the power of the adversary.15Each forwarder is supposed toflip a biased coin to decide whether to terminate the path,but the coinflips are local to the forwarder and cannot be observed by other members.Therefore,honest members cannot detect without cooperation that corrupt members always terminate paths.In any case,corrupt members can make their observable behavior indistinguishable from that of the honest members by continuing the path with probability as described in section4.2.3,even though this yields no additional information to the adversary.4.4Multiple pathsThe discrete-time Markov chain defined in sections4.2and4.3models construc-tion of a single path through the crowd.As explained in section3.2.2,paths have to be reformulated periodically.The decision to rebuild the path is typically made according to a pre-determined schedule,e.g.,hourly,daily,or once enough new members have asked to join the crowd.For the purposes of our analysis,we sim-ply assume that paths are reformulated somefinite number of times(determined by the system parameter=TotalRuns).We analyze anonymity properties provided by Crowds after successive path reformulations by considering the state space produced by successive execu-tions of the path construction protocol described in section4.2.As explained in section4.3,the adversary is permitted to combine its observations of some or all of the paths that have been constructed(the adversary only observes the paths for which some corrupt member was selected as one of the forwarders).The adversary may then use this information to infer the path initiator’s identity.Because for-warder selection is probabilistic,the adversary’s ability to collect enough informa-tion to successfully identify the initiator can only be characterized probabilistically, as explained in section5.4.5Finiteness of the adversary’s state spaceThe state space of the honest members defined by the transition matrix of sec-tion4.2.4is infinite since there is no a priori upper bound on the length of each path.Corrupt members,however,even if they collaborate,can make at most one observation per path,as explained in section4.3.As long as the number of path reformulations is bounded(see section4.4),only afinite number of paths will be constructed and the adversary will be able to make only afinite number of observa-tions.Therefore,the adversary only needsfinite memory and the adversary’s state space isfinite.In general,anonymity is violated if the adversary has a high probability of making a certain observation(see section5).Tofind out whether Crowds satisfies16。
dlib目标

dlib目标dlib(Deep Learning in C++)是一个用于机器学习和人脸检测等任务的C++工具包。
dlib是一个强大的开源库,提供了一些高效且易于使用的工具和算法,用于解决计算机视觉和机器学习问题。
dlib被广泛用于图像处理和计算机视觉中的目标检测任务。
它包含了一些经典的机器学习算法,如支持向量机(SVM)和随机森林(Random Forests),以及一些流行的卷积神经网络(CNN)的实现,如ResNet和AlexNet。
这些算法可以用于图像分类、物体检测和人脸识别等任务。
人脸检测是dlib最著名的功能之一。
dlib利用深度学习算法和计算机视觉技术,可以在图像中准确地检测到人脸,并标记出人脸的位置和特征点。
这些特征点包括眼睛、鼻子、嘴巴等重要的面部特征,可以用于人脸识别、表情分析和人脸变换等应用。
除了人脸检测,dlib还提供了一些其他有用的功能。
例如,它可以用于图像分类,将图像分为不同的类别。
它还可以用于目标跟踪,跟踪运动中的物体并预测其位置。
此外,dlib还提供了一些辅助工具,如图像增强和特征提取等。
在使用dlib时,我们通常需要加载预训练的模型。
这些模型已经在大规模数据集上进行了训练,并具备很强的泛化能力。
我们可以通过使用这些预训练模型来进行图像分类、目标检测和人脸识别等任务。
此外,我们还可以使用dlib的API来训练自己的模型,以满足特定的需求。
与其他一些深度学习框架相比,dlib的优点之一是其优秀的性能。
它是用C++编写的,可以在低端设备上运行,如嵌入式系统和移动设备。
此外,dlib提供了一些多线程的功能,可以充分利用多核处理器的并行计算能力,加快算法的速度。
在使用dlib时,我们还需要注意一些潜在的问题。
首先,由于dlib是基于C++的,因此它的学习曲线可能较陡峭,需要一定的编程经验和计算机视觉的知识。
其次,由于dlib的模型较大,因此在训练和部署模型时需要考虑存储空间和计算资源的限制。
A Tutorial on Support Vector Machines for Pattern Recognition

burges@
Bell Laboratories, Lucent Technologies Editor: Usama Fayyad Abstract. The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light. Keywords: support vector machines, statistical learning theory, VC dimension, pattern recognition
基于自动编码器的表示学习算法的研究

表格目录 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XI
Secondly, this thesis proposes a supervised metric auto-encoder based on the ideas of metric learning, Siamese network and auto-encoder, so that the learned representation has the characteristics of reconstruction and metric learning at the same time. In order to explain and understand the algorithm, this thesis gives a detailed derivation process, algorithm flow and network architecture. Through the experiments on MNIST and Fashion-MNIST datasets and the comparison with other algorithms, this thesis proves that the algorithm represents better learning performance and higher recognition rate.
dlib人脸识别原理

dlib人脸识别原理人脸识别技术是一种日益普及的biomentric术,它在身份验证、安全保护和其他领域发挥着重要作用。
现代人脸识别技术大都基于计算机视觉和机器学习原理,其中最受欢迎的是DLib框架。
DLib是一种高级的开源机器学习软件库,它主要用于识别和分类人脸图像,以实现高精度的人脸识别。
本文将详细介绍DLib人脸识别的原理,并结合实例介绍其识别算法的具体流程。
1. DLib人脸识别原理DLib的人脸识别原理是借助机器学习训练一组模型,以实现高精度的人脸识别。
其中,DLib人脸识别模型通常使用一种叫做“支持向量机(SVM)”的机器学习算法,该算法可以自动找出一组数据中的模式,并使用它们来进行判断。
首先,DLib框架会对输入图像进行处理,该处理会将图像平均分割为一系列小块,并对每个小块进行提取特征。
提取的特征会被作为训练模型的输入,训练的模型被用来检测输入图像中是否存在人脸,以及识别出图像中的哪一个人。
DLib人脸识别模型支持不同的特征提取方式,其中最常用的是“局部二值模式(LBP)”特征提取方式。
LBP特征也称为局部纹理特征,它是指在一个给定的图像中,每个像素的值仅取决于它的邻域像素的值,而与其他远处的像素无关。
通过提取局部纹理特征,算法可以有效地减少噪声干扰,并精确地跟踪被识别的人的脸部特征,以便对其进行识别。
2. DLib人脸识别算法流程DLib人脸识别算法的具体流程如下:(1)获取待识别图像:首先,将待识别的人脸图像输入算法;(2)人脸检测:使用DLib框架中的相关算法,从输入图像中检测出人脸的位置;(3)特征提取以及预处理:将检测出的人脸图像使用LBP特征提取方法分割成小块,并对每个小块进行特征提取;(4)SVM训练:将每个块的特征向量输入SVM算法,进行训练,以生成一组用于人脸识别的模型;(5)计算相似度:计算每个待识别图像与训练生成的模型之间的相似度;(6)识别结果:根据计算出的相似度,识别出最匹配的人脸图像。
dpabi前缀

DPABI(Dublin Platform for Biomedical Imaging)是一个开源的生物医学成像平台,用于处理和分析医学影像数据。
DPABI的前缀可能是指该平台的缩写,或者某个特定的模块或工具的前缀。
DPABI平台提供了多种工具和库,用于处理医学影像数据,包括图像导入、格式转换、图像分割、标注、测量和分析等。
这些工具和库可能以DPABI作为前缀,以标识它们属于该平台的一部分。
例如,DPABI可能是一个用于医学影像分析的软件库,其中包含了一系列用于图像处理和分析的函数和工具。
该库可能以DPABI作为前缀,以便与其他库或工具进行区分。
需要注意的是,DPABI的具体含义和前缀的使用方式可能会因特定上下文而异。
如果您对DPABI的具体含义有疑问,建议参考相关文档或与相关开发人员进行联系以获取更准确的信息。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
A Probabilistic Model for Semantic Word VectorsAndrew L.Maas and Andrew Y.NgComputer Science DepartmentStanford UniversityStanford,CA94305[amaas,ang]@AbstractVector representations of words capture relationships in words’functions andmeanings.Many existing techniques for inducing such representations from datause a pipeline of hand-coded processing techniques.Neural language models of-fer principled techniques to learn word vectors using a probabilistic modeling ap-proach.However,learning word vectors via language modeling produces repre-sentations with a syntactic focus,where word similarity is based upon how wordsare used in sentences.In this work we wish to learn word representations to en-code word meaning–semantics.We introduce a model which learns semanticallyfocused word vectors using a probabilistic model of documents.We evaluate themodel’s word vectors in two tasks of sentiment analysis.1IntroductionWord representations are a critical component of many natural language processing systems.Rep-resenting words as indices in a vocabulary fails to capture the rich structure of synonymy and antonymy among words.Vector representations encode continuous similarities between words as distance or angle between word vectors in a high-dimensional space.Word representation vectors have proved useful in tasks such as named entity recognition,part of speech tagging,and document retrieval[23,6,21].Neural language models[2,6,14,15]induce word vectors by back-propagating errors in a language modeling task through nonlinear neural networks,or linear transform nguage modeling, predicting the next word in a sentence given a few preceding words,is primarily a syntactic task. Issues of syntax concern word function and the structural arrangement of words in a sentence,while issues of semantics concern word meaning.Learning word vectors using the syntactic task of lan-guage modeling produces representations which are syntactically focused.Word similarities with syntactic focus would pair“wonderful”with other highly polarized adjectives such as“terrible”or “awful.”These similarities result from the fact that these words have similar syntactic properties–they are likely to occur at the same location in sentences like“the food was absolutely.”In con-trast,word representations capturing semantic similarity would associate“wonderful”with words of similar meaning such as“fantastic”and“prize-winner”because they have similar meaning despite possible differences in syntactic function.The construction of neural language models makes them unable to learn word representations which are primarily semantic.Neural language models are instances of vector space models,which broadly refers to any method for inducing vector representations of words.Turney and Pantel[23]give a recent review of both syntactic and semantic vector space models.Most VSMs implement some combination of weight-ing,smoothing,and dimension reducing a word association matrix(e.g.TF-IDF weighting).For semantic or syntactic word representations VSMs use a term-document or word-context matrix re-spectively for word association.For each VSM processing stage there are dozens of possibilities, making the design space of VSMs overwhelming.Furthermore,many methods have little theo-retical foundation and a particular weighting or dimension reduction technique is selected simplybecause it has been shown to work in practice.Neural language models offer a VSM for syntacticword vectors which has a complete probabilistic foundation.The present work offers a similarlywell-founded probabilistic model which learns semantic,as opposed to syntactic,word vectors. This work develops a model which learns semantically oriented word vectors using unsupervisedlearning.Word vectors are discovered from data as part of a probabilistic model of word occurrencein documents similar to a probabilistic topic model.Learning vectors from document-level wordco-occurrence allows our model to learn word representations based on the topical information con-veyed by words.Building a VSM with probabilistic foundation allows us to offer a principledsolution to word vector learning in place of the hand-designed processing pipelines typically used.Our experiments show that our model learns vectors more suitable for document-level tasks whencompared with other VSMs.2Log-Bilinear Language ModelPrior work introduced neural probabilistic language models[2],which predict the n th word in asequence given the n−1preceding context words.More formally,a model defines a distributionP(w n|w1:n−1)where the number of context words is often small(n≤6).Neural language models encode this distribution using word vectors.Letφw be the vector representation of word w,a neurallanguage model uses P(w n|w1:n−1)=P(w n|φ1:n−1)Mnih and Hinton[14]introduce a neural language model which uses a log-bilinear energy function(lblLm).The model parametrizes the log probability of a word occurring in a given context using aninner product of the formlog p(w n|w1:n−1)=log p(w n|φ1:n−1)∝ φn,n−1 i=1φT i C i .(1)This is an inner product between the query word’s representationφn and a sum of the context words’representations after each is transformed by a position specific matrix C i.Theφvectors learned as part of the language modeling task are useful features for syntactic natural language processing tasks such as named-entity recognition and chunking[21].As a VSM,the lblLm is a theoretically well-founded approach to learning syntactic word representations from word-context information. The lblLm method does not provide a tractable solution for inducing word vectors from term-document data.The model introduces a transform matrix C i for each context word,which causes the number of model parameters to grow linearly as the number of context words increases.For 100-dimensional word representation vectors,each C i contains104parameters,which makes for an unreasonably large number of parameters when trying to learn representations from documents containing hundreds or thousands of words.Furthermore it is unclear how the model could handle documents of variable length,or if predicting a single word given all other words in the document is a good objective for training semantic word representations.Though the details of other neural language models differ,they face similar challenges in learning semantic word vectors because of their parametrization and language modeling objective.3Log-Bilinear Document ModelWe now introduce a model which learns word representations from term-document information using principles similar to those used in the lblLm and other neural language models.However unlike previous work in the neural language model literature our model naturally handles term-document data to learn semantic word vectors.We derive a probabilistic model with log-bilinear energy function to model the bag of words distribution of a document.This approach naturally handles long,variable length documents,and learns representations sensitive to long-range word correlations.Maximum likelihood learning can then be efficiently performed with coordinate ascent optimization.3.1ModelStarting with the broad goal of matching the empirical distribution of words in a document,we model a document using a continuous mixture distribution over words indexed by a random variable θ.We assume words in a document are conditionally independent given the mixture variableθ. We assign a probability to a document d using a joint distribution over the document and a random variableθ.The model assumes each word w i∈d is conditionally independent of the other words givenθ.The probability of a document is thus,p(d)= p(d,θ)dθ= p(θ)N i=1p(w i|θ)dθ.(2) Where N is the number of words in d and w i is the i th word in d.We use a Gaussian prior onθ. We define the conditional distribution p(w i|θ)using a log-linear model with parameters R and b.The energy function uses a word representation matrix R∈R(βx|V|)where each word w(represented as a one-hot vector)in the vocabulary V has aβ-dimensional vector representationφw=Rw corresponding to that word’s column in R.The random variableθis also aβ-dimensional vector,θ∈R(β)which weights each of theβdimensions of words’representation vectors.We additionally introduce a bias b w for each word to capture differences in overall word frequencies.The energy assigned to a word w given these model parameters is,E(w;θ,φw,b w)=−θTφw−b w.(3) To obtain thefinal distribution p(w|θ)we use a softmax,p(w|θ;R,b)=exp(−E(w;θ,φw,b w))w′∈V exp(−E(w′;θ,φw′,b w′))=exp(θTφw+b w)w′∈V exp(θTφw′+b w′).(4)The number of terms in the denominator’s summation grows linearly in|V|,making exact compu-tation of the distribution possible.For a givenθ,a word w’s occurrence probability is proportional to how closely its representation vectorφw matches the scaling direction ofθThis idea is similar to the word vector inner product used in the lblLm model.Equation2resembles the probabilistic model of latent Dirichlet allocation(LDA)[3],which models documents as mixtures of latent topics.One could view the entries of a word vectorφas that word’s association strength with respect to each latent topic dimension.The random variableθthen defines a weighting over topics.However,our model does not attempt to model individual topics,but instead directly models word probabilities conditioned on the topic weighting variableθ.Because of the log-linear formulation of the conditional distribution,θis a vector in Rβand not restricted to the unit simplex as it is in LDA.3.2LearningGiven a document collection D,we assume documents are i.i.d samples and denote the k th docu-ment as d k.We wish to learn model parameters R and b to maximize,maxR,bp(D;R,b)= d k∈D p(θ)N k i=1p(w i|θ;R,b)dθ.(5) Using MAP estimates forθ,we approximate this learning problem as,max R,b d k∈D p(ˆθk)N ki=1p(w i|ˆθk;R,b),(6)whereˆθk denotes the MAP estimate ofθfor d k.We introduce a regularization term for the word representation matrix R.The word biases b are not regularized reflecting the fact that we want the biases to capture whatever overall word frequency statistics are present in the data.By taking the logarithm and simplifying we obtain thefinal learning problem,maxR,bλ||R||2F+ d k∈Dλ||ˆθk||22+N k i=1log p(w i|ˆθk;R,b).(7)The free parameters in the model are the regularization weightλand the word vector dimensionality β.We use a single regularization weightλfor R andθbecause the two are linearly linked in the conditional distribution p(w|θ;R,b).The problem offinding optimal values for R and b requires optimization of the non-convex objec-tive function.We use coordinate ascent,whichfirst optimizes the word representations(R and b)while leaving the MAP estimates(ˆθ)fixed.Then wefind the new MAP estimate for each document while leaving the word representationsfixed,and continue this process until convergence.The opti-mization algorithm quicklyfinds a global solution for eachˆθk because we have a low-dimensional, convex problem in eachˆθk.Because the MAP estimation problems for different documents are in-dependent,we can solve them on separate machines in parallel.This facilitates scaling the model to document collections with hundreds of thousands of documents.4ExperimentsWe evaluate our model with document-level and sentence-level categorization tasks in the domain of online movie reviews.These are sub-tasks of sentiment analysis which has recently received much attention as a challenging set of problems in natural language processing[4,18,22].In both tasks we compare our model with several existing methods for word vector induction,and previously reported results from the literature.We also qualitatively evaluate the model’s word representations by visualizing word similarities.4.1Word Representation LearningWe induce word representations with our model using50,000movie reviews from The Internet Movie Database(IMDB).Because some movies receive substantially more reviews than others, we limited ourselves to including at most30reviews from any movie in the collection.Previous work[5]shows function and negating words usually treated as stop words are in fact indicative of sentiment,so we build our dictionary by keeping the20,000most frequent unigram tokens without stop word removal.Additionally,because certain non-word tokens(e.g.“!”and“:-)”)are indicative of sentiment,we allow them in our vocabulary.As a qualitative assessment of word representations,we visualize the words most similar to a query word using vector similarity of the learned representations.Given a query word w and another word w′we obtain their vector representationsφw andφw′,and evaluate their cosine similarity asSimilarity(φw,φw′)=φT wφw′||φw||·||φw′||.By assessing the similarity of w with all other words w′inthe vocabulary we canfind the words deemed most similar by the model.Cosine similarity is often used with word vectors because it ignores differences in magnitude.Table1shows the most similar words to given query words using our model’s word representations. The vector similarities capture our intuitive notions of semantic similarity.The most similar words have a broad range of part-of-speech and functionality,but adhere to the theme suggested by the query word.Previous work on term-document VSMs demonstrated similar results,and compared the recovered word similarities to human concept organization[12,20].Table1also shows the most similar words to query words using word vectors trained via the lblLm on news articles(obtained already trained from[21]).Word similarities captured by the neural language model are primarily syntactic where part of speech similarity dominates semantic similarity.Word vectors obtained from LDA perform poorly on this task(not shown),presumably because LDA word/topic distributions do not meaningfully embed words in a vector space.4.2Other Word RepresentationsWe implemented several alternative vector space models for comparison.With the exception of the lblLm,we induce word representations for each of the models using the same training data used to induce our own word representations.Latent Semantic Analysis(LSA)[7].One of the most commonly used tools in information re-trieval,LSA applies the singular value decomposition(SVD)algorithm to factor a term-documentTable1:Similarity of learned word vectors.Thefive words most similar to the target word(top row) using cosine similarity applied to the word vectors discovered by our model and the log-bilinear language model.romance mothers murder comedy awful amazingOur Model romantic lesbian murdered funny terrible absolutely love mother crime laughs horrible fantastic chemistry jewish murders hilarious ridiculous truly relationship mom committed serious bad incredible drama tolerance murderer few stupid extremelyLblLm colours parents fraud drama unsettling unbelievable paintings families kidnapping monster vice incredible joy veterans rape slogan energetic obvious diet patients corruption guest hires perfect craftsmanship adults conspiracy mentality unbelievable clearco-occurrence matrix.To obtain a k-dimensional representation for a given word,only the entries corresponding to the k largest singular values are taken from the word’s basis in the factored matrix. Latent Dirichlet Allocation(LDA)[3].LDA is a probabilistic model of documents which assumes each document is a mixture of latent topics.This model is often used to categorize or cluster doc-uments by topic.For each latent topic,the model learns a conditional distribution p(word|topic) for the probability a word occurs within the given topic.To obtain a k-dimensional vector repre-sentation of each word w,we use each p(w|topic)value in the vector after training a k-topic model on the data.We normalize this vector to unit length because more frequent words often have high probability in many topics.To train the LDA model we use code released by the authors of[3]. When training LDA we remove from our vocabulary very frequent and very rare words.Log-Bilinear Language Model(lblLm)[15].This is the model given in[14]and discussed in section2,but extended to reduce training time.We obtained the word representations from this model used in[21]which were trained on roughly37million words from a news corpus with a context window of sizefive.4.3Sentiment ClassificationOurfirst evaluation task is document-level sentiment classification.A classifier must predict whether a given review is positive or negative(thumbs up vs.thumbs down)given only the text of the re-view.As a document-level categorization task,sentiment classification is substantially more difficult than topic-based categorization[22].We chose this task because word vectors trained using term-document matrices are most commonly used in document-level tasks such as categorization and retrieval.The evaluation dataset is the polarity dataset version2.0introduced by Pang and Lee1 [17].This dataset consists of2,000movie reviews,where each is associated with a binary sentiment polarity label.We report10-fold cross validation results using the authors’published folds to make our results comparable to those previously reported in the literature.We use a linear support vector machine classifier trained with LibLinear[8]and set the SVM regularization parameter to the same value used in[18,17].Because we are interested in evaluating the capabilities of various word representation learners,we use as features the mean representation vector,an average of the word representations for all words present in the document.The number of times a word appears in a document is often used as a feature when categorizing documents by topic.However,previous work found a binary indicator of whether or not the word is present to be a more useful feature in sentiment classification[22,18]. For this reason we used term presence for our bag-of-words features.We also evaluate performance using mean representation vectors concatenated with the original bag-of-words vector.In all cases we normalize each feature vector to unit norm,and following the technique of[21]scale word representation matrices to have unit standard deviation.1/people/pabo/movie-review-dataTable2:Sentiment classification results on the movie review dataset from[17].Features labeled with“mean”are arithmetic means of the word vectors for words present in the review.Our model’s representation outperforms other word vector methods,and is competitive with systems specially designed for sentiment classification.Features Accuracy(%)Bag of Words(BoW)86.75LblLm Mean71.30LDA Mean66.70LSA Mean77.45Our Method Mean88.50LblLm Mean+BoW86.10LDA Mean+BoW86.70LSA Mean+BoW85.25Our Method Mean+BoW89.35BoW SVM reported in[17]87.15Contextual Valence Shifters[11]86.20δTF-IDF Weighting[13]88.10Appraisal Taxonomy[25]90.20Table2shows the classification performance of our method,other VSMs we implemented,and previously reported results from the literature.Our method’s features clearly outperform those of other VSMs.On its own,our method’s word vectors outperform bag-of-words features with two orders of magnitude fewer features.When concatenated with the bag-of-words features,our method is competitive with previously reported results which use models engineered specifically for the task of sentiment classification.To our knowledge,the only method which outperforms our model’s mean vectors concatenated with bag-of-words features is the work of Whitelaw et al[25].This work builds a feature set of adjective phrases expressing sentiment using hand-selected words indicative of sentiment,WordNet,and online thesauri.That such a task-specific model narrowly outperforms our method is evidence for the power of unsupervised feature learning.4.4Subjectivity DetectionAs a second evaluation task,we performed sentence-level subjectivity classification.In this task,a classifier is trained to decide whether a given sentence is subjective,expressing the writer’s opinions, or objective,expressing purely facts.We used the dataset of Pang and Lee[17]which gathered sub-jective sentences from movie review summaries and objective sentences from movie plot summaries. This task is substantially different from the review classification task because it uses sentences as opposed to entire documents and the target concept is subjectivity instead of opinion polarity.We randomly split the10,000examples into10folds and report10-fold cross validation accuracy using the SVM training protocol of[17].Table3shows classification accuracies from the sentence subjectivity experiment.Our model pro-vided superior features when compared against other VSMs,and slightly outperformed the bag-of-words baseline.Further improvement over the bag-of-words baseline is obtained by concatenating the two sets of features together.5Related WorkPrior work has developed several models to learn word representations via a probabilistic language modeling objective.Mnih and Hinton[14,15]introduced an energy-based log-bilinear model for word representations following earlier work on neural language models[2,16].Successful appli-cation of these word representation learners and other neural network models include semantic role labeling,chunking,and named entity recognition[6,21].In contrast to the syntactic focus of language models,probabilistic topic models aim to capture document-level correlations among words[20].Our probabilistic model is similar to LDA[3],Table3:Sentence subjective/objective classification accuracies using the movie review subjectivity dataset of[17].Features labeled with“mean”are arithmetic means of the word vectors for words present in the sentence.Features Accuracy(%)Bag of Words(BoW)90.25LblLm Mean78.45LDA Mean66.65LSA Mean84.11Our Method Mean90.36LblLm Mean+BoW87.29LDA Mean+BoW88.82LSA Mean+BoW88.75Our Method Mean+BoW91.54BoW SVM reported in[17]90which is related to pLSI[10].However,pLSI doesn’t give a well-defined probabilistic model over previously unseen novel documents.The recently introduced replicated softmax model[19]uses an undirected graphical model to learn topics in a document collection.Turney and Pantel[23]offer an extensive review of VSMs which employ a matrix factorization technique after applying some weighting or smoothing operation to the matrix entries.Several recent techniques learn word representations in a principled manner as part of an application of interest.These applications include retrieval and ranking systems[1,9],and systems to represent images and textual tags in the same vector space[24].Our work learns word representations via the more basic task of topic modeling as compared to these more specialized representation learners. 6DiscussionWe presented a vector space model which learns semantically sensitive word representations via a probabilistic model of word occurrence in documents.Its probabilistic foundation gives a theoreti-cally justified technique for word vector induction as an alternative to the overwhelming number of matrix factorization-based techniques commonly used.Our model is parametrized as a log-bilinear model following recent success in using similar techniques for language models[2,6,14,15].By assuming word order independence and replacing the language modeling objective with a document modeling objective,our model captures word relations at the document level.Our model’s foundation is closely related to probabilistic latent topic models[3,20].However, we parametrize our topic model in a manner which aims to capture word representations instead of latent topics.In our experiments,our method performed better than LDA which models latent topics directly.We demonstrated the utility of our learned word vectors on two tasks of sentiment classification. Both were tasks of a semantic nature,and our methods’word vectors performed better than word vectors trained with the more syntactic objective of language ing the mean of word vectors to represent documents ignores vast amounts of information that could help categorization–negated phrases for example.Future work could better capture the information conveyed by words in sequence using convolutional models over word vectors.AcknowledgmentsWe thank Chris Potts,Dan Ramage,Richard Socher,and Chris Manning for insightful discussions. This work is supported by the DARPA Deep Learning program under contract number FA8650-10-C-7020.References[1]B.Bai,J.Weston,D.Grangier,R.Collobert,K.Sadamasa,Y.Qi,O.Chapelle,and K.Wein-berger.Supervised semantic indexing.In Proceeding of CIKM,2009.[2]Y.Bengio,R.Ducharme,P.Vincent,and C.Jauvin.A Neural Probabilistic Language Model.Journal of Machine Learning Research,3(6):1137–1155,August2003.[3]D.M.Blei,A.Y.Ng,and tent Dirichlet Allocation.Journal of MachineLearning Research,3(4-5):993–1022,May2003.[4]J.Blitzer,M.Dredze,and F.Pereira.Biographies,bollywood,boom-boxes and blenders:Domain adaptation for sentiment classification.In Proceedings of the ACL,2007.[5]C.K.Chung and J.W.Pennebaker.The psychological function of function words.SocialCommunication,pages343–359,2007.[6]R.Collobert and J.Weston.A unified architecture for natural language processing.Proceed-ings of the25th ICML,2008.[7]S.Deerwester,S.T.Dumais,G.W.Furnas,ndauer,and R.Harshman.Indexing bylatent semantic analysis.Journal of the American Society for Information Science,41(6):391–407,September1990.[8]R.E.Fan,K.W.Chang,C.J.Hsieh,X.R.Wang,and C.J.Lin.LIBLINEAR:A library forlarge linear classification.The Journal of Machine Learning Research,9:1871–1874,2008.[9]D.Grangier,F.Monay,and S.Bengio.A discriminative approach for the retrieval of imagesfrom text queries.In Proceedings of the ECML,2006.[10]T.Hofmann.Probabilistic latent semantic indexing.In Proceedings of ACM SIGIR,1999.[11]A.Kennedy and D.Inkpen.Sentiment classification of movie reviews using contextual valenceputational Intelligence,22(2):110–125,May2006.[12]ndauer,P.Foltz,and ham.An introduction to latent semantic analysis.DiscourseProcesses,25(2):259–284,1998.[13]J.Martineau and T.Finin.Delta tfidf:An improved feature space for sentiment analysis.InProceedings of the third AAAI internatonal conference on weblogs and social media,2009. [14]A.Mnih and G.E.Hinton.Three new graphical models for statistical language modelling.InProceedings of the24th ICML,2007.[15]A.Mnih and G.E.Hinton.A scalable hierarchical distributed language model.In NeuralInformation Processing Systems,volume22,2009.[16]F.Morin and Y.Bengio.Hierarchical probabilistic neural network language model.In Pro-ceedings of the international workshop on artificial intelligence and statistics,2005.[17]B.Pang and L.Lee.A sentimental education:Sentiment analysis using subjectivity summa-rization based on minimum cuts.In Proceedings of the ACL,volume2004,2004.[18]B.Pang,L.Lee,and S.Vaithyanathan.Thumbs up?:sentiment classification using machinelearning techniques.In Empirical methods in natural language processing,2002.[19]R.Salakhutdinov and G.E.Hinton.Replicated softmax:an undirected topic model.In Ad-vances in Neural Information Processing Systems,volume22,2009.[20]M.Steyvers and T.L.Griffiths.Probabilistic Topic Models.In Latent Semantic Analysis:ARoad to Meaning,2006.[21]J.Turian,L.Ratinov,and Y.Bengio.Word representations:A simple and general method forsemi-supervised learning.In Proceedings of the ACL,2010.[22]P.D.Turney.Thumbs up or thumbs down?Semantic orientation applied to unsupervisedclassification of reviews.In Proceedings of the ACL,2002.[23]P.D.Turney and P.Pantel.From Frequency to Meaning:Vector Space Models of Semantics.Journal of Artificial Intelligence Research,37:141–188,2010.[24]J.Weston,S.Bengio,and rge scale image annotation:learning to rank withjoint word-image embeddings.In Proceedings of the ECML,2010.[25]C.Whitelaw,N.Garg,and ing appraisal taxonomies for sentiment analysis.InProceedings of CIKM,2005.。