Spatio-temporal Shape and Flow Correlation for Action Recognition

合集下载

基于时空相关性的LSTM算法及PM_(2.5)浓度预测应用

基于时空相关性的LSTM算法及PM_(2.5)浓度预测应用

第38卷第6期 计算机应用与软件Vol 38No.62021年6月 ComputerApplicationsandSoftwareJun.2021基于时空相关性的LSTM算法及PM2.5浓度预测应用赵彦明(河北民族师范学院数学与计算机科学学院 河北承德067000)收稿日期:2019-09-12。

河北省社科基金项目(HB18TJ004)。

赵彦明,副教授,主研领域:脑计算理论,图像分析,生物特征识别。

摘 要 现阶段空气污染物粒子浓度演进过程模拟与预测算法忽视了粒子浓度的空间相关性,且没有实现粒子浓度的时间依赖性与空间相关性融合。

对此,提出基于时空相关性的LSTM算法(TS_LSTM)并应用于PM2.5浓度预测。

该算法提出空间相关性及其相关因子计算方法;将局部区域相关性因子与LSTM算法的遗忘门和记忆门融合,建立基于局部地理信息的LSTM算法(LTS_LSTM);融合LTS_LSTM算法学习结果与全局空间相关性因子,构造基于全局地理信息时空相关的LSTM算法(GTS_LSTM)。

模拟全局与局部的空气污染物粒子浓度演进过程,并实现离子浓度预测。

在全局与局部数据集上,将该算法与回归算法、支持向量机、模糊神经网络、LSTM神经网络、GC LSTM神经网络、DL LSTM神经网络比较研究,结果表明:在空气粒子浓度预测上,该算法的预测性能优于各种传统预测算法,接近深度LSTM算法。

关键词 长短时记忆网络 空气污染物浓度预测 循环神经网络 时空相关性 PM2.5中图分类号 TP3 文献标志码 A DOI:10.3969/j.issn.1000 386x.2021.06.040LSTMALGORITHMBASEDONSPATIO TEMPORALCORRELATIONANDITSAPPLICATIONOFPM2.5CONCENTRATIONPREDICTIONZhaoYanming(SchoolofMathematicsandComputerScience,HebeiNormalUniversityforNationalities,Chengde067000,Hebei,China)Abstract Atpresent,thesimulationandpredictionalgorithmoftheevolutionprocessofairpollutantparticleconcentrationignoresthespatialcorrelationofparticleconcentration,anddoesnotrealizethefusionoftime dependentandspatialcorrelationofparticleconcentration.Basedonthis,aLSTMalgorithmbasedonspatio temporalcorrelation(TS_LSTM)anditsapplicationofPM2.5concentrationpredictionisproposed.Itproposedthecalculationmethodofcorrelationfactor;thefactoroflocalspatialinformationcorrelationwasfusedwithforgettinggateandmemorygateofLSTMalgorithmtoestablishtheLSTMalgorithmofthelocalgeographicinformation(LTS_LSTM);thealgorithmfusedthelearningresultofLTS_LSTMalgorithmwiththeglobalspatialcorrelationfactortoestablishaglobalspatio temporalcorrelationLSTMalgorithm(GTS_LSTM).Itsimulatedglobalandlocalevolutionprocessofairpollutantparticleconcentration,andrealizedionconcentrationprediction.Onthelocalandglobaldataset,thisalgorithmiscomparedwithregressionalgorithm,supportvectormachine,fuzzyneuralnetwork,LSTMneuralnetwork,GC LSTMneuralnetworkandDL LSTMneuralnetwork.Theresultsshowthat:intheairparticleconcentrationprediction,thepredictionperformanceofthisalgorithmisbetterthanvarioustraditionalpredictionalgorithms,closetothedepthLSTMalgorithm.Keywords LSTM Predictionofairpollutantconcentration Recurrentneuralnetwork Spatio temporalcorrelation PM2.5250 计算机应用与软件2021年0 引 言空气污染物对人类健康的威胁与日俱增。

基于双流卷积神经网络的人体动作识别研究

基于双流卷积神经网络的人体动作识别研究

实 验 技 术 与 管 理 第38卷 第8期 2021年8月Experimental Technology and Management Vol.38 No.8 Aug. 2021收稿日期: 2021-01-29作者简介: 吕淑平(1963—),女,黑龙江哈尔滨,博士,教授,控制工程(电气工程)实验教学中心主任,主要研究方向为模式识别与智能系统、实验室建设与管理,lvshuping@ 。

引文格式: 吕淑平,黄毅,王莹莹. 基于双流卷积神经网络的人体动作识别研究[J]. 实验技术与管理, 2021, 38(8): 144-148.Cite this article: LYU S P, HUANG Y, WANG Y Y. Research on human action recognition based on dual stream convolutional neural network[J]. Experimental Technology and Management, 2021, 38(8): 144-148. (in Chinese)ISSN 1002-4956 CN11-2034/TDOI: 10.16791/ki.sjg.2021.08.031基于双流卷积神经网络的人体动作识别研究吕淑平,黄 毅,王莹莹(哈尔滨工程大学 智能科学与工程学院,黑龙江 哈尔滨 150001)摘 要:针对双流卷积神经网络存在的网络结构较浅、时间流及空间流网络均为独立训练学习、并未学习到时空网络之间关联信息等问题,文章设计了基于双流卷积神经网络的人体动作识别改进算法。

采用ResNet-34对原网络进行替换,加深网络结构;将时间流、空间流网络提前进行特征图融合,加强时空网络信息融合的充分性。

文章还对具体的融合方式和融合位置进行了实验研究,确定了网络最佳融合策略,在UCF-101数据集上的识别率为91.5%,相较于原网络以及其他相关识别方法有更高的识别精度。

1635—1643年中国群聚性灾害的时空演进与气候背景

1635—1643年中国群聚性灾害的时空演进与气候背景

第60卷第6期2021年11月Vol.60No.6Nov.2021中山大学学报(自然科学版)ACTA SCIENTIARUM NATURALIUM UNIVERSITATIS SUNYATSENI 1635—1643年中国群聚性灾害的时空演进与气候背景*高兴1,李钢1,2,王星星1,王烁1,汪宇欣1,张翠玲11.西北大学城市与环境学院/陕西省地表系统与环境承载力重点实验室,陕西西安7101272.黄土与第四纪地质国家重点实验室,陕西西安710061摘要:近年来灾害地理视角下的历史灾害研究重点考察特定地域与时间序列,重大灾害个案复原亟需关注。

明末崇祯八至十六年(1635—1643年)发生的旱、蝗、饥、疫群聚性灾害无论从持续时长、受灾范围还是灾情等级来看都较为罕见。

通过整理历史文献构建群聚性灾害数据集,以旱、蝗为主线复原群聚性灾害的时空演进与相互关联,探讨群发性灾害的气候背景。

结果表明:①各灾种演变均呈单峰“快增慢减”态势,但峰值年份不同。

旱灾、蝗灾、饥荒在1638—1641年灾情较重,疫灾经历较长的孕灾期,在1641年达到顶峰;②旱灾广泛分布于东部季风区,黄淮海流域与长江下游沿线是核心灾区;蝗灾暴发相对集中,聚集在黄河下游沿线及河网湖泊周边,体现显著“水缘性”特征;③相对于旱灾,蝗灾暴发存在显著空间相关性和时间承继性,饥荒呈现一定滞后性。

旱情显著提升蝗灾的暴发强度,重度旱灾下蝗灾等级显著增加;④在明清小冰期,连续的极端冷干气候加剧蝗灾暴发,或与大尺度环流等外部变化相关,且灾害链发模式加剧了灾情。

关键词:群聚性灾害;蝗灾;时空演进;气候背景;明朝末期(1635—1643年)中图分类号:K901.9文献标志码:A文章编号:0529-6579(2021)06-0071-09 Spatiotemporal process and climatic background of clustering disastersin China during1635-1643GAO Xing1,LI Gang1,2,WANG Xingxing1,WANG Shuo1,WANG Yuxin1,ZHANG Cuiling11.College of Urban and Environmental Sciences/Shaanxi Key Laboratory of Earth Surface Systemand Environmental Carrying Capacity,Northwest University,Xi'an710127,China2.State Key Laboratory of Loess and Quaternary Geology,Xi'an710061,ChinaAbstract:In recent years,the study of historical disasters from the perspective of disaster geography focuses on spatiotemporal evolution and series analysis,and the recovery of historical clustering disas‐ters needs attention.The clustering disasters such as drought,locust plagues,famine and epidemic di‐saster occurred during the8th to16th Chongzhen years(1635-1643)in the late Ming Dynasty were rela‐tively rare in terms of duration,scope,and level.By sorting out the historical documents and con‐structing the data set,this paper takes drought and locust plagues as the line to recover the spatiotempo‐ral evolution and correlation between clustering disasters,and clarify the climatic background of this period.The results show that:(1)The development of disasters presented a single peak type of“fast increasing followed by slow decreasing”trend,but the peak years were different.Drought,locustDOI:10.13471/ki.acta.snus.2020D042*收稿日期:2020-08-21录用日期:2020-09-09网络首发日期:2021-05-27基金项目:国家自然科学基金(41201190);黄土与第四纪地质国家重点实验室开放基金(SKLLQG1911)作者简介:高兴(1994年生),男;研究方向:历史蝗灾与环境变迁;E-mail:Cartny@通信作者:李钢(1979年生),男;研究方向:人地关系与空间安全,灾害地理与社会地理;E-mail:lig@第60卷中山大学学报(自然科学版)plagues and famine were severely affected during1638-1641.The epidemic had a lag period and reached its peak in1641.(2)The drought was widely distributed in the eastern monsoon region of Chi‐na,and the Huang-Huai-Hai Basin and the lower reaches of the Yangtze River were the core affected areas.The outbreak of locust plagues was relatively concentrated along the lower reaches of the Yellow River and around the lakes,which reflected the obvious“water approaching characteristic”.(3)Com‐pared with drought,locust plagues showed significant spatial correlation and temporal succession,and famine and epidemic disaster presented a lag.Drought significantly promoted the outbreak intensity of locust plagues.(4)During the Little Ice Age,continuous extreme cold and dry climate aggravated the outbreak of locust plagues,which may be related to large-scale circulation or other external environ‐ment changes,and the pattern of the disaster chain multiplied the intensity of the situation.Key words:clustering disasters;locust plagues;spatiotemporal process;climate background;late Ming dynasty(1635-1643)作为农业大国,中国的历史发展受自然灾害影响深远[1]。

去甲肾上腺素对神经元、胶质细胞和小胶质细胞的功能调控作用及其机制研究进展

去甲肾上腺素对神经元、胶质细胞和小胶质细胞的功能调控作用及其机制研究进展

去甲肾上腺素对神经元、胶质细胞和小胶质细胞的功能调控作用及其机制研究进展杨振宇;喻田【摘要】中枢神经系统内的去甲肾上腺素是一种重要的神经递质.主要来源于蓝斑核团并投射至几乎整个大脑皮层,因此接受去甲肾上腺素调控的脑区十分广泛.在大脑皮层,神经元、星形胶质细胞以及小胶质细胞可以表达多种不同亚型的肾上腺素能受体,去甲肾上腺素可通过与上述靶细胞表达的不同受体结合,以此发挥其在中枢神经系统内的多重递质效应,包括调控细胞能量代谢、谷氨酸转运、神经炎症反应以及调节神经核团功能状态等.【期刊名称】《山东医药》【年(卷),期】2017(057)011【总页数】4页(P99-102)【关键词】去甲肾上腺素;神经元;星形胶质细胞;小胶质细胞;肾上腺素能受体【作者】杨振宇;喻田【作者单位】遵义医学院,贵州遵义563000;遵义医学院,贵州遵义563000【正文语种】中文【中图分类】R338.2去甲肾上腺素(NE)作为中枢神经系统内的一种重要的神经递质,其主要来源于脑干蓝斑核团。

在大鼠,蓝斑核团由大约1 500个去甲肾上腺素能神经元组成并投射至几乎整个大脑皮层,因此中枢神经系统内NE的作用区域十分广泛。

此外,中枢神经系统内的不同神经元和胶质细胞可以表达不同亚型的NE受体,而脑内NE的递质效应,主要取决于作用特定靶细胞后的局部环路变化。

NE通过与星形胶质细胞膨大终足上的受体结合调控其功能,而小胶质细胞和神经元也表达不同的NE 受体接受并其调控。

因此,NE可以通过作用于不同细胞的不同亚型受体来调控多种细胞活动,包括细胞代谢、谷氨酸转运、炎症反应以及调节神经核团功能状态等。

本文就NE对中枢神经系统不同类型细胞(神经元、星形胶质细胞、小胶质细胞)的调控作用及其机制研究进展综述如下。

1.1 对兴奋性神经元的调控中枢神经系统内的NE可以作用于皮层神经元的相应受体来影响其电活动发放。

激动β受体可分别通过cAMP和PKA依赖途径来抑制钙离子激活的钾通道以及形成后超极化电导,以此机制来促进动作电位的发放[1]。

于慧敏,浙江大学,教授,博士生导师。主要研究方向为图像视频处理与

于慧敏,浙江大学,教授,博士生导师。主要研究方向为图像视频处理与

于慧敏,浙江大学,教授,博士生导师。

主要研究方向为图像/视频处理与分析。

2003年获科学技术三等奖一项,授权发明专利近20项,多篇论文发表在模式识别和计算机视觉领域顶尖学报和会议上。

近年来,在 (3D/2D)视频/图象处理与分析、视频监控、3D视频获取和医学图像处理等方面,主持了多项国家自然科学基金、973子课题、国家国防计划项目、国家863课题、浙江省重大/重点项目的研究和开发。

一、近年主持的科研项目(1)国家自然基金,61471321、目标协同分割与识别技术的研究、2015-2018。

(2) 973子课题,2012CB316406-1、面向公共安全的跨媒体呈现与验证和示范平、2012-2016。

(3)国家自然基金,60872069、基于3D 视频的运动分割与3D 运动估计、2009-2011。

(4) 863项目,2007AA01Z331、基于异构结构的3D实时获取技术与系统、2007-2009。

(5)浙江省科技计划项目,2013C310035 、多国纸币序列号和特殊污染字符识别技、2013-2015。

(6)浙江省科技计划重点项目, 2006C21035 、集成化多模医学影像信息计算和处理平台的研发、2006-2008。

(7)航天基金,***三维动目标的获取与重建、2008-2010。

(8)中国电信,3D视频监控系统、2010。

(9)中兴通讯,跨摄像机的目标匹配与跟踪技术研究、2014.05-2015.05。

(10)浙江大力科技,激光雷达导航与图像读表系统、2015-。

(11)横向,纸币序列号的实时识别技术、2011-2012。

(12)横向,清分机视频处理技术、2010-2012。

(参与)(13)横向,基于多摄像机的目标跟踪、事件检测与行为分析、2010。

(14)横向,红外视频雷达、2010-2012。

(15)横向,客运车辆行车安全视频分析系统、2010-2011。

二、近五年发表的论文期刊论文:1)Fei Chen, Huimin Yu#, and Roland Hu. Shape Sparse Representation for JointObject Classification and Segmentation [J]. IEEE Transactions on Image Processing 22(3): 992-1004 ,2013.2)Xie Y, Yu H#, Gong X, et al. Learning Visual-Spatial Saliency for Multiple-ShotPerson Re-Identification[J].Signal Processing Letters IEEE, 2015, 22:1854-1858.3)Yang, Bai, Huimin Yu#, and Roland Hu. Unsupervised regions basedsegmentation using object discovery, Journal of Visual Communication and Image Representation, 2015,31: 125-137.4)Fei Chen, Roland Hu, Huimin Yu#, Shiyan Wang: Reduced set density estimatorfor object segmentation based on shape probabilistic representation. J. Visual Communication and Image Representation,2012, 23(7): 1085-1094.5)Fei Chen, Huimin Yu#, Jincao Yao , Roland Hu ,Robust sparse kernel densityestimation by inducing randomness[J],Pattern Analysis and Applications: Volume 18, Issue 2 (2015), Page 367-375.6)赵璐,于慧敏#,基于先验形状信息和水平集方法的车辆检测,浙江大学学报(工学版),pp.124-129,2010.1。

基于视觉—语义关系的行为识别方法研究

基于视觉—语义关系的行为识别方法研究

摘要伴随着近几年深度学习技术的兴起,基于计算机视觉的行为识别问题得到了普遍的关注且取得了较大的发展,在安全监控、医疗监护、人机交互、自动驾驶和无人商店等领域有广泛的应用前景。

目前大多数行为识别方法只能识别单人的行为,并且只能识别诸如行走、跑步、跌倒等少量限定类型的行为,无法对场景中人和环境物体的大量交互行为进行检测。

在复杂和背景剧烈变化的场景中,使用人工构造的特征的行为识别方法对环境变化、物体形变和遮挡的鲁棒性较差,造成识别准确率较低。

此外,由于待处理的图像数据信息量大,目前大多数基于计算机视觉的行为识别方法计算复杂度高,无法实现计算的实时性。

针对上述问题,本文的主要研究工作如下:(1)针对视频中的行为识别问题,提出了一种结合三维卷积神经网络和循环神经网络的长-短期时空视觉模型(Long-Short Term Spatio-Temporal Visual Model,LSTVM)。

该方法首先利用三维卷积神经网络提取视频中的短期时空视觉特征,然后将具有通用性的短期特征输入一种改进的循环神经网络,提取特异性的长期行为特征。

实验结果表明,LSTVM方法在UCF101数据集上取得了87.6%的准确率。

(2)为了提高视频中的交互行为的识别准确率,在研究工作(1)的基础上研究了视频中的交互行为识别的优化问题,提出了一种融合人-物体视觉关系的长-短期时空视觉模型(Long-Short Term Spatio-Temporal Visual Model with Human-Object Visual Relationship,HOVR-LSTVM)。

该方法首先利用基于卷积神经网络的物体检测器获取人和物体的语义和空间位置信息,然后构造语义-空间位置特征并与短期时空视觉特征进行特征融合。

实验结果表明,HOVR-LSTVM 方法在UCF101数据集上将准确率提高到了92.5%,已超过了当前同类方法。

此外,相比于其它基于光流信息的行为识别方法,HOVR-LSTVM方法计算复杂度低,计算速率达到了125.2帧/秒,实现了识别的超实时性。

Reiss莱斯文体分类

• A additional type ,a" hyper-type"---
the multi-medial text type.
• The need for this arises from the fact that the translating material does not only consist of written texts, but also include verbal texts
• Eg.
• a,The particular frequency of words and phrases of evaluation (positive for the addresser or for the cause to which he has committed himself; negative for any obstacle to his commitment)
• At the point, the author distinguish the changes between "intentional" and "unintentional" changes affecting the translation.
Unintentional changes may arise from the different language structures as well as from differences in
• INTERLINGUAL TRANSLATION may be defined as a bilingual mediated process of communication, which ordinarily aims at the production of a TL [target language] text that is functionally equivalent to an SL text [source language].

基于混沌神经网络的交通流预测算法


详细的分析, 在此不Байду номын сангаас赘述。 大量学者研究表明, 交通流中存在混沌现象, 并
[5 - 8 ] 。从 预 测 角 度 而 就此进行了 较 为 深 入 的 研 究 言, 混沌理论与神经网络的应用主要可以分为两类 :
一是神经网络神经元具有混沌特性, 主要应用于复 杂问题的优化等领域; 二是利用混沌理论与神经网 络串行结合, 构造预测结构,即将混沌理论与神经 [9 ] 网络结合起来对时间序列进行预测 。
{
( 3)
154
济 南 大 学 学 报( 自然科学版)
第 26 卷
图1
混沌神经网络结构图
2
仿真算例
选取济南市典型路口经十路与舜耕路路口为分 析对象, 仿真数据采用由济南市交警支队获得的该 采集间隔为 5 min, 仿真环 路口西向东某周内数据, 境为 Matlab7. 0. 1 。 2. 1 数据预处理 由式( 1 ) 处理后得到图 2 所示交通量曲线。
法, 该算法首先计算混沌相空间的嵌入维数和嵌入延迟 , 构造得到的相空间向量作为 RBF 神经网络的输入, 其相空间次邻向 量作为期望输出值, 滚动训练得到神经网络的权值 , 然后以实际交通流作为输入 , 经由网络计算得到预测值 。 仿真结果表明 该算法相比于 RBF 神经网络, 预测精度提高 96% , 证明了该算法的有效性 。 关键词:RBF 神经网络; 智能交通; 交通流预测; 混沌理论 中图分类号:TP39 ; U49 文献标志码:A
Abstract : To improve the precision of traffic flow forecasting, we studied a kind of new chaos RBF neural network ( CRBF) algorithm. It first calculates embedding dimensions and embedding delay of chaos phase space, takes the phase space vector as the input of RBF neural network and the vector value next to input as the output expectation value, finally trains the network and gets the weights. the collected traffic flow could be input into the net, and after phase space reconstruction and network forecasting, the preIn practice, diction and error value are got. The simulation results show that our algorithm is much more precise than common RBF network, and the prediction accuracy is improved by 96% . Key words: RBF neural network; ITS; traffic flow prediction; chaos theory

多视图视频编码基于快速交叉模式选择说明书

3rd International Conference on Multimedia TechnologyÿICMT 2013)Multi-view Video Coding Based on fast inter mode selectionSen Wang1, Yingyun Yang2,Huabing Wang3Abstract.Multiview video coding mode selection is divided into inter mode selection and intra mode selection, This paper by calculating the adaptive threshold and the complexity of inter-frame motion, proposed a fast inter mode selection algorithm based on the motion complexity and texture information. This algorithm makes use of inter macroblocks spatio-temporal correlation, Early analysis of the movement complex of the frame around, reducing the alternative models. According to the texture information on pattern further screening, thus greatly improving the pattern search speed, while the selection of adaptive threshold ensures coding versatility. Experimental results show that compared with JMVC8.5 FastSearch algorithm,the new algorithm can save about 60% of encoding time,while maintain high RD performance.Keywords:JMVC FastSearch, Fast inter-mode selection, Motion complexity, Texture information.1 IntroductionMulti-view video (MVC) technology is one of the hot spot of current research . It is widely used in Free-viewpoint TV (FTV), 3DTV, 3D films, virtual reality and so on. However, Opposed to the single view video, multi-view video requires processing more data. MVC is an on-going development of the H.264/AVC video coding standard [1]. Joint Video Team (JVT) has delivered the working MVC as well as its reference software, previously known as JMVM (Joint Multi-view Video Model) [2] and recently updated as JMVC (Joint Multi-view Video Coding) [3].In JMVC, the best coding mode is determined by exhaustively searching over all possible partition modes using rate distortion comparison in motion estimation for each macroblock[7][8], which results in extremely large encoding times. This paper proposed a fast inter mode selection algorithm based on motion complexity in image region and texture information, unnecessary modes are excluded in advance by calculating the adaptive threshold and the motion complexity,then coding efficiency improved. Compared with the fast algorithm in image-based texture analysis proposed by literature [9] and the fast search algorithm TZS1 proposed by literature [10] ,we can save even more time.2 Fast Mode DecisionIn JMVC, Inter prediction and inter-view prediction spent most of the entire coding time.The inter prediction and inter-view prediction using the same prediction model, that a variety of sizes prediction1Sen Wang ( )College of Information Engineering, Communication University of China, Beijing, Chinae-mail: ****************2Yingyun YangCollege of Information Engineering, Communication University of China, Beijing, China3Huabing WangCollege of Information Engineering, Communication University of China, Beijing, Chinamode, including SKIP, Inter16 × 16, Inter16 × 8, Inter8 × 16, Inter8 × 8Frext, Inter8 × 8, Intra16 × 16, Intra8 × 8 and Intra4 × 4 and other models [5][6]. In order to get a good picture quality and access to balance between the code rate, JMVC used for each prediction mode full traversal methods [4]. While traversing can get the best encoding mode, but the full search method is based on computational complexity increases exponentially at the expense of wasting a lot of time to search for some unlikely to be coding mode block mode. If there are effective fast mode selection algorithms in advance exclude some unnecessary coding mode, will effectively reduce the computational complexity, greatly improve the coding efficiency.2.1 Adaptive threshold determination method of motion complexitySince the continuity of the video image content, the majority of zone has a strong time correlation and spatial correlation.According to the motion complexity of adjacent images, stereoscopic video images can be roughly divided into the background area, sub-regional background and foreground region. Background area and complexity background area changes little with time and has a strong spatial correlation, its motion complexity is low for large size segmentation; sport zones changes larger with time, its motion complexity is hihg for small size segmentation, as figure 2.1.Analysis of correlation of the current macroblock and predicted macroblock, calculate the macro block motion complexity.We define a symbol Mmc(motion complexity) that represents the current macroblock motion complexity:)','(),(),(y x f y x f y x f -=∆ (2.1)|),(|11010Average y x f N M Mmc N y M x -∆⨯=∑∑-=-= (2.2)Where f (x, y) represents the luminance of the c urrent frame’s pixel (x, y), f(x',y') represents the luminance of the reference frame corresponding to the pixel (x', y');M × N represents width and height of the current macroblock, is normally set to 16 × 16; Average represents the current macroblock average luminance in residual graph, calculated as: ∑∑∆⨯=-=-=),(11010y x f N M Average N y M x (2.3)Fig. 2.1 The proposed determination method according to the motion complexity (exit)In general, most background or sub-background of adjacent images are stationary or almost no movement, so the lower complexity of its movement, Mmc is small;while to the foreground, the higher complexity of its movement, low correlation, Mmc is large; Therefore, by setting a threshold value todetermine the macroblock in which region, suppose set two thresholds TH1 and TH2 (TH1 <TH2), Defined as follows: flat background Mmc <T1; complex background T2 <Mmc <T1; foreground Mmc> T2. We define the background is mainly selected pattern set A = {16 × 16}, sub-background is mainly selected pattern set B = {16 × 16,16 × 8,8 × 16}, foreground mainly selected pattern set C = {16 × 16,16 × 8,8 × 16,8 × 8,8 × 4,4 × 8,4 × 4}.The size of TH1 and TH2 are generally set by the actual experience, but for very different sequence of textures, you need to manually set a different value and then run the encoding process, which gives the actual coding trouble. This paper presents an adaptive threshold method to automatically set TH1 and TH2 according to the image texture , to improve coding practicality. First calculate the average complexity of all macroblock in motion residual figure Avg_mmc:Mmc W H Avg W y H x mmc ∑∑-=-=⨯=10101(2.4)Where W denotes the number of 16×16 macroblocks in horizontal direction of the motion residual figure, H denotes the number of 16×16 macroblocks in vertical direction of the motion residual figure. We put the brightness difference less than 3 pixels as background region in residual figure, then calculate the background in the entire residual image proportion WEIGHT, then calculate adaptive threshold TH1, then:TH1=Avg_mmc ×WEIGHT (2.5)Because in an image, the proportion of the background region is the proportion of flat background region X times,then TH2=TH1×(X+1), Experiments in this article we assume that the image sub-background region and background region accounts for a percentage of the ratio of 2, then:TH2=TH1×(2+1) (2.6)2.2 According to the texture information narrowed pattern setWhen a macroblock in the horizontal direction belong to the same object, the horizontal direction is relatively smooth texture, the texture is relatively complex vertical direction; when one macro block in the vertical direction are the same object, the vertical direction is relatively smooth texture, horizontal texture is relatively complex. Therefore through the texture feature of the macroblock to select the appropriate division pattern: For the size of 16 × 16 macroblock, if the macroblock to be coded belonging to the same in horizontal direction, then the choice of 16 × 8 greater chance of block mode, otherwise select 8 × 16. We define two macro blocks can be expressed the horizontal and vertical direction texture variables:|)1,(),(|1010-∆-∆=∑∑-=-=y x f y x f D N y Mx h (2.7) |),1(),(|1010y x f y x f D N y M x v -∆-∆=∑∑-=-= (2.8)Where Dh denotes texture features in horizontal direction; Dv denotes texture features in vertical residual figure the next line corresponding to the brightness value of the pixel, △f(x-1, y)represents the brightness values before a pixel in the residual figure. When Dh >Dv, the block texture tends to be horizontal, 8×16 mode and 4×8 mode can be excluded in advance; When Dh< Dv the block texture tends to be vertical, 16×8 mode and 8×4 mode can be excluded in advance.2.3 This paper presents algorithmsThe specific steps of proposed algorithm are as follows:1) According to the macroblock (x', y') of the reference image, the calculation of the current image corresponding to movement complexity Mmc;2) Adaptive threshold calculation: TH1 and TH2;3) Compare Mmc with TH1, TH2. a) If Mmc <TH1, the macroblock is in background area, just select {16×16} mode. b) If TH1< Mmc <TH2, the macroblock is in complex background area, select {16×16,16×8, 8×16}. As to complex background area, calculate Dh and Dv. If Dh>Dv, select {16×16,16×8}; else if Dh<Dv, choose {16×16,8×16}; otherwise, choose {8×8Frext,16×16}. c) If the Mmc > TH2, illustrate macro block is in the foreground region, traverse all the interframe modes.4)Calculate their mode centralized rate-distortion cost of each mode,Choose the optimal mode based on RDcost. Algorithm flow chart is as follows:Fig. 2.2The proposed fast algorithm flowchart3 Experimental ResultsTheexperimentwasbasedontheconfigurationofIntel(R)Core(TM)********************* 2.99GB. Test sequences are ballroom、Vassar、exit and race1. 100 frames are encoded in each viewpoint. Search mode is fast search, and search range is 16,the GOP length is set as 8. To evaluate the rate distortion performance, respectively in QP for 27,32,37,42 measure PSNR of each viewpoint, the average encoding time and the average bitrates.In Figure 3.1, JMVC represents JMVC8.5 FastSearch algorithm structure,”improved” represents using improved fast inter mode selection algorithm.Fig. 3.1 Rate-distortion curve contrast diagramFig. 3.2 Algorithm coding time comparison chartTable 3.1 improved algorithm with JMVC original fast algorithm code comparison tableimproved algorithm compare with JMVC original fast algorithm△encoding time(%) △bitrate (%) △YPSNR(dB) △UPSNR(dB)-58.58 1.84 -0.03 -0.038-62.63 1.13 -0.014 -0.004-51.41 1.25 -0.016 -0.01-62.89 2.19 -0.024 -0.032-58.88 1.60 -0.021 -0.021From Figure 3.1, the rate-distortion curve can be seen: the two algorithms rate-distortion curve almost coincide, indicating that the coding efficiency of the improved algorithm compared with the original algorithm is hardly declined. Figure 3.2 and Table 3.1 can be seen in the improved coding algorithm almost no damage to image quality, the encoding time is greatly reduced, has better real-time. For more background area in Vassar sequence encoding time reduction of about 63%; As for the scene change faster race1 sequence encoding time decreases 52% or so. Description of the algorithm for a sequence with more fixed background and fewer moving object has better optimizationresults, and therefore more suitable for video conferencing, remote video surveillance and other occasions.From Table 3.2 can be seen that the proposed algorithm in the almost same image quality with literature [9]and [10], we can save more encoding time.4 ConclusionsThis paper makes use of inter macroblocks spatio-temporal correlation,proposed a fast inter mode selection algorithm based on the motion complexity and texture information. Experimental data show that the algorithm PSNR drop is less than 0.03dB, encoding bitrate rose less than 2.19%, almost without a decline in image quality and coding time saves about 60% on average compare with JMVC8.5 FastSearch algorithm. This algorithm is suitable for all kinds of video sequences, and the sequence with more fixed background is better.Acknowledgment This work is supported by the National Science & Technology Pillar Program of China under Grant No.2012BAH39F02.References1. ISO/IEC International Standard 14496-10, Information technology-Coding of Audio-Visual Objects-Part 10: AdvancedVideo Coding, third edition, Dec. 2005, corrected version, March 2006.2. A. Vetro, P. Pandit, H. Kimata, A. Smolic and Y. K. Wang, ISO/IEC JTC1/SC29/WG11 and ITU-T Q6/SG16. JointMultiview Video Model (JMVM) 8.0, Doc. JVT-AA207, Apr. 2008.3. Joint Video Team, MVC software manual, Version JMVC 8.5 (CVS tag: JMVC_8_5), March 26, 2011.4. Description of core experiments in MVC,ISO/IEC JTC1/SC29/WG11MPEG2006/W8019,Montreux,April 2006.5. Liu Z, ShenLQ, ZhangZY. An efficient intermode decision algorithm based on motion homogeneity forH.264/AVC[J].IEEE Transactions on Circuits and Systems for Video Technology200919(1):128-132.6. Zeng H Q,Cai C H,MaKK. Fast mode decision for H.264/AVC based on macroblock motion activity [J]. IEEETransactions on Circuits and Systems for Video Technology200919(4):491-499.7. HE Yong,LIU Gui-hua,H.264 inter mode selection optimization algorithm,VIDEO ENGINEERING, 2007,31(12)8. HE Jun-qiu,ZHAO Huan,Based motion complexity fast mode selection algorithm,COMPUTER ENGINEERING ANDAPPLICATION, 2009,45(7)9. Huabing Wang ,Yingyun Yang,Sen Wang ,Fast Mode Selection Based on Texture Segmentation in JMVC, 2012 14thInternational Conference on Communication Technology, 2012 (11);10. Xiuli Tang, Multiview video coding algorithm for fast search mode,Huaqiao University Thesis,2010.1。

武汉江汉路历史文化街区声景时空分异初探

中图分类号 TU112 文献标识码 B 文章编号 1003-739X (2023)12-0035-06 收稿日期 2023-02-21摘 要 声景是体现历史文化街区传统特色的独特要素,反映了街区物质空间和人为活动之间复杂的关联性,承载了街区的历史记忆和标志性特征。

该文从武汉江汉路的客观声环境调查入手,在探析街区声景构成要素的基础上,对其时间和空间上的分异特征及其影响因素进行了论述分析,最后总结了江汉路声景的形成与变化规律,为历史文化街区声景的保护与利用提供借鉴和参考。

关键词 江汉路 历史文化街区 商业步行街 声景 时空分异Abstract Soundscape is a unique element that reflects the traditional characteristics of urban historic conservation areas, it reflects the complex correlation between the material space and human activities, and carries the historical memory and symbolic characteristics of the blocks. Starting with the investigation of the current situation of objective sound environment on Jianghan Road in Wuhan, and on the basis of the analysis of the soundscape components of the block, we discuss and analyze its spatio-temporal differentiation characteristics and influencing factors. Finally, we summarize the formation and change rules of the soundscape of Jianghan Road, so as to provide reference for the protection and utilization of soundscape in urban historic conservation areas.Keywords Jianghan Road, Urban historic conservation areas, Commercial pedestrian street, Soundscape, Spatio-temporal differentiation武汉江汉路历史文化街区声景时空分异初探Exploration on the Spatial and Temporal Differentiation of Soundscape in Urban Historic Conservation Areas of Jianghan Road, Wuhan历史文化街区能够较完整和真实地反映地方传统格局和历史风貌[1],在延续城市文化生命力、增强城市文化发展动力、丰富城市文化内涵等方面发挥着举足轻重的作用。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Spatio-temporalShapeandFlowCorrelationforActionRecognitionYanKe1,RahulSukthankar2,1,MartialHebert11SchoolofComputerScience,CarnegieMellon;2IntelResearchPittsburgh

{yke,rahuls,hebert}@cs.cmu.edu

AbstractThispaperexplorestheuseofvolumetricfeaturesforactionrecognition.First,weproposeanovelmethodtocorrelatespatio-temporalshapestovideoclipsthathavebeenautomaticallysegmented.Ourmethodworksonover-segmentedvideos,whichmeansthatwedonotrequireback-groundsubtractionforreliableobjectsegmentation.Next,wediscussanddemonstratethecomplementarynatureofshape-andflow-basedfeaturesforactionrecognition.Ourmethod,whencombinedwitharecentflow-basedcorrela-tiontechnique,candetectawiderangeofactionsinvideo,asdemonstratedbyresultsonalongtennisvideo.Althoughnotspecificallydesignedforwhole-videoclassification,wealsoshowthatourmethod’sperformanceiscompetitivewithcurrentactionclassificationtechniquesonastandardvideoclassificationdataset.

1.IntroductionThegoalofactionrecognitionistolocalizeaparticulareventofinterestinvideo,suchasatennisserve,bothinspaceandintime.Justasobjectrecognitionisakeyprob-leminimageunderstanding,actionrecognitionisafunda-mentalchallengeforinterpretingvideo.Arecenttrendinactionrecognitionhasbeentheemergenceoftechniquesbasedonthevolumetricanalysisofvideo,whereasequenceofimagesistreatedasathree-dimensionalspace-timevol-ume.Eschewingthebuildingofexplicitmodelsoftheactororenvironment(e.g.,kinematicmodelsofhumans),theseapproachesattempttoperformrecognitiondirectlyontherawvideo.Anobviousbenefitisthatrecognitionneednotbelimitedtoaspecificsetofactorsoractionsbutcan,inprinciple,extendtoavarietyofevents—givenappropriatetrainingdata.Thedrawbackisthatvolumetricrepresen-tationsdonoteasilygeneralizeacrossappearancechangesduetodifferentactors,varyingenvironmentalconditionsandcameraviewpoint.Thisobservationhasmotivatedtheemploymentofvideofeaturesthatarerobusttoappearance;thesecanbebroadlycategorizedasshape-based(e.g.,back-groundsubtractedhumansilhouettes)andflow-based(e.g.,Figure1.Ourgoalistodetectspecificactionsinrealisiticvideoswithclutteredenvironments.First,wesegmentinputvideointospace-timevolumes.Then,wecorrelateactiontemplateswiththevolumesusingshapeandflowfeatures.Weareabletolocalizeeventsinspace-timewithouttheneedforbackground-subtractedvideos.motionfieldsgeneratedusingopticalflow).However,asdiscussedbelow,bothofthesetypesofmethodshavesig-nificantlimitations.Silhouette-basedapproachesattempttorecognizeac-tionsbycharacterizingtheshapeoftheactor’ssilhouettethroughspace-time,andthusarerobusttovariationsinclothingandlighting[2,3,21].Therearetwomajorlim-itationswithsuchapproaches.First,theyassumethatthesilhouettescanbeaccuratelydelineatedfromtheback-ground.Second,theyassumethattheentirepersonisrep-resentedasoneregion.Therefore,suchtechniquestypi-callyrequirestaticcamerasandagoodbackgroundmodel.Unfortunately,evenstate-of-the-artbackgroundsubtractiontechniquesgenerateholeswhenpartsoftheactorblendinwiththebackground,orcreateprotrusionsonthesilhouettewhenstrongshadowsarepresent.Theseartifactsconse-quentlyreducetheaccuracyofshape-basedactionrecog-nitiontechniques.Amoresubtlelimitationofsilhouette-basedtechniquesisthattheyignorefeaturesinsidetheboundary,suchasinternalmotionoftheobject.Flow-basedtechniquesestimatetheopticalfieldbetweenadjacentframesandusethatasthebasisforactionrecogni-

1a)serveactionb)runrightactionc)returnserveactionFigure2.Illustrationofactionsdetectedinatennissequence.Toprow:templates;Bottomrow:exampledetections.

tion.Keetal.learnadiscriminativecascadeof3Dboxfea-turesontheflow[12].ShechtmanandIraniuseatemplatematchingapproachtocorrelatetheflowconsistencybe-tweenthetemplateandthevideo[19].Inadditiontobeinginvarianttoappearancevariations,animportantadvantageofflow-basedapproachesisthattheyrequirenobackgroundsubtractionandthusthesemethodscanprocessvideoswithlimitedcameramotion.However,opticalflowisaverycoarsefeatureandthereforemanyscenesarelikelytoex-hibitsimilarflowsovershortperiodsoftime.Forexam-ple,Keetal.observedthatintheKTHactionsdataset[18],theirboxingdetectorwastriggerednearahand-clapactionbecausethoseregionscontainedthesameflow[12].Commontoallappearance-basedapproachesarelimi-tationsduetochangesincameraviewandvariabilityinthespeedofactions.Veryfewrepresentationsarerobusttothesevariations,andthestandardapproachistospanthespaceofvariationsusingmultipletrainingexamples.Othershaveattemptedtousespace-timeinterestpointfeaturesforaddedrobustness[10,16,18].Whilethesparsityofthein-terestpointsiscertainlyappealingfromanefficiencystand-point,itisunclearhowthesemethodscompareagainstvol-umetricapproaches.Thispaperevaluatesshape-andflow-basedvolumetricfeaturesagainstinterestpointtechniquesonastandarddataset.Ourpapermakestwomajorcontributions.First,wepro-poseasimpleyeteffectiveshape-basedrepresentationformatchingvideosthatdoesnotrequirebackgroundsubtrac-tion,norexplicitbackgroundmodels.Second,wecom-bineourshape-basedmethodwithrecentflow-basedtech-niquesanddemonstrateimprovedrecognitionperformance.Ourshape-basedmatchingconsistsofspatio-temporalre-gionextractionandregionmatching.Forregionextrac-tion,weemployanunsupervisedclusteringtechniquetosegmentthevideointothree-dimensionalvolumesthatareinternallyconsistentinappearance;wetermthese“super-voxels”sincetheyareconceptuallyanalogoustosuperpix-els[17].Weobservethatrealobjectboundariesinspatio-temporalvolumestypicallyfallonsupervoxelborders,justassuperpixelborderscorrespondtousefulsegmentationboundaries[15].Aswithallbottom-upsegmentationtech-niques,wedonotexpecttheregionextractortosegmenttheentireobjectasasingleregion,andthusweerronthesideofover-segmentation.Weproposeashapematchingtechniquethatworksdespiteover-segmentedvideos.Thisissimilarinspirittorecentworkinshape-guidedfigure-groundsegmentation[4].Wethendiscussthelimitationsofshapeandflow-basedtechniquesforactionrecognitionandarguethattheircomplementarynatureallowsthemtomit-igateeachother’slimitations.Toshowthebenefitsofthecombinedfeatures,weincorporateShechtmanandIrani’sflow-basedfeatures[19]intoourclassifieranddemonstrateimprovedperformanceonachallengingeventdetectiontask(seeFigure2)andastandardvideoclassificationtask.

相关文档
最新文档