Visual Tracking and Recognition using Appearance-Adaptive Models in Particle Filters

合集下载

机械工程及自动化专业外文翻译--可以行走、翻身并站立的有两手和两足的机器人

外文原文：Two-Armed Bipedal Robot that can Walk, Roll Over and Stand upMasayuki INABA, Fumio KANEHIROSatoshi KAGAMI, Hirochika INOUEDepartment of Mechano-InformaticsThe University of Tokyo7-3-l Hongo, Bunkyo-ku, 113 Tokyo, JAPANAbstractFocusing attention on flexibility and intelligent reactivity in the real world, it is more important to build, not a robot that won’t fall down, but a robot that can get up if it does full down. This paper presents a research on a two-armed bipedal robot, an apelike robot, which can perform biped walking, rolling over and standing up. The robot consists of a head, two arms, and two legs. The control system of the biped robot is designed based on the remote-brained approach in which a robot does not bring its own brain within the body and talks with it by radio links. This remote-brained approach enables a robot to have both a heavy brain with powerful computation and a lightweight body with multiple joints. The robot can keep balance in standing using tracking vision, detectwhether it falls down or not by a set of vertical sensors, and perform getting up motion colaborating two arms and two legs. The developed system and experimental results are described with illustrated real examples.1 IntroductionAs human children show, it is indispensable to have capability of getting up motion in order to learn biped locomotion. In order to build a robot which tries to learn biped walking automatically, the body should be designed to have structures to support getting up as well as sensors to know whether it lays or not.When a biped robot has arms, it can perform various behaviors as well as walking. Research on biped walking robots has presented with realization[1][2][3].It has mainly focused on the dynamics in walking,treating it as an advanced problem in control[3][4][5].However, focusing attention on the intelligent reactivity in the real world, it is more important to build, not a robot that won’t fall down, but a robot that can get up if it does fall down.In order to build a robot that can get up if it falls down, the robot needs sensing system to keep the body balance and to know whether it falls down or not. Although vision is one ofthe most important sensing functions of a robot, it is hard to build a robot with a powerful vision system on its own body because of the size and power limitation of a vision system. If we want to advance research on vision-based robot behaviors requiring dynamic reactions and intelligent reasoning based on experience, the robot body has to be lightweight enough to react quickly and have many DOFS in actuation to show a variety of intelligent behaviors.As for the legged robot [6] [7] [8],there is only a little research on vision-based behaviors[9]. The difficulties in advancing experimental research for vision-based legged robots are caused by the limitation of the vision hardware. It is hard to keep developing advanced vision software in limited hardware. In order to solve the problems and advance the study of vision-based behaviors, we have adopted a new approach through building remote-brained robots. The body and the brain are connected by wireless links by using wireless cameras and remote-controlled actuators.As a robot body does not need computers on-board,it becomes easier to build a lightweight body with many DOFS in actuation.In this research, we developed a two-armed bipedal robot using the remote-brained robot environment and made it toperform balancing based on vision and getting up through cooperating arms and legs. The system and experimental results are described below.2 The Remote-Brained SystemThe remote-brained robot does not bring its own brainwithin the body. It leaves the brain in the mother environment and communicates with it by radio links. This allows us to build a robot with a free body and a heavy brain. The connection link between the body and the brain defines the interface between software and hardware. Bodies are designed to suit each research project and task. This enables us advance in performing research with a variety of real robot systems[10].A major advantage of remote-brained robots is that the robot can have a large and heavy brain based on super parallel computers. Although hardware technology for vision has advanced and produced powerful compact vision systems, the size of the hardware is still large. Wireless connection between the camera and the vision processor has been a research tool. The remote-brained approach allows us to progress in the study of a variety of experimental issues in vision-based robotics.Another advantage of remote-brained approach is that the robot bodies can be lightweight. This opens up the possibility of working with legged mobile robots. As with animals, if a robot has 4 limbs it can walk. We are focusing on vision-based adaptive behaviors of 4-limbed robots,mechanical animals, experimenting in a field as yet not much studied.The brain is raised in the mother environment in-herited over generations. The brain and the mother environment can be shared with newly designed robots. A developer using the environment can concentrate on the functional design of a brain. For robots where the brain is raised in a mother environment, it can benefit directly from the mother’s ‘evolution’,meaning that the software gains power easily when the mother is upgraded to a more powerful computer. Figure 1 shows the configuration of the remote-brained system which consists of brain base, robot body and brain-body interface.In the remote-brained approach the design and theperformance of the interface between brain and body is the key. Our current implementation adopts a fully remotely brained approach, which means the body has no computer onboard. Current system consists of the vision subsystems, the non-vision sensor subsystem and the motion control subsystem. A block can receive video signals from cameras on robot bodies. The vision subsystems are parallel sets each consisting of eight vision boards.A body just has a receiver for motion instruction signals and a transmitter for sensor signals. The sensor information is transmitted from a video transmitter. It is possible to transmit other sensor information such as touch and servo error through the video transmitter by integrating the signals into a video image[11]. The actuator is a geared module which includes an analog servo circuit and receives a posit.ion reference value from the motion receiver. The motion control subsystem can handle up to 104 actuators through 13 wave bands and send the reference values to all the actuators every 20msec.3 The Two-Armed Bipedal RobotFigure 2 shows the structure of the two-armed bipedal robot. The main electric components of the robot are joint servo actuators, control signal receivers, an orientation sensor with transmitter, a battery set for actuators and sensors sensor and a camera with video transmitter. There is no computer on-board. A servo actuator includes a geared motor and analog servo circuit in the box. The control signal to each servo module is position reference. The torque of servo modules available cover 2Kgcm - 14Kgcm with the speed about 0.2sec/60deg. The control signal transmitted onradio link encodes eight reference values. The robot in figure 2 has two receiver modules onboard to control 16 actuators.Figure 3 explains the orientation sensor using a set of vertical switches. The vertical switch is a mercury switch. When the mercury switch (a) is tilted, the drop of mercury closes the contact between the two electrodes. The orientation sensor mount two mercury switches such as shown in (b). The switches provides two bits signal to detect four orientation of the sensor as shown in (c). The robot has this sensor at its chest and it can distinguish four orientation; face up, face down, standing and upside down.The body structure is designed and simulated in the mother environment. The kinematic model of the body is described in an object-oriented lisp, Euslisp which has enabled us to describe the geometric solid model and window interface for behavior design.Figure 4 shows some of the classes in the programming environent for remote-brained robot written in Euslisp. The hierachy in the classes provides us with rich facilities for extending development of various robots.4 Vision-Based BalancingThe robot can stand up on two legs. As it can change the gravity center of its body by controling the ankle angles, it can perform static bipedal walks. During static walking the robot has to control its body balance if the ground is not flat and stable.In order to perform vision-based balancing it is re-quired to have high speed vision system to keep ob-serving moving schene. We have developed a tracking vision board using a correlation chip[l3]. The vision board consists of a transputer augmented with a special LSI chip(MEP[14]: Motion Estimation Processor) which performs local image block matching.The inputs to the processor MEP are an image as a reference block and an image for a search window.The size of the reference block is up to 16 by 16 pixels.The size of the search window depends on the size of the reference block is usually up to 32 by 32 pixels so that it can include 16 * 16 possible matches. The processor calculates 256 values of SAD (sum of absolute difference) between the reference block and 256 blocks in the search window and also finds the best matching block, that is, the one which has the minimumSAD value.Block matching is very powerful when the target moves only in translation. However, the ordinary block matching method cannot track the target when it rotates. In order to overcome this difficulty, we developed a new method which follows up the candidate templates to real rotation of the target. The rotated template method first generates all the rotated target images in advance, and several adequate candidates of the reference template are selected and matched is tracking the scene in the front view. It remembers the vertical orientation of an object as the reference for visual tracking and generates several rotated images of the reference image. If the vision tracks the reference object using the rotated images, it can measures the body rotation. In order to keep the body balance, the robot feedback controls its body rotation to control the center of the body gravity. The rotational visual tracker[l5] can track the image at video rate.5 Biped WalkingIf a bipedal robot can control the center of gravity freely, itcan perform biped walk. As the robot shown in Figure 2 has the degrees to left and right directions at the ankle position, it can perform bipedal walking in static way.The motion sequence of one cycle in biped walking consists of eight phases as shown in Figure 6. One step consists of four phases; move-gravity-center-on-foot,lift-leg, move-forward-leg, place-leg. As the body is described in solid model, the robot can generate a body configuration for move-gravity-center-on-foot according to the parameter of the hight of the gravity center. After this movement, the robot can lift the other leg and move it forward. In lifting leg, the robot has to control the configuration in order to keep the center of gravity above the supporting foot. As the stability in balance depends on the hight of the gravity center, the robot selects suitable angles of the knees.Figure 7 shows a sequence of experiments of the robot in biped walking.6 Rolling Over and Standing UpFigure 8 shows the sequence of rolling over, sitting and standing up. This motion requires coordination between arms and legs.As the robot foot consists of a battery, the robot can make use of the weight of the battery for the roll-over motion. When the robot throws up the left leg and moves the left arm back and the right arm forward, it can get rotary moment aroundthe body. If the body starts turning, the right leg moves back and the left foot returns its position to lie on the face. This rollover motion changes the body orientation from face up to face down. It can be verified by the orientation sensor.After getting face down orientation, the robot moves the arms down to sit on two feet. This motion causes slip movement between hands and the ground. If the length of the arm is not enough to carry the center of gravity of the body onto feet, this sitting motion requires dynamic pushing motion by arms. The standing motion is controlled in order to keep the balance.7 Integration through Building Sensor-Based Transition NetIn order to integrate the basic actions described above, we adopted a method to describe a sensor-based transition network in which transition is considered according to sensor status. Figure 9 shows a state transition diagram of the robot which integrates basic actions: biped walking, rolling over, sitting, and standing up. This integration provides the robot with capability of keeping walking even when it falls down. The ordinary biped walk is composed by taking two states, Left-leg Fore and Right-leg Fore, successively.The poses in ‘Lie on the Back’ and ‘Lie on the Face’are as same as one in ‘Stand’. That is, the shape ofthe robot body is same but the orientation is different.The robot can detect whether the robot lies on the back or the face using the orientation sensor. When the robot detects falls down, it changes the state to ‘Lie on the Back’ or ‘Lie on the Front’ by moving to the neutral pose. If the robot gets up from ‘Lie on the Back’, the motion sequence is planned to execute Roll-over, Sit and Stand-up motions. If the state is ‘Lie on the Face’, it does not execute Roll-over but moves arms up to perform the sitting motion.8 Concluding RemarksThis paper has presented a two-armed bipedal robot whichcan perform statically biped walk, rolling over and standing up motions. The key to build such behaviors is the remote-brained approach. As the experiments have shown, wireless technologies permit robot bodies free movement. It also seems to change the way we conceptualize robotics. In our laboratory it has enabled the development of a new research environment, better suited to robotics and real-world AI.The robot presented here is a legged robot. As legged locomotion requires dynamic visual feedback control, its vision-based behaviors can prove the effectiveness of the vision system and the remote-brained system. Our vision system is based on high speed block matching function implemented with motion estimation LSI. The vision system provides the mechanical bodies with dynamic and adaptive capabilities in interaction with human. The mechanical dog has shown adaptive behaviors based on distance measurement by tracking. The mechanical ape has shown tracking and memory based visual functions and their integration in interactive behaviors.The research with a two-armed bipedal robot provides us with a new field in intelligent robotics research because of itsvariety of the possible behaviors created from the flexiblility of the body. The remote-brained approach will support learning-based behaviors in this research field. The next tasks in this research include how to learn from human actions and how to allow the robots to improve their own learned behaviors.References[1] I. Kate and H. Tsuik. The hydraulically powered biped walking machine with a high carrying capacity. In Proc. Of 4th Int. Sys. on External Control of Human Extremities,1972.[2] H. Miura and I. Shimoyama. Dynamic walk of a biped. International Journal of Robotics Research, Vol. 3, No. 2,pp. 60-74, 1984.[3] S. Kawamura, T. Kawamura, D. fijino, F. Miyazaki, and S. Arimoto. Realization of biped locomotion by motion pattern learning. Journal of the Robotics Society of Japan,Vol. 3, No. 3, pp. 177-187, 1985.[4] Jessica K. Hodgins and Marc H. Raibert.Biped gymnastics.International Journal of Robotics Research, Vol. 9,No. 2, pp. 115-132, 1990.[5] A. Takanishi, M. Ishida, Y. Yamazaki, and I. Kato. The realization of dynamic walking by the biped walking robotwl-lord. Journal of the Robotics Society of Japan, Vol. 3, No. 4, pp. 325-336, 1985.[6] R.B. McGhee and G.I. Iswandhi. Adaptive locomotion of a multilegged robot over rough terrain. IEEE Trans.On Systems, Man and Cybernetics,Vol.SMC-9,No.4,pp. 176-182,1979.[7] M. H. Raibert, Jr. H. B. Brown, and S. S. Murthy. 3-d balance using 2-d algorithms. Robotics Research : the First International Symposium on Robotics Research (ISRRI),pp. 279-301, 1983.[8] S. Hirose, M. Nose, H. Kikuchi, and Y. Umetani. Adaptive gait control of a quadruped walking vehicle. Robotics Research : the First International Symposium on Robotics Research (ISRRI), pp. 253-369, 1983.[9] R.B. McGhee, F. Ozguner, and S.J. Tsai. Rough terrain locomotion by a hexapod robot using a binocular ranging system. Robotics Research:the First International Symposium on Robotics Research (ISRR1),pp.228-251, 1984.[10] M. Inaba. Remote-Brained Robotics: Interfacing AI with Real World Behaviors. in Proceedings of the 6th International Symposium 1993; Robotics Research: The SixthInternational Symposium,pp.335-344.International Foundation for Robotics Research, 1993.[11] M. Inaba, S. Kagami, K. Sakakki, F. Kanehiro, and H. In-oue.Vision-Based Multisensor Integration in Remote- Brained Robots. In 1994 IEEE International Conference on Multisensor Fusion and Integration fo Intelligent Systems,pp. 747-754, 1994.[I2] M. Inaba, T. Kamada, and H. Inoue. Rope Handling by Mobile Hand-Eye Robots. In Proceedings of International Conference on Advanced Robotics ICAR’93,pp. 121-126,1993.[13] H. Inoue, T. Tachikawa, and M. Inaba. Robot vision system with a correlation chip for real-time tracking, optical flow and depth map generation. In Proceedings of the 1992 IEEE International Conference on Robotics and Automation,pp. 1621-1626. 1992.[14] SGS-THOMSON Microelectronics. STI3220 motion estimation processor (tentative data). In Image Processing Data Book, pp. 115-138. SGS-THOMSON Microelectronics, 1990.[15] Masayuki Inaba, Satoshi Kagami, and Hirochika Inoue. Real time vision-based control in sumo playing robot. InProceedings of the 1993 JSME International Conference on Advanced Mechatronics, pp.854-859,1993.中文译文：可以行走、翻身并站立的有两手和两足的机器人摘要在实践中把注意力集中在灵活性和智能反应，更重要的是创想，不是一个不会倒下的机器人，而是一个倒下来可以站起来的机器人。

孪生网络目标跟踪算法

第37卷第2期2021年2月福建电脑Journal of F ujian ComputerVol.37 No.2Feb.2021孪生网络目标跟踪算法程栋栋1吕宗旺1祝玉华2\河南工业大学信息科学与工程学院河南郑州450000)2(黄河水利职业技术学院河南开封475004)摘要在计算机视觉领域中，卷积神经网络发挥着越来越重要的作用。

在海量数据的驱动下，深度学习表现出了比传统方法更为优越的特征表达能力。

基于孪生网络的目标跟踪算法由于准确性和实时性等优点，相关研宄受到越来越多的重视。

本文首先阐述了计算机视觉的研宄意义，着重介绍了几种基于孪生网络的目标跟踪算法，最后总结了这些算法的优点以及未来的研宄方向。

关键词深度学习；孪生网络；目标跟踪中图法分类号 TP391 D0I:10.16707/ki.fjpc.2021.02.026Target Tracking Algorithms Based on Siamese NetworkCHENG Dongdong1,LV Zongwang1,ZHU Yuhua21(School of Information Science and Engineering,Henan University of Technology,Zhengzhou,China,450000)2(Yellow River Water Conservancy Vocational and Technical College,Kaifeng,China,475004)1引言计算机视觉的研究工作与人类现代化的生产生活密不可分，相关技术可以应用在智能视频监控、工厂自动化生产、无人驾驶等方面[1]。

对于目标跟踪的研究是计算机视觉领域的一个重要方向。

通常情况下，目标跟踪被定义为在一个连续的视频序列中，得到指定物体的位移信息，从而描绘出该物体的位移轨迹，并对其位移数据进行分析，最终达到理解物体运动行为的目的[2]。

高水平游泳运动员训练全过程监控系统设计

2020年12月第20卷第4期廊坊师范学院学报(自然科学版)Journal of Langfang Normal University(Natural Science Edition)Dec.2020Vol.20No.4高水平游泳运动员训练全过程监控系统设计王晓宇(淮南师范学院，安徽淮南232038)【摘要】构建高水平游泳运动员训练全过程的视频采集系统，对采集的视频图像进行信息转换分析,通过视觉特征信息跟踪识别方法进行训练过程的动作特征点融合，利用视频动作图像的信息识别，建立训练全过程视频图像的关键动作特征点的提取模型，继而提取视频图像的关键特征,通过空间三维信息融合方法建立训练动作三维重构模型，结合视频追踪和过程轨迹识别，提取有效动作信息检测，实现高水平游泳运动员训练全过程监控系统优化。

【关键词】高水平游泳运动员;训练全过程;监控;系统设计Design of Monitoring System for the Whole TrainingProcess of High-level SwimmersWang Xiaoyu(Huainan Normal University,Huainan232038,China)[Abstract]This paper designs a video acquisition system for the whole training process of high-level swimmers.It conducts information conversion analysis on the collected video images of the entire training process of high-level swimmers,and uses visual feature information tracking and recognition methods to integrate the movement feature points of the training process. Using the information recognition of the video action image,the system establishes the extraction model of the key movement feature points of the high-level swimmer training video image throughout the training process,then extracts the key features of the video image,and establishes three-dimensional reconstruction model of training movements based on information fusion bined with video tracking and process trajectory recognition,the system extracts effective action information for detection,and optimizes the monitoring system for the whole training process of high-level swimmers.[Key words]high-level swimmers;the whole training process;monitoring;system design〔中图分类号〕TP391〔文献标识码〕A〔文章编号]1674-3229(2020)04-0095-050引言随着机器视觉信息识别技术的发展,采用图像视觉监控识别方法,进行高水平游泳运动员训练全过程监控,采用优化的图像处理技术,建立机器视觉下的训练全过程视频监控和图像分析模型,通过对训练全过程视频特征采样和图像的信息提取，结合运动视频跟踪识别，提高高水平游泳运动员训练全过程监控和特征分析能力，对促进游泳训练水平提升具有重要意义⑴。

图像处理和计算机视觉中的经典论文

前言：最近由于工作的关系，接触到了很多篇以前都没有听说过的经典文章，在感叹这些文章伟大的同时，也顿感自己视野的狭小。

想在网上找找计算机视觉界的经典文章汇总，一直没有找到。

失望之余，我决定自己总结一篇，希望对 CV领域的童鞋们有所帮助。

由于自
己的视野比较狭窄，肯定也有很多疏漏，权当抛砖引玉了
1990年之前
1990年
1991年
1992年
1993年
1994年
1995年
1996年
1997年
1998年
1998年是图像处理和计算机视觉经典文章井喷的一年。

大概从这一年开始，开始有了新的趋势。

由于竞争的加剧，一些好的算法都先发在会议上了，先占个坑，等过一两年之后再扩展到会议上。

1999年
2000年
世纪之交，各种综述都出来了
2001年
2002年
2003年
2004年
2005年
2006年
2007年
2008年
2009年
2010年
2011年
2012年。

基于动态视觉显著性的感兴趣目标提取与跟踪_李蕙

［12 ］［11 ］
图1
基于动态视觉显著性的感兴趣目标提取模型
形成的，大小依次为输入视频帧的 1 /2 ～ 1 /256 。然后在 2 个金字塔层之间利用中央周边差操作进行特具体算法如下：通过将代表周边背景信息征的计算，的较小尺度的图像进行线性插值，使之与代表中心信息的较大尺度的图像大小相同，然后进行点对点的减操作，以符号 Θ 表示。假沿当前帧为 f n ，图像的亮度特征用下式计算： I f n （ c，s） = I f n （ c） ΘI f n （ s），（ 1）
c =2 s = c +3
（ 5） fn CO = N 〔 N（ O f （ c，s，θ））〕。 ∑ c =0 45 ， 90 ， 135° ） θ = （ 0，
n
3
（ 6）（ 4）图 2 为一幅彩色图像的静态分量显著图。图 2 b、 c、 d 分别为应用上述方法提中 a 为输入视频帧，取的颜色分量显著图、亮度分量显著图以及方向分量显著图。
Abstract ： When the tracking moving object in video image sequences by particle filter， usually the object to be tracked was manually selected in the first video frame or segmented by background subtraction． Inspired by the human visual mechanism，an algorithm which can locate the object of interest automatically based on dynamic visual saliency modeled by scale invariant feature transform （ SIFT） flow was proposed． To detect the object of interest， some other static features such as color， intensity， and orientation were also extracted． A saliency map was developed by combining both the static saliency and the dynamic saliency， and the most salient object was selected as the object of interest． This method was further applied to object tracking with particle filter． The object template was constructed by fusing the color，gradient， and local binary pattern（ LBP ） texture features． The results show that the proposed method can simulate the human’ s dynamic attention process to some extent when an object starts to move in a scene and track the object of interest robustly． Keywords： visual saliency； motion saliency； scale invariant feature transform flow； particle filter； local binary pattern； object tracking

视觉心理测试英文文章

视觉心理测试英文文章Visual Perception Test: Unraveling the Secrets of Your Mind.Visual perception is a fascinating and complex process that allows us to make sense of the world around us. Through our eyes, we receive countless amounts of information that our brains must swiftly and efficiently interpret, creating a coherent and meaningful representation of our surroundings. Visual perception tests are designed to assess various aspects of this remarkable ability, providing insights into how we see, process, and respond to visual stimuli.Types of Visual Perception Tests.Visual perception tests encompass a wide range of assessments, each designed to evaluate specific aspects of this cognitive function. Some of the most common types include:Acuity Tests: These tests measure the sharpness or clarity of vision, determining how well you can perceive fine details at different distances.Contrast Sensitivity Tests: These tests evaluate your ability to distinguish between objects of different brightness levels, assessing your sensitivity to contrast.Color Vision Tests: Color vision tests determine whether you can differentiate between different colors and detect color deficiencies.Depth Perception Tests: These tests measure yourability to perceive depth and three-dimensional space, assessing your binocular vision and stereopsis.Visual Field Tests: These tests map the extent of your peripheral vision, determining how much of your surroundings you can see without moving your eyes.Motion Perception Tests: These tests assess yourability to perceive movement and detect moving objects, evaluating your visual tracking and coordination skills.Visual Memory Tests: These tests measure your ability to remember and recognize visual information, assessing your visual working memory and long-term visual memory.Purpose and Applications.Visual perception tests serve a variety of purposes, including:Diagnosis of Visual Impairments: These tests help diagnose visual problems such as nearsightedness, farsightedness, astigmatism, and color blindness.Monitoring Eye Health: Regular visual perception tests can monitor changes in your vision over time, detecting potential eye diseases or conditions that may require treatment.Evaluating Cognitive Function: Visual perception testscan provide insights into cognitive abilities such as attention, memory, and processing speed, which are often impaired in neurodegenerative diseases like Alzheimer's.Research and Development: Visual perception tests are essential in research to understand how the visual system works and develop new treatments for visual impairments.Occupational Screening: Many industries requirecertain levels of visual acuity and other visual abilities for employment, and these tests help ensure that candidates meet the necessary standards.How to Prepare for a Visual Perception Test.Preparing for a visual perception test is generally straightforward. Here are some tips:Get a good night's sleep before the test.Avoid caffeine and alcohol before the test.Bring your eyeglasses or contact lenses if you normally wear them.Inform the examiner about any medications you are taking that may affect your vision.Understanding Your Results.After completing a visual perception test, your examiner will interpret your results and provide you with a report. Your results will indicate:Your visual acuity and other measures of visual function.Any visual impairments or abnormalities that may require further evaluation or treatment.Recommendations for follow-up care or lifestyle modifications to enhance your visual health.Conclusion.Visual perception tests are invaluable tools for assessing the health and function of our visual system. By providing insights into how we see and process visual information, these tests help diagnose visual impairments, monitor eye health, evaluate cognitive function, and guide research and development. By understanding the results of visual perception tests, we can take proactive steps to maintain optimal visual health and function throughout our lives.。

visual tracking 跟踪

Video content production and post-production (compositing, augmented reality, editing, re-purposing, stereo3D authoring, motion capture for animation, clickable hyper videos, etc.) Video content management (indexing, annotation, search, browsing) Valuable video
4 7/29/2013
With 3D (cinematic) shape prior
http://cvlab.epfl.ch/research/completed/realtime_tracking/
/~black/3Dtracking.html
5 7/29/2013
9
7/29/2013
With no appearance prior

Tracking bounding box and segmentation from user selection
/~cbibby/index.shtml
10
7/29/2013
Why?
Elementary or principal tool for multiple CV systems

Other sciences (neuroscience, ethology, biomechanics, sport, medicine, biology, fluid mechanics, meteorology, oceanography) Defense, surveillance, safety, monitoring, control, assistance Robotics, Human-Computer Interfaces Disposable video (camera as a sensor)

结合可变形卷积和注意力机制的目标跟踪方法

Science and Technology & Innovation｜科技与创新2024年第01期DOI：10.15913/ki.kjycx.2024.01.008结合可变形卷积和注意力机制的目标跟踪方法＊游丽萍，贝绍轶（江苏理工学院，江苏常州213000）摘要：为解决多数孪生网络目标跟踪算法特征提取能力弱、目标形变和遮挡场景适应性差等问题，提出一种结合可变形卷积和注意力机制的孪生网络目标跟踪算法。

首先，在特征提取网络中采用可变形卷积，使其能够自适应学习目标偏移量，提升模型适用性；然后，在骨干网络中引进SimAM注意力机制，在提升特征提取能力的同时减少计算量；最后，将公开数据集OTB2015和VOT2018与其他算法进行性能对比实验。

实验结果表明，所提方法的精确度和成功率比基准算法SiamFC在形变和遮挡等场景下有较好的鲁棒性。

关键词：孪生网络；目标跟踪；可变形卷积；注意力机制中图分类号：TP391.41 文献标志码：A 文章编号：2095-6835（2024）01-0031-04目标跟踪是计算机视觉的重要研究方向，被广泛应用于智能驾驶和现代军事等领域，但跟踪过程中存在的目标形变和遮挡等问题，使目标跟踪任务仍具有挑战性。

近年来，深度卷积因提取特征能力强被引入到目标跟踪领域。

DANELLJAN等（2016）[1]提出的C-COT 算法及NAM＆HAN（2016）[2]提出的MDNet算法都采用了深度卷积提取特征，虽然跟踪器效果较好，但跟踪速度慢，无法实时跟踪；BERTINETTO等（2016）[3]提出的SiamFC算法因实现了跟踪精度和速度的良好平衡而受到广泛关注。

SiamFC算法通过2个权值共享的AlexNet[4]网络提取特征，但AlexNet层数较浅，提取特征能力弱，跟踪效果不佳。

WANG等（2018）[5]提出的RASNet使用通道、残差和全局注意力模块对特征的空间和通道进行加权，提高了跟踪器的性能。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

MITSUBISHI ELECTRIC RESEARCH LABORATORIES
Visual Tracking and Recognition using Appearance-Adaptive Models in Particle Filters
Shaohua Zhou Rama Chellappa Baback Moghaddam TR2004-028 June 2004 Abstract We propose an approach that incorporates appearance-based models in a particle ﬁlter to realize robust visual tracking and recognition algorithms. In conventional tracking algorithms, the appearance model is either ﬁxed or rapidly changing, and the motion model is simply a random walk with ﬁxed noise variance. Also, the number of particles is typically ﬁxed. All these factors make the visual tracker unstable. To stabilize the tracker, we propose the following features: an observation model arising from an adaptive appearance model, an adaptive velocity motion model with adaptive noise variance, and an adaptive number of particles. The adaptive-velocity model is derived using a ﬁrst-order linear predictor based on the appearance difference between the incoming observation and the previous particle conﬁguration. Occlusion analysis is implemented using robust statistics. Experimental results on tracking visual objects in long outdoor and indoor video sequences demonstrate the effectiveness and robustness of our tracking algorithm. We then perform simultaneous tracking and recognition by embedding them in one particle ﬁlter. For recognition purposes, we model the appearance changes between frames and gallery images by constructing the intra- and extra-personal spaces. Accurate recognition is achieved when confronted by pose and view variations.
IEEE TRANSACTION ON IMAGE PROCESSING., VOL. X, NO. Y, MONTH 2004
2
Index Terms Visual tracking, visual recognition, particle ﬁltering, appearance-adaptive model, occlusion.
Publication History:– 1. First printing, TR2004-028, June 2004
IEEE TRANSACTION ON IMAGE PROCESSING., VOL. X, NO. Y, MONTH 2004
1
Visual tracking and recognition using appearance-adaptive models in particle ﬁlters
Published in: IEEE Transactions on Image Processing, 2004
This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonproﬁt educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 2004 201 Broadway, Cambridge, Massachusetts 02139
Shaohua Kevin Zhou1 , Rama Chellappa1 , and Baback Moghaddam2
1
Center for Automation Research (CfAR) and
Department of Electrical and Computer Engineering University of Maryland, College Park, MD 20740 Email: {shaohua, rama}@ 201 Broadway, Cambridge, MA 02139 Email: {baback}@
2
Mitsubishi Electric Research Laboratories (MERL)
Abstract We present an approach that incorporates appearance-adaptive models in a particle ﬁlter to realize robust visual tracking and recognition algorithms. Tracking needs modeling inter-frame motion and appearance changes whereas recognition needs modeling appearance changes between frames and gallery images. In conventional tracking algorithms, the appearance model is either ﬁxed or rapidly changing, and the motion model is simply a random walk with ﬁxed noise variance. Also, the number of particles is typically ﬁxed. All these factors make the visual tracker unstable. To stabilize the tracker, we propose the following modiﬁcations: an observation model arising from an adaptive appearance model, an adaptive velocity motion model with adaptive noise variance, and an adaptive number of particles. The adaptivevelocity model is derived using a ﬁrst-order linear predictor based on the appearance difference between the incoming observation and the previous particle conﬁguration. Occlusion analysis is implemented using robust statistics. Experimental results on tracking visual objects in long outdoor and indoor video sequences demonstrate the effectiveness and robustness of our tracking algorithm. We then perform simultaneous tracking and recognition by embedding them in a part we model the appearance changes between frames and gallery images by constructing the intra- and extra-personal spaces. Accurate recognition is achieved when confronted by pose and view variations.