Contour-based Classication of Video Objects

合集下载

2006ACM Visual attention detection in video sequences using spatiotemporal cues

Visual Attention Detection in Video Sequences Using Spatiotemporal Cues
Yun Zhai
University of Central Florida Orlando, Florida 32816
Mubarak Shah
University of Central Florida Orlando, Florida 32816
shah@ Keywords
Video Attention Detection, Spatiotemporal Saliency Map.
1.
INTRODUCTION
Categories and Subject Descriptors
I.2.10 [Artiﬁcial Intelligence]: Vision and Scene Understanding, Perceptual Reasoning.; I.4.10 [Image Processing and Computer Vision]: Image Representation.
Visual attention detection in still images has been long studied, while there is not much work on the spatiotemporal attention analysis. Psychology studies suggest that human vision system perceives external features separately (Treisman and Gelade [25]) and is sensitive to the diﬀerence between the target region and its neighborhood (Duncan and Humphreys [6]). Following this suggestion, many works have focused on the detection of feature contrasts to trigger human vision nerves. This is usually referred as the “stimuli-driven” mechanism. Itti et al. [10] proposed one of the earliest works in visual attention detection by utilizing

Improved definition video frame enhancement

2.1. Problem Statement
The objective is to estimate a high-resolution frame, by reconstructing the high-frequency components of the image lost through undersampling the data. Assume that each frame in a low-resolution image sequence contains N1 N2 square pixels. A lexicographical ordering of the lth frame
2. VIDEO OBSERVATION MODEL
The video frame enhancement problem is stated in this section, and an observation model is proposed for a video sequence which includes motion compensated subsampling.
The human visual system seems to be capable of temporally integrating information in a video sequence in such a way that the perceived spatial resolution of a sequence appears much higher than the spatial resolution of an individual frame. This paper addresses how to utilize both the spatial and temporal information present in an image sequence to create a high-resolution video still. A novel observation model based on motion compensated subsampling is proposed for a video sequence. Since the reconstruction problem is ill-posed, Bayesian restoration with an edge-preserving prior image model is used to extract a highresolution video frame from a low-resolution sequence. Estimates computed from an image sequence containing a camera pan show dramatic improvement over bilinear, cubic Bspline, and Bayesian single frame interpolations. Improved de nition is also shown for a video sequence containing objects moving with independent trajectories.

《中国图书馆分类法》

信息科学科技创新导报 Science and Technology Innovation Herald141光视频，10~11为红外视频。

视频1和视频2是仿真的静态拍摄视频，生成方法是对静态图像加入随机抖动量生成多帧图像，并将多帧图像合成仿真视频，视频的抖动量的大小和方向已知，可以用于评价稳像精度和统计抖动判断误差。

通过计算估计抖动量和真实抖动量的均方根误差RMSE （Root Mean Square Error ），评价稳像精度。

RMSE定义如下，RMSE =（11）其中(),i i x y 和(),i iX Y 分别为视频第i帧与第i+1帧间估计补偿量和真实抖动量，N 为视频总帧数。

视频3和视频4是仿真的动态拍摄视频，生成方法是对平滑视频加入具有一定方向性的随机抖动量合成仿真视频，视频的抖动方向为已知，可用于统计抖动判断误差。

仿真视频稳像处理结果如表1所示，抖动判断准确率平均值为91%，抖动量估计的均方根误差的平均值为0.024（像素），表明稳像精度达到亚像素级。

对于真实巡视视频稳像处理中，为表示稳像算法对标清视频的处理时间效率，将视频分辨率统一转换为720×576，对各视频处理的时间效率汇总表如表2所示，平均每帧处理时间为39ms，处理帧率为25fps，与原视频帧率相同，因此本电子稳像算法可以满足标清可见光视频和红外视频的实时处理。

图4是可见光视频8的第804、2328帧原始视频（左）和稳像后的视频帧（右）对比情况，图5是红外视频10原始视频帧和稳像后的视频帧的对比情况，自左向右、自上而下分别为第950、2376帧。

3 结语直升机/无人机巡检光电吊舱电子稳像算法包括运动初判模块、运动估计模块和运动补偿模块。

通过对仿真视频和真实架空输电线路巡视视频的实验表明，视频抖动判断准确率为91%；抖动量估计的均方根误差的平均值为0.024像素，稳像精度达到亚像素级；对于标清巡视视频的去抖增稳时平均单帧处理时间为39ms （即处理帧率为25fps ），达到了实时性的要求。

survey--on sentiment detection of reviews

A survey on sentiment detection of reviewsHuifeng Tang,Songbo Tan *,Xueqi ChengInformation Security Center,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080,PR Chinaa r t i c l e i n f o Keywords:Sentiment detection Opinion extractionSentiment classiﬁcationa b s t r a c tThe sentiment detection of texts has been witnessed a booming interest in recent years,due to the increased availability of online reviews in digital form and the ensuing need to organize them.Till to now,there are mainly four different problems predominating in this research community,namely,sub-jectivity classiﬁcation,word sentiment classiﬁcation,document sentiment classiﬁcation and opinion extraction.In fact,there are inherent relations between them.Subjectivity classiﬁcation can prevent the sentiment classiﬁer from considering irrelevant or even potentially misleading text.Document sen-timent classiﬁcation and opinion extraction have often involved word sentiment classiﬁcation tech-niques.This survey discusses related issues and main approaches to these problems.Ó2009Published by Elsevier Ltd.1.IntroductionToday,very large amount of reviews are available on the web,as well as the weblogs are fast-growing in blogsphere.Product re-views exist in a variety of forms on the web:sites dedicated to a speciﬁc type of product (such as digital camera ),sites for newspa-pers and magazines that may feature reviews (like Rolling Stone or Consumer Reports ),sites that couple reviews with commerce (like Amazon ),and sites that specialize in collecting professional or user reviews in a variety of areas (like ).Less formal reviews are available on discussion boards and mailing list archives,as well as in Usenet via Google ers also com-ment on products in their personal web sites and blogs,which are then aggregated by sites such as , ,and .The information mentioned above is a rich and useful source for marketing intelligence,social psychologists,and others interested in extracting and mining opinions,views,moods,and attitudes.For example,whether a product review is positive or negative;what are the moods among Bloggers at that time;how the public reﬂect towards this political affair,etc.To achieve this goal,a core and essential job is to detect subjec-tive information contained in texts,include viewpoint,fancy,atti-tude,sensibility etc.This is so-called sentiment detection .A challenging aspect of this task seems to distinguish it from traditional topic-based detection (classiﬁcation)is that while top-ics are often identiﬁable by keywords alone,sentiment can be ex-pressed in a much subtle manner.For example,the sentence ‘‘What a bad picture quality that digital camera has!...Oh,thisnew type camera has a good picture,long battery life and beautiful appearance!”compares a negative experience of one product with a positive experience of another product.It is difﬁcult to separate out the core assessment that should actually be correlated with the document.Thus,sentiment seems to require more understand-ing than the usual topic-based classiﬁcation.Sentiment detection dates back to the late 1990s (Argamon,Koppel,&Avneri,1998;Kessler,Nunberg,&SchÄutze,1997;Sper-tus,1997),but only in the early 2000s did it become a major sub-ﬁeld of the information management discipline (Chaovalit &Zhou,2005;Dimitrova,Finn,Kushmerick,&Smyth,2002;Durbin,Neal Richter,&Warner,2003;Efron,2004;Gamon,2004;Glance,Hurst,&Tomokiyo,2004;Grefenstette,Qu,Shanahan,&Evans,2004;Hil-lard,Ostendorf,&Shriberg,2003;Inkpen,Feiguina,&Hirst,2004;Kobayashi,Inui,&Inui,2001;Liu,Lieberman,&Selker,2003;Rau-bern &Muller-Kogler,2001;Riloff and Wiebe,2003;Subasic &Huettner,2001;Tong,2001;Vegnaduzzo,2004;Wiebe &Riloff,2005;Wilson,Wiebe,&Hoffmann,2005).Until the early 2000s,the two main popular approaches to sentiment detection,espe-cially in the real-world applications,were based on machine learn-ing techniques and based on semantic analysis techniques.After that,the shallow nature language processing techniques were widely used in this area,especially in the document sentiment detection.Current-day sentiment detection is thus a discipline at the crossroads of NLP and IR,and as such it shares a number of characteristics with other tasks such as information extraction and text-mining.Although several international conferences have devoted spe-cial issues to this topic,such as ACL,AAAI,WWW,EMNLP,CIKM etc.,there are no systematic treatments of the subject:there are neither textbooks nor journals entirely devoted to sentiment detection yet.0957-4174/$-see front matter Ó2009Published by Elsevier Ltd.doi:10.1016/j.eswa.2009.02.063*Corresponding author.E-mail addresses:tanghuifeng@ (H.Tang),tansongbo@ (S.Tan),cxq@ (X.Cheng).Expert Systems with Applications 36(2009)10760–10773Contents lists available at ScienceDirectExpert Systems with Applicationsjournal homepage:/locate/eswaThis paperﬁrst introduces the deﬁnitions of several problems that pertain to sentiment detection.Then we present some appli-cations of sentiment detection.Section4discusses the subjectivity classiﬁcation problem.Section5introduces semantic orientation method.The sixth section examines the effectiveness of applying machine learning techniques to document sentiment classiﬁcation. The seventh section discusses opinion extraction problem.The eighth part talks about evaluation of sentiment st sec-tion concludes with challenges and discussion of future work.2.Sentiment detection2.1.Subjectivity classiﬁcationSubjectivity in natural language refers to aspects of language used to express opinions and evaluations(Wiebe,1994).Subjectiv-ity classiﬁcation is stated as follows:Let S={s1,...,s n}be a set of sentences in document D.The problem of subjectivity classiﬁcation is to distinguish sentences used to present opinions and other forms of subjectivity(subjective sentences set S s)from sentences used to objectively present factual information(objective sen-tences set S o),where S s[S o=S.This task is especially relevant for news reporting and Internet forums,in which opinions of various agents are expressed.2.2.Sentiment classiﬁcationSentiment classiﬁcation includes two kinds of classiﬁcation forms,i.e.,binary sentiment classiﬁcation and multi-class senti-ment classiﬁcation.Given a document set D={d1,...,d n},and a pre-deﬁned categories set C={positive,negative},binary senti-ment classiﬁcation is to classify each d i in D,with a label expressed in C.If we set C*={strong positive,positive,neutral,negative,strong negative}and classify each d i in D with a label in C*,the problem changes to multi-class sentiment classiﬁcation.Most prior work on learning to identify sentiment has focused on the binary distinction of positive vs.negative.But it is often helpful to have more information than this binary distinction pro-vides,especially if one is ranking items by recommendation or comparing several reviewers’opinions.Koppel and Schler(2005a, 2005b)show that it is crucial to use neutral examples in learning polarity for a variety of reasons.Learning from negative and posi-tive examples alone will not permit accurate classiﬁcation of neu-tral examples.Moreover,the use of neutral training examples in learning facilitates better distinction between positive and nega-tive examples.3.Applications of sentiment detectionIn this section,we will expound some rising applications of sen-timent detection.3.1.Products comparisonIt is a common practice for online merchants to ask their cus-tomers to review the products that they have purchased.With more and more people using the Web to express opinions,the number of reviews that a product receives grows rapidly.Most of the researches about these reviews were focused on automatically classifying the products into‘‘recommended”or‘‘not recom-mended”(Pang,Lee,&Vaithyanathan,2002;Ranjan Das&Chen, 2001;Terveen,Hill,Amento,McDonald,&Creter,1997).But every product has several features,in which maybe only part of them people are interested.Moreover,a product has shortcomings in one aspect,probably has merits in another place(Morinaga,Yamanishi,Tateishi,&Fukushima,2002;Taboada,Gillies,&McFe-tridge,2006).To analysis the online reviews and bring forward a visual man-ner to compare consumers’opinions of different products,i.e., merely with a single glance the user can clearly see the advantages and weaknesses of each product in the minds of consumers.For a potential customer,he/she can see a visual side-by-side and fea-ture-by-feature comparison of consumer opinions on these prod-ucts,which helps him/her to decide which product to buy.For a product manufacturer,the comparison enables it to easily gather marketing intelligence and product benchmarking information.Liu,Hu,and Cheng(2005)proposed a novel framework for ana-lyzing and comparing consumer opinions of competing products.A prototype system called Opinion Observer is implemented.To en-able the visualization,two tasks were performed:(1)Identifying product features that customers have expressed their opinions on,based on language pattern mining techniques.Such features form the basis for the comparison.(2)For each feature,identifying whether the opinion from each reviewer is positive or negative,if any.Different users can visualize and compare opinions of different products using a user interface.The user simply chooses the prod-ucts that he/she wishes to compare and the system then retrieves the analyzed results of these products and displays them in the interface.3.2.Opinion summarizationThe number of online reviews that a product receives grows rapidly,especially for some popular products.Furthermore,many reviews are long and have only a few sentences containing opin-ions on the product.This makes it hard for a potential customer to read them to make an informed decision on whether to purchase the product.The large number of reviews also makes it hard for product manufacturers to keep track of customer opinions of their products because many merchant sites may sell their products,and the manufacturer may produce many kinds of products.Opinion summarization(Ku,Lee,Wu,&Chen,2005;Philip et al., 2004)summarizes opinions of articles by telling sentiment polari-ties,degree and the correlated events.With opinion summariza-tion,a customer can easily see how the existing customers feel about a product,and the product manufacturer can get the reason why different stands people like it or what they complain about.Hu and Liu(2004a,2004b)conduct a work like that:Given a set of customer reviews of a particular product,the task involves three subtasks:(1)identifying features of the product that customers have expressed their opinions on(called product features);(2) for each feature,identifying review sentences that give positive or negative opinions;and(3)producing a summary using the dis-covered information.Ku,Liang,and Chen(2006)investigated both news and web blog articles.In their research,TREC,NTCIR and articles collected from web blogs serve as the information sources for opinion extraction.Documents related to the issue of animal cloning are selected as the experimental materials.Algorithms for opinion extraction at word,sentence and document level are proposed. The issue of relevant sentence selection is discussed,and then top-ical and opinionated information are summarized.Opinion sum-marizations are visualized by representative sentences.Finally, an opinionated curve showing supportive and non-supportive de-gree along the timeline is illustrated by an opinion tracking system.3.3.Opinion reason miningIn opinion analysis area,ﬁnding the polarity of opinions or aggregating and quantifying degree assessment of opinionsH.Tang et al./Expert Systems with Applications36(2009)10760–1077310761scattered throughout web pages is not enough.We can do more critical part of in-depth opinion assessment,such asﬁnding rea-sons in opinion-bearing texts.For example,inﬁlm reviews,infor-mation such as‘‘found200positive reviews and150negative reviews”may not fully satisfy the information needs of different people.More useful information would be‘‘Thisﬁlm is great for its novel originality”or‘‘Poor acting,which makes theﬁlm awful”.Opinion reason mining tries to identify one of the critical ele-ments of online reviews to answer the question,‘‘What are the rea-sons that the author of this review likes or dislikes the product?”To answer this question,we should extract not only sentences that contain opinion-bearing expressions,but also sentences with rea-sons why an author of a review writes the review(Cardie,Wiebe, Wilson,&Litman,2003;Clarke&Terra,2003;Li&Yamanishi, 2001;Stoyanov,Cardie,Litman,&Wiebe,2004).Kim and Hovy(2005)proposed a method for detecting opinion-bearing expressions.In their subsequent work(Kim&Hovy,2006), they collected a large set of h review text,pros,cons i triplets from ,which explicitly state pros and cons phrases in their respective categories by each review’s author along with the re-view text.Their automatic labeling systemﬁrst collects phrases in pro and conﬁelds and then searches the main review text in or-der to collect sentences corresponding to those phrases.Then the system annotates this sentence with the appropriate‘‘pro”or‘‘con”label.All remaining sentences with neither label are marked as ‘‘neither”.After labeling all the data,they use it to train their pro and con sentence recognition system.3.4.Other applicationsThomas,Pang,and Lee(2006)try to determine from the tran-scripts of US Congressionalﬂoor debates whether the speeches rep-resent support of or opposition to proposed legislation.Mullen and Malouf(2006)describe a statistical sentiment analysis method on political discussion group postings to judge whether there is oppos-ing political viewpoint to the original post.Moreover,there are some potential applications of sentiment detection,such as online message sentimentﬁltering,E-mail sentiment classiﬁcation,web-blog author’s attitude analysis,sentiment web search engine,etc.4.Subjectivity classiﬁcationSubjectivity classiﬁcation is a task to investigate whether a par-agraph presents the opinion of its author or reports facts.In fact, most of the research showed there was very tight relation between subjectivity classiﬁcation and document sentiment classiﬁcation (Pang&Lee,2004;Wiebe,2000;Wiebe,Bruce,&O’Hara,1999; Wiebe,Wilson,Bruce,Bell,&Martin,2002;Yu&Hatzivassiloglou, 2003).Subjectivity classiﬁcation can prevent the polarity classiﬁer from considering irrelevant or even potentially misleading text. Pang and Lee(2004)ﬁnd subjectivity detection can compress re-views into much shorter extracts that still retain polarity informa-tion at a level comparable to that of the full review.Much of the research in automated opinion detection has been performed and proposed for discriminating between subjective and objective text at the document and sentence levels(Bruce& Wiebe,1999;Finn,Kushmerick,&Smyth,2002;Hatzivassiloglou &Wiebe,2000;Wiebe,2000;Wiebe et al.,1999;Wiebe et al., 2002;Yu&Hatzivassiloglou,2003).In this section,we will discuss some approaches used to automatically assign one document as objective or subjective.4.1.Similarity approachSimilarity approach to classifying sentences as opinions or facts explores the hypothesis that,within a given topic,opinion sen-tences will be more similar to other opinion sentences than to fac-tual sentences(Yu&Hatzivassiloglou,2003).Similarity approach measures sentence similarity based on shared words,phrases, and WordNet synsets(Dagan,Shaul,&Markovitch,1993;Dagan, Pereira,&Lee,1994;Leacock&Chodorow,1998;Miller&Charles, 1991;Resnik,1995;Zhang,Xu,&Callan,2002).To measure the overall similarity of a sentence to the opinion or fact documents,we need to go through three steps.First,use IR method to acquire the documents that are on the same topic as the sentence in question.Second,calculate its similarity scores with each sentence in those documents and make an average va-lue.Third,assign the sentence to the category(opinion or fact) for which the average value is the highest.Alternatively,for the frequency variant,we can use the similarity scores or count how many of them for each category,and then compare it with a prede-termined threshold.4.2.Naive Bayes classiﬁerNaive Bayes classiﬁer is a commonly used supervised machine learning algorithm.This approach presupposes all sentences in opinion or factual articles as opinion or fact sentences.Naive Bayes uses the sentences in opinion and fact documents as the examples of the two categories.The features include words, bigrams,and trigrams,as well as the part of speech in each sen-tence.In addition,the presence of semantically oriented(positive and negative)words in a sentence is an indicator that the sentence is subjective.Therefore,it can include the counts of positive and negative words in the sentence,as well as counts of the polarities of sequences of semantically oriented words(e.g.,‘‘++”for two con-secutive positively oriented words).It also include the counts of parts of speech combined with polarity information(e.g.,‘‘JJ+”for positive adjectives),as well as features encoding the polarity(if any)of the head verb,the main subject,and their immediate modiﬁers.Generally speaking,Naive Bayes assigns a document d j(repre-sented by a vector dÃj)to the class c i that maximizes Pðc i j dÃjÞby applying Bayes’rule as follow,Pðc i j dÃjÞ¼Pðc iÞPðdÃjj c iÞPðdÃjÞð1Þwhere PðdÃjÞis the probability that a randomly picked document dhas vector dÃjas its representation,and P(c)is the probability that a randomly picked document belongs to class c.To estimate the term PðdÃjj cÞ,Naive Bayes decomposes it byassuming all the features in dÃj(represented by f i,i=1to m)are con-ditionally independent,i.e.,Pðc i j dÃjÞ¼Pðc iÞQ mi¼1Pðf i j c iÞÀÁPðdÃjÞð2Þ4.3.Multiple Naive Bayes classiﬁerThe hypothesis of all sentences in opinion or factual articles as opinion or fact sentences is an approximation.To address this, multiple Naive Bayes classiﬁer approach applies an algorithm using multiple classiﬁers,each relying on a different subset of fea-tures.The goal is to reduce the training set to the sentences that are most likely to be correctly labeled,thus boosting classiﬁcation accuracy.Given separate sets of features F1,F2,...,F m,it train separate Na-ive Bayes classiﬁers C1,C2,...,C m corresponding to each feature set. Assuming as ground truth the information provided by the docu-ment labels and that all sentences inherit the status of their docu-ment as opinions or facts,itﬁrst train C1on the entire training set,10762H.Tang et al./Expert Systems with Applications36(2009)10760–10773then use the resulting classiﬁer to predict labels for the training set.The sentences that receive a label different from the assumed truth are then removed,and train C2on the remaining sentences. This process is repeated iteratively until no more sentences can be removed.Yu and Hatzivassiloglou(2003)report results using ﬁve feature sets,starting from words alone and adding in bigrams, trigrams,part-of-speech,and polarity.4.4.Cut-based classiﬁerCut-based classiﬁer approach put forward a hypothesis that, text spans(items)occurring near each other(within discourse boundaries)may share the same subjectivity status(Pang&Lee, 2004).Based on this hypothesis,Pang supplied his algorithm with pair-wise interaction information,e.g.,to specify that two particu-lar sentences should ideally receive the same subjectivity label. This algorithm uses an efﬁcient and intuitive graph-based formula-tion relying onﬁnding minimum cuts.Suppose there are n items x1,x2,...,x n to divide into two classes C1and C2,here access to two types of information:ind j(x i):Individual scores.It is the non-negative estimates of each x i’s preference for being in C j based on just the features of x i alone;assoc(x i,x k):Association scores.It is the non-negative estimates of how important it is that x i and x k be in the same class.Then,this problem changes to calculate the maximization of each item’s score for one class:its individual score for the class it is assigned to,minus its individual score for the other class,then minus associated items into different classes for penalization. Thus,after some algebra,it arrives at the following optimization problem:assign the x i to C1and C2so as to minimize the partition cost:X x2C1ind2ðxÞþXx2C2ind1ðxÞþXx i2C1;x k2C2assocðx i;x kÞð3ÞThis situation can be represented in the following manner.Build an undirected graph G with vertices{v1,...,v n,s,t};the last two are, respectively,the source and sink.Add n edges(s,v i),each with weight ind1(x i),and n edges(v i,t),each with weight ind2(x i).Finally, addðC2nÞedges(v i,v k),each with weight assoc(x i,x k).A cut(S,T)of G is a partition of its nodes into sets S={s}US0and T={t}UT0,where s R S0,t R T0.Its cost cost(S,T)is the sum of the weights of all edges crossing from S to T.A minimum cut of G is one of minimum cost. Then,ﬁnding solution of this problem is changed into looking for a minimum cut of G.5.Word sentiment classiﬁcationThe task on document sentiment classiﬁcation has usually in-volved the manual or semi-manual construction of semantic orien-tation word lexicons(Hatzivassiloglou&McKeown,1997; Hatzivassiloglou&Wiebe,2000;Lin,1998;Pereira,Tishby,&Lee, 1993;Riloff,Wiebe,&Wilson,2003;Turney&Littman,2002; Wiebe,2000),which built by word sentiment classiﬁcation tech-niques.For instance,Das and Chen(2001)used a classiﬁer on investor bulletin boards to see if apparently positive postings were correlated with stock price,in which several scoring methods were employed in conjunction with a manually crafted lexicon.Classify-ing the semantic orientation of individual words or phrases,such as whether it is positive or negative or has different intensities, generally using a pre-selected set of seed words,sometimes using linguistic heuristics(For example,Lin(1998)&Pereira et al.(1993) used linguistic co-locations to group words with similar uses or meanings).Some studies showed that restricting features to those adjec-tives for word sentiment classiﬁcation would improve perfor-mance(Andreevskaia&Bergler,2006;Turney&Littman,2002; Wiebe,2000).However,more researches showed most of the adjectives and adverb,a small group of nouns and verbs possess semantic orientation(Andreevskaia&Bergler,2006;Esuli&Sebas-tiani,2005;Gamon&Aue,2005;Takamura,Inui,&Okumura, 2005;Turney&Littman,2003).Automatic methods of sentiment annotation at the word level can be grouped into two major categories:(1)corpus-based ap-proaches and(2)dictionary-based approaches.Theﬁrst group in-cludes methods that rely on syntactic or co-occurrence patterns of words in large texts to determine their sentiment(e.g.,Hatzi-vassiloglou&McKeown,1997;Turney&Littman,2002;Yu&Hat-zivassiloglou,2003and others).The second group uses WordNet (/)information,especially,synsets and hierarchies,to acquire sentiment-marked words(Hu&Liu, 2004a;Kim&Hovy,2004)or to measure the similarity between candidate words and sentiment-bearing words such as good and bad(Kamps,Marx,Mokken,&de Rijke,2004).5.1.Analysis by conjunctions between adjectivesThis method attempts to predict the orientation of subjective adjectives by analyzing pairs of adjectives(conjoined by and,or, but,either-or,or neither-nor)which are extracted from a large unlabelled document set.The underlying intuition is that the act of conjoining adjectives is subject to linguistic constraints on the orientation of the adjectives involved(e.g.and usually conjoins two adjectives of the same-orientation,while but conjoins two adjectives of opposite orientation).This is shown in the following three sentences(where theﬁrst two are perceived as correct and the third is perceived as incorrect)taken from Hatzivassiloglou and McKeown(1997):‘‘The tax proposal was simple and well received by the public”.‘‘The tax proposal was simplistic but well received by the public”.‘‘The tax proposal was simplistic and well received by the public”.To infer the orientation of adjectives from analysis of conjunc-tions,a supervised learning algorithm can be performed as follow-ing steps:1.All conjunctions of adjectives are extracted from a set ofdocuments.2.Train a log-linear regression classiﬁer and then classify pairs ofadjectives either as having the same or as having different ori-entation.The hypothesized same-orientation or different-orien-tation links between all pairs form a graph.3.A clustering algorithm partitions the graph produced in step2into two clusters.By using the intuition that positive adjectives tend to be used more frequently than negative ones,the cluster containing the terms of higher average frequency in the docu-ment set is deemed to contain the positive terms.The log-linear model offers an estimate of how good each pre-diction is,since it produces a value y between0and1,in which 1corresponds to same-orientation,and one minus the produced value y corresponds to dissimilarity.Same-and different-orienta-tion links between adjectives form a graph.To partition the graph nodes into subsets of the same-orientation,the clustering algo-rithm calculates an objective function U scoring each possible par-tition P of the adjectives into two subgroups C1and C2as,UðPÞ¼X2i¼11j C i jXx;y2C i;x–ydðx;yÞ!ð4Þwhere j C i j is the cardinality of cluster i,and d(x,y)is the dissimilarity between adjectives x and y.H.Tang et al./Expert Systems with Applications36(2009)10760–1077310763In general,because the model was unsupervised,it required an immense word corpus to function.5.2.Analysis by lexical relationsThis method presents a strategy for inferring semantic orienta-tion from semantic association between words and phrases.It fol-lows a hypothesis that two words tend to be the same semantic orientation if they have strong semantic association.Therefore,it focused on the use of lexical relations deﬁned in WordNet to calcu-late the distance between adjectives.Generally speaking,we can deﬁned a graph on the adjectives contained in the intersection between a term set(For example, TL term set(Turney&Littman,2003))and WordNet,adding a link between two adjectives whenever WordNet indicates the presence of a synonymy relation between them,and deﬁning a distance measure using elementary notions from graph theory.In more de-tail,this approach can be realized as following steps:1.Construct relations at the level of words.The simplest approachhere is just to collect all words in WordNet,and relate words that can be synonymous(i.e.,they occurring in the same synset).2.Deﬁne a distance measure d(t1,t2)between terms t1and t2onthis graph,which amounts to the length of the shortest path that connects t1and t2(with d(t1,t2)=+1if t1and t2are not connected).3.Calculate the orientation of a term by its relative distance(Kamps et al.,2004)from the two seed terms good and bad,i.e.,SOðtÞ¼dðt;badÞÀdðt;goodÞdðgood;badÞð5Þ4.Get the result followed by this rules:The adjective t is deemedto belong to positive if SO(t)>0,and the absolute value of SO(t) determines,as usual,the strength of this orientation(the con-stant denominator d(good,bad)is a normalization factor that constrains all values of SO to belong to the[À1,1]range).5.3.Analysis by glossesThe characteristic of this method lies in the fact that it exploits the glosses(i.e.textual deﬁnitions)that one term has in an online ‘‘glossary”,or dictionary.Its basic assumption is that if a word is semantically oriented in one direction,then the words in its gloss tend to be oriented in the same direction(Esuli&Sebastiani,2005; Esuli&Sebastiani,2006a,2006b).For instance,the glosses of good and excellent will both contain appreciative expressions;while the glosses of bad and awful will both contain derogative expressions.Generally,this method can determine the orientation of a term based on the classiﬁcation of its glosses.The process is composed of the following steps:1.A seed set(S p,S n),representative of the two categories positiveand negative,is provided as input.2.Search new terms to enrich S p and S e lexical relations(e.g.synonymy)with the terms contained in S p and S n from a thesau-rus,or online dictionary,toﬁnd these new terms,and then append them to S p or S n.3.For each term t i in S0p [S0nor in the test set(i.e.the set of termsto be classiﬁed),a textual representation of t i is generated by collating all the glosses of t i as found in a machine-readable dic-tionary.Each such representation is converted into a vector by standard text indexing techniques.4.A binary text classiﬁer is trained on the terms in S0p [S0nandthen applied to the terms in the test set.5.4.Analysis by both lexical relations and glossesThis method determines sentiment of words and phrases both relies on lexical relations(synonymy,antonymy and hyponymy) and glosses provided in WordNet.Andreevskaia and Bergler(2006)proposed an algorithm named ‘‘STEP”(Semantic Tag Extraction Program).This algorithm starts with a small set of seed words of known sentiment value(positive or negative)and implements the following steps:1.Extend the small set of seed words by adding synonyms,ant-onyms and hyponyms of the seed words supplied in WordNet.This step brings on average a5-fold increase in the size of the original list with the accuracy of the resulting list comparable to manual annotations.2.Go through all WordNet glosses,identiﬁes the entries that con-tain in their deﬁnitions the sentiment-bearing words from the extended seed list,and adds these head words to the corre-sponding category–positive,negative or neutral.3.Disambiguate the glosses with part-of-speech tagger,and elim-inate errors of some words acquired in step1and from the seed list.At this step,it alsoﬁlters out all those words that have been assigned contradicting.In this algorithm,for each word we need compute a Net Overlap Score by subtracting the total number of runs assigning this word a negative sentiment from the total of the runs that consider it posi-tive.In order to make the Net Overlap Score measure usable in sen-timent tagging of texts and phrases,the absolute values of this score should be normalized and mapped onto a standard[0,1] interval.STEP accomplishes this normalization by using the value of the Net Overlap Score as a parameter in the standard fuzzy mem-bership S-function(Zadeh,1987).This function maps the absolute values of the Net Overlap Score onto the interval from0to1,where 0corresponds to the absence of membership in the category of sentiment(in this case,these will be the neutral words)and1re-ﬂects the highest degree of membership in this category.The func-tion can be deﬁned as follows,Sðu;a;b;cÞ¼0if u6a2uÀac a2if a6u6b1À2uÀacÀa2if b6u6c1if u P c8>>>>>><>>>>>>:ð6Þwhere u is the Net Overlap Score for the word and a,b,c are the three adjustable parameters:a is set to1,c is set to15and b,which represents a crossover point,is deﬁned as b=(a+c)/2=8.Deﬁned this way,the S-function assigns highest degree of membership (=1)to words that have the Net Overlap Score u P15.Net Overlap Score can be used as a measure of the words degree of membership in the fuzzy category of sentiment:the core adjec-tives,which had the highest Net Overlap Score,were identiﬁed most accurately both by STEP and by human annotators,while the words on the periphery of the category had the lowest scores and were associated with low rates of inter-annotator agreement.5.5.Analysis by pointwise mutual informationThe general strategy of this method is to infer semantic orienta-tion from semantic association.The underlying assumption is that a phrase has a positive semantic orientation when it has good asso-ciations(e.g.,‘‘romantic ambience”)and a negative semantic orien-tation when it has bad associations(e.g.,‘‘horriﬁc events”)(Turney, 2002).10764H.Tang et al./Expert Systems with Applications36(2009)10760–10773。

GLOBAL MOTION COMPENSATION FOR VIDEO PICTURES

专利名称：GLOBAL MOTION COMPENSATION FOR VIDEO PICTURES发明人：SJOEBERG, RICKARD,EINARSSON,TORBJOERN,FROEJDH, PER申请号：SE0202206申请日：20021129公开号：WO03047268A3公开日：20031009专利内容由知识产权出版社提供摘要：A system and a method for coding and decoding video data are invented. In a system and method of video data compression a video frame (32) is divided into a sequence of image blocks (38), wherein one of several possible block-coding modes is an implicit global motion compensation (IGMC) mode, which is used to copy pixels from a previous frame (32) displaced by a predicted motion vector. In another embodiment of the invention, a system and method of a video data compression, a video frame (32) is segmented into a sequence of slices (36), wherein each slice (36) includes a number of macroblocks (38). Respective slices (36) are encoded and a signal is included in the header (44) of an encoded slice (40) to indicate whether the slice (40) is GMC enabled, that is, whether global motion compensation is to be used in reconstructing the encoded slice. If so, GMC information, such as information representing a set of motion vectors (42a-42d), is included with the slice. In a useful embodiment each slice (36) of a frame (32) contains the same GMC information, to enhance resiliency against errors. In another embodiment different slices (36) of a frame (32) contain different GMC information. In either embodiment, motion vectors (42a-42d) for each image of a particular encoded slice (40)can be reconstructed using GMC information contained only in the particular encoded slice.申请人：TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)更多信息请下载全文后查看。

视频特技外文翻译-其他专业

Video Special EffectsPeng HuangObject-space NPARMeier was the first one who produced painterly animations from object-space scenes[53].He triangulated surfaces in object-space and distributed strokes over each triangle in proportion to its area. Since his initial work, many object-space NPAR systems have been presented. Hall’s Q-maps[29](A Q-map is a 3D texture which adapts to the intensity of light to give the object in the image a 3D look, for example, more marks are made where an object is darker) may be applied to create coherent pen-and-ink shaded system capable of rendering object-space eometries in a sketchy style was outlined by Curtis[15], and operates by tracing the paths of particles traveling stochastically around contours of a depth image generated from a 3D object. See Figure for some addition, most modern graphical modeling packages(3D Studio MAX!,Maya,XSI Soft-Image) support plug-ins which offer the option of rendering object-space scenes to give a flat shaded, cartoon-like appearance. Image-space NPARMost NPAR systems in image-space are still based on static painterly rendering techniques,brushing strokes frame by frame and trying to avoid unappealing swimming which distractsthe audience from the content of the animation. Liwinowicz extends his static method and makes use of optical flow to estimate a motion vector field to translate the strokes painted on the first frame to successive frames[47]. A similar method is employed by Kovacs and Sziranyi[42]. A simpler solution is proposed by Hertzmann[33], who differences consecutive frames of video, re-painting only those areas which have changed above some global(userdefined) threshold. Hays and Essa’s approach[32] builds on and improves these techniques by using edges to guide painterly refinement. See Figure for some examples. In their current work, they are looking into studying region-based methods to extend beyond pixels tocell-based renderings, which implies the trend from low-level analysis to higher-level scene understanding.We also find various image-space tools which are highly interactive to assist users in the process of creating digital non-photorealistic animations. Fekete et al. describe a system[23] to assist in the creation of line art cartoons. Agarwala proposes an interactive system[2] that allows children and others untrained in “cel animation” to create 2D cartoons from images and video. Users have to hand-segment the first image, and active contours(snakes) are used to track the segmentation boundaries from frame to frame. It is labor intensive(usersneed to correct the contours every frame), unstable(due to susceptibility of snakes to local minima and tracking fails under occlusion) and limited to video material with distinct objects and well-defined edges. Another technique is called “advanced rotoscoping” by the Computer Graphics community, which requires artists to draw a shape in key-frames, and then interpolate the shape over the interval between key-frames – a process referred to as“in-betweening” by animators. The film “Waking Life”[26] used this technique. See Figure for some examples.NPAR techniques in image-space as well as commercial video effects software, such as Adobe Premier, which provide low-level effects( slow-motion, spatial warping, and motion blur etc.), fail to do a high-level video analysis and are unable to create more complicated visual effects(. motion emphasis). Lake et al. present techniques for emphasizing motion of cartoon objects by introducing geometry into the cartoon scene[43]. However, their work is limited to object-space, avoiding the complexhigh-level video analysis, and their “motion lines” are quite simple. In their current work, they are trying to integrate other traditional cartoon effects into their system. Collomosse and Hall first introduce high-level Computer Vision analysis to NPAR in “VideoPaintbox”[12]. They argue that comprehensive video analysis should be the first step in the artistic rendering(AR) process; salient information(such as object boundaries or trajectories) must be extracted prior to representation in an artistic style. By developing novel Computer Vision techniques for AR, they are able to emphasize motion using traditional animation cues[44] such as streak-lines, anticipation and deformation. Their unique contribution is to build a video based NPR system which can process over an extended period of time rather than on a per frame basis. This advance allows them to analyze trajectories, make decisions regarding occlusions and collisions and do motion emphasis. In this work we will also regard video as a whole rather than the sum of individual frames. However, their segmentation in “computer vision component” suffers labor intensity, since users have to manually identify polygons, which are “shrink wrapped” to the feature’s edge contour using snake relaxation[72] before tracking. And their tracking is based on the assumption that contour motion may be modeled by a linear conformal affine transform(LCAT) in the image plane. We try to use a more automatic segmentation and non-rigid region tracking to improve the capability of video analysis. See Figure for some examples. Another high-level video-based NPAR system is provided by Wang et al.[69]. They regard video as a 3D lattice(x,y,t) and then implement spatio-temporal segmentation of homogeneous regions using mean shift[14] or improved mean shift[70] to get volumes of contiguous pixels with similar colour. Users have to define salient regions by manually sketching on key-frames and the system thus automatically generates salient regions per frame. This naturally build the correspondence between successive frames, avoiding non-rigid region tracking. Their rendering is based on mean shift guided interpolation. The rendering style is limited to several approaches, such as changing segment colour and placing strokes and ignores motion analysis and motion emphasis. Our system segments key-frame using 2D mean shift, identifies salient regions and then tracks them over the whole sequence. We extract motion information from the results and then do motion emphasis. See Figure for some examples. Some other NPAR techniquesBregler et la present a technique called “cartoon capture and retargeting” in [7] which is used to track the motion from traditional animated cartoon and then retarget it onto different output media, including 2D cartoon animation, 3D CG models, andphoto-realistic output. They describe vision-based tracking techniques and new modeling techniques. Our research tries to borrow this idea to extract motion information, from general video rather than a cartoon, using different computer vision algorithms and then represent this in different output media. See Figure for some examples.NPR application on Sports AnalysisDue to a more and more competitive sports market, sports media companies try to attract audiences by increasingly providing more special or more specific graphics and effects. Sports statistics are often graphically presented on TV during sporting events such as the percentage of time in a soccer game that the ball has been in one half compared to the other half. These statistics are collected in many ways both manually and automatically. It is desirable to be able to generate many statistics directly from the video of the game. There are many products in the broadcast environment that provide this capability.The Telestrator[78] is a simple but efficient tool to allow users to draw lines within a 2D video image by using a mouse. The product is sold as a dedicated box with a touch screen, a video input and video outputs the video with the graphics produced. Typically, four very simple controls such as draw arrow, draw dotted line etc. are provided.视频特技黄鹏Meier 第一个在绘画上创造出三维感觉，他把物体按比例绘制到三维界面上实现在二维介质上实现立体感，有了他的理论工作，许多三维空间的NPAR开始被研发出来。

VIDEO CONTENT SEARCHES IN A MEDICAL CONTEXT

申请人：Intuitive S20 Kifer Road Sunnyvale, California 94086 US 国籍：US 代理机构：MacDougall, Alan John Shaw 更多信息请下载全文后查看
摘要：A method of searching video content for specific subject matter of interest augments traditional image data analysis methods with analysis of contemporaneously gathered non-image data. One application involves video recordings of medical procedures performed using a medical device. When a user desires to locate portions of video recordings showing a medical event of interest, one or more medical device system events likely to correspond to the occurrence of the medical event of interest are identified from one or more procedure event logs. Timestamps associated with these system events are used to identify candidate video clips from the procedure video recordings. Image data analysis is performed on the candidate video clips to determine whether each candidate video clip contains the medical event of interest.

电影专业术语中英对照

电影专业术语中英文对照AAbove-the-line 线上费用A-B roll A-B 卷AC 交流电Academy ratio 学院标准画框比Adaptation 改编ADR editor ADR 剪辑师Aligator clamp=gaffer grip 固定灯具的弹簧夹，又称鳄鱼夹Ambient Sounds 环境音Amp 安培Amplification 信号放大Amplitude 振幅Analog 模拟Anamorphic lens 变形镜头Aperture 光圈Answer Print 校正拷贝Arc 摄像机的弧度运动Art director 艺术指导Aspect ratio 画框比Atmosphere sound 气氛音Attack 起音Audio board 调音台Audio mixer 混音器，混音师Automatic dialogue replacement 自动对白补录Automatic focus 自动对焦Automatic gain control 自动增益控制Automatic iris 自动光圈（少用）Axis of action 表演轴线BBack light 轮廓光Background light=Scenery light 场景光Balance 平衡Balanced 平衡电缆Barndoor 遮扉（灯具上的黑色金属活动板，遮光用的）Barney 隔音套Base 片基（用来附着感光乳剂的胶片基底）Base plate 底座（固定灯的）Baselight level 基本亮度Batch capturing 批次采集Below-the-line 线下费用Bidirectional 双向麦克风Bit depth 位元深度（在数字声音中每次取样的数目，通常是8，12,16) Bit player 客串演员Blimp 隔声罩Blocking 走位（几乎等同为“ Staging 排练”）Blue-screen 蓝幕效果Body brace 身体支架Boom 吊杆（也有上，下移动神相机的意思）Boom arm 可伸缩杆Boom operator 吊杆操作员Bounce light 反射光Bracket 括号式曝光Breakdown 剧本分解Brightness 亮度Broad 方灯（泛光灯）Butterfly scrim 顶光棚CCall sheet 通告表Camcorder 摄录一体机Cameo 浮雕Camera assistant 摄影助理Camera operator 摄影机操作者Camera report 摄影机报表Canted shot 倾斜镜头Capacitor 电容器Carpuring 采集Casting 选角Casting director 选角指导Chroma key 色度键Chrominance 彩色信号Cinematographer 电影摄影师Clip capture 素材段落采集Close-up 特写镜头Color balance 色平衡Colorbars 彩条Color circle 圆形彩图Color corrector 色彩校正器Color compensating filter 色温转换滤镜Color correction 色彩校正Color sampling 色彩取样Color temperature 色温Colorization 调色Coproducer 联合制片Composer 作曲家Composting 合成画面Compression 压缩Compression ratio 压缩比Concept 故事梗概Condenser 电容话筒Conductor 指挥家Conform 套片Construction coordinator 搭景协调Continuity 连续性Continuity editing 连续性剪辑Contrast 对比度Contrast range 最大对比度Contrast ratio 对比度系数Copyright 版权Core 片心Costumer 服装师Costume designer 服装设计师Countdown 倒计时Craft service 膳食服务Crane 摇臂Cross-cutting 交叉剪辑（又称平行剪辑）Cross-fade 交互混音Crystal 晶控同步元件（摄像机里控制速度的一个晶体组件）Cue sheet 混录提示表Cut 卡接（一个画面切换到另一个画面，有一种转场的效果）Cutaway 切离镜头Cut-in 切入镜头Cutting-on-action 剪辑中的“接动作”DD1 一种数字录像带Dailies 工作样片Daily production report 每日拍摄报表Day-for-night 日间拍夜景Daylight 日光片Decay 衰减Decibel 分贝Decompression 解压缩Depth of field 景深Dialogue 对白Diegetic sound 故事中的声音Diffusion filter 扩散滤镜Digital 数字Digital autio workstation 数字声音工作站Digital audiotape 数字录音带Digital film printer 数字胶片印片机Digitalize 数字化Dimmer board 调光控制器Diopter 屈光镜Direct capture 直接采集Direct sound 直接声音Directionality 方向性Diretor 's cu导t 演剪辑版本Dissolve 叠化Distortion 失真Dolly 轮组Dot 圆形遮光罩Dropouts 断磁（又叫脱音）Dynamic mic 动圈话筒Dynamic range 动态变化范围EEcho 回音Edge numbers 边缘号码（胶片边缘的数字）Editor ' s cu剪t 辑师版本8mm 8 毫米胶片Electromagnetic spectrum 电磁波谱Electronic stabilizer 电子影像稳定器Emulsion 感光乳剂Equalization 均衡Establishing shot 定位镜头Executive producer 执行制片Exposure index 曝光指数Extendable lighting pole 伸缩灯杆Extras 群众演员Eye light 眼神光Eyeline match 视线匹配FFade in 淡入Fade out 淡出Fast film stock 感光快的胶片Fast lens 强光透镜Field 场Fill light 辅助光Filter 滤镜Filter box 滤镜斗Filter factor 滤镜系数Filter wheel 滤镜转轮Final cut 精剪，最后一个剪辑的版本（同名剪辑软件）Finger 狭长挡光板Firewire 火线接头Fishpole 鱼竿（传声器吊杆）5.1 sound 5.1 环绕声Fixed lens 定焦镜头Flag 旗形挡光板Flashback 闪回Flash cut 闪切Flashforward 闪后Flat 平线Flatbed 平台式剪辑机Floodlight 散光Floor stand 落地式麦克风支架Fluid head 液压云台Flying spot scanner 飞点扫描仪Focal length 焦距Focus 调焦Fog filter 雾镜Foley 特殊音效Foley artist 特殊音效师Foley editor 特殊音效剪辑师Foley mixer 特殊音效混音师Foley walkers 特殊音效师Font 字体Footcandle 尺烛光Forced development 增感冲洗Formats 格式Frame 格（画框）Freeze-frame 定格Frequency 频率Frequency response 频率响应Fresnel spotlight 一种有透镜镜片的聚光灯Friction head 摩擦云台Front focus 前焦点F-stops 光圈系数Fundamental 基音G Gaffer 电工Gaffer grip 灯光夹Gaffer tape 电工胶布Gain 增益Gelatin filter 明胶滤光片Graduated neutral density filter 中灰渐变滤镜Graphic designer 美术设计Grip 场务HHandle 控制点Hairstylist 发型师Hard effects 动作音效Hard light 硬光，直射光Harmonics 谐音Head 云台，磁头Hertz 赫兹Hidden editing 隐性剪辑Hidden mics 隐藏式麦克风Hi-Fi 高保真音响High hat 仰摄座High-angle shot 俯拍镜头High-frequency fluorescent 高频荧光灯High-key lighting 高调光线Hiss 高频噪音HMI light 太阳灯Horizontal resolution 水平分辨率Hue 色相（颜色特定的色调）Hypercardioid 超指向性麦克风IImage enhancer 图像增强器Image stabilization 影响稳定Imaging device 成像装置Impedance 阻抗Incandescent light 白炽灯Incident light meter 入射式测光表Impoint 开始点Intercutting 交互剪接Interlaced scanning 隔行扫描Internegative 中间负片Interpositive 中间正片Inverse square rule 平方反比率Invisible editing 隐性剪辑Iris 光阑JJib-arm 悬吊手臂Jog 格放Jump cut 跳接KKelvin 开尔文数值Key light 主光Key grip 场务Keyframe 关键影格Keykode 片边号码Kicker light 侧光LLatitude 宽容度Lavaliere 领夹式麦克风Leader 导带Leadroom 导引空间Lens 镜头Lens hood 遮光罩Letterbox 上下黑框式Light meter 测光表Light ratio 光线比例Light stand 灯架Light-balancing filter 校色温滤镜Light crew 灯光组Light-emitting diode 发光二极管Lighting director 灯光指导Lighting plot 照明规划图Lighting ratio 光比Lighting technician 灯光师Line inputs 线材输入Line producer 现场制片人Linear 线性Linear CCD array 线状CCD 排列确切意思待进一步查阅）Location manager 场地管理经理Log 工作记录Long lens 长焦距镜头Long shot 远景镜头Look space 视线空间Loop 循环播放Lossless 无损压缩Low-angle shot 仰拍镜头Low-Contrast filter 低反差滤镜Low-key lighting 低调照明Luminance 亮度MMacro 近摄摄影Magnetic film 磁性声带Makeup artist 化妆师Master scene script 分镜头剧本Master scene shooting method 主场景拍摄法Master shot 主镜头Match cut 匹配剪辑Matte artist 套色绘图师Matte box 遮光斗Matting 套片合成Meditum shot 中景镜头Mic inputs 麦克风输入接口Midside miking M-S 拾音制式Miniature designer 缩小物设计师Mini-DV 迷你DV Miniphone plug 小型耳机Mise-en-sc nèe 场面调度Mixed lighting 混合照明Mixer 混音师Model maker 模型师Modulate 调制Monaural 单声道Monitor 监视器Montage editing 蒙太奇剪辑Morphing 变形Motif 母题Motion capture 运动扑捉Multimedia 多媒体Multitrack audiotape recorder 多轨录音机Music editor 配乐剪辑师Musical Instrument Digital Interface 数字音乐乐器界面Music（MIDI）supervisor 音乐总监NNegative 负片Negative cutter 负片剪辑师Neutral density filter 中灰滤镜Night-for-night 夜间拍夜景Noise 噪波（干扰信号）Nonlinear editing 非线性剪辑（非线编）Normal lens 标准镜头Noseroom 鼻前空间（视线空间）Off-line 脱机剪辑Offline editor 线外剪辑师Off-screen space 银幕外空间Ohms 欧姆Omnidirectional 全方向型麦克风180-degree rule 180°规则，又称轴线规则On-line 在线剪辑Online editor 线上剪辑师Opacity 暗度On-set editing 现场剪辑On-set editor 现场剪辑师Open casting 公开选角Optical printer 光学印片机Optical stabilization 光学影像稳定器Outpoint 结束点Outttakes 备用镜头Overlap action 重叠表演Overlap cutting 重叠剪接Overmodulation 过度调制Over-the-shoulder shot 过肩镜头Overtones 泛音Over-under method 上下方式PPan 横移Pan and scan 横摇纵摇法Parabolic reflector 抛物面反射器Parallel editing 平行剪辑Peaking in the red 指针在红色区域Pedestal 基准黑Perspective 距离感Phase 相位Phone plug 耳机Phono play 莲花接头Photoflood 摄影散光灯Pickup pattern 拾音范围Pitch 音高Pixel 像素Playhead 播放点Point-of-view shot 主观镜头Polarizing filter 偏光滤镜Positive 正片Postproduction 后期制作Practical light 实际光源Premix 预先混音Preproduction 前期制作Preroll 预卷Presence 表现力Prime lens = Fixed lens 定焦镜头Principal actors 主要演员Prism block 棱镜Producer 制片人Production placement 置入行销（就是在影片中植入广告）Production 拍摄阶段Production assistant 制片助理Production coordinator 协调制片Production designer 制作设计师Production manager 制片主任Production schedule 制作日程表Production sound mixer 现场混音师Progressive scanning 逐行扫描Property master 道具管理Props 道具Proximity effect 听讲效果Piblic domain 版权公有Publicist 宣传人员Pulling focus 移焦Pushed processing 增感冲洗Pyro technician 烟火技师QQuartz-halogen lamp 石英灯RRack focus 移焦Radio frequency 射频（简称RF ）Random assess 随机存取Random-assess memory 内存RCA plug = Phone plug 莲花插头Reaction shot 反映镜头Read through 口念对白Reel 片卷Reflected light meter 反射式测光表Reflector 反光板Reframing 调整取景范围Release 释放Release print 发行拷贝Render 渲染Resolution 分辨率Reverberation 混响Reversal 反转片Rigging grip 吊具场务Ripple 涟漪效果Room tone 空间音Rough cut 粗剪Rule of thirds 三分规则SSafe area 安全区域（银幕的概念里恒定成像的区域）Sampling rate 取样频率Saturation 饱和度Scanning 扫描Scene 场景Scene outline 分场大纲Scenic artist 布景工S-connector S 端子（一种电子影像接头）Scratch dish 第二空间（非线编系统中的一个可命令分类项目的空间）Scratch track 参考音轨Screen direction 银幕方向Screenplay 剧本Screen test 试镜Scrim 柔光布Script breakdown 剧本分解Script breakdown sheet 剧本分解表Script supervisor 场记Scrubber 处理器Second unit 第二摄制组Separation light 修饰光Sequence 段落Set decorator 场景装饰工Serifs 横细线Server 伺服器70mm 70 毫米胶片Set decorations 布景装饰Set designer 布景设计师Shock mount 减震架Shooting schedule 拍摄日程表Shooting script 分镜头剧本Short lens 广角镜头Shot 镜头Shotgun mic 枪式麦克风，一般用来采集比较远的声音Shot/reaction shot 反应镜头（偶尔也称脸部特写镜头）Shot/reverse shot 反打镜头Shutter 快门Shuttle 飞梭Signal-to-noise ratio 信噪比Silhouette lighting 剪影光线Single system 单系统16mm 16 毫米胶片Sky filter 天空滤镜Slate 拍板Slow film stock 感光慢的胶片Slow lens 小相对孔径透镜/小光强透镜SMPTE time code SMPTE 时间码Snoot 圆锥形光罩Soft light 柔光/ 散射光Soft-contrast filter 软调反差滤镜Softlight reflector 柔光反射器Software engineer 美术设计软件工程师Sound designer 声音设计师Sound editor 声音剪辑师Sound effects 音效Sound effects editor 音效剪辑师Sound effects mixer 音效混音师Sound flashback 声音闪回Sound flashforward 声音闪前Sound mix 混音Source lighting 有源光线Space clamp 高空夹Spatial compression 空间压缩Special assistant 特别助理Special effects 特效Special effects coordinator 特效协调者Special-effects generator 特效生成器Speech bump 语音衰减Split screen 分割画面Spot 聚光Spot meter 点式测光表Spotlight 聚光灯Spotting sheet 注记表Sprocket holes 片孔Standard definition television 标准清晰度电视Standby 待机Stand-ins 替身Star filter 星型条纹滤镜Steadicam 斯坦尼康Stereo 立体声Still frame 定格Still photographer 剧照师Storage area network 存储局域网络Storyboard 分镜故事板Storyboard artisit 分镜表绘图师Stream 串流Stripboard 提示板Stunt coordinator 特技指导Stunt people 特技演员Super 8 超8 毫米胶片Super 16 超16 毫米胶片Supercardiord 超指向性麦克风Supporting role 配角Surround sound 环绕声Sustain 持续Sweetening 调音Switcher 视频混合处理器Sync 同步信号Synchronous sound 同期声TTable stand 台式麦克风架Take 拍摄次数Technical rehearsal 技术彩排Telecine operator 影视格式转换员Telephoto lens 长焦镜头Template 模板30-degree rule 30°原则35mm 35 毫米胶片3-D computer artist 3-D 电脑立体绘图师Three-point lighting 三点式布光Three-to-one rule 三比一规则Three-two pulldown 三二抓片法（将每秒24 格的胶片要转换成每秒的电子30 个画框影像的过程）Threshold of pain 120 分贝的声音（声音达到刺耳程度的临界点）Thumbnail 缩略图Tilt 纵摇（摄像机垂直上下摇摄）Timbre 音色Time code 时间码Time code generator 时间码生成器Time code reader 时间码读取器Timeline 时间轴Timing 调光Timing sheet 时间表Title designer 标题设计师Tonality 色调Tone 基准音Track 横轨Trailer 预告片Transitions 镜头转换Treatment 文学脚本Trimming 修整Tripod 三角架Trombone 吊灯栓Truck 横移T-stop t 光圈Tungsten 钨丝灯24P 高分辨率的数字电子影像格式UUltracardiord 锐心型麦克风/超指向性麦克风Ultraviolet filter 紫外线滤镜U-Matic 标准3/4 英寸录像带Umbrella reflector 反光伞Unbalanced 不平衡式的Undo 恢复原状Unit production manager 剧组制片主任Universal clamp 万能夹Upright 直立式剪辑机Utility person 剧务VVariable focal length lens = zoom lens 变焦镜头Vectorscope 矢量示波器Velocity 速率Vertical resolution 垂直分辨率Videographer 电子影像摄影师Video assist 录像辅助系统Video-on-demand 随选视频Viewfinder 观景器Virtual set 虚拟场景Visual effects 视觉效果Visual effects editor 视觉特效师Voice-over 旁白Volt 伏特Volume unit meter 音量表WWalla walla 背景人声Watt 瓦特Waveform 波形Waveform monitor 波形监视器White balance 白平衡White reference 白基准Wide-angle lens 广角镜头Widescreen 宽银幕Wild sound 自然音Window dub 视窗剪辑Windscreen 防风罩Wipe 划像Wireless mic 无线麦克风Workprint 工作正片Writer 编剧XXLR connector 卡侬插头X-Y miking X-Y 拾音制式ZZebra stripe 斑马条纹Zoom 变焦Zoom lens 变焦镜头Zoom mic 指向性可调麦克风附录：一些基本电影词汇documentary 记录片,文献片filmdom 电影界literary film 文艺片musicals 音乐片comedy 喜剧片tragedy 悲剧片dracula movie 恐怖片sowordsmen film 武侠片detective film 侦探片ethical film 伦理片affectional film 爱情片erotic film 黄色片western movies 西部片film d' avan-tgarde 前卫片serial 系列片trailer 预告片cartoon (film) 卡通片,动画片footage 影片长度full-length film, feature film 长片short(film) 短片colour film 彩色片(美作:color film) silent film 默片,无声片dubbed film 配音复制的影片,译制片silent cinema, silent films 无声电影sound motion picture, talkie 有声电影cinemascope, CinemaScope 西涅玛斯科普型立体声宽银幕电影幕电影cinerama, Cinerama 西涅拉玛型立体声宽银幕电影,全景电影title 片名original version 原著dialogue 对白subtitles, subtitling 字幕credits, credit titles对原作者及其他有贡献者的谢启和姓名telefilm 电视片演员actors Starring cast阵容film star, movie star 电影明星star, lead 主角double, stand-in 替身演员stunt man 特技替身演员extra, walker-on 临时演员character actor 性格演员regular player 基本演员,变形镜头式宽银extra 特别客串film star 电影明星film actor 男电影明星film actress 女电影明星support 配角util 跑龙套工作人员technicians adapter 改编scenarist, scriptwriter 脚本作者dialogue writer 对白作者production manager 制片人producer 制片主任film director 导演assistant director 副导演,助理导演cameraman, set photographer director of photography assistant cameraman 摄影助理property manager, propsman 道具员art director 布景师（美作:set decorator） stagehand 化装师lighting engineer 灯光师film cutter film editor 剪辑师sound engineer, recording director 录音师script girl, continuity girl 场记员scenario writer, scenarist 剧作家放映projection reel, spool （影片的）卷,本sound track 音带,声带showing, screening, projection 放映projector 放映机projection booth, projection room 放映室panoramic screen 宽银幕film industry 电影工业cinematograph 电影摄影机, 电影放映机cinema, pictures 电影院（美作:movie theater） first-run cinema 首轮影院second-run cinema 二轮影院art theatre 艺术影院continuous performance cinema 循环场电影院film society 电影协会,电影俱乐部（美作:film摄影师club） film library 电影资料馆premiere 首映式film festival电影节distributor 发行人Board of Censors 审查署shooting schedule 摄制计划censor ' c s ertificate 审查级别release 准予上映banned film 禁映影片A-certificate A 级（儿童不宜） U-certificate U 级X-certificate X 级（成人级） direction 导演production 制片adaptation 改编scenario, screenplay, script 编剧scene 场景exterior 外景lighting 灯光shooting 摄制to shoot 拍摄dissolve 渐隐,化入,化出fade-out 淡出fade-in 淡入special effects 特技slow motion 慢镜头editing, cutting 剪接montage 剪辑recording, sound recording 录音sound effects 音响效果mix, mixing 混录dubbing 配音postsynchronization 后期录音合成studio 制片厂,摄影棚（motion）film studio 电影制片厂set, stage, floor 场地properties, props 道具dolly 移动式摄影小车spotlight 聚光灯clapper boards 拍板microphone 麦克风,话筒boom 长杆话筒scenery 布景电影摄制filming shooting camera 摄影机shooting angle 拍摄角度high angle shot 俯拍long shot 远景full shot 全景close-up, close shot 特写,近景medium shot 中景background 背景three-quarter shot 双人近景pan 摇镜头frame, picture 镜头still 静止double exposure 两次曝光superimposition 叠印exposure meter 曝光表printing 洗印影片类型films typesfilm, motion picture 影片,电影（美作:movie） newsreel 新闻片,纪录片• 电影名词解释| 电影名词解释（中英文对照）ABERRATION 像差摄影影头因制作不精密，或人为的损害，不能将一点所发出的所有光线聚焦于底片感光膜上的同一位置，使影像变形，或失焦模糊不清。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Stephan Richter,Gerald K¨u hne,Oliver SchusterUniversity of Mannheim,Germanyrichter,kuehne,schuster@informatik.uni-mannheim.deABSTRACTThe recognition of objects that appear in a video sequence is an essential aspect of any video content analysis system.We present an approach which classiﬁes a segmented video object based on its appearance (object views)in successive video frames.The classiﬁcation is performed by matching curvature features of the contours of these object views to a database containing preprocessed views of prototypical objects using a modiﬁed curvature scale space technique.By integrating the results of a number of successive frames and by using the modiﬁed curvature scale space technique as an efﬁcient representation of object contours,our approach enables the robust,tolerant and rapid object classiﬁcation of video objects. Keywords:video content analysis,video object recognition,shape analysis,curvature scale space1INTRODUCTIONThe use of video as an information medium has become commonplace.To provide access to the informa-tion contained in video data,appropriate content analysis and indexing methods are necessary.The distant goal of research in automatic content analysis of continuous media is to enable functionality like that al-ready existing for textual information retrieval.Various methods covering different aspects such as shot boundary detection,scene determination,text extraction,and human face detection have been developed in thisﬁeld.5,8,18,20,23The recognition of objects that appear in a video sequence constitutes another essential part of any video content analysis system.In general,object recognition can be addressed at different levels of abstraction. For instance,an object might be classiﬁable as a“cat”(object class),as a“Siamese cat”(subordinate level) or as“my neighbour’s cat”(individual object).We present a contour-based approach to classifying a wide range of objects in video sequences on the object-class level.Curvature features of the contour of each two-dimensional appearance of an object in a video frame are calculated.These features are matched to those of views of prototypical video objects stored in a database.Theﬁnal classiﬁcation of the object is achieved by integrating the matching results for a number of successive frames.This adds reliability to our approach,since unrecognizable single object views occurring in the video sequence are insigniﬁcant with respect to the whole sequence.The calculation of the contour description relies on the curvature scale space(CSS)method developed by Mokhtarian11–13for shape-based image retrieval.We extended this technique for the processing of video sequences and enhanced it by extracting additional information from the CSS image and developing a new matching algorithm.Our approach is restricted by the assumption that a reliable segmentation of the object to be classiﬁed is available.Though,the segmentation of common objects in arbitrary scenes is still beyond the capabilities of an artiﬁcial system,there exist a number of algorithms that succeed under constrained conditions.7,16 The remainder of the paper is organized as follows:Section2summarizes related work.Section3 describes our recognition approach in detail.Experimental results appear in Section4.Finally,Section5 offers concluding remarks.Figure1.Extract from the two-dimensional views representing the object class“car”.2RELATED WORKContour-analysis techniques have existed in computer science for some time now.One of theﬁrst overviews of algorithms in the area of shape analysis was published by Pavlidis as early as1978.17He restricted his review to the analysis of“silhouettes”,as he called shapes and contours of two-dimensional objects.Al-ready in1978,Pavlidis mentioned that shape analysis is“an enormous subject”.More than twenty years later,the subject is still enormous,many new approaches have been tried,and some progress has been made.Pavlidis mentioned that it seemed to be possible to develop rigorous mathematical algorithms to analyse shapes and provide results similar to human perception.Our research found that no straightfor-ward mathematical metric can be found which models human perception with regard to shape analysis.A more recent general survey of shape-analysis techniques was done by Loncaric.9The survey con-tains a section which covers aspects of the human visual system.Some of the theories of visual forms from theﬁeld of psychology are introduced.Among them is the Gestalt theory,which assumes that form is perceived as a whole.Opposed to the Gestalt theory,several decomposition theories are mentioned. Nevertheless,the human visual system is not understood well enough to discard either the Gestalt theory or the decomposition theories.In the later sections of his paper,Loncaric introduces several shape-analysis techniques grouped into boundary scalar transform,boundary space domain,global scalar transform,and global space domain techniques.The method which we use is a boundary scalar transform technique.A paper which compares human perceptual judgments with the results of seven different shape-matching algorithms was written by Scassellati,Alexopoulos and Flickner.19They asked40volunteers to match20 query shapes to a database of about1,400images taken from the QBIC3,14,15project.The algorithm which was closest to the results obtained from the volunteers was the turning angle approach.The turning angle approach basically describes which direction needs to be used to travel the perimeter of a contour.Never-theless,this approach was only best in seven out of20queries.These results support our statement that no straightforward mathematical metric can be found to model human shape perception.One of the most recent works to present an overview and the state of the art in theory and practice of shape analysis is a book by Costa and Cesar.6This book provides a good introduction to the subject ranging from the basic mathematical concepts to the acquisition and preprocessing of shapes.Several concepts of shape representation and characterization are presented.One of the more promising contour analysis techniques is the CSS method introduced by Mokhtar-ian.1,2,11–13The advantages of his method are that it is size and rotation invariant and robust to noise. In the next section,we brieﬂy describe the CSS method and outline the architecture of our video object recognition system.3VIDEO OBJECT CLASSIFICATIONOur system for object classiﬁcation consists of two major parts:(1)a database containing contour-based representations of prototypical video objects and(2)an algorithm for matching objects extracted from a video sequence with the database.The object representation and the database are discussed in Section3.1, the matching algorithm in Section3.2.3.1Object RepresentationIn general,there are two ways to generate object-related information.First,one could extract features from a3D object model,and second,it is feasible to use two-dimensional views of an object,taken from differ-ent perspectives as basis.In cognitive psychology a number of theories have been developed with regard to(a)objectview(b)10iterations(c)30iterations(d)100iterations(e)200iterations(f)300iterations300200100iterationsarc length1030Figure2.Construction of the CSS image.Left:(a)-(f)Smoothed contour after10,30,100,200and300 iterations.The small dots on the contour mark the curvature zero crossings.Right:Resulting CSS image.object representation in the human brain.4,10,22Although a general theory is not available,psychophysical evidence indicates that humans encode three-dimensional objects as multiple viewpoint-speciﬁc represen-tations that are largely two-dimensional.21We adhere to this theory and store for each object class a number of different two-dimensional views, so-called object views.Furthermore,we pool different views of different objects into one object class to obtain a reliable class deﬁnition.Figure1illustrates the object class car by depicting several object views from this class.Of the views that can be generated from different perspectives,we prefer so called canonical views.A canonical view shows an object in a—with respect to human perception—typical perspective and provides a sufﬁcient number of object characteristics to allow for rapid recognition.For instance,for a car,one possible canonical view is a slightly elevated view of the frontal and side parts of the object(see Figure1).A view of the bottom of a car is generally not considered canonical.Different sources of information are available to characterize a two-dimensional view(e.g.contour, colour,texture,motion,or relative location of the object to other objects).However,most common objects can be identiﬁed by their contours only.22In our approach,for each object view a few parameters are extracted from its contour and stored in conjunction with the object view’s class name in a database.The parameters are calculated using a modiﬁed curvature scale space(CSS)technique.3.1.1Basic Curvature Scale Space RepresentationThe CSS technique1,2,11–13is based on the idea of curve evolution,i.e.basically the deformation of a curve over time.The technique provides a multi-scale representation of the curvature zero crossings of a closed planar contour.A zero crossing occurs,for instance,at the transition from a convex to a concave contour segment.The contour is scanned iteratively for inﬂection points of the curvature while being smoothed by a Gaussian kernel.During the deformation process,zero crossings merge as transitions between contour segments of different curvature are equalized(see Figure2).Consequently,after a certain number of iterations inﬂection points cease to exist and the shape of the closed curve is convex.Note that due to(a)(b)(c)Figure3.Ambiguities in CSS images.(a)shallow concavity:object view(left),contour(right)(b)deep concavity:object view(left),contour(right),(c)left:CSS image of(a),right:CSS image of(b).the dependence on curvature zero crossings,convex object views cannot be represented with the CSS technique.From the positions of the zero crossings at different scales,a so-called CSS image is constructed.The CSS image shows the zero crossings with respect to their position on the contour and the width of the Gaussian kernel(or the number of iterations,see Figure2).Therefore,signiﬁcant contour properties that are visible for a large number of iterations result in high peaks in the CSS image.However,areas with rapidly changing curvatures caused by noise produce only small local maxima.To include size invariance into the CSS technique,we sample for each contour aﬁxed number of equidistant contour points(in our implementation we use200sample points).In many cases the peaks of the CSS image provide a robust and compact representation of an object view’s contour.2,12,13Note that a rotation of an object view on the image plane can be accomplished by shifting the CSS image left or right in a horizontal direction.Furthermore,a representation of a mirrored object view is obtained by mirroring the CSS image.It is sufﬁcient to extract the signiﬁcant maxima(above a certain noise level)from the CSS image, i.e.selected for each maximum are its position on the contour and the value(iteration or Gaussian kernel width).For instance,for the example depicted in Figure2,only four data pairs have to be stored,assuminga noise level of30iterations.3.1.2Modiﬁed Curvature Scale Space RepresentationA main drawback to the basic CSS technique described in the last section is the occurrence of ambiguities. Under certain conditions shallow and deep concavities on a contour may result in peaks of the same height in the CSS image.Figure3depicts this problem:The shallow concavity of the object contour shown in (a)and the deep concavity of the object contour displayed in(b)result in peaks of nearly the same height (relative difference about1%)in the CSS images(c).Consequently,certain contours differing signiﬁcantly in their visual appearance are claimed by the basic CSS technique to be similar.Abbasi2presented several approaches to avoiding these ambiguities.However,the proposed strategies raise the computational costs signiﬁcantly.In our extension we utilize additional information already available in the CSS image.In addition to the height of a peak in the CSS image,we also extract the width at the bottom of the arc-shaped contour corresponding to the peak.As it is shown in Figure3,the widths of shallow and deep concavities differ signiﬁcantly creating CSS maxima of the same height(relative width difference is about80%).The width speciﬁes the normalized arc length distance of the two curvature zero crossings enframing the contour segment represented by the peak in the CSS image.Let us summarise our approach to mapping prototypical video objects to database entries.Each pro-totypical object is represented by a collection of object views.For the object views,in turn,a number of data triples consisting of positions,heights,and widths of the CSS maxima are stored in the database.The matching algorithm described in the following section utilises this information to compare extracted video objects and prototypical video objects.3.2Object MatchingObject matching is done in two steps.In theﬁrst,each individual object in a sequence is compared to all objects in the database by comparing peaks characterised by the triplets in the database.A list of the best matches is build for further processing.Thisﬁrst step is described in section3.2.1.In the second step,the results from theﬁrst step are accumulated and a conﬁdence value is calculated.Based on the conﬁdence value,the object class of the object in the sequence is printed.The second step is described in section3.2.2.3.2.1Object based MatchingIn order toﬁnd the most similar object in the database compared to a query object from a sequence,a matching algorithm is needed.This algorithm is shown as Algorithm1.The general idea of the algorithm is to compare the peaks in the CSS images of the two query objects cm1and cm2to each other based on the characterisation by the triplets of height,position on the arc and width.This is done byﬁrst determining the best position to compare the two images.It might be necessary to rotate or mirror one of the images so that the peaks are aligned best.Next,a matching peak is determined for each peak in cm1.If a matching peak is found,the Euclidean distance of the height and position of the peaks is calculated and added to the difference between the images.If no matching peak was found,the height of the peak in cm1multiplied by a penalty factor is added to the total difference.Several hygiene factors,which need to be met,exist.For all peaks,the matched peak needs to be within a certain position and width range.Only for the highest peaks,the height also needs to be within a certain range.The ranges are set via threshold parameters.The matching algorithm has to take into account that the object from the image might be mirrored or rotated compared to the best match in the object-view database.Therefore,the matching algorithm needs to be executed multiple times until the best-matching position is found.A heuristic is used to shorten execution time.Only the most promising rotations are calculated.These are determined by shifting the CSS images so that the highest peaks of the CSS image are aligned.As mentioned before,shifting the CSS image corresponds to rotation of the original object.Since not all possible rotations of an object view are stored in the database,it is reasonable to compensate this shortcoming during the matching process.The algorithm which determines relevant shifting offsets is shown as Algorithm2.Algorithm1needs to be called several times to compensate for mirroring of the object in the sequence or the object view in the database.The call which results in the lowest difference is used for further processingAlgorithm1might return∞,if e.g.the shift list is empty and therefore no adequate rotation could be found or if the highest maxima in the CSS images do not match within a given tolerance range.If this is the case,the two objects are signiﬁcantly different and therefore a match is not possible.A clear rejection helps to improve the overall results of the matching algorithm,since object-views which do not bear much resemblance to objects from a sequence are eliminated for further evaluation in this way.3.2.2Sequence based MatchingOnce the matching algorithm has been executed,a list of matches for each object in the sequence exists. This list contains the difference to the object view in the database and the object class of the object view. Only the top match,i.e.the object view with the least difference,is used for evaluation.It might be that the difference to the top match is∞.If so,no reasonable match could be found in the database.Since the database does not contain all possible object views,such a result might occur frequently,depending on the object in the sequence.∞as difference therefore clearly indicates that no conclusive statement can be made about the class of the object from the sequence.All top matches which were recognised are used for accumulation.In the accumulation process,the inverse difference for each object in the sequence is added to an entry for the speciﬁc recognised object class.This procedure yields a list which contains one value for each object class.The total of theseﬂoats gives the total accumulated difference for the processed sequence.Each entry in the list is divided by this total,resulting in a percentage number.The object class with a percentage higher than70%is considered to be the object class of the sequence.Higher percentage numbers stand for better recognition rates.Examples of test runs can be found in Section4.typematchCssImages list cssmaximum cm1list cssmaximum cm2beginmindifference:∞return parameter varoffset shiftlist dounmatched maxima cm1dounvisited maxima cm2dowidth cm1T width%width cm2width cm1T width%andmatch found:truebreakﬁififheight cm1T height%height cm2height cm1T height%orposdiff:pos cm1pos cm2heightdiff:height cm1height cm2difference:differencedifference:∞reject this shift offset ﬁifdifference:∞reject this shift offset elseodunmatched maxima cm2doifmindifference:differenceﬁreturn mindifferenceendfunctmax:maximum height in cm1foreachforeachif continueﬁwidth difference20%then if width is not in tolerance,ignore add position difference to listodreturn list position differencesend4EXPERIMENTAL RESULTSOur test database containsﬁve object classes containing animals,birds,cars,people,and miscellaneous objects.For each object class we collected25–102images from a clip art library.The clip arts are typical representatives of their object class with easily recognizable perspectives(canonical views).See Figure1 for examples of prototypes of the object class car.The object class people contains the most objects(102images).The contours of humans differ greatly in image sequences,e.g.the position of the arms and legs makes a great impact on the contour.To make recognition of a query object possible,it is necessary to have a great variety of human images in the database.Otherwise Algorithm1will reject all images in the database and no results are obtainable. The object class car is very well represented in the database,too.With a limited number of48cars most perspectives and types are represented.The object classes birds,animals,and miscellaneous objects hold 25,42,and30images.The creation of the database with about250objects requires30seconds computation time on a standard personal computer,thus8CSS images per second can be calculated.The database stores for each CSS image the name of the object class and the data of the relevant peaks(height,position,width).Several short real-world video sequences were tested.Five sequences contain rigid objects(cars)and 6sequences show non-rigid objects(people and one bird sequence).The results of these sequences are shown in table1.For each sequence the type of segmentation(automatically or manually)and the number of frames are given.The length of each sequence ranges from2to8seconds.When an object enters or leaves a scene only parts of it are visible and its contour is heavily deformed.If this is the case the threshold parameter will reject all objects in the database:no match is possible.The column matched frames in table1shows the number of frames where the matching to at least one object in the database was successful.To match a sequence to the databaseﬁrst the CSS features of the sequence need to be calculated.In a second step the calculated CSS features of the sequence are matched to the precalculated CSS features of the object views stored in the database.The whole matching process can be done in5frames per second.Theﬁrst3people–sequences are segmented automatically.Kim describes a segmentation method which calculates differences between a background image and the unknown frame based on edges.7The sequence People–3was automatically segmented using a level set based method described by Paragios.16 The other sequences were segmented manually.The sequence entitled People–1is a talking-head scene with small changes between the different frames.In People–2a human walks around and changes its orientation towards the camera.Only the upper part of the person is visible(see Figure5for sample images and best matches).In sequence People–3a human runs from a great distance to the camera.Sequence Number of Object classframes detectedautomatically21People–239People96% automatically23People–429People71% manually6Bird–115People57% manually33Car–221Cars87% manually40Car–419Cars100% manually17Figure4.Results for the sequence Car–4.Top:From left to right–segmented objects views from frames 7,11,15,17of the video sequence Car–4.Bottom:From left to right–best matches from the database for the object view displayed above.Figure5.Results for the sequence People–2.Top:From left to right–segmented objects views from frames22,26,29,32of the video sequence People–2.Bottom:From left to right–best matches from the database for the object view displayed above.on non-rigid objects.First of all,it is straightforward to extend the database and provide more prototypes covering different motions and postures.Second,the rotation invariance of the matching procedure could be restricted.Following our approach of using canonical views,it is reasonable to allow only a certain degree of rotation when matching object views to those in the database.Finally,the CSS technique leaves room for improvement.In most cases peaks in the CSS image result from concave contour segments in the object view.However,under certain conditions convex segments may also result in CSS maxima.In the current implementation the CSS maxima extracted from the CSS image are not classiﬁed appropriately.Future work may comprise the considerations mentioned above and the integration of reliable object segmentation techniques.REFERENCES1.Sadegh Abbasi and Farzin Mokhtarian.Shape similarity retrieval under afﬁne transform:Applicationto multi-view object representation and recognition.In Proc.International Conference on Computer Vision,pages450–455.IEEE,1999.2.Sadegh Abbasi,Farzin Mokhtarian,and Josef Kittler.Enhancing css-based shape retrieval for objectswith shallow concavities.Image and Vision Computing,18(3):199–211,2000.3.Jonathan Ashley,Ron Barber,Myron Flickner,James Hafner,Denis Lee,Wayne Niblack,andDragutin Petkovic.Automatic and semi-automatic methods for image annotation and retrieval in qbic.In Proceedings of the SPIE,volume2420,pages24–35,1995.Figure6.Results for the sequence People–5.Top:From left to right–segmented objects views from frames200,204,209,225,228of the video sequence People–5.Bottom:From left to right–best matches from the database for the object view displayed above.4.Irving Biederman.Recognition-by-components:a theory of human image understanding.Psycho-logical Review,94:115–147,1987.5.Shih-Fu Chang,William Chen,and Hari Sundaram.Videoq:A fully automated video retrieval systemusing motion sketches.In Proceedings Fourth IEEE Workshop on Applications of Computer Vision, 1998.6.Luciano da Fontoura Costa and Roberto Marcondes Cesar,Jr.Shape Analysis and Classiﬁcation.CRC Press,2000NW Corporate Blvd,Boca Raton,FL33431-9868,September2000.7.Changick Kim and Jenq-Neng Hwang.An integrated scheme for object-based video abstraction.InProceedings ACM Multimedia,2000.8.Huiping Li and David Doermann.Text enhancement in digital video using multiple frame integration.In Proceedings ACM Multimedia,pages19–22,1999.9.Sven Loncaric.A survey of shape analysis techniques.Pattern Recognition,31(8):983–1001,August1998.10.David Marr.Vision:A Computational Investigation into the Human Representation and Processingof Visual Information.Freeman,San Francisco,CA,1982.11.Farzin Mokhtarian.Silhouette-based isolated object recognition through curvature scale space.IEEETrans.Pattern Analysis and Machine Intelligence,17(5):539–544,1995.12.Farzin Mokhtarian,Sadegh Abbasi,and Josef Kittler.Efﬁcient and robust retrieval by shape contentthrough curvature scale space.In Proc.International Workshop on Image DataBases and MultiMedia Search,pages35–42,1996.13.Farzin Mokhtarian,Sadegh Abbasi,and Josef Kittler.Robust and efﬁcient shape indexing throughcurvature scale space.In British Machine Vision Conference,1996.14.W.Niblack,R.Barber,W.Equitz,M.Flickner,E.Glasman,D.Petkovic,P.Yanker,C.Faloutsos,and G.Tabuin.The qbic project:Querying images by content using color,texture,and shape.In Proceedings of the SPIE,volume1908,pages173–187,1993.15.Wayne Niblack,Xiaoming Zhu,James Lee Hafner,Tom Breuel,Dulce Ponceleon,Dragutin Petkovic,Myron Flickner,Eli Upfal,Siegfredo I.Nin,Sanghoon Sull,Byron Dom,Boon-Lock Yeo,Savitha Srinivasan,Dan Zivkovic,and Mike Penner.Updates to the qbic system.In Proceedings of the SPIE, volume3312,pages150–161,1997.16.Nikos Paragios and Rachid Deriche.Geodesic active contours and level sets for the detection andtracking of moving objects.IEEE Transactions on Pattern Analysis and Machine Intelligence,22(3), March2000.17.Theodosios Pavlidis.Review of algorithms for shape puter Graphics and Image Pro-cessing,7(2):243–258,April1978.18.Silvia Pfeiffer,Rainer Lienhart,and Wolfgang Effelsberg.Scene determination based on video andaudio features.In IEEE Conference on Multimedia Computing and Systems,1999.19.Brian Scassellati,Sophoclis Alexopoulos,and Myron Flickner.Retrieving images by2d shape:a comparison of computation methods with human perceptual judgments.In Wayne Niblack andRamesh C.Jain,editors,Proceedings of SPIE,Storage and Retrieval for Image and Video Databases II,volume2185,pages2–14,1994.20.Savitha Srinivasan,Dulce Ponceleon,Arnon Amir,and Dragutin Petkovic.What is in that videoanyway?:In search of better browsing.In Proceedings of the IEEE International Conference on Multimedia Computing and Systems,1999.21.Michael D.Tarr and Heinrich H.B¨u lthoff,editors.Object Recognition in Man,Monkey,and Machine.MIT Press,Cambrigde,MA,1998.22.Shimon Ullman.High-level Vision:Object Recognition and Visual Cognition.MIT Press,Cambridge,MA,1996.23.Howard rmedia–search and summarization in the video medium.In Proceedings ofImagina,2000.。