On Deep Generative Models with Applications to Recognition_cvpr2011

合集下载

ImageNetClassificationwithDeepConvolutionalNeuralN

ImageNet Classification with Deep Convolutional Neural NetworksAlex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton摘要咱们训练了一个大型的深度卷积神经网络，来将在ImageNet LSVRC-2020大赛中的120万张高清图像分为1000个不同的类别。

对测试数据，咱们取得了top-1误差率%，和top-5误差率%，那个成效比之前最顶尖的都要好得多。

该神经网络有6000万个参数和650,000个神经元，由五个卷积层，和某些卷积层后随着的max-pooling层，和三个全连接层，还有排在最后的1000-way的softmax层组成。

为了使训练速度更快，咱们利用了非饱和的神经元和一个超级高效的GPU关于卷积运算的工具。

为了减少全连接层的过拟合，咱们采纳了最新开发的正那么化方式，称为“dropout”，它已被证明是超级有效的。

在ILSVRC-2021大赛中，咱们又输入了该模型的一个变体，并依托top-5测试误差率%取得了成功，相较较下，次优项的错误率是%。

1 引言当前物体识别的方式大体上都利用了机械学习方式。

为了改善这些方式的性能，咱们能够搜集更大的数据集，学习更强有力的模型，并利用更好的技术，以避免过拟合。

直到最近，标记图像的数据集都相当小——大约数万张图像（例如，NORB [16]，Caltech-101/256 [8, 9]，和CIFAR-10/100 [12]）。

简单的识别任务能够用这种规模的数据集解决得相当好，专门是当它们用标签-保留转换增强了的时候。

例如，在MNIST数字识别任务中当前最好的误差率（<%）接近于人类的表现[4]。

可是现实环境中的物体表现出相当大的转变，因此要学习它们以对它们进行识别就必需利用更大的训练集。

事实上，小规模图像数据集的缺点已被普遍认同（例如，Pinto等人[21]），可是直到最近，搜集有着上百万张图像的带标签数据集才成为可能。

Beyond Bags of FeaturesSpatial Pyramid Matching

Beyond Bags of Features:Spatial Pyramid Matching for Recognizing Natural Scene CategoriesSvetlana Lazebnik1 slazebni@ 1Beckman InstituteUniversity of IllinoisCordelia Schmid2Cordelia.Schmid@inrialpes.fr2INRIA Rhˆo ne-AlpesMontbonnot,FranceJean Ponce1,3ponce@3Ecole Normale Sup´e rieureParis,FranceAbstractThis paper presents a method for recognizing scene cat-egories based on approximate global geometric correspon-dence.This technique works by partitioning the image into increasinglyﬁne sub-regions and computing histograms of local features found inside each sub-region.The result-ing“spatial pyramid”is a simple and computationally efﬁ-cient extension of an orderless bag-of-features image rep-resentation,and it shows signiﬁcantly improved perfor-mance on challenging scene categorization tasks.Speciﬁ-cally,our proposed method exceeds the state of the art on the Caltech-101database and achieveshigh accuracy on alarge database ofﬁfteen natural scene categories.The spa-tial pyramid framework also offers insights into the successof several recently proposed image descriptions,includingTorralba’s“gist”and Lowe’s SIFT descriptors.1.IntroductionIn this paper,we consider the problem of recognizingthe semantic category of an image.For example,we maywant to classify a photograph as depicting a scene(forest,street,ofﬁce,etc.)or as containing a certain object of in-terest.For such whole-image categorization tasks,bag-of-methods,which represent an image as an orderlessof local features,have recently demonstrated im-levels of performance[7,22,23,25].However,because these methods disregard all information about thespatial layout of the features,they have severely limited de-scriptive ability.In particular,they are incapable of captur-ing shape or of segmenting an object from its background.Unfortunately,overcoming these limitations to build effec-tive structural object descriptions has proven to be quitechallenging,especially when the recognition system mustbe made to work in the presence of heavy clutter,occlu-sion,or large viewpoint changes.Approaches based ongenerative part models[3,5]and geometric correspondencesearch[1,11]achieve robustness at signiﬁcant computa-tional expense.A more efﬁcient approach is to augment abasic bag-of-features representation with pairwise relationsbetween neighboring local features,but existing implemen-tations of this idea[11,17]have yielded inconclusive re-sults.One other strategy for increasing robustness to geo-metric deformations is to increase the level of invariance oflocal features(e.g.,by using afﬁne-invariant detectors),buta recent large-scale evaluation[25]suggests that this strat-egy usually does not pay off.Though we remain sympathetic to the goal of develop-ing robust and geometrically invariant structural object rep-resentations,we propose in this paper to revisit“global”non-invariant representations based on aggregating statis-tics of local features overﬁxed subregions.We introduce akernel-based recognition method that works by computingrough geometric correspondence on a global scale using anefﬁcient approximation technique adapted from the pyramidmatching scheme of Grauman and Darrell[7].Our methodinvolves repeatedly subdividing the image and computinghistograms of local features at increasinglyﬁne resolutions.As shown by experiments in Section5,this simple oper-ation sufﬁces to signiﬁcantly improve performance over abasic bag-of-features representation,and even over meth-ods based on detailed geometric correspondence.Previous research has shown that statistical properties ofthe scene considered in a holistic fashion,without any anal-ysis of its constituent objects,yield a rich set of cues to itssemantic category[13].Our own experiments conﬁrm thatglobal representations can be surprisingly effective not onlyfor identifying the overall scene,but also for categorizingimages as containing speciﬁc objects,even when these ob-jects are embedded in heavy clutter and vary signiﬁcantlyin pose and appearance.This said,we do not advocate thedirect use of a global method for object recognition(exceptfor very restricted sorts of imagery).Instead,we envision asubordinate role for this method.It may be used to capturethe“gist”of an image[21]and to inform the subsequentsearch for speciﬁc objects(e.g.,if the image,based on its global description,is likely to be a highway,we have a high probability ofﬁnding a car,but not a toaster).In addition, the simplicity and efﬁciency of our method,in combina-tion with its tendency to yield unexpectedly high recogni-tion rates on challenging data,could make it a good base-line for“calibrating”new datasets and for evaluating more sophisticated recognition approaches.2.Previous WorkIn computer vision,histograms have a long history as a method for image description(see,e.g.,[16,19]).Koen-derink and Van Doorn[10]have generalized histograms to locally orderless images,or histogram-valued scale spaces (i.e.,for each Gaussian aperture at a given location and scale,the locally orderless image returns the histogram of image features aggregated over that aperture).Our spatial pyramid approach can be thought of as an alternative for-mulation of a locally orderless image,where instead of a Gaussian scale space of apertures,we deﬁne aﬁxed hier-archy of rectangular windows.Koenderink and Van Doorn have argued persuasively that locally orderless images play an important role in visual perception.Our retrieval exper-iments(Fig.4)conﬁrm that spatial pyramids can capture perceptually salient features and suggest that“locally or-derless matching”may be a powerful mechanism for esti-mating overall perceptual similarity between images.It is important to contrast our proposed approach with multiresolution histograms[8],which involve repeatedly subsampling an image and computing a global histogram of pixel values at each new level.In other words,a mul-tiresolution histogram varies the resolution at which the fea-tures(intensity values)are computed,but the histogram res-olution(intensity scale)staysﬁxed.We take the opposite approach ofﬁxing the resolution at which the features are computed,but varying the spatial resolution at which they are aggregated.This results in a higher-dimensional rep-resentation that preserves more information(e.g.,an image consisting of thin black and white stripes would retain two modes at every level of a spatial pyramid,whereas it would become indistinguishable from a uniformly gray image at all but theﬁnest levels of a multiresolution histogram).Fi-nally,unlike a multiresolution histogram,a spatial pyramid, when equipped with an appropriate kernel,can be used for approximate geometric matching.The operation of“subdivide and disorder”—i.e.,par-tition the image into subblocks and compute histograms (or histogram statistics,such as means)of local features in these subblocks—has been practiced numerous times in computer vision,both for global image description[6,18, 20,21]and for local description of interest regions[12]. Thus,though the operation itself seems fundamental,pre-vious methods leave open the question of what is the right subdivision scheme(although a regular4×4grid seemsto be the most popular implementation choice),and what isthe right balance between“subdividing”and“disordering.”The spatial pyramid framework suggests a possible way toaddress this issue:namely,the best results may be achievedwhen multiple resolutions are combined in a principled way.It also suggests that the reason for the empirical success of“subdivide and disorder”techniques is the fact that they ac-tually perform approximate geometric matching.3.Spatial Pyramid MatchingWeﬁrst describe the original formulation of pyramidmatching[7],and then introduce our application of thisframework to create a spatial pyramid image representation.3.1.Pyramid Match KernelsLet X and Y be two sets of vectors in a d-dimensionalfeature space.Grauman and Darrell[7]propose pyramidmatching toﬁnd an approximate correspondence betweenthese two rmally,pyramid matching works byplacing a sequence of increasingly coarser grids over thefeature space and taking a weighted sum of the number ofmatches that occur at each level of resolution.At anyﬁxedresolution,two points are said to match if they fall into thesame cell of the grid;matches found atﬁner resolutions areweighted more highly than matches found at coarser resolu-tions.More speciﬁcally,let us construct a sequence of gridsat resolutions0,...,L,such that the grid at level has2cells along each dimension,for a total of D=2d cells.LetH X and H Y denote the histograms of X and Y at this res-olution,so that H X(i)and H Y(i)are the numbers of points from X and Y that fall into the i th cell of the grid.Thenthe number of matches at level is given by the histogramintersection function[19]:I(H X,H Y)=Di=1minH X(i),H Y(i).(1)In the following,we will abbreviate I(H X,H Y)to I .Note that the number of matches found at level also in-cludes all the matches found at theﬁner level +1.There-fore,the number of new matches found at level is given by I −I +1for =0,...,L−1.The weight associatedwith level is set to12L−,which is inversely proportional to cell width at that level.Intuitively,we want to penalize matches found in larger cells because they involve increas-ingly dissimilar features.Putting all the pieces together,weget the following deﬁnition of a pyramid match kernel :κL(X,Y )=I L+L −1 =012 I −I+1(2)=12LI 0+L =112L − +1I .(3)Both the histogram intersection and the pyramid match ker-nel are Mercer kernels [7].3.2.Spatial Matching SchemeAs introduced in [7],a pyramid match kernel workswith an orderless image representation.It allows for pre-cise matching of two collections of features in a high-dimensional appearance space,but discards all spatial in-formation.This paper advocates an “orthogonal”approach:perform pyramid matching in the two-dimensional image space,and use traditional clustering techniques in feature space.1Speciﬁcally,we quantize all feature vectors into M discrete types,and make the simplifyingassumption that only features ofthe same type can be matched to one an-other.Each channel m gives us two sets of two-dimensional vectors,X m and Y m ,representing the coordinates of fea-tures of type m found in the respective images.The ﬁnal kernel is then the sum of the separate channel kernels:K L(X,Y )=M m =1κL (X m ,Y m ).(4)This approach has the advantage of maintaining continuity with the popular “visual vocabulary”paradigm —in fact,it reduces to a standard bag of features when L =0.Because the pyramid match kernel (3)is simply a weighted sum of histogram intersections,and because c min(a,b )=min(ca,cb )for positive numbers,we can implement K L as a single histogram intersection of “long”vectors formed by concatenating the appropriately weighted histograms of all channels at all resolutions (Fig.1).For L levels and M channels,the resulting vector has dimen-sionality M L =04 =M 13(4L +1−1).Several experi-ments reported in Section 5use the settings of M =400and L =3,resulting in 34000-dimensional histogram in-tersections.However,these operations are efﬁcient because the histogram vectors are extremely sparse (in fact,just as in [7],the computational complexity of the kernel is linear in the number of features).It must also be noted that we did not observe any signiﬁcant increase in performance beyond M =200and L =2,where the concatenated histograms are only 4200-dimensional.1Inprinciple,it is possible to integrate geometric information directly into the original pyramid matching framework by treating image coordi-nates as two extra dimensions in the feature space.´Figure 1.Toy example of constructing a three-level pyramid.Theimage has three feature types,indicated by circles,diamonds,and crosses.At the top,we subdivide the image at three different lev-els of resolution.Next,for each level of resolution and each chan-nel,we count the features that fall in each spatial bin.Finally,we weight each spatial histogram according to eq.(3).The ﬁnal implementation issue is that of normalization.For maximum computational efﬁciency,we normalize all histograms by the total weight of all features in the image,in effect forcing the total number of features in all images to be the same.Because we use a dense feature representation (see Section 4),and thus do not need to worry about spuri-ous feature detections resulting from clutter,this practice is sufﬁcient to deal with the effects of variable image size.4.Feature ExtractionThis section brieﬂy describes the two kinds of features used in the experiments of Section 5.First,we have so-called “weak features,”which are oriented edge points,i.e.,points whose gradient magnitude in a given direction ex-ceeds a minimum threshold.We extract edge points at two scales and eight orientations,for a total of M =16chan-nels.We designed these features to obtain a representation similar to the “gist”[21]or to a global SIFT descriptor [12]of the image.For better discriminative power,we also utilize higher-dimensional “strong features,”which are SIFT descriptors of 16×16pixel patches computed over a grid with spacing of 8pixels.Our decision to use a dense regular grid in-stead of interest points was based on the comparative evalu-ation of Fei-Fei and Perona [4],who have shown that dense features work better for scene classiﬁcation.Intuitively,a dense image description is necessary to capture uniform re-gions such as sky,calm water,or road surface (to deal with low-contrast regions,we skip the usual SIFT normalization procedure when the overall gradient magnitude of the patch is too weak).We perform k -means clustering of a random subset of patches from the training set to form a visual vo-cabulary.Typical vocabulary sizes for our experiments are M =200and M =400.ofﬁce kitchen livingroombedroom storeindustrialtall building ∗inside city ∗street∗highway ∗coast ∗open country∗mountain ∗forest ∗suburbFigure 2.Example images from the scene category database.The starred categories originate from Oliva and Torralba [13].Weak features (M =16)Strong features (M =200)Strong features (M =400)L Single-level Pyramid Single-level Pyramid Single-level Pyramid 0(1×1)45.3±0.572.2±0.674.8±0.31(2×2)53.6±0.356.2±0.677.9±0.679.0±0.578.8±0.480.1±0.52(4×4)61.7±0.664.7±0.779.4±0.381.1±0.379.7±0.581.4±0.53(8×8)63.3±0.866.8±0.677.2±0.480.7±0.377.2±0.581.1±0.6Table 1.in bold.5.ExperimentsIn this section,we report results on three diverse datasets:ﬁfteen scene categories [4],Caltech-101[3],and Graz [14].We perform all processing in grayscale,even when color images are available.All experiments are re-peated ten times with different randomly selected training and test images,and the average of per-class recognition rates 2is recorded for each run.The ﬁnal result is reported as the mean and standard deviation of the results from the in-dividual runs.Multi-class classiﬁcation is done with a sup-port vector machine (SVM)trained using the one-versus-all rule:a classiﬁer is learned to separate each class from the rest,and a test image is assigned the label of the classiﬁer with the highest response.2Thealternative performance measure,the percentage of all test im-ages classiﬁed correctly,can be biased if test set sizes for different classes vary signiﬁcantly.This is especially true of the Caltech-101dataset,where some of the “easiest”classes are disproportionately large.5.1.Scene Category RecognitionOur ﬁrst dataset (Fig.2)is composed of ﬁfteen scene cat-egories:thirteen were provided by Fei-Fei and Perona [4](eight of these were originally collected by Oliva and Tor-ralba [13]),and two (industrial and store)were collected by ourselves.Each category has 200to 400images,and av-erage image size is 300×250pixels.The major sources of the pictures in the dataset include the COREL collection,personal photographs,and Google image search.This is one of the most complete scene category dataset used in the literature thus far.Table 1shows detailed results of classiﬁcation experi-ments using 100images per class for training and the rest for testing (the same setup as [4]).First,let us examine the performance of strong features for L =0and M =200,corresponding to a standard bag of features.Our classi-ﬁcation rate is 72.2%(74.7%for the 13classes inherited from Fei-Fei and Perona),which is much higher than their best results of 65.2%,achieved with an orderless method and a feature set comparable to ours.We conjecture that Fei-Fei and Perona’s approach is disadvantaged by its re-o92.7 kitchen k 68.5living rooml60.4b68.3store s 76.2industriali 65.4tall buildingt91.1i80.5s90.2h86.6c82.4open country o 70.5mountain88.8f94.7 suburb s 99.4images from class i that were misidentiﬁed as class j.liance on latent Dirichlet allocation(LDA)[2],which is essentially an unsupervised dimensionality reduction tech-nique and as such,is not necessarily conducive to achiev-ing the highest classiﬁcation accuracy.To verify this,we have experimented with probabilistic latent semantic analy-sis(pLSA)[9],which attempts to explain the distribution of features in the image as a mixture of a few“scene topics”or“aspects”and performs very similarly to LDA in prac-tice[17].Following the scheme of Quelhas et al.[15],we run pLSA in an unsupervised setting to learn a60-aspect model of half the training images.Next,we apply this model to the other half to obtain probabilities of topics given each image(thus reducing the dimensionality of the feature space from200to60).Finally,we train the SVM on these reduced features and use them to classify the test set.In this setup,our average classiﬁcation rate drops to63.3%from the original72.2%.For the13classes inherited from Fei-Fei and Perona,it drops to65.9%from74.7%,which is now very similar to their results.Thus,we can see that la-tent factor analysis techniques can adversely affect classiﬁ-cation performance,which is also consistent with the results of Quelhas et al.[15].Next,let us examine the behavior of spatial pyramid matching.For completeness,Table1lists the performance achieved using just the highest level of the pyramid(the “single-level”columns),as well as the performance of the complete matching scheme using multiple levels(the“pyra-mid”columns).For all three kinds of features,results im-prove dramatically as we go from L=0to a multi-level setup.Though matching at the highest pyramid level seems to account for most of the improvement,using all the levels inated at higher pyramid levels.Thus,we can conclude that the coarse-grained geometric cues provided by the pyramid have more discriminative power than an enlarged visual vo-cabulary.Of course,the optimal way to exploit structure both in the image and in the feature space may be to com-bine them in a uniﬁed multiresolution framework;this is subject for future research.Fig.3shows a confusion table between theﬁfteen scene categories.Not surprisingly,confusion occurs between the indoor classes(kitchen,bedroom,living room),and also be-tween some natural classes,such as coast and open country. Fig.4shows examples of image retrieval using the spatial pyramid kernel and strong features with M=200.These examples give a sense of the kind of visual information cap-tured by our approach.In particular,spatial pyramids seem successful at capturing the organization of major pictorial elements or“blobs,”and the directionality of dominant lines and edges.Because the pyramid is based on features com-puted at the original image resolution,even high-frequency details can be preserved.For example,query image(b) shows white kitchen cabinet doors with dark borders.Three of the retrieved“kitchen”images contain similar cabinets, the“ofﬁce”image shows a wall plastered with white docu-ments in dark frames,and the“inside city”image shows a white building with darker window frames.5.2.Caltech-101Our second set of experiments is on the Caltech-101 database[3](Fig.5).This database contains from31to 800images per category.Most images are medium resolu-tion,i.e.,about300×300pixels.Caltech-101is probably the most diverse object database available today,though it(a)kitchen living room living room living room ofﬁce living room living room living room livingroom(b)kitchen ofﬁce insidecity(c)store mountainforest(d)tall bldg inside city insidecity(e)tall bldg inside city mountain mountainmountain(f)inside city tallbldg(g)streetFigure 4.Retrieval from the scene category database.The query images are on the left,and the eight images giving the highest values of the spatial pyramid kernel (for L =2,M =200)are on the right.The actual class of incorrectly retrieved images is listed below them.is not without ly,most images feature relatively little clutter,and the objects are centered and oc-cupy most of the image.In addition,a number of categories,such as minaret (see Fig.5),are affected by “corner”arti-facts resulting from artiﬁcial image rotation.Though these artifacts are semantically irrelevant,they can provide stable cues resulting in misleadingly high recognition rates.We follow the experimental setup of Grauman and Dar-rell [7]and J.Zhang et al.[25],namely,we train on 30im-ages per class and test on the rest.For efﬁciency,we limit the number of test images to 50per class.Note that,be-cause some categories are very small,we may end up with just a single test image per class.Table 2gives a break-down of classiﬁcation rates for different pyramid levels for weak features and strong features with M =200.The results for M =400are not shown,because just as for the scene category database,they do not bring any signiﬁ-cant improvement.For L =0,strong features give 41.2%,which is slightly below the 43%reported by Grauman and Darrell.Our best result is 64.6%,achieved with strong fea-tures at L =2.This exceeds the highest classiﬁcation rate previously published,3that of 53.9%reported by J.Zhang et al.[25].Berg et al.[1]report 48%accuracy using 15training images per class.Our average recognition rate with this setup is 56.4%.The behavior of weak features on this database is also noteworthy:for L =0,they give a clas-siﬁcation rate of 15.5%,which is consistent with a naive graylevel correlation baseline [1],but in conjunction with a four-level spatial pyramid,their performance rises to 54%—on par with the best results in the literature.Fig.5shows a few of the “easiest”and “hardest”object classes for our method.The successful classes are either dominated by rotation artifacts (like minaret),have very lit-tle clutter (like windsor chair),or represent coherent natural “scenes”(like joshua tree and okapi).The least success-ful classes are either textureless animals (like beaver and cougar),animals that camouﬂage well in their environment3See,however,H.Zhang et al.[24]in these proceedings,for an al-gorithm that yields a classiﬁcation rate of 66.2±0.5%for 30training examples,and 59.1±0.6%for 15examples.minaret (97.6%)windsor chair (94.6%)joshua tree (87.9%)okapi (87.8%)cougar body (27.6%)beaver (27.5%)crocodile (25.0%)ant (25.0%)Figure 5.Caltech-101results.Top:some classes on which our method (L =2,M =200)achieved high performance.Bottom:some classes on which our method performed poorly.Weak featuresStrong features (200)L Single-level Pyramid Single-level Pyramid 015.5±0.941.2±1.2131.4±1.232.8±1.355.9±0.957.0±0.8247.2±1.149.3±1.463.6±0.964.6±0.8352.2±0.854.0±1.160.3±0.964.6±0.7Table 2.Classiﬁcation results for the Caltech-101database.class 1mis-class 2mis-class 1/class 2classiﬁed asclassiﬁed as class 2class 1ketch /schooner 21.614.8lotus /water lily 15.320.0crocodile /crocodile head 10.510.0crayﬁsh /lobster 11.39.1ﬂamingo /ibis 9.510.4)on the Caltech-101database.ClassL =0L =2Opelt [14]Zhang [25]Bikes 82.4±2.086.3±2.586.592.0People 79.5±2.382.3±3.180.888.0and comparison with two existing methods.(like crocodile),or “thin”objects (like ant).Table 3shows the top ﬁve of our method’s confusions,all of which are between closely related classes.To summarize,our method has outperformed both state-of-the-art orderless methods [7,25]and methods based on precise geometric correspondence [1].Signiﬁcantly,all these methods rely on sparse features (interest points or sparsely sampled edge points).However,because of the geometric stability and lack of clutter of Caltech-101,dense features combined with global spatial relations seem to cap-ture more discriminative information about the objects.5.3.The Graz DatasetAs seen from Sections 5.1and 5.2,our proposed ap-proach does very well on global scene classiﬁcation tasks,or on object recognition tasks in the absence of clutter with most of the objects assuming “canonical”poses.However,it was not designed to cope with heavy clutter and pose changes.It is interesting to see how well our algorithm can do by exploiting the global scene cues that still remain under these conditions.Accordingly,our ﬁnal set of ex-periments is on the Graz dataset [14](Fig.6),which is characterized by high intra-class variation.This dataset has two object classes,bikes (373images)and persons (460im-ages),and a background class (270images).The image res-olution is 640×480,and the range of scales and poses at which exemplars are presented is very diverse,e.g.,a “per-son”image may show a pedestrian in the distance,a side view of a complete body,or just a closeup of a head.For this database,we perform two-class detection (object vs.back-ground)using an experimental setup consistent with that of Opelt et al.[14].Namely,we train detectors for persons and bikes on 100positive and 100negative images (of which 50are drawn from the other object class and 50from the back-ground),and test on a similarly distributed set.We generate ROC curves by thresholding raw SVM output,and report the ROC equal error rate averaged over ten runs.Table 4summarizes our results for strong features with M =200.Note that the standard deviation is quite high be-cause the images in the database vary greatly in their level of difﬁculty,so the performance for any single run is depen-dent on the composition of the training set (in particular,for L =2,the performance for bikes ranges from 81%to 91%).For this database,the improvement from L =0to L =2is relatively small.This makes intuitive sense:when a class is characterized by high geometric variability,it is difﬁcult to ﬁnd useful global features.Despite this disadvantage of our method,we still achieve results very close to those of Opelt et al.[14],who use a sparse,locally invariant feature representation.In the future,we plan to combine spatial pyramids with invariant features for improved robustness against geometric changes.6.DiscussionThis paper has presented a “holistic”approach for image categorization based on a modiﬁcation of pyramid match kernels [7].Our method,which works by repeatedly sub-dividing an image and computing histograms of image fea-tures over the resulting subregions,has shown promising re-。

C.parvum全基因组序列

DOI: 10.1126/science.1094786, 441 (2004);304Science et al.Mitchell S. Abrahamsen,Cryptosporidium parvum Complete Genome Sequence of the Apicomplexan, (this information is current as of October 7, 2009 ):The following resources related to this article are available online at/cgi/content/full/304/5669/441version of this article at:including high-resolution figures, can be found in the online Updated information and services,/cgi/content/full/1094786/DC1 can be found at:Supporting Online Material/cgi/content/full/304/5669/441#otherarticles , 9 of which can be accessed for free: cites 25 articles This article 239 article(s) on the ISI Web of Science. cited by This article has been /cgi/content/full/304/5669/441#otherarticles 53 articles hosted by HighWire Press; see: cited by This article has been/cgi/collection/genetics Genetics: subject collections This article appears in the following/about/permissions.dtl in whole or in part can be found at: this article permission to reproduce of this article or about obtaining reprints Information about obtaining registered trademark of AAAS.is a Science 2004 by the American Association for the Advancement of Science; all rights reserved. The title Copyright American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the Science o n O c t o b e r 7, 2009w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o m3.R.Jackendoff,Foundations of Language:Brain,Gram-mar,Evolution(Oxford Univ.Press,Oxford,2003).4.Although for Frege(1),reference was established rela-tive to objects in the world,here we follow Jackendoff’s suggestion(3)that this is done relative to objects and the state of affairs as mentally represented.5.S.Zola-Morgan,L.R.Squire,in The Development andNeural Bases of Higher Cognitive Functions(New York Academy of Sciences,New York,1990),pp.434–456.6.N.Chomsky,Reﬂections on Language(Pantheon,New York,1975).7.J.Katz,Semantic Theory(Harper&Row,New York,1972).8.D.Sperber,D.Wilson,Relevance(Harvard Univ.Press,Cambridge,MA,1986).9.K.I.Forster,in Sentence Processing,W.E.Cooper,C.T.Walker,Eds.(Erlbaum,Hillsdale,NJ,1989),pp.27–85.10.H.H.Clark,Using Language(Cambridge Univ.Press,Cambridge,1996).11.Often word meanings can only be fully determined byinvokingworld knowledg e.For instance,the meaningof “ﬂat”in a“ﬂat road”implies the absence of holes.However,in the expression“aﬂat tire,”it indicates the presence of a hole.The meaningof“ﬁnish”in the phrase “Billﬁnished the book”implies that Bill completed readingthe book.However,the phrase“the g oatﬁn-ished the book”can only be interpreted as the goat eatingor destroyingthe book.The examples illustrate that word meaningis often underdetermined and nec-essarily intertwined with general world knowledge.In such cases,it is hard to see how the integration of lexical meaning and general world knowledge could be strictly separated(3,31).12.W.Marslen-Wilson,C.M.Brown,L.K.Tyler,Lang.Cognit.Process.3,1(1988).13.ERPs for30subjects were averaged time-locked to theonset of the critical words,with40items per condition.Sentences were presented word by word on the centerof a computer screen,with a stimulus onset asynchronyof600ms.While subjects were readingthe sentences,their EEG was recorded and ampliﬁed with a high-cut-off frequency of70Hz,a time constant of8s,and asamplingfrequency of200Hz.14.Materials and methods are available as supportingmaterial on Science Online.15.M.Kutas,S.A.Hillyard,Science207,203(1980).16.C.Brown,P.Hagoort,J.Cognit.Neurosci.5,34(1993).17.C.M.Brown,P.Hagoort,in Architectures and Mech-anisms for Language Processing,M.W.Crocker,M.Pickering,C.Clifton Jr.,Eds.(Cambridge Univ.Press,Cambridge,1999),pp.213–237.18.F.Varela et al.,Nature Rev.Neurosci.2,229(2001).19.We obtained TFRs of the single-trial EEG data by con-volvingcomplex Morlet wavelets with the EEG data andcomputingthe squared norm for the result of theconvolution.We used wavelets with a7-cycle width,with frequencies ranging from1to70Hz,in1-Hz steps.Power values thus obtained were expressed as a per-centage change relative to the power in a baselineinterval,which was taken from150to0ms before theonset of the critical word.This was done in order tonormalize for individual differences in EEG power anddifferences in baseline power between different fre-quency bands.Two relevant time-frequency compo-nents were identiﬁed:(i)a theta component,rangingfrom4to7Hz and from300to800ms after wordonset,and(ii)a gamma component,ranging from35to45Hz and from400to600ms after word onset.20.C.Tallon-Baudry,O.Bertrand,Trends Cognit.Sci.3,151(1999).tner et al.,Nature397,434(1999).22.M.Bastiaansen,P.Hagoort,Cortex39(2003).23.O.Jensen,C.D.Tesche,Eur.J.Neurosci.15,1395(2002).24.Whole brain T2*-weighted echo planar imaging bloodoxygen level–dependent(EPI-BOLD)fMRI data wereacquired with a Siemens Sonata1.5-T magnetic reso-nance scanner with interleaved slice ordering,a volumerepetition time of2.48s,an echo time of40ms,a90°ﬂip angle,31horizontal slices,a64ϫ64slice matrix,and isotropic voxel size of3.5ϫ3.5ϫ3.5mm.For thestructural magnetic resonance image,we used a high-resolution(isotropic voxels of1mm3)T1-weightedmagnetization-prepared rapid gradient-echo pulse se-quence.The fMRI data were preprocessed and analyzedby statistical parametric mappingwith SPM99software(http://www.ﬁ/spm99).25.S.E.Petersen et al.,Nature331,585(1988).26.B.T.Gold,R.L.Buckner,Neuron35,803(2002).27.E.Halgren et al.,J.Psychophysiol.88,1(1994).28.E.Halgren et al.,Neuroimage17,1101(2002).29.M.K.Tanenhaus et al.,Science268,1632(1995).30.J.J.A.van Berkum et al.,J.Cognit.Neurosci.11,657(1999).31.P.A.M.Seuren,Discourse Semantics(Basil Blackwell,Oxford,1985).32.We thank P.Indefrey,P.Fries,P.A.M.Seuren,and M.van Turennout for helpful discussions.Supported bythe Netherlands Organization for Scientiﬁc Research,grant no.400-56-384(P.H.).Supporting Online Material/cgi/content/full/1095455/DC1Materials and MethodsFig.S1References and Notes8January2004;accepted9March2004Published online18March2004;10.1126/science.1095455Include this information when citingthis paper.Complete Genome Sequence ofthe Apicomplexan,Cryptosporidium parvumMitchell S.Abrahamsen,1,2*†Thomas J.Templeton,3†Shinichiro Enomoto,1Juan E.Abrahante,1Guan Zhu,4 Cheryl ncto,1Mingqi Deng,1Chang Liu,1‡Giovanni Widmer,5Saul Tzipori,5GregoryA.Buck,6Ping Xu,6 Alan T.Bankier,7Paul H.Dear,7Bernard A.Konfortov,7 Helen F.Spriggs,7Lakshminarayan Iyer,8Vivek Anantharaman,8L.Aravind,8Vivek Kapur2,9The apicomplexan Cryptosporidium parvum is an intestinal parasite that affects healthy humans and animals,and causes an unrelenting infection in immuno-compromised individuals such as AIDS patients.We report the complete ge-nome sequence of C.parvum,type II isolate.Genome analysis identiﬁes ex-tremely streamlined metabolic pathways and a reliance on the host for nu-trients.In contrast to Plasmodium and Toxoplasma,the parasite lacks an api-coplast and its genome,and possesses a degenerate mitochondrion that has lost its genome.Several novel classes of cell-surface and secreted proteins with a potential role in host interactions and pathogenesis were also detected.Elu-cidation of the core metabolism,including enzymes with high similarities to bacterial and plant counterparts,opens new avenues for drug development.Cryptosporidium parvum is a globally impor-tant intracellular pathogen of humans and animals.The duration of infection and patho-genesis of cryptosporidiosis depends on host immune status,ranging from a severe but self-limiting diarrhea in immunocompetent individuals to a life-threatening,prolonged infection in immunocompromised patients.Asubstantial degree of morbidity and mortalityis associated with infections in AIDS pa-tients.Despite intensive efforts over the past20years,there is currently no effective ther-apy for treating or preventing C.parvuminfection in humans.Cryptosporidium belongs to the phylumApicomplexa,whose members share a com-mon apical secretory apparatus mediating lo-comotion and tissue or cellular invasion.Many apicomplexans are of medical or vet-erinary importance,including Plasmodium,Babesia,Toxoplasma,Neosprora,Sarcocys-tis,Cyclospora,and Eimeria.The life cycle ofC.parvum is similar to that of other cyst-forming apicomplexans(e.g.,Eimeria and Tox-oplasma),resulting in the formation of oocysts1Department of Veterinary and Biomedical Science,College of Veterinary Medicine,2Biomedical Genom-ics Center,University of Minnesota,St.Paul,MN55108,USA.3Department of Microbiology and Immu-nology,Weill Medical College and Program in Immu-nology,Weill Graduate School of Medical Sciences ofCornell University,New York,NY10021,USA.4De-partment of Veterinary Pathobiology,College of Vet-erinary Medicine,Texas A&M University,College Sta-tion,TX77843,USA.5Division of Infectious Diseases,Tufts University School of Veterinary Medicine,NorthGrafton,MA01536,USA.6Center for the Study ofBiological Complexity and Department of Microbiol-ogy and Immunology,Virginia Commonwealth Uni-versity,Richmond,VA23198,USA.7MRC Laboratoryof Molecular Biology,Hills Road,Cambridge CB22QH,UK.8National Center for Biotechnology Infor-mation,National Library of Medicine,National Insti-tutes of Health,Bethesda,MD20894,USA.9Depart-ment of Microbiology,University of Minnesota,Min-neapolis,MN55455,USA.*To whom correspondence should be addressed.E-mail:abe@†These authors contributed equally to this work.‡Present address:Bioinformatics Division,Genetic Re-search,GlaxoSmithKline Pharmaceuticals,5MooreDrive,Research Triangle Park,NC27009,USA.R E P O R T S SCIENCE VOL30416APRIL2004441o n O c t o b e r 7 , 2 0 0 9 w w w . s c i e n c e m a g . o r g D o w n l o a d e d f r o mthat are shed in the feces of infected hosts.C.parvum oocysts are highly resistant to environ-mental stresses,including chlorine treatment of community water supplies;hence,the parasite is an important water-and food-borne pathogen (1).The obligate intracellular nature of the par-asite ’s life cycle and the inability to culture the parasite continuously in vitro greatly impair researchers ’ability to obtain purified samples of the different developmental stages.The par-asite cannot be genetically manipulated,and transformation methodologies are currently un-available.To begin to address these limitations,we have obtained the complete C.parvum ge-nome sequence and its predicted protein com-plement.(This whole-genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the project accession AAEE00000000.The version described in this paper is the first version,AAEE01000000.)The random shotgun approach was used to obtain the complete DNA sequence (2)of the Iowa “type II ”isolate of C.parvum .This isolate readily transmits disease among numerous mammals,including humans.The resulting ge-nome sequence has roughly 13ϫgenome cov-erage containing five gaps and 9.1Mb of totalDNA sequence within eight chromosomes.The C.parvum genome is thus quite compact rela-tive to the 23-Mb,14-chromosome genome of Plasmodium falciparum (3);this size difference is predominantly the result of shorter intergenic regions,fewer introns,and a smaller number of genes (Table 1).Comparison of the assembled sequence of chromosome VI to that of the recently published sequence of chromosome VI (4)revealed that our assembly contains an ad-ditional 160kb of sequence and a single gap versus two,with the common sequences dis-playing a 99.993%sequence identity (2).The relative paucity of introns greatly simplified gene predictions and facilitated an-notation (2)of predicted open reading frames (ORFs).These analyses provided an estimate of 3807protein-encoding genes for the C.parvum genome,far fewer than the estimated 5300genes predicted for the Plasmodium genome (3).This difference is primarily due to the absence of an apicoplast and mitochondrial genome,as well as the pres-ence of fewer genes encoding metabolic functions and variant surface proteins,such as the P.falciparum var and rifin molecules (Table 2).An analysis of the encoded pro-tein sequences with the program SEG (5)shows that these protein-encoding genes are not enriched in low-complexity se-quences (34%)to the extent observed in the proteins from Plasmodium (70%).Our sequence analysis indicates that Cryptosporidium ,unlike Plasmodium and Toxoplasma ,lacks both mitochondrion and apicoplast genomes.The overall complete-ness of the genome sequence,together with the fact that similar DNA extraction proce-dures used to isolate total genomic DNA from C.parvum efficiently yielded mito-chondrion and apicoplast genomes from Ei-meria sp.and Toxoplasma (6,7),indicates that the absence of organellar genomes was unlikely to have been the result of method-ological error.These conclusions are con-sistent with the absence of nuclear genes for the DNA replication and translation machinery characteristic of mitochondria and apicoplasts,and with the lack of mito-chondrial or apicoplast targeting signals for tRNA synthetases.A number of putative mitochondrial pro-teins were identified,including components of a mitochondrial protein import apparatus,chaperones,uncoupling proteins,and solute translocators (table S1).However,the ge-nome does not encode any Krebs cycle en-zymes,nor the components constituting the mitochondrial complexes I to IV;this finding indicates that the parasite does not rely on complete oxidation and respiratory chains for synthesizing adenosine triphosphate (ATP).Similar to Plasmodium ,no orthologs for the ␥,␦,or εsubunits or the c subunit of the F 0proton channel were detected (whereas all subunits were found for a V-type ATPase).Cryptosporidium ,like Eimeria (8)and Plas-modium ,possesses a pyridine nucleotide tran-shydrogenase integral membrane protein that may couple reduced nicotinamide adenine dinucleotide (NADH)and reduced nico-tinamide adenine dinucleotide phosphate (NADPH)redox to proton translocation across the inner mitochondrial membrane.Unlike Plasmodium ,the parasite has two copies of the pyridine nucleotide transhydrogenase gene.Also present is a likely mitochondrial membrane –associated,cyanide-resistant alter-native oxidase (AOX )that catalyzes the reduction of molecular oxygen by ubiquinol to produce H 2O,but not superoxide or H 2O 2.Several genes were identified as involved in biogenesis of iron-sulfur [Fe-S]complexes with potential mitochondrial targeting signals (e.g.,nifS,nifU,frataxin,and ferredoxin),supporting the presence of a limited electron flux in the mitochondrial remnant (table S2).Our sequence analysis confirms the absence of a plastid genome (7)and,additionally,the loss of plastid-associated metabolic pathways including the type II fatty acid synthases (FASs)and isoprenoid synthetic enzymes thatTable 1.General features of the C.parvum genome and comparison with other single-celled eukaryotes.Values are derived from respective genome project summaries (3,26–28).ND,not determined.FeatureC.parvum P.falciparum S.pombe S.cerevisiae E.cuniculiSize (Mbp)9.122.912.512.5 2.5(G ϩC)content (%)3019.43638.347No.of genes 38075268492957701997Mean gene length (bp)excluding introns 1795228314261424ND Gene density (bp per gene)23824338252820881256Percent coding75.352.657.570.590Genes with introns (%)553.9435ND Intergenic regions (G ϩC)content %23.913.632.435.145Mean length (bp)5661694952515129RNAsNo.of tRNA genes 454317429944No.of 5S rRNA genes 6330100–2003No.of 5.8S ,18S ,and 28S rRNA units 57200–400100–20022Table parison between predicted C.parvum and P.falciparum proteins.FeatureC.parvum P.falciparum *Common †Total predicted proteins380752681883Mitochondrial targeted/encoded 17(0.45%)246(4.7%)15Apicoplast targeted/encoded 0581(11.0%)0var/rif/stevor ‡0236(4.5%)0Annotated as protease §50(1.3%)31(0.59%)27Annotated as transporter ࿣69(1.8%)34(0.65%)34Assigned EC function ¶167(4.4%)389(7.4%)113Hypothetical proteins925(24.3%)3208(60.9%)126*Values indicated for P.falciparum are as reported (3)with the exception of those for proteins annotated as protease or transporter.†TBLASTN hits (e Ͻ–5)between C.parvum and P.falciparum .‡As reported in (3).§Pre-dicted proteins annotated as “protease or peptidase”for C.parvum (CryptoGenome database,)and P.falciparum (PlasmoDB database,).࿣Predicted proteins annotated as “trans-porter,permease of P-type ATPase”for C.parvum (CryptoGenome)and P.falciparum (PlasmoDB).¶Bidirectional BLAST hit (e Ͻ–15)to orthologs with assigned Enzyme Commission (EC)numbers.Does not include EC assignment numbers for protein kinases or protein phosphatases (due to inconsistent annotation across genomes),or DNA polymerases or RNA polymerases,as a result of issues related to subunit inclusion.(For consistency,46proteins were excluded from the reported P.falciparum values.)R E P O R T S16APRIL 2004VOL 304SCIENCE 442 o n O c t o b e r 7, 2009w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o mare otherwise localized to the plastid in other apicomplexans.C.parvum fatty acid biosynthe-sis appears to be cytoplasmic,conducted by a large(8252amino acids)modular type I FAS (9)and possibly by another large enzyme that is related to the multidomain bacterial polyketide synthase(10).Comprehensive screening of the C.parvum genome sequence also did not detect orthologs of Plasmodium nuclear-encoded genes that contain apicoplast-targeting and transit sequences(11).C.parvum metabolism is greatly stream-lined relative to that of Plasmodium,and in certain ways it is reminiscent of that of another obligate eukaryotic parasite,the microsporidian Encephalitozoon.The degeneration of the mi-tochondrion and associated metabolic capabili-ties suggests that the parasite largely relies on glycolysis for energy production.The parasite is capable of uptake and catabolism of mono-sugars(e.g.,glucose and fructose)as well as synthesis,storage,and catabolism of polysac-charides such as trehalose and amylopectin. Like many anaerobic organisms,it economizes ATP through the use of pyrophosphate-dependent phosphofructokinases.The conver-sion of pyruvate to acetyl–coenzyme A(CoA) is catalyzed by an atypical pyruvate-NADPH oxidoreductase(Cp PNO)that contains an N-terminal pyruvate–ferredoxin oxidoreductase (PFO)domain fused with a C-terminal NADPH–cytochrome P450reductase domain (CPR).Such a PFO-CPR fusion has previously been observed only in the euglenozoan protist Euglena gracilis(12).Acetyl-CoA can be con-verted to malonyl-CoA,an important precursor for fatty acid and polyketide biosynthesis.Gly-colysis leads to several possible organic end products,including lactate,acetate,and ethanol. The production of acetate from acetyl-CoA may be economically beneficial to the parasite via coupling with ATP production.Ethanol is potentially produced via two in-dependent pathways:(i)from the combination of pyruvate decarboxylase and alcohol dehy-drogenase,or(ii)from acetyl-CoA by means of a bifunctional dehydrogenase(adhE)with ac-etaldehyde and alcohol dehydrogenase activi-ties;adhE first converts acetyl-CoA to acetal-dehyde and then reduces the latter to ethanol. AdhE predominantly occurs in bacteria but has recently been identified in several protozoans, including vertebrate gut parasites such as Enta-moeba and Giardia(13,14).Adjacent to the adhE gene resides a second gene encoding only the AdhE C-terminal Fe-dependent alcohol de-hydrogenase domain.This gene product may form a multisubunit complex with AdhE,or it may function as an alternative alcohol dehydro-genase that is specific to certain growth condi-tions.C.parvum has a glycerol3-phosphate dehydrogenase similar to those of plants,fungi, and the kinetoplastid Trypanosoma,but(unlike trypanosomes)the parasite lacks an ortholog of glycerol kinase and thus this pathway does not yield glycerol production.In addition to themodular fatty acid synthase(Cp FAS1)andpolyketide synthase homolog(Cp PKS1), C.parvum possesses several fatty acyl–CoA syn-thases and a fatty acyl elongase that may partici-pate in fatty acid metabolism.Further,enzymesfor the metabolism of complex lipids(e.g.,glyc-erolipid and inositol phosphate)were identified inthe genome.Fatty acids are apparently not anenergy source,because enzymes of the fatty acidoxidative pathway are absent,with the exceptionof a3-hydroxyacyl-CoA dehydrogenase.C.parvum purine metabolism is greatlysimplified,retaining only an adenosine ki-nase and enzymes catalyzing conversionsof adenosine5Ј-monophosphate(AMP)toinosine,xanthosine,and guanosine5Ј-monophosphates(IMP,XMP,and GMP).Among these enzymes,IMP dehydrogenase(IMPDH)is phylogenetically related toε-proteobacterial IMPDH and is strikinglydifferent from its counterparts in both thehost and other apicomplexans(15).In con-trast to other apicomplexans such as Toxo-plasma gondii and P.falciparum,no geneencoding hypoxanthine-xanthineguaninephosphoribosyltransferase(HXGPRT)is de-tected,in contrast to a previous report on theactivity of this enzyme in C.parvum sporo-zoites(16).The absence of HXGPRT sug-gests that the parasite may rely solely on asingle enzyme system including IMPDH toproduce GMP from AMP.In contrast to otherapicomplexans,the parasite appears to relyon adenosine for purine salvage,a modelsupported by the identification of an adeno-sine transporter.Unlike other apicomplexansand many parasitic protists that can synthe-size pyrimidines de novo,C.parvum relies onpyrimidine salvage and retains the ability forinterconversions among uridine and cytidine5Ј-monophosphates(UMP and CMP),theirdeoxy forms(dUMP and dCMP),and dAMP,as well as their corresponding di-and triphos-phonucleotides.The parasite has also largelyshed the ability to synthesize amino acids denovo,although it retains the ability to convertselect amino acids,and instead appears torely on amino acid uptake from the host bymeans of a set of at least11amino acidtransporters(table S2).Most of the Cryptosporidium core pro-cesses involved in DNA replication,repair,transcription,and translation conform to thebasic eukaryotic blueprint(2).The transcrip-tional apparatus resembles Plasmodium interms of basal transcription machinery.How-ever,a striking numerical difference is seenin the complements of two RNA bindingdomains,Sm and RRM,between P.falcipa-rum(17and71domains,respectively)and C.parvum(9and51domains).This reductionresults in part from the loss of conservedproteins belonging to the spliceosomal ma-chinery,including all genes encoding Smdomain proteins belonging to the U6spliceo-somal particle,which suggests that this par-ticle activity is degenerate or entirely lost.This reduction in spliceosomal machinery isconsistent with the reduced number of pre-dicted introns in Cryptosporidium(5%)rela-tive to Plasmodium(Ͼ50%).In addition,keycomponents of the small RNA–mediatedposttranscriptional gene silencing system aremissing,such as the RNA-dependent RNApolymerase,Argonaute,and Dicer orthologs;hence,RNA interference–related technolo-gies are unlikely to be of much value intargeted disruption of genes in C.parvum.Cryptosporidium invasion of columnarbrush border epithelial cells has been de-scribed as“intracellular,but extracytoplas-mic,”as the parasite resides on the surface ofthe intestinal epithelium but lies underneaththe host cell membrane.This niche may al-low the parasite to evade immune surveil-lance but take advantage of solute transportacross the host microvillus membrane or theextensively convoluted parasitophorous vac-uole.Indeed,Cryptosporidium has numerousgenes(table S2)encoding families of putativesugar transporters(up to9genes)and aminoacid transporters(11genes).This is in starkcontrast to Plasmodium,which has fewersugar transporters and only one putative ami-no acid transporter(GenBank identificationnumber23612372).As a first step toward identification ofmulti–drug-resistant pumps,the genome se-quence was analyzed for all occurrences ofgenes encoding multitransmembrane proteins.Notable are a set of four paralogous proteinsthat belong to the sbmA family(table S2)thatare involved in the transport of peptide antibi-otics in bacteria.A putative ortholog of thePlasmodium chloroquine resistance–linkedgene Pf CRT(17)was also identified,althoughthe parasite does not possess a food vacuole likethe one seen in Plasmodium.Unlike Plasmodium,C.parvum does notpossess extensive subtelomeric clusters of anti-genically variant proteins(exemplified by thelarge families of var and rif/stevor genes)thatare involved in immune evasion.In contrast,more than20genes were identified that encodemucin-like proteins(18,19)having hallmarksof extensive Thr or Ser stretches suggestive ofglycosylation and signal peptide sequences sug-gesting secretion(table S2).One notable exam-ple is an11,700–amino acid protein with anuninterrupted stretch of308Thr residues(cgd3_720).Although large families of secretedproteins analogous to the Plasmodium multi-gene families were not found,several smallermultigene clusters were observed that encodepredicted secreted proteins,with no detectablesimilarity to proteins from other organisms(Fig.1,A and B).Within this group,at leastfour distinct families appear to have emergedthrough gene expansions specific to the Cryp-R E P O R T S SCIENCE VOL30416APRIL2004443o n O c t o b e r 7 , 2 0 0 9 w w w . s c i e n c e m a g . o r g D o w n l o a d e d f r o mtosporidium clade.These families —SKSR,MEDLE,WYLE,FGLN,and GGC —were named after well-conserved sequence motifs (table S2).Reverse transcription polymerase chain reaction (RT-PCR)expression analysis (20)of one cluster,a locus of seven adjacent CpLSP genes (Fig.1B),shows coexpression during the course of in vitro development (Fig.1C).An additional eight genes were identified that encode proteins having a periodic cysteine structure similar to the Cryptosporidium oocyst wall protein;these eight genes are similarly expressed during the onset of oocyst formation and likely participate in the formation of the coccidian rigid oocyst wall in both Cryptospo-ridium and Toxoplasma (21).Whereas the extracellular proteins described above are of apparent apicomplexan or lineage-specific in-vention,Cryptosporidium possesses many genesencodingsecretedproteinshavinglineage-specific multidomain architectures composed of animal-and bacterial-like extracellular adhe-sive domains (fig.S1).Lineage-specific expansions were ob-served for several proteases (table S2),in-cluding an aspartyl protease (six genes),a subtilisin-like protease,a cryptopain-like cys-teine protease (five genes),and a Plas-modium falcilysin-like (insulin degrading enzyme –like)protease (19genes).Nine of the Cryptosporidium falcilysin genes lack the Zn-chelating “HXXEH ”active site motif and are likely to be catalytically inactive copies that may have been reused for specific protein-protein interactions on the cell sur-face.In contrast to the Plasmodium falcilysin,the Cryptosporidium genes possess signal peptide sequences and are likely trafficked to a secretory pathway.The expansion of this family suggests either that the proteins have distinct cleavage specificities or that their diversity may be related to evasion of a host immune response.Completion of the C.parvum genome se-quence has highlighted the lack of conven-tional drug targets currently pursued for the control and treatment of other parasitic protists.On the basis of molecular and bio-chemical studies and drug screening of other apicomplexans,several putative Cryptospo-ridium metabolic pathways or enzymes have been erroneously proposed to be potential drug targets (22),including the apicoplast and its associated metabolic pathways,the shikimate pathway,the mannitol cycle,the electron transport chain,and HXGPRT.Nonetheless,complete genome sequence analysis identifies a number of classic and novel molecular candidates for drug explora-tion,including numerous plant-like and bacterial-like enzymes (tables S3and S4).Although the C.parvum genome lacks HXGPRT,a potent drug target in other api-complexans,it has only the single pathway dependent on IMPDH to convert AMP to GMP.The bacterial-type IMPDH may be a promising target because it differs substan-tially from that of eukaryotic enzymes (15).Because of the lack of de novo biosynthetic capacity for purines,pyrimidines,and amino acids,C.parvum relies solely on scavenge from the host via a series of transporters,which may be exploited for chemotherapy.C.parvum possesses a bacterial-type thymidine kinase,and the role of this enzyme in pyrim-idine metabolism and its drug target candida-cy should be pursued.The presence of an alternative oxidase,likely targeted to the remnant mitochondrion,gives promise to the study of salicylhydroxamic acid (SHAM),as-cofuranone,and their analogs as inhibitors of energy metabolism in the parasite (23).Cryptosporidium possesses at least 15“plant-like ”enzymes that are either absent in or highly divergent from those typically found in mammals (table S3).Within the glycolytic pathway,the plant-like PPi-PFK has been shown to be a potential target in other parasites including T.gondii ,and PEPCL and PGI ap-pear to be plant-type enzymes in C.parvum .Another example is a trehalose-6-phosphate synthase/phosphatase catalyzing trehalose bio-synthesis from glucose-6-phosphate and uridine diphosphate –glucose.Trehalose may serve as a sugar storage source or may function as an antidesiccant,antioxidant,or protein stability agent in oocysts,playing a role similar to that of mannitol in Eimeria oocysts (24).Orthologs of putative Eimeria mannitol synthesis enzymes were not found.However,two oxidoreductases (table S2)were identified in C.parvum ,one of which belongs to the same families as the plant mannose dehydrogenases (25)and the other to the plant cinnamyl alcohol dehydrogenases.In principle,these enzymes could synthesize protective polyol compounds,and the former enzyme could use host-derived mannose to syn-thesize mannitol.References and Notes1.D.G.Korich et al .,Appl.Environ.Microbiol.56,1423(1990).2.See supportingdata on Science Online.3.M.J.Gardner et al .,Nature 419,498(2002).4.A.T.Bankier et al .,Genome Res.13,1787(2003).5.J.C.Wootton,Comput.Chem.18,269(1994).Fig.1.(A )Schematic showing the chromosomal locations of clusters of potentially secreted proteins.Numbers of adjacent genes are indicated in paren-theses.Arrows indicate direc-tion of clusters containinguni-directional genes (encoded on the same strand);squares indi-cate clusters containingg enes encoded on both strands.Non-paralogous genes are indicated by solid gray squares or direc-tional triangles;SKSR (green triangles),FGLN (red trian-gles),and MEDLE (blue trian-gles)indicate three C.parvum –speciﬁc families of paralogous genes predominantly located at telomeres.Insl (yellow tri-angles)indicates an insulinase/falcilysin-like paralogous gene family.Cp LSP (white square)indicates the location of a clus-ter of adjacent large secreted proteins (table S2)that are cotranscriptionally regulated.Identiﬁed anchored telomeric repeat sequences are indicated by circles.(B )Schematic show-inga select locus containinga cluster of coexpressed large secreted proteins (Cp LSP).Genes and intergenic regions (regions between identiﬁed genes)are drawn to scale at the nucleotide level.The length of the intergenic re-gions is indicated above or be-low the locus.(C )Relative ex-pression levels of CpLSP (red lines)and,as a control,C.parvum Hedgehog-type HINT domain gene (blue line)duringin vitro development,as determined by semiquantitative RT-PCR usingg ene-speciﬁc primers correspondingto the seven adjacent g enes within the CpLSP locus as shown in (B).Expression levels from three independent time-course experiments are represented as the ratio of the expression of each gene to that of C.parvum 18S rRNA present in each of the infected samples (20).R E P O R T S16APRIL 2004VOL 304SCIENCE 444 o n O c t o b e r 7, 2009w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o m。

人眼数据采集方法

• Smith B A, Yin Q, Feiner S, et al. Gaze locking: passive eye contact detection for human-object interaction[C]. User Interface Software and Technology, 2013: 271-280.
EYEDIAP (ETRA 2014)
• 采集工具：深度相机Kinect + RGB相机 • 采集方法：志愿者坐在深度相机前，要求眼睛一直盯着运动的乒
乓球，同时用RGB相机记录这一过程。在采集到的视频中人工标注眼睛中心点与乒乓球的2D坐标，映射到点云中得到对应的三维坐标，做差得到三维视线向量 • 规模：94段视频，16位不同人种的样本 • 适用场景：视线估计 • 局限性：需要深度摄像头，数据量较少
合成数据集
• 采集工具：手动合成或自动合成 • SynthesEyes (ICCV 2015) • UnityEyes (ETRA 2016)
• 直接提供了自动生成工具 • 使用Unity引擎制作 • 可以自定义视线、头部姿态等
• SimGAN (CVPR 2017) 用GAN做视线迁移 • Unsupervised Representation Learning (CVPR 2020) 视线重定向
• RT-GENE（眼动仪+深度+RGB） • 视线追踪
• SynthesEyes（合成）
• GazeFollow
• UnityEyes（合成）
• VideoGaze
MPIIGaze (CVPR 2015)
• 采集工具：参数已知的单个RGB相机 • 采集方法：利用相机参数和镜面算法计算并校准人眼的3D位置，

GlobalFeatures：全局特征

Scene Parsing Using Scene Attributes AsGlobal FeaturesHang SuDepartment of Computer ScienceBrown University*******************May13,2013AbstractData-driven methods have been proven very eﬀective for the task of scene parsing.A crucial step in these methods is to retrieve a setof visually similar scenes from existing image collections for the queryimage according to certain global scene representations.In this work,we incorporate scene attributes into data-driven scene parsing systemsas global scene features.We show that when used as global features,our compact attribute-based scene representation can compete with orimprove on traditional low-level scene representations for the task ofscene parsing and scene retrieval in general.1IntroductionScene parsing is the task of segmenting all the objects in a natural image and identifying their categories.Categorical labels can be given to either each pixel or each region(e.g.superpixel)of the input image,giving a thorough interpretation of the scene content.Most methods proposed for this problem require a generative or discriminative model to be trained for each category,and thus only work with a handful of pre-deﬁned categories [2,3,4,5,8,11,13,14,15].The training process can be very time-consuming and must be done in advance.Even worse,the entire training has to be repeated whenever new training images or class labels are added to the dataset.Recently,several nonparametric,data-driven approaches have been proposed for the scene parsing problem[7,16,1].These approaches require no training in advance.They can easily scale to hundreds of categories1Figure1:Sample outputs of scene attribute detection.(from[10])and have the potential to work with internet-scale,continuously growing datasets like LabelMe[12].There are low-level representations and high-level representations(e.g. attributes or categories)of natural scenes.While much research has been done on various low-level representations,such as the gist descriptor[9]or spatial pyramid[6],less attention has been given to high-level scene rep-resentations and their applications for data-driven vision pared with other high-level representations,scene attributes keep the beneﬁt of be-ing compact and carrying sementic meanings,while giving moreﬂexible and comprehensive interpretations to natural scenes.We adopt scene attributes designed in[10],which have102discriminative attributes discovered and learned from crowdsourcing.Figure1shows some sample outputs of the attribute detector provided in[10].In this paper we show how well we can improve nonparametric,data-driven scene parsing by adopting scene attributes.Tighe and Lazebnik investigate nonparametric,data-driven scene parsing and achieve state-of-the-art performance[16].We follow their system pipeline(section2)and show that by simply adding scene attributes as one of the features used for global scene representation we can achieve signiﬁcant performance improve-ment(section3).2System PipelineThe following is a summary of the steps taken by the parsing system for every query image(Figure2).Retrieval Set.Theﬁrst step in parsing a query image is toﬁnd a re-trieval set of images similar to the query image.The purpose ofﬁnding such a subset of training images(there is actually no training process,though we2Figure2:System pipeline of scene parsing.(from[16])still call the images from which we try to learn the”training images”)is to expedite the parsing process and at the same time throw away irrele-vant information which otherwise can be confusing.In[16],three types of global image features are used in this step:gist,spatial pyramid,and color histogram.For each feature type,Tighe and Lazebnik sort all the training images in increasing order of Euclidean distance from the query image.They take the minimum rank accross all feature types for each training image and then sort the minimum ranks in increasing order to get a ranking among the training images for the query image.The top ranking K images are used as the retrieval set.Local Superpixel Labeling.After building the retrieval set,the query image and the images in retrieval set are segmented into superpixels.Each superpixel is then described using20diﬀerent features.A detailed list of these features can be found in Table1in[16].For each superpixel in the query image,nearest-neighbor superpixels in the retrieval set are found ac-cording to the20features for that superpixel.A likelihood score is then computed for each class based on the nearest-neighbor matches.Classiﬁcation.In the last step,we can simply assign the class with3the highest likelihood score to each superpixel in the query image,or use Markov Random Field(MRF)framework to further incorporate pairwise co-occurrence information learned from training dataset.As in[1],we report the performance without using the MRF layer in this paper so diﬀerences in local classiﬁcation performance can be observed more clearly.3Scene Attributes As Global FeaturesOur main goal in investigating scene parsing is to see how well our scene attributes work as a scene representation.Thus,we keep most parts of the system in[16]unchanged but use the scene attributes as the global feature or one of the global features in addition to other low-level features forﬁnding retrieval sets.The dataset we use for this experiment is the SIFT-Flow dataset[7].It is composed of2,688annotated images from LabelMe and has33semantic labels.Since the class frequencies are highly unbalanced,we report both per-pixel classiﬁcation rate and per-class rate,which is the average of the per-pixel rates over all classes.We also report the performance of an“op-timal retrieval set”,which uses ground-truth class labels instead of global features toﬁnd similar scenes for the query image.This retrieval set is called Maximum Histogram Intersection.It is found by ranking training images according to the class histogram intersections they have with the query image:∩(T arget,Query)= 33j=1min(H T[j],H Q[j])33j=1H Q[j]where H T and H Q are the histograms of target image and query image respectively.This optimal retrieval set is meant to be a performance upper bound and should provide an insight into how much room for improvement there still is in the image retrieval step.In[16],Tighe and Lazebnik proposed a diﬀerent type of“optimal”retrieval set by ranking training images in terms of the number of pixels their ground truth label maps share with the label map of the query.Our experiment shows ours is uaually better in terms of both per-pixel rates and per-class rates.Figure3and Figure4show the performance comparison among dif-ferent global features.As we can see from the result,using only scene attributes as global features we get higher per-pixel rates than[16],which uses three global features(G+SP+CH),while getting similar per-class rates.4Figure3:Evaluation of using our scene attributes as a global feature for scene parsing on the SIFT-Flow dataset.x-axis represents mean per-class classiﬁcation rate,y-axis represents per-pixel classiﬁcation rate.The best performance sits on the top-right corner of the space.The plots also show the impact of changing retrieval set size K.The blue plot shows the result of using gist (G),spatial pyramid(SP),and color histogram(CH)together as scene descriptors forﬁnding retrieval sets[16].Using scene attributes itself improves the per-pixel rates while the per-class rates are ing scene attributes together with the previous three features increases both the per-pixel rates and the per-class rates.”Maximum Histogram Intersection”is the upper bound we get byﬁnding retrieval set using ground-truth labels of the query image.5Figure4:Comparision of using various global features for scene parsing. The leftﬁgure shows per-pixel rates and the right one shows per-class rates.Both the twelve features described in[17]and the three features used in[16](G,SP,CH) are tried seperatly,as well as our scene attributes.We also report the performance of using all features together(G+SP+CH+SUN12+Attr)and using the three features in[16](G+SP+CH).When combining our scene attributes with those three global features(At-tributes+G+SP+CH),both the per-pixel rates and the per-class rates in-crease signiﬁcantly(73.4%,29.8%(K=200)vs.76.2%,33.0%(K=100)). Considering the compact size of our scene attributes,102dimensions com-pared with the5184-dimension G+SP+CH,this result demonstrates the scene attributes’strong ability for high-level scene representation.It is also worth noting that adding more features beyond this point does not neces-sarily improve the performance.For instance,by using all the12features described in[17]together with the scene attributes,the per-pixel rate and the per-class rate drop to74.6%and30.4%respectively(K=100).4ConclusionScene parsing provides much deeper understanding of scenes than traditional category-based recognition.We investigated the use of attribute-based rep-resentation as global features for scene parsing.These experiments show its capability as a compact yet rich representation,and suggest the possible uses of scene attributes for future data-driven computer vision tasks.6References[1]D.Eigen and R.Fergus.Nonparametric image parsing using adaptiveneighbor sets.In Computer Vision and Pattern Recognition(CVPR), 2012IEEE Conference on,pages2799–2806,june2012.[2]S.Gould,R.Fulton,and D.Koller.Decomposing a scene into geometricand semantically consistent regions.In Computer Vision,2009IEEE 12th International Conference on,pages1–8,292009-oct.22009. [3]Xuming He,R.S.Zemel,and M.A.Carreira-Perpinan.Multiscale con-ditional randomﬁelds for image labeling.In Computer Vision and Pattern Recognition,2004.CVPR2004.Proceedings of the2004IEEE Computer Society Conference on,volume2,pages II–695–II–702Vol.2, june-2july2004.[4]Derek Hoiem,Alexei A.Efros,and Martial Hebert.Recovering surfacelayout from an put.Vision,75(1):151–172,October 2007.[5]L’ubor Ladick´y,Paul Sturgess,Karteek Alahari,Chris Russell,andPhilip H.S.Torr.What,where and how many?combining object detectors and crfs.In Proceedings of the11th European conference on Computer vision:Part IV,ECCV’10,pages424–437,Berlin,Heidel-berg,2010.Springer-Verlag.[6]Svetlana Lazebnik,Cordelia Schmid,and Jean Ponce.Beyond bagsof features:Spatial pyramid matching for recognizing natural scene categories.In Computer Vision and Pattern Recognition,2006IEEE Computer Society Conference on,volume2,pages2169–2178.IEEE, 2006.[7]Ce Liu,Jenny Yuen,and Antonio Torralba.Nonparametric scene pars-ing via label transfer.Pattern Analysis and Machine Intelligence,IEEE Transactions on,33(12):2368–2382,dec.2011.[8]T.Malisiewicz and A.A.Efros.Recognition by association via learningper-exemplar distances.In Computer Vision and Pattern Recognition, 2008.CVPR2008.IEEE Conference on,pages1–8,june2008. [9]Aude Oliva and Antonio Torralba.Modeling the shape of the scene:Aholistic representation of the spatial envelope.International journal of computer vision,42(3):145–175,2001.7[10]Genevieve Patterson and James Hays.Sun attribute database:Dis-covering,annotating,and recognizing scene attributes.In Computer Vision and Pattern Recognition(CVPR),2012IEEE Conference on, pages2751–2758.IEEE,2012.[11]A.Rabinovich,A.Vedaldi,C.Galleguillos,E.Wiewiora,and S.Be-longie.Objects in context.In Computer Vision,2007.ICCV2007.IEEE11th International Conference on,pages1–8,oct.2007. [12]Bryan C Russell,Antonio Torralba,Kevin P Murphy,and William Tbelme:a database and web-based tool for image annota-tion.International journal of computer vision,77(1):157–173,2008. [13]J.Shotton,M.Johnson,and R.Cipolla.Semantic texton forests forimage categorization and segmentation.In Computer Vision and Pat-tern Recognition,2008.CVPR2008.IEEE Conference on,pages1–8, june2008.[14]Jamie Shotton,John Winn,Carsten Rother,and Antonio Criminisi.Textonboost:joint appearance,shape and context modeling for multi-class object recognition and segmentation.In Proceedings of the9th European conference on Computer Vision-Volume Part I,ECCV’06, pages1–15,Berlin,Heidelberg,2006.Springer-Verlag.[15]Richard Socher,CliﬀC Lin,Andrew Y Ng,and Christopher D Man-ning.Parsing natural scenes and natural language with recursive neural networks.In Proceedings of the26th International Conference on Ma-chine Learning(ICML),volume2,page7,2011.[16]Joseph Tighe and Svetlana Lazebnik.Superparsing.International Jour-nal of Computer Vision,101:329–349,2013.[17]Jianxiong Xiao,James Hays,Krista A Ehinger,Aude Oliva,and Anto-nio Torralba.Sun database:Large-scale scene recognition from abbey to zoo.In Computer vision and pattern recognition(CVPR),2010IEEE conference on,pages3485–3492.IEEE,2010.8。

基于深度学习的图像分割技术分析

算注语言信IB与电厢China Computer&Communication2020年第23期基于深度学习的图像分割技术分析张影（苏州科技大学电子与信息工程学院，江苏苏州215009）摘要：近年来，深度学习已广泛应用在计算机视觉中，涵盖了图像分割、特征提取以及目标识别等方面，其中图像分割问题一直是一个经典难题。

本文主要对基于深度学习的图像分割技术的方法和研究现状进行了归纳总结，并就深度学习的图像处理技术进行详细讨论，主要从4个角度讨论处理图像分割的方法，最后对图像分割领域的技术发展做了总结。

关键词：深度学习；图像分割；深度网络中图分类号：TP391.4文献标识码：A文章编号：4003-9767（2020）23-068-02Research Review on Image Segmentation Based on Deep LearningZHANG Ying(College of Electronics and Information Engineering,Suzhou University of Science and Technology,Suzhou Jiangsu215009,China) Abstract：In recent years,deep learning has been widely used in computer vision,covering image segmentation,feature extraction and target recognition,among which image segmentation has always been a classic problem.In this paper,the methods and research status of image segmentation technology based on deep learning are summarized,and the image processing technology of deep learning is discussed in detail.The methods of image segmentation are mainly discussed from four aspects.Finally,the development of image segmentation technology is summarized.Keywords:deep learning;image segmentation;deep network0引言在计算机视觉中，图像处理、模式识别和图像识别都是近几年的研究热点，基于深度学习类型的分割有分类定位、目标检测、语义分割等。

高被引论文

高被引论文以下是一些高被引论文的例子：1. "Deep Residual Learning for Image Recognition" - 这篇论文由何凯明等人于2016年发表在IEEE Conference on Computer Vision and Pattern Recognition (CVPR)会议上。

它介绍了一种用于图像识别的深度残差学习方法，该方法在图像分类、检测和分割等任务上均取得了显著的性能提升。

2. "Attention is All You Need" - 这篇论文由Vaswani等人于2017年发表在NeurIPS会议上。

它引入了一种名为Transformer的神经网络架构，该架构通过自注意力机制实现了端到端的序列到序列的模型，取得了在机器翻译等任务上优异的结果。

3. "Generative Adversarial Networks" - 这篇论文由Goodfellow 等人于2014年发表在NeurIPS会议上。

它提出了一种新颖的生成模型架构，即生成对抗网络（GANs），该架构通过竞争式学习的方式同时训练生成器和判别器模型，取得了在图像生成和样式迁移等领域的突破性进展。

4. "The Unreasonable Effectiveness of Deep Learning in Artificial Intelligence" - 这篇论文由LeCun等人于2015年发表在IEEE Conference on Computer Vision and Pattern Recognition (CVPR)会议上。

它综述了深度学习在人工智能领域的应用，阐述了深度学习方法在计算机视觉、语音识别、自然语言处理等任务上的优秀表现。

5. "A Neural Algorithm of Artistic Style" - 这篇论文由Gatys等人于2015年发表在Journal of Vision上。

adboost, NN 等分类器模型介绍

• Toolbox for manipulating dataset • Code and dataset
Matlab code • Gentle boosting • Object detector using a part based model
Dataset with cars and computer monitors
Haar wavelets
Haar filters and integral image
Viola and Jones, ICCV 2001
The average intensity in the block is computed with four sums independently of the block size.
Boosting
• A simple algorithm for learning robust classifiers
– Freund & Shapire, 1995 – Friedman, Hastie, Tibshhirani, 1998
• Provides efficient algorithm for sparse visual feature selection
/torralba/iccv2005/
Boosting
Boosting fits the additive model
by minimizing the exponential loss
Training samples
The exponential loss is a differentiable upper bound to the misclassification error.
Haar wavelets

神经网络模型在图像超分领域的应用研究

0. 001,反卷积层为放大层。为了学习更准小的学习率, 学习率 lr 为
0. 000 1,一阶指数衰减率和二阶指数衰减率使用
系统默认参数,β1 为 0. 9,β2 为 0. 999,使用 Y 通道
的均方误差作为损失函数,如式(1) 所示,实验设
韩伟娟1,董新捷2,董文龙2
( 1. 中原科技学院,河南郑州 450000;2. 河南省公安厅,河南郑州 450003)
摘要:图像分辨率提高即超分技术是指从低分辨率图像重建相应的高分辨率图像,在医学影像
等领域有重要的应用价值。传统的基于插值的方法效果不尽理想,近年来深度学习被应用于
该领域。回顾了快速超分辨卷积神经网络( FSRCNN) 、深度超分辨率卷积神经网络( VDSR) 、
160
信息工程大学学报
2021 年
线性映射,实验结果表明此方法的重建效果优于插值等传统方法。
FSRCNN 是对 SRCNN 的改进,在最后使用一个反卷积层放大尺寸,可以直接将原始低分辨率图像输入到网络中。在实际处理中把图像转化为 YCbCr ,使用 Y 通道,然后网络的输出合并插值后的 CbCr 通道,输出彩色图像。 1. 2 VDSR
超分辨率生成对抗网络( SRGAN)3 种神经网络在图像超分中的应用原理,设计实验测试网络
结构的效果,使用 Set4、Set14、Urban100 等数据集进行峰值信噪比、结构相似性等指标的测试,
VDSR 效果较好,改进 VDSR 网络结构,由原来的 Y 通道扩展为三通道 ( VDSR-RGB) ,进一步
本文设计的 SRGAN 包括生成模块、判别模块和对抗模块 3 部分。

大模型算法面试题目(3篇)

第1篇 1. Llama 2 中使用的注意力机制是什么？描述一下查询分组注意力。 Llama 2 使用的是多头注意力（Multi-Head Attention）机制。查询分组注意力是一种改进的多头注意力机制，通过将查询、键和值分别分组，使模型能够更灵活地处理不同类型的输入数据。

2. LangChain 的结构详细描述一下。 LangChain 是一个基于 Transformer 的自然语言处理模型，其结构主要包括编码器（Encoder）和解码器（Decoder）。编码器负责将输入序列转换为固定长度的向量表示，解码器则负责根据编码器输出的向量表示生成输出序列。

3. 对位置编码熟悉吗？讲讲几种位置编码的异同。熟悉。位置编码是为了让模型了解输入序列中各个单词的顺序信息。常见的位置编码有正弦和余弦位置编码、位置编码嵌入（Positional Encoding Embedding）等。它们的主要区别在于编码方式和性能表现。

4. RLHF 的具体工程是什么？包含了几何模型？ RLHF（Reinforcement Learning from Human Feedback）是一种结合人类反馈进行强化学习的方法。具体工程包括构建一个强化学习框架，使模型能够根据人类提供的反馈进行调整。几何模型在该工程中的应用较少，主要关注于模型的可解释性和可控性。

5. 分别讲讲Encoder-only, Decoder-only, Encoder-Decoder，几种大模型的代表作。

Encoder-only：BERT、RoBERTa、ALBERT 等。 Decoder-only：GPT-3、GPT-2、T5 等。 Encoder-Decoder：Transformer、BERT、RoBERTa 等。 6. 具体讲讲 p-tuning, LoRA 等微调方法，并指出它们与传统 fine-tuning 微调有何不同。

p-tuning 和 LoRA 是两种轻量级的微调方法，与传统 fine-tuning 微调相比，它们在以下方面有所不同：（1）参数调整：p-tuning 和 LoRA 仅调整部分参数，而 fine-tuning 调整所有参数。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

OnDeepGenerativeModelswithApplicationstoRecognitionMarc’AurelioRanzato∗JoshuaSusskind†∗DepartmentofComputerScienceUniversityofTorontoranzato,vmnih,hinton@cs.toronto.eduVolodymyrMnih∗GeoffreyHinton∗†InstituteforNeuralComputation

UniversityofCalifornia,SanDiegojosh@mplab.ucsd.edu

AbstractThemostpopularwaytouseprobabilisticmodelsinvisionisﬁrsttoextractsomedescriptorsofsmallimagepatchesorobjectpartsusingwell-engineeredfeatures,andthentousestatisticallearningtoolstomodelthedependen-ciesamongthesefeaturesandeventuallabels.Learningprobabilisticmodelsdirectlyontherawpixelvalueshasprovedtobemuchmoredifﬁcultandistypicallyonlyusedforregularizingdiscriminativemethods.Inthiswork,weuseoneofthebest,pixel-level,generativemodelsofnatu-ralimages–agatedMRF–asthelowestlevelofadeepbeliefnetwork(DBN)thathasseveralhiddenlayers.WeshowthattheresultingDBNisverygoodatcopingwithocclusionwhenpredictingexpressioncategoriesfromfaceimages,anditcanproducefeaturesthatperformcompa-rablytoSIFTdescriptorsfordiscriminatingdifferenttypesofscene.Thegenerativeabilityofthemodelalsomakesiteasytoseewhatinformationiscapturedandwhatislostateachlevelofrepresentation.

1.IntroductionandPreviousWorkOverthepastfewyearsmostoftheresearchonobjectandscenerecognitionhasconvergedtowardsaparticularparadigm.Mostmethods[11]startbyapplyingsomewell-engineeredfeatures,likeSIFT[15],HoG[4],SURF[1],orPHoG[2],todescribepatchesoftheimage,andthentheyaggregatethesefeaturesatdifferentspatialresolutionsandondifferentpartsoftheimagetoproduceafeaturevectorwhichissubsequentlyfedintoageneralpurposeclassiﬁersuchasaSupportVectorMachine(SVM).Althoughverysuccessful,thesemethodsrelyheavilyonhumandesignofgoodpatchdescriptorsandwaystoaggregatethem.Giventhelargeandgrowingamountofeasilyavailableimagedataandcontinuedadvancesinmachinelearning,itshouldbepossibletolearnbetterpatchdescriptorsandbetterwaysofaggregatingthem.Thiswillbeparticularlysigniﬁcantfordatawherehumanexpertiseislimitedsuchasmicroscopic,radiographicorhyper-spectralimagery.AmajorissuewhenFigure1.Outlineofadeepgenerativemodelcomposedofthreelayers.Theinputisahigh-resolutionimage(oralargepatch).Theﬁrstlayerappliesﬁltersthattiletheimagewithdifferentoffsets(squaresofdiffer-entcolorareﬁlterswithdifferentparameters).Theﬁltersofthislayerarelearnedonnaturalimages.Afterwards,asecondlayerisadded.Thisusesasinputtheexpectationoftheﬁrstlayerlatentvariables.Thislayeristrainedtoﬁtthedistributionofitsinput.Theprocedureisrepeatedagainforthethirdlayer(andasmanyasdesired).Becauseofthetiling,thespatialresolutionisgreatlyreducedateachlayer,whileeachlatentvari-ableincreasesitsreceptiveﬁeldsizeaswegoupinthehierarchy.Aftertraining,inferenceofthetoplevelrepresentationiswellapproximatedbypropagatingtheexpectationofthelatentvariablesgiventheirinputstart-ingfromtheinputimage(seedashedarrows),whichisefﬁcientsincetheirdistributionisfactorial.Generationisperformedbyﬁrstsamplingfromthetoplayer,andthenusingtheconditionalovertheinputateachlayer(startingfromthetoplayer)toback-projectthesamplesinimagespace(seecontinuouslinearrows).Exceptattheﬁrstlayer,allotherlatentvari-ablesarebinary.Generationcanbeusedtorestoreimagesandcopewithocclusion.Thelatentvariablescanbeusedasfeaturesforasubsequentdiscriminativeclassiﬁertrainedtorecognizeobjects.learningend-to-enddiscriminativesystems[12]isoverﬁt-ting,sincetherearemillionsofparametersthatneedtobeoptimizedandtheamountoflabeleddataisseldomsufﬁ-cienttoconstrainsomanyparameters.Replicatinglocalﬁltersisoneveryeffectivemethodofreducingthenum-berofindependentparametersandthishasrecentlybeencombinedwiththeuseofgenerativemodelstoimposead-ditionalconstraintsontheparametersbyforcingthemtobegoodforgeneration(orreconstruction)aswellasforrecog-nition[13,9].Fromapurelydiscriminativeviewpoint,itisawasteofcapacitytomodelaspectsoftheimagethatarenotdirectlyrelevanttothediscriminativegoal.Inprac-

2857tice,however,thiswasteismorethanoutweighedbythefactthatasubsetofthehighlevelfeaturesthataregoodforgeneratingimagesarealsomuchmoredirectlyusefulthanrawpixelsfordeﬁningtheboundariesbetweentheclassesofinterest.Learningfeaturesthatconstituteagoodgen-erativemodelisthereforeamuchbetterregularizerthandomain-independenttermssuchasL2orL1penaltiesontheparametersorlatentvariablevalues.Acomputationallyefﬁcientwaytousegenerativemodelingtoregularizeadis-criminativesystemistostartbylearningagood,multi-layergenerativemodelofimages.Thiscreateslayersoffeaturedetectorsthatcapturethehigher-orderstatisticalstructureoftheinputdomain.Thesefeaturescanthenbediscrim-inativelyﬁne-tunedtoadjusttheclassboundarieswithoutrequiringthediscriminativelearningtodesigngoodfea-turesfromscratch.Thegenerative“pre-training”isparticu-larlyefﬁcientforDBNsandstackedauto-encodersbecauseitispossibletolearnonelayeroffeaturesatatime[7]andthefactthatthisdoesnotrequirelabeleddataisanaddedbonus.ForDBNs,thereisalsoaguaranteethataddinganewlayer,ifdonecorrectly,createsamodelthathasabettervariationallowerboundonthelogprobabilityofthetrain-ingdatathantheprevious,shallowermodel[7].Slightlybetterperformancecanusuallybeachievedbycombiningthegenerativeanddiscriminativeobjectivefunctionsdur-ingtheﬁne-tuningstage[10].Wecaninterpretsparsecod-ingmethods[17],convolutionalDBNs[13],manyenergy-basedmodels[9]andotherrelatedmethods[23]asgener-ativemodelsusedforregularizationofanultimatelydis-criminativesystem.Thequestionweaddressinthispaperiswhetheritisagoodideatouseaproperprobabilisticmodelofpixelstoregularizeadiscriminativesystem,asopposedtousingasimpleralgorithmthatlearnsrepresen-tationswithoutattemptingtoﬁtaproperdistributiontotheinputimages[17,9].Infact,learningaproperprobabilis-ticmodelisnotthemostpopularchoice,evenamongre-searchersinterestedinend-to-endadaptiverecognitionsys-tems,becauselearningareallygoodprobabilisticmodelsofhigh-resolutionimagesisawell-knownopenprobleminnaturalimagestatistics[20,19,18].Learningapropergen-erativemodelisthereforeseenasmakinganalreadydif-ﬁcultrecognitionproblemevenharder.Inthiswork,weshowhowtouseaDBN[7]toimproveoneofthebestgenerativemodelsofimages,namelyagatedMarkovRan-domField(MRF)thatusesonesetofhiddenvariablestocreateanimage-speciﬁcmodelofthecovariancestructureofthepixelsandanothersetofhiddenvariablestomodeltheintensitiesofthepixels[18].TheDBNusesseverallayersofBernouillihiddenvariablestomodelthestatisti-calstructureinthehiddenactivitiesofthegatedMRF.Byreplicatingfeaturesinthelowerlayersitispossibletolearnaverygoodgenerativemodelofhigh-resolutionimages.Section2reviewsthismodelandthelearningprocedure.Section3demonstratestheadvantageofsuchaprobabilis-ticmodeloverothermethodsbasedonengineeredfeaturesorlearnedfeaturesusingnon-probabilisticenergy-based(orloss-based)models.Weshowthatourdeepmodelismoreinterpretablebecausewecaneasilyvisualizethestateofitsinternalrepresentationatanygivenlayer.Wecanalsousethemodeltodrawsampleswhichgiveanintuitivein-terpretationofthelearnedstructure.Morequantitatively,wethenusethemodeltolearnfeaturesanalogoustoSIFTwhichareusedtorecognizescenecategories.Finally,weshowthatwecanexploitthegenerativeabilityofthemodeltopredictmissingpixelsinoccludedfaceimagesandulti-matelyimprovetherecognitionoffacialexpressions.Theseexperimentsaimatdemonstratingthatthesamedeepmodelperformscomparablyorevenbetterthanthestate-of-the-artontwoverydifferentdatasetsbecauseitcanfullyadapttotheinputdata.Thedeepmodelcanalsobeseenasaﬂexibleframeworktolearnrobustfeatures,withtheadvantagethatitcannaturallycopewithambiguitiesintherawsensoryin-puts,likemissingpixelvaluescausedbyocclusion,whichisperseaveryimportantprobleminvision.