The Isotope Wavelet A Signal Theoretic Framework for Analyzing Mass Spectrometry Data

合集下载

非放射性碘标记-电感耦合等离子体质谱用于免疫分析研究

析检测报道很少，本研究中当碘一羊抗兔ＩＧ蛋白与包被板ｇ
基金项目：海洋公益性行业科研专项项目（０７５１）２０００１，中国海监技术支撑体系项目和２０年海洋环境保护及节能减排专项项目资助０８作者简介：李景喜，１８９０年生，国家海洋局第一海洋研究所生态研究中心研究实习员
摘要研究了非放射性碘标记一电感耦合等离子体质谱免疫分析体系，体系以兔抗大肠杆菌作为模型抗该原，羊抗兔ＩＧ蛋白为抗体，ｇ建立了一种新的免疫分析方法。实验中采用溴代琥珀酰胺（Ｂ）ＮＳ为氧化剂，实
现非放射性Ｉ标记羊抗兔ＩＧ蛋白，ｇ探索了标记的最佳条件，标记率为６．２；标记物在Ｓｐａｅ５３１ｅｈｄｘＧ０柱上分离纯化后，研究了Ｉ＿羊抗兔蛋白的稳定性，结果表明，标记物在４℃放置９后几乎没有碘脱落，６ｈ并保持一定的活性。实验中采用聚苯乙烯９孔板作为固相载体进行免疫反应，以电感耦合等离子体质谱为检６测手段，方法的检出限为０１ｇ・～，Ｓ（＝９为３；．２ｍＬＲＤｎ）该体系也可适用于其他活性蛋白、核酸等的标
记和分析。
关键词非放射性碘；免疫分析；电感耦合等离子体质谱
中图分类号：０５．６７３文献标识码：ＡＤ：１．９４ｊｉｎ１０ —５３２１）３０８ —４ＯＩ０３６／．ｓ．０００９（０００—７８０ｓ容易引起脱氧核糖核酸、白及酶类等分子变化和结构的破蛋
稳定碘同位素，且其来源方便、价格低廉、记容易、废液标

WAters31843

Desolvation gas flow liters/hr 200 to 250 250 to 400 250 to 400 400 to 750
Higher desolvation temperatures give increased sensitivity. However, increasing the temperature above the range suggested reduces beam stability. Increasing the gas flow rate higher that the quoted values lead to unnecessary high nitrogen consumption. Avoid operating the desolvation heater for long periods of time without proper gas flow. To do so could damage the source.
©2004 Waters Corporation
MS Troubleshooting Strategy
Try to simplify -assess impact on lab efficiency -inspect the MS or /MS/MS -try to categorize troubleshoot the easiest to fix items first
Mass Spectrometry
Source: ESI APcI Nano-ESI Mass analyzers: magnetic sectors electric sectors time of flight quadrupole ion trap FT-ICR

质谱

CH3
58 + CH2 N(CH3)2
键的断裂位置
15 58 + CH3 CH2 N(CH3)2
形成质量为15和58的两种碎片离子
31
6. 质谱中离子碎片的类型
分子离子
同位素离子
碎片离子
亚稳离子
多电荷离子
32
一、分子离子和分子离子峰的判断
1 . 分子离子（M＋．或 P ）有机分子被高能电子流轰击掉一个电子形成的离子即为分子离子。在质谱图上，与分子离子相对应的峰为分子离子峰。分子离子峰位于质谱图中质荷比最高位置的一端，该峰就是分子离子峰，该峰的质量数就是分子量。质谱中测得的分子量是分子的真实质量，而化学计算得到的分子量是各元素同位素平均质量之和，如：
பைடு நூலகம்
田中耕一的软激光解吸附质谱技术原理图
解决了“看清”生物大分子 “是谁”的问题
8
海洋环境及污染
HP6890 GC (Agilent 4 USA)/Micromass Autospec-Ultima NT (Micromass UK)
质谱计：双聚焦磁质谱质量范围：3000/2000(8kV加速电压) 分辨率：>80000(10%谷底) 灵敏度：电子轰击离子化，分辨率1000时，1µg 硬脂酸甲酯产生5*10-7C电量，分辨率真空系统：扩散泵扫描速度: 015s/decade GC-MS精确质量数：2mu RMS
质荷比小的在低磁场强度时通过收集狭缝
16
2 质谱仪基本组成
质谱计有多种型号，如单聚焦质谱、双聚焦质谱、四级杆质量器质谱、离子阱质谱等，它们在产生、分离和检测离子的装置上有很大差异。但是，不管是哪种类型的质谱仪，其基本组成是相同的。都包括离子源、质量分析器、检测器和真空系统。本节主要介绍有机质谱的基本结构和工作原理。一、单聚焦质谱计 1．真空系统：质谱计的整个系统都必须是高真空的(10－5—10－6mmHg ) ，因此可根据需要而采用各种类型的真空泵．

waters离子淌度质谱

waters离子淌度质谱离子淌度质谱（Ion Mobility Mass Spectrometry，简称IM-MS）是一种结合了离子迁移和质谱技术的分析方法，能够对复杂样品进行快速、高效的分析。

Waters离子淌度质谱系统是目前应用较为广泛的一种离子淌度质谱仪器，具有高分辨率、高灵敏度和高通量的特点。

一、离子淌度质谱的原理离子淌度质谱的原理基于离子在气体中的迁移速率差异。

在离子淌度质谱仪中，样品分子首先通过电离源产生离子，然后进入离子迁移装置。

离子迁移装置中的气体会对离子进行碰撞，使离子发生迁移。

不同离子的迁移速率受到离子的大小、形状、电荷状态等因素的影响，因此离子会在迁移过程中发生分离。

最后，离子进入质谱仪进行质谱分析，得到离子的质荷比信息。

二、Waters离子淌度质谱系统的特点1. 高分辨率：Waters离子淌度质谱系统采用了高压离子迁移技术，能够实现高分辨率的离子分离。

这使得在复杂样品中，离子能够得到有效的分离和鉴定，提高了分析的准确性和可靠性。

2. 高灵敏度：Waters离子淌度质谱系统具有高灵敏度的特点，能够检测到低浓度的目标物质。

这对于药物代谢研究、生物标志物的发现等领域具有重要意义。

3. 高通量：Waters离子淌度质谱系统的分析速度快，能够在短时间内完成大量样品的分析。

这对于高通量筛选、快速分析等应用具有重要意义。

三、Waters离子淌度质谱在生命科学中的应用1. 药物代谢研究：Waters离子淌度质谱系统可以对药物代谢产物进行快速鉴定和定量分析，帮助科研人员了解药物在体内的代谢途径和代谢产物的结构特征，为药物研发提供重要参考。

2. 蛋白质研究：Waters离子淌度质谱系统可以用于蛋白质的结构分析、翻译后修饰的研究等。

通过离子淌度质谱的分析，可以获得蛋白质的质量、结构和功能等信息。

3. 代谢组学研究：Waters离子淌度质谱系统可以对生物体内的代谢产物进行全面的分析，帮助科研人员了解代谢途径的变化、代谢产物的结构特征等，为代谢组学研究提供重要工具和方法。

C.parvum全基因组序列

DOI: 10.1126/science.1094786, 441 (2004);304Science et al.Mitchell S. Abrahamsen,Cryptosporidium parvum Complete Genome Sequence of the Apicomplexan, (this information is current as of October 7, 2009 ):The following resources related to this article are available online at/cgi/content/full/304/5669/441version of this article at:including high-resolution figures, can be found in the online Updated information and services,/cgi/content/full/1094786/DC1 can be found at:Supporting Online Material/cgi/content/full/304/5669/441#otherarticles , 9 of which can be accessed for free: cites 25 articles This article 239 article(s) on the ISI Web of Science. cited by This article has been /cgi/content/full/304/5669/441#otherarticles 53 articles hosted by HighWire Press; see: cited by This article has been/cgi/collection/genetics Genetics: subject collections This article appears in the following/about/permissions.dtl in whole or in part can be found at: this article permission to reproduce of this article or about obtaining reprints Information about obtaining registered trademark of AAAS.is a Science 2004 by the American Association for the Advancement of Science; all rights reserved. The title Copyright American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the Science o n O c t o b e r 7, 2009w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o m3.R.Jackendoff,Foundations of Language:Brain,Gram-mar,Evolution(Oxford Univ.Press,Oxford,2003).4.Although for Frege(1),reference was established rela-tive to objects in the world,here we follow Jackendoff’s suggestion(3)that this is done relative to objects and the state of affairs as mentally represented.5.S.Zola-Morgan,L.R.Squire,in The Development andNeural Bases of Higher Cognitive Functions(New York Academy of Sciences,New York,1990),pp.434–456.6.N.Chomsky,Reﬂections on Language(Pantheon,New York,1975).7.J.Katz,Semantic Theory(Harper&Row,New York,1972).8.D.Sperber,D.Wilson,Relevance(Harvard Univ.Press,Cambridge,MA,1986).9.K.I.Forster,in Sentence Processing,W.E.Cooper,C.T.Walker,Eds.(Erlbaum,Hillsdale,NJ,1989),pp.27–85.10.H.H.Clark,Using Language(Cambridge Univ.Press,Cambridge,1996).11.Often word meanings can only be fully determined byinvokingworld knowledg e.For instance,the meaningof “ﬂat”in a“ﬂat road”implies the absence of holes.However,in the expression“aﬂat tire,”it indicates the presence of a hole.The meaningof“ﬁnish”in the phrase “Billﬁnished the book”implies that Bill completed readingthe book.However,the phrase“the g oatﬁn-ished the book”can only be interpreted as the goat eatingor destroyingthe book.The examples illustrate that word meaningis often underdetermined and nec-essarily intertwined with general world knowledge.In such cases,it is hard to see how the integration of lexical meaning and general world knowledge could be strictly separated(3,31).12.W.Marslen-Wilson,C.M.Brown,L.K.Tyler,Lang.Cognit.Process.3,1(1988).13.ERPs for30subjects were averaged time-locked to theonset of the critical words,with40items per condition.Sentences were presented word by word on the centerof a computer screen,with a stimulus onset asynchronyof600ms.While subjects were readingthe sentences,their EEG was recorded and ampliﬁed with a high-cut-off frequency of70Hz,a time constant of8s,and asamplingfrequency of200Hz.14.Materials and methods are available as supportingmaterial on Science Online.15.M.Kutas,S.A.Hillyard,Science207,203(1980).16.C.Brown,P.Hagoort,J.Cognit.Neurosci.5,34(1993).17.C.M.Brown,P.Hagoort,in Architectures and Mech-anisms for Language Processing,M.W.Crocker,M.Pickering,C.Clifton Jr.,Eds.(Cambridge Univ.Press,Cambridge,1999),pp.213–237.18.F.Varela et al.,Nature Rev.Neurosci.2,229(2001).19.We obtained TFRs of the single-trial EEG data by con-volvingcomplex Morlet wavelets with the EEG data andcomputingthe squared norm for the result of theconvolution.We used wavelets with a7-cycle width,with frequencies ranging from1to70Hz,in1-Hz steps.Power values thus obtained were expressed as a per-centage change relative to the power in a baselineinterval,which was taken from150to0ms before theonset of the critical word.This was done in order tonormalize for individual differences in EEG power anddifferences in baseline power between different fre-quency bands.Two relevant time-frequency compo-nents were identiﬁed:(i)a theta component,rangingfrom4to7Hz and from300to800ms after wordonset,and(ii)a gamma component,ranging from35to45Hz and from400to600ms after word onset.20.C.Tallon-Baudry,O.Bertrand,Trends Cognit.Sci.3,151(1999).tner et al.,Nature397,434(1999).22.M.Bastiaansen,P.Hagoort,Cortex39(2003).23.O.Jensen,C.D.Tesche,Eur.J.Neurosci.15,1395(2002).24.Whole brain T2*-weighted echo planar imaging bloodoxygen level–dependent(EPI-BOLD)fMRI data wereacquired with a Siemens Sonata1.5-T magnetic reso-nance scanner with interleaved slice ordering,a volumerepetition time of2.48s,an echo time of40ms,a90°ﬂip angle,31horizontal slices,a64ϫ64slice matrix,and isotropic voxel size of3.5ϫ3.5ϫ3.5mm.For thestructural magnetic resonance image,we used a high-resolution(isotropic voxels of1mm3)T1-weightedmagnetization-prepared rapid gradient-echo pulse se-quence.The fMRI data were preprocessed and analyzedby statistical parametric mappingwith SPM99software(http://www.ﬁ/spm99).25.S.E.Petersen et al.,Nature331,585(1988).26.B.T.Gold,R.L.Buckner,Neuron35,803(2002).27.E.Halgren et al.,J.Psychophysiol.88,1(1994).28.E.Halgren et al.,Neuroimage17,1101(2002).29.M.K.Tanenhaus et al.,Science268,1632(1995).30.J.J.A.van Berkum et al.,J.Cognit.Neurosci.11,657(1999).31.P.A.M.Seuren,Discourse Semantics(Basil Blackwell,Oxford,1985).32.We thank P.Indefrey,P.Fries,P.A.M.Seuren,and M.van Turennout for helpful discussions.Supported bythe Netherlands Organization for Scientiﬁc Research,grant no.400-56-384(P.H.).Supporting Online Material/cgi/content/full/1095455/DC1Materials and MethodsFig.S1References and Notes8January2004;accepted9March2004Published online18March2004;10.1126/science.1095455Include this information when citingthis paper.Complete Genome Sequence ofthe Apicomplexan,Cryptosporidium parvumMitchell S.Abrahamsen,1,2*†Thomas J.Templeton,3†Shinichiro Enomoto,1Juan E.Abrahante,1Guan Zhu,4 Cheryl ncto,1Mingqi Deng,1Chang Liu,1‡Giovanni Widmer,5Saul Tzipori,5GregoryA.Buck,6Ping Xu,6 Alan T.Bankier,7Paul H.Dear,7Bernard A.Konfortov,7 Helen F.Spriggs,7Lakshminarayan Iyer,8Vivek Anantharaman,8L.Aravind,8Vivek Kapur2,9The apicomplexan Cryptosporidium parvum is an intestinal parasite that affects healthy humans and animals,and causes an unrelenting infection in immuno-compromised individuals such as AIDS patients.We report the complete ge-nome sequence of C.parvum,type II isolate.Genome analysis identiﬁes ex-tremely streamlined metabolic pathways and a reliance on the host for nu-trients.In contrast to Plasmodium and Toxoplasma,the parasite lacks an api-coplast and its genome,and possesses a degenerate mitochondrion that has lost its genome.Several novel classes of cell-surface and secreted proteins with a potential role in host interactions and pathogenesis were also detected.Elu-cidation of the core metabolism,including enzymes with high similarities to bacterial and plant counterparts,opens new avenues for drug development.Cryptosporidium parvum is a globally impor-tant intracellular pathogen of humans and animals.The duration of infection and patho-genesis of cryptosporidiosis depends on host immune status,ranging from a severe but self-limiting diarrhea in immunocompetent individuals to a life-threatening,prolonged infection in immunocompromised patients.Asubstantial degree of morbidity and mortalityis associated with infections in AIDS pa-tients.Despite intensive efforts over the past20years,there is currently no effective ther-apy for treating or preventing C.parvuminfection in humans.Cryptosporidium belongs to the phylumApicomplexa,whose members share a com-mon apical secretory apparatus mediating lo-comotion and tissue or cellular invasion.Many apicomplexans are of medical or vet-erinary importance,including Plasmodium,Babesia,Toxoplasma,Neosprora,Sarcocys-tis,Cyclospora,and Eimeria.The life cycle ofC.parvum is similar to that of other cyst-forming apicomplexans(e.g.,Eimeria and Tox-oplasma),resulting in the formation of oocysts1Department of Veterinary and Biomedical Science,College of Veterinary Medicine,2Biomedical Genom-ics Center,University of Minnesota,St.Paul,MN55108,USA.3Department of Microbiology and Immu-nology,Weill Medical College and Program in Immu-nology,Weill Graduate School of Medical Sciences ofCornell University,New York,NY10021,USA.4De-partment of Veterinary Pathobiology,College of Vet-erinary Medicine,Texas A&M University,College Sta-tion,TX77843,USA.5Division of Infectious Diseases,Tufts University School of Veterinary Medicine,NorthGrafton,MA01536,USA.6Center for the Study ofBiological Complexity and Department of Microbiol-ogy and Immunology,Virginia Commonwealth Uni-versity,Richmond,VA23198,USA.7MRC Laboratoryof Molecular Biology,Hills Road,Cambridge CB22QH,UK.8National Center for Biotechnology Infor-mation,National Library of Medicine,National Insti-tutes of Health,Bethesda,MD20894,USA.9Depart-ment of Microbiology,University of Minnesota,Min-neapolis,MN55455,USA.*To whom correspondence should be addressed.E-mail:abe@†These authors contributed equally to this work.‡Present address:Bioinformatics Division,Genetic Re-search,GlaxoSmithKline Pharmaceuticals,5MooreDrive,Research Triangle Park,NC27009,USA.R E P O R T S SCIENCE VOL30416APRIL2004441o n O c t o b e r 7 , 2 0 0 9 w w w . s c i e n c e m a g . o r g D o w n l o a d e d f r o mthat are shed in the feces of infected hosts.C.parvum oocysts are highly resistant to environ-mental stresses,including chlorine treatment of community water supplies;hence,the parasite is an important water-and food-borne pathogen (1).The obligate intracellular nature of the par-asite ’s life cycle and the inability to culture the parasite continuously in vitro greatly impair researchers ’ability to obtain purified samples of the different developmental stages.The par-asite cannot be genetically manipulated,and transformation methodologies are currently un-available.To begin to address these limitations,we have obtained the complete C.parvum ge-nome sequence and its predicted protein com-plement.(This whole-genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the project accession AAEE00000000.The version described in this paper is the first version,AAEE01000000.)The random shotgun approach was used to obtain the complete DNA sequence (2)of the Iowa “type II ”isolate of C.parvum .This isolate readily transmits disease among numerous mammals,including humans.The resulting ge-nome sequence has roughly 13ϫgenome cov-erage containing five gaps and 9.1Mb of totalDNA sequence within eight chromosomes.The C.parvum genome is thus quite compact rela-tive to the 23-Mb,14-chromosome genome of Plasmodium falciparum (3);this size difference is predominantly the result of shorter intergenic regions,fewer introns,and a smaller number of genes (Table 1).Comparison of the assembled sequence of chromosome VI to that of the recently published sequence of chromosome VI (4)revealed that our assembly contains an ad-ditional 160kb of sequence and a single gap versus two,with the common sequences dis-playing a 99.993%sequence identity (2).The relative paucity of introns greatly simplified gene predictions and facilitated an-notation (2)of predicted open reading frames (ORFs).These analyses provided an estimate of 3807protein-encoding genes for the C.parvum genome,far fewer than the estimated 5300genes predicted for the Plasmodium genome (3).This difference is primarily due to the absence of an apicoplast and mitochondrial genome,as well as the pres-ence of fewer genes encoding metabolic functions and variant surface proteins,such as the P.falciparum var and rifin molecules (Table 2).An analysis of the encoded pro-tein sequences with the program SEG (5)shows that these protein-encoding genes are not enriched in low-complexity se-quences (34%)to the extent observed in the proteins from Plasmodium (70%).Our sequence analysis indicates that Cryptosporidium ,unlike Plasmodium and Toxoplasma ,lacks both mitochondrion and apicoplast genomes.The overall complete-ness of the genome sequence,together with the fact that similar DNA extraction proce-dures used to isolate total genomic DNA from C.parvum efficiently yielded mito-chondrion and apicoplast genomes from Ei-meria sp.and Toxoplasma (6,7),indicates that the absence of organellar genomes was unlikely to have been the result of method-ological error.These conclusions are con-sistent with the absence of nuclear genes for the DNA replication and translation machinery characteristic of mitochondria and apicoplasts,and with the lack of mito-chondrial or apicoplast targeting signals for tRNA synthetases.A number of putative mitochondrial pro-teins were identified,including components of a mitochondrial protein import apparatus,chaperones,uncoupling proteins,and solute translocators (table S1).However,the ge-nome does not encode any Krebs cycle en-zymes,nor the components constituting the mitochondrial complexes I to IV;this finding indicates that the parasite does not rely on complete oxidation and respiratory chains for synthesizing adenosine triphosphate (ATP).Similar to Plasmodium ,no orthologs for the ␥,␦,or εsubunits or the c subunit of the F 0proton channel were detected (whereas all subunits were found for a V-type ATPase).Cryptosporidium ,like Eimeria (8)and Plas-modium ,possesses a pyridine nucleotide tran-shydrogenase integral membrane protein that may couple reduced nicotinamide adenine dinucleotide (NADH)and reduced nico-tinamide adenine dinucleotide phosphate (NADPH)redox to proton translocation across the inner mitochondrial membrane.Unlike Plasmodium ,the parasite has two copies of the pyridine nucleotide transhydrogenase gene.Also present is a likely mitochondrial membrane –associated,cyanide-resistant alter-native oxidase (AOX )that catalyzes the reduction of molecular oxygen by ubiquinol to produce H 2O,but not superoxide or H 2O 2.Several genes were identified as involved in biogenesis of iron-sulfur [Fe-S]complexes with potential mitochondrial targeting signals (e.g.,nifS,nifU,frataxin,and ferredoxin),supporting the presence of a limited electron flux in the mitochondrial remnant (table S2).Our sequence analysis confirms the absence of a plastid genome (7)and,additionally,the loss of plastid-associated metabolic pathways including the type II fatty acid synthases (FASs)and isoprenoid synthetic enzymes thatTable 1.General features of the C.parvum genome and comparison with other single-celled eukaryotes.Values are derived from respective genome project summaries (3,26–28).ND,not determined.FeatureC.parvum P.falciparum S.pombe S.cerevisiae E.cuniculiSize (Mbp)9.122.912.512.5 2.5(G ϩC)content (%)3019.43638.347No.of genes 38075268492957701997Mean gene length (bp)excluding introns 1795228314261424ND Gene density (bp per gene)23824338252820881256Percent coding75.352.657.570.590Genes with introns (%)553.9435ND Intergenic regions (G ϩC)content %23.913.632.435.145Mean length (bp)5661694952515129RNAsNo.of tRNA genes 454317429944No.of 5S rRNA genes 6330100–2003No.of 5.8S ,18S ,and 28S rRNA units 57200–400100–20022Table parison between predicted C.parvum and P.falciparum proteins.FeatureC.parvum P.falciparum *Common †Total predicted proteins380752681883Mitochondrial targeted/encoded 17(0.45%)246(4.7%)15Apicoplast targeted/encoded 0581(11.0%)0var/rif/stevor ‡0236(4.5%)0Annotated as protease §50(1.3%)31(0.59%)27Annotated as transporter ࿣69(1.8%)34(0.65%)34Assigned EC function ¶167(4.4%)389(7.4%)113Hypothetical proteins925(24.3%)3208(60.9%)126*Values indicated for P.falciparum are as reported (3)with the exception of those for proteins annotated as protease or transporter.†TBLASTN hits (e Ͻ–5)between C.parvum and P.falciparum .‡As reported in (3).§Pre-dicted proteins annotated as “protease or peptidase”for C.parvum (CryptoGenome database,)and P.falciparum (PlasmoDB database,).࿣Predicted proteins annotated as “trans-porter,permease of P-type ATPase”for C.parvum (CryptoGenome)and P.falciparum (PlasmoDB).¶Bidirectional BLAST hit (e Ͻ–15)to orthologs with assigned Enzyme Commission (EC)numbers.Does not include EC assignment numbers for protein kinases or protein phosphatases (due to inconsistent annotation across genomes),or DNA polymerases or RNA polymerases,as a result of issues related to subunit inclusion.(For consistency,46proteins were excluded from the reported P.falciparum values.)R E P O R T S16APRIL 2004VOL 304SCIENCE 442 o n O c t o b e r 7, 2009w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o mare otherwise localized to the plastid in other apicomplexans.C.parvum fatty acid biosynthe-sis appears to be cytoplasmic,conducted by a large(8252amino acids)modular type I FAS (9)and possibly by another large enzyme that is related to the multidomain bacterial polyketide synthase(10).Comprehensive screening of the C.parvum genome sequence also did not detect orthologs of Plasmodium nuclear-encoded genes that contain apicoplast-targeting and transit sequences(11).C.parvum metabolism is greatly stream-lined relative to that of Plasmodium,and in certain ways it is reminiscent of that of another obligate eukaryotic parasite,the microsporidian Encephalitozoon.The degeneration of the mi-tochondrion and associated metabolic capabili-ties suggests that the parasite largely relies on glycolysis for energy production.The parasite is capable of uptake and catabolism of mono-sugars(e.g.,glucose and fructose)as well as synthesis,storage,and catabolism of polysac-charides such as trehalose and amylopectin. Like many anaerobic organisms,it economizes ATP through the use of pyrophosphate-dependent phosphofructokinases.The conver-sion of pyruvate to acetyl–coenzyme A(CoA) is catalyzed by an atypical pyruvate-NADPH oxidoreductase(Cp PNO)that contains an N-terminal pyruvate–ferredoxin oxidoreductase (PFO)domain fused with a C-terminal NADPH–cytochrome P450reductase domain (CPR).Such a PFO-CPR fusion has previously been observed only in the euglenozoan protist Euglena gracilis(12).Acetyl-CoA can be con-verted to malonyl-CoA,an important precursor for fatty acid and polyketide biosynthesis.Gly-colysis leads to several possible organic end products,including lactate,acetate,and ethanol. The production of acetate from acetyl-CoA may be economically beneficial to the parasite via coupling with ATP production.Ethanol is potentially produced via two in-dependent pathways:(i)from the combination of pyruvate decarboxylase and alcohol dehy-drogenase,or(ii)from acetyl-CoA by means of a bifunctional dehydrogenase(adhE)with ac-etaldehyde and alcohol dehydrogenase activi-ties;adhE first converts acetyl-CoA to acetal-dehyde and then reduces the latter to ethanol. AdhE predominantly occurs in bacteria but has recently been identified in several protozoans, including vertebrate gut parasites such as Enta-moeba and Giardia(13,14).Adjacent to the adhE gene resides a second gene encoding only the AdhE C-terminal Fe-dependent alcohol de-hydrogenase domain.This gene product may form a multisubunit complex with AdhE,or it may function as an alternative alcohol dehydro-genase that is specific to certain growth condi-tions.C.parvum has a glycerol3-phosphate dehydrogenase similar to those of plants,fungi, and the kinetoplastid Trypanosoma,but(unlike trypanosomes)the parasite lacks an ortholog of glycerol kinase and thus this pathway does not yield glycerol production.In addition to themodular fatty acid synthase(Cp FAS1)andpolyketide synthase homolog(Cp PKS1), C.parvum possesses several fatty acyl–CoA syn-thases and a fatty acyl elongase that may partici-pate in fatty acid metabolism.Further,enzymesfor the metabolism of complex lipids(e.g.,glyc-erolipid and inositol phosphate)were identified inthe genome.Fatty acids are apparently not anenergy source,because enzymes of the fatty acidoxidative pathway are absent,with the exceptionof a3-hydroxyacyl-CoA dehydrogenase.C.parvum purine metabolism is greatlysimplified,retaining only an adenosine ki-nase and enzymes catalyzing conversionsof adenosine5Ј-monophosphate(AMP)toinosine,xanthosine,and guanosine5Ј-monophosphates(IMP,XMP,and GMP).Among these enzymes,IMP dehydrogenase(IMPDH)is phylogenetically related toε-proteobacterial IMPDH and is strikinglydifferent from its counterparts in both thehost and other apicomplexans(15).In con-trast to other apicomplexans such as Toxo-plasma gondii and P.falciparum,no geneencoding hypoxanthine-xanthineguaninephosphoribosyltransferase(HXGPRT)is de-tected,in contrast to a previous report on theactivity of this enzyme in C.parvum sporo-zoites(16).The absence of HXGPRT sug-gests that the parasite may rely solely on asingle enzyme system including IMPDH toproduce GMP from AMP.In contrast to otherapicomplexans,the parasite appears to relyon adenosine for purine salvage,a modelsupported by the identification of an adeno-sine transporter.Unlike other apicomplexansand many parasitic protists that can synthe-size pyrimidines de novo,C.parvum relies onpyrimidine salvage and retains the ability forinterconversions among uridine and cytidine5Ј-monophosphates(UMP and CMP),theirdeoxy forms(dUMP and dCMP),and dAMP,as well as their corresponding di-and triphos-phonucleotides.The parasite has also largelyshed the ability to synthesize amino acids denovo,although it retains the ability to convertselect amino acids,and instead appears torely on amino acid uptake from the host bymeans of a set of at least11amino acidtransporters(table S2).Most of the Cryptosporidium core pro-cesses involved in DNA replication,repair,transcription,and translation conform to thebasic eukaryotic blueprint(2).The transcrip-tional apparatus resembles Plasmodium interms of basal transcription machinery.How-ever,a striking numerical difference is seenin the complements of two RNA bindingdomains,Sm and RRM,between P.falcipa-rum(17and71domains,respectively)and C.parvum(9and51domains).This reductionresults in part from the loss of conservedproteins belonging to the spliceosomal ma-chinery,including all genes encoding Smdomain proteins belonging to the U6spliceo-somal particle,which suggests that this par-ticle activity is degenerate or entirely lost.This reduction in spliceosomal machinery isconsistent with the reduced number of pre-dicted introns in Cryptosporidium(5%)rela-tive to Plasmodium(Ͼ50%).In addition,keycomponents of the small RNA–mediatedposttranscriptional gene silencing system aremissing,such as the RNA-dependent RNApolymerase,Argonaute,and Dicer orthologs;hence,RNA interference–related technolo-gies are unlikely to be of much value intargeted disruption of genes in C.parvum.Cryptosporidium invasion of columnarbrush border epithelial cells has been de-scribed as“intracellular,but extracytoplas-mic,”as the parasite resides on the surface ofthe intestinal epithelium but lies underneaththe host cell membrane.This niche may al-low the parasite to evade immune surveil-lance but take advantage of solute transportacross the host microvillus membrane or theextensively convoluted parasitophorous vac-uole.Indeed,Cryptosporidium has numerousgenes(table S2)encoding families of putativesugar transporters(up to9genes)and aminoacid transporters(11genes).This is in starkcontrast to Plasmodium,which has fewersugar transporters and only one putative ami-no acid transporter(GenBank identificationnumber23612372).As a first step toward identification ofmulti–drug-resistant pumps,the genome se-quence was analyzed for all occurrences ofgenes encoding multitransmembrane proteins.Notable are a set of four paralogous proteinsthat belong to the sbmA family(table S2)thatare involved in the transport of peptide antibi-otics in bacteria.A putative ortholog of thePlasmodium chloroquine resistance–linkedgene Pf CRT(17)was also identified,althoughthe parasite does not possess a food vacuole likethe one seen in Plasmodium.Unlike Plasmodium,C.parvum does notpossess extensive subtelomeric clusters of anti-genically variant proteins(exemplified by thelarge families of var and rif/stevor genes)thatare involved in immune evasion.In contrast,more than20genes were identified that encodemucin-like proteins(18,19)having hallmarksof extensive Thr or Ser stretches suggestive ofglycosylation and signal peptide sequences sug-gesting secretion(table S2).One notable exam-ple is an11,700–amino acid protein with anuninterrupted stretch of308Thr residues(cgd3_720).Although large families of secretedproteins analogous to the Plasmodium multi-gene families were not found,several smallermultigene clusters were observed that encodepredicted secreted proteins,with no detectablesimilarity to proteins from other organisms(Fig.1,A and B).Within this group,at leastfour distinct families appear to have emergedthrough gene expansions specific to the Cryp-R E P O R T S SCIENCE VOL30416APRIL2004443o n O c t o b e r 7 , 2 0 0 9 w w w . s c i e n c e m a g . o r g D o w n l o a d e d f r o mtosporidium clade.These families —SKSR,MEDLE,WYLE,FGLN,and GGC —were named after well-conserved sequence motifs (table S2).Reverse transcription polymerase chain reaction (RT-PCR)expression analysis (20)of one cluster,a locus of seven adjacent CpLSP genes (Fig.1B),shows coexpression during the course of in vitro development (Fig.1C).An additional eight genes were identified that encode proteins having a periodic cysteine structure similar to the Cryptosporidium oocyst wall protein;these eight genes are similarly expressed during the onset of oocyst formation and likely participate in the formation of the coccidian rigid oocyst wall in both Cryptospo-ridium and Toxoplasma (21).Whereas the extracellular proteins described above are of apparent apicomplexan or lineage-specific in-vention,Cryptosporidium possesses many genesencodingsecretedproteinshavinglineage-specific multidomain architectures composed of animal-and bacterial-like extracellular adhe-sive domains (fig.S1).Lineage-specific expansions were ob-served for several proteases (table S2),in-cluding an aspartyl protease (six genes),a subtilisin-like protease,a cryptopain-like cys-teine protease (five genes),and a Plas-modium falcilysin-like (insulin degrading enzyme –like)protease (19genes).Nine of the Cryptosporidium falcilysin genes lack the Zn-chelating “HXXEH ”active site motif and are likely to be catalytically inactive copies that may have been reused for specific protein-protein interactions on the cell sur-face.In contrast to the Plasmodium falcilysin,the Cryptosporidium genes possess signal peptide sequences and are likely trafficked to a secretory pathway.The expansion of this family suggests either that the proteins have distinct cleavage specificities or that their diversity may be related to evasion of a host immune response.Completion of the C.parvum genome se-quence has highlighted the lack of conven-tional drug targets currently pursued for the control and treatment of other parasitic protists.On the basis of molecular and bio-chemical studies and drug screening of other apicomplexans,several putative Cryptospo-ridium metabolic pathways or enzymes have been erroneously proposed to be potential drug targets (22),including the apicoplast and its associated metabolic pathways,the shikimate pathway,the mannitol cycle,the electron transport chain,and HXGPRT.Nonetheless,complete genome sequence analysis identifies a number of classic and novel molecular candidates for drug explora-tion,including numerous plant-like and bacterial-like enzymes (tables S3and S4).Although the C.parvum genome lacks HXGPRT,a potent drug target in other api-complexans,it has only the single pathway dependent on IMPDH to convert AMP to GMP.The bacterial-type IMPDH may be a promising target because it differs substan-tially from that of eukaryotic enzymes (15).Because of the lack of de novo biosynthetic capacity for purines,pyrimidines,and amino acids,C.parvum relies solely on scavenge from the host via a series of transporters,which may be exploited for chemotherapy.C.parvum possesses a bacterial-type thymidine kinase,and the role of this enzyme in pyrim-idine metabolism and its drug target candida-cy should be pursued.The presence of an alternative oxidase,likely targeted to the remnant mitochondrion,gives promise to the study of salicylhydroxamic acid (SHAM),as-cofuranone,and their analogs as inhibitors of energy metabolism in the parasite (23).Cryptosporidium possesses at least 15“plant-like ”enzymes that are either absent in or highly divergent from those typically found in mammals (table S3).Within the glycolytic pathway,the plant-like PPi-PFK has been shown to be a potential target in other parasites including T.gondii ,and PEPCL and PGI ap-pear to be plant-type enzymes in C.parvum .Another example is a trehalose-6-phosphate synthase/phosphatase catalyzing trehalose bio-synthesis from glucose-6-phosphate and uridine diphosphate –glucose.Trehalose may serve as a sugar storage source or may function as an antidesiccant,antioxidant,or protein stability agent in oocysts,playing a role similar to that of mannitol in Eimeria oocysts (24).Orthologs of putative Eimeria mannitol synthesis enzymes were not found.However,two oxidoreductases (table S2)were identified in C.parvum ,one of which belongs to the same families as the plant mannose dehydrogenases (25)and the other to the plant cinnamyl alcohol dehydrogenases.In principle,these enzymes could synthesize protective polyol compounds,and the former enzyme could use host-derived mannose to syn-thesize mannitol.References and Notes1.D.G.Korich et al .,Appl.Environ.Microbiol.56,1423(1990).2.See supportingdata on Science Online.3.M.J.Gardner et al .,Nature 419,498(2002).4.A.T.Bankier et al .,Genome Res.13,1787(2003).5.J.C.Wootton,Comput.Chem.18,269(1994).Fig.1.(A )Schematic showing the chromosomal locations of clusters of potentially secreted proteins.Numbers of adjacent genes are indicated in paren-theses.Arrows indicate direc-tion of clusters containinguni-directional genes (encoded on the same strand);squares indi-cate clusters containingg enes encoded on both strands.Non-paralogous genes are indicated by solid gray squares or direc-tional triangles;SKSR (green triangles),FGLN (red trian-gles),and MEDLE (blue trian-gles)indicate three C.parvum –speciﬁc families of paralogous genes predominantly located at telomeres.Insl (yellow tri-angles)indicates an insulinase/falcilysin-like paralogous gene family.Cp LSP (white square)indicates the location of a clus-ter of adjacent large secreted proteins (table S2)that are cotranscriptionally regulated.Identiﬁed anchored telomeric repeat sequences are indicated by circles.(B )Schematic show-inga select locus containinga cluster of coexpressed large secreted proteins (Cp LSP).Genes and intergenic regions (regions between identiﬁed genes)are drawn to scale at the nucleotide level.The length of the intergenic re-gions is indicated above or be-low the locus.(C )Relative ex-pression levels of CpLSP (red lines)and,as a control,C.parvum Hedgehog-type HINT domain gene (blue line)duringin vitro development,as determined by semiquantitative RT-PCR usingg ene-speciﬁc primers correspondingto the seven adjacent g enes within the CpLSP locus as shown in (B).Expression levels from three independent time-course experiments are represented as the ratio of the expression of each gene to that of C.parvum 18S rRNA present in each of the infected samples (20).R E P O R T S16APRIL 2004VOL 304SCIENCE 444 o n O c t o b e r 7, 2009w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o m。

Atomic Decomposition by Basis pursuit

SIAM R EVIEWc2001Society for Industrial and Applied Mathematics Vol.43,No.1,pp.129–159Atomic Decomposition by BasisPursuit ∗Scott Shaobing Chen †David L.Donoho ‡Michael A.Saunders §Abstract.The time-frequency and time-scale communities have recently developed a large number ofovercomplete waveform dictionaries—stationary wavelets,wavelet packets,cosine packets,chirplets,and warplets,to name a few.Decomposition into overcomplete systems is not unique,and several methods for decomposition have been proposed,including the method of frames (MOF),matching pursuit (MP),and,for special dictionaries,the best orthogonal basis (BOB).Basis pursuit (BP)is a principle for decomposing a signal into an “optimal”superpo-sition of dictionary elements,where optimal means having the smallest l 1norm of coef-ﬁcients among all such decompositions.We give examples exhibiting several advantages over MOF,MP,and BOB,including better sparsity and superresolution.BP has interest-ing relations to ideas in areas as diverse as ill-posed problems,abstract harmonic analysis,total variation denoising,and multiscale edge denoising.BP in highly overcomplete dictionaries leads to large-scale optimization problems.With signals of length 8192and a wavelet packet dictionary,one gets an equivalent linear program of size 8192by 212,992.Such problems can be attacked successfully only because of recent advances in linear and quadratic programming by interior-point methods.We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.Key words.overcomplete signal representation,denoising,time-frequency analysis,time-scale anal-ysis, 1norm optimization,matching pursuit,wavelets,wavelet packets,cosine pack-ets,interior-point methods for linear programming,total variation denoising,multiscale edges,MATLAB code AMS subject classiﬁcations.94A12,65K05,65D15,41A45PII.S003614450037906X1.Introduction.Over the last several years,there has been an explosion of in-terest in alternatives to traditional signal representations.Instead of just represent-ing signals as superpositions of sinusoids (the traditional Fourier representation)we now have available alternate dictionaries—collections of parameterized waveforms—of which the wavelets dictionary is only the best known.Wavelets,steerable wavelets,segmented wavelets,Gabor dictionaries,multiscale Gabor dictionaries,wavelet pack-∗Publishedelectronically February 2,2001.This paper originally appeared in SIAM Journal onScientiﬁc Computing ,Volume 20,Number 1,1998,pages 33–61.This research was partially sup-ported by NSF grants DMS-92-09130,DMI-92-04208,and ECS-9707111,by the NASA Astrophysical Data Program,by ONR grant N00014-90-J1242,and by other sponsors./journals/sirev/43-1/37906.html†Renaissance Technologies,600Route 25A,East Setauket,NY 11733(schen@).‡Department of Statistics,Stanford University,Stanford,CA 94305(donoho@).§Department of Management Science and Engineering,Stanford University,Stanford,CA 94305(saunders@).129D o w n l o a d e d 08/09/14 t o 58.19.126.38. R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p ://w w w .s i a m .o r g /j o u r n a l s /o j s a .p h p130S.S.CHEN,D.L.DONOHO,AND M.A.SAUNDERSets,cosine packets,chirplets,warplets,and a wide range of other dictionaries are now available.Each such dictionary D is a collection of waveforms (φγ)γ∈Γ,with γa parameter,and we envision a decomposition of a signal s ass =γ∈Γαγφγ,(1.1)or an approximate decomposition s =m i =1αγi φγi +R (m ),(1.2)where R (m )is a residual.Depending on the dictionary,such a representation de-composes the signal into pure tones (Fourier dictionary),bumps (wavelet dictionary),chirps (chirplet dictionary),etc.Most of the new dictionaries are overcomplete ,either because they start out that way or because we merge complete dictionaries,obtaining a new megadictionary con-sisting of several types of waveforms (e.g.,Fourier and wavelets dictionaries).The decomposition (1.1)is then nonunique,because some elements in the dictionary have representations in terms of other elements.1.1.Goals of Adaptive Representation.Nonuniqueness gives us the possibility of adaptation,i.e.,of choosing from among many representations one that is most suited to our purposes.We are motivated by the aim of achieving simultaneously the following goals .•Sparsity.We should obtain the sparsest possible representation of the object—the one with the fewest signiﬁcant coeﬃcients.•Superresolution.We should obtain a resolution of sparse objects that is much higher resolution than that possible with traditional nonadaptive approaches.An important constraint ,which is perhaps in conﬂict with both the goals,follows.•Speed.It should be possible to obtain a representation in order O (n )or O (n log(n ))time.1.2.Finding a Representation.Several methods have been proposed for obtain-ing signal representations in overcomplete dictionaries.These range from general approaches,like the method of frames (MOF)[9]and the method of matching pursuit (MP)[29],to clever schemes derived for specialized dictionaries,like the method of best orthogonal basis (BOB)[7].These methods are described brieﬂy in section 2.3.In our view,these methods have both advantages and shortcomings.The principal emphasis of the proposers of these methods is on achieving suﬃcient computational speed.While the resulting methods are practical to apply to real data,we show below by computational examples that the methods,either quite generally or in important special cases,lack qualities of sparsity preservation and of stable superresolution.1.3.Basis Pursuit.Basis pursuit (BP)ﬁnds signal representations in overcom-plete dictionaries by convex optimization:it obtains the decomposition that minimizes the 1normof the coeﬃcients occurring in the representation.Because of the nondif-ferentiability of the 1norm,this optimization principle leads to decompositions that can have very diﬀerent properties fromthe MOF—in particular,they can be m uch sparser.Because it is based on global optimization,it can stably superresolve in ways that MP cannot.D o w n l o a d e d 08/09/14 t o 58.19.126.38. R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p ://w w w .s i a m .o r g /j o u r n a l s /o j s a .p h pATOMIC DECOMPOSITION BY BASIS PURSUIT131BP can be used with noisy data by solving an optimization problem trading oﬀa quadratic misﬁt measure with an 1normof coeﬃcients.Examples show that it can stably suppress noise while preserving structure that is well expressed in the dictionary under consideration.BP is closely connected with linear programming.Recent advances in large-scale linear programming—associated with interior-point methods—can be applied to BP and can make it possible,with certain dictionaries,to nearly solve the BP optimization problem in nearly linear time.We have implemented primal-dual log barrier interior-point methods as part of a MATLAB [31]computing environment called Atomizer,which accepts a wide range of dictionaries.Instructions for Internet access to Atomizer are given in section 7.3.Experiments with standard time-frequency dictionaries indicate some of the potential beneﬁts of BP.Experiments with some nonstandard dictionaries,like the stationary wavelet dictionary and the heaviside dictionary,indicate important connections between BP and methods like Mallat and Zhong’s [29]multiscale edge representation and Rudin,Osher,and Fatemi’s [35]total variation-based denoising methods.1.4.Contents.In section 2we establish vocabulary and notation for the rest of the article,describing a number of dictionaries and existing methods for overcomplete representation.In section 3we discuss the principle of BP and its relations to existing methods and to ideas in other ﬁelds.In section 4we discuss methodological issues associated with BP,in particular some of the interesting nonstandard ways it can be deployed.In section 5we describe BP denoising,a method for dealing with problem (1.2).In section 6we discuss recent advances in large-scale linear programming (LP)and resulting algorithms for BP.For reasons of space we refer the reader to [4]for a discussion of related work in statistics and analysis.2.Overcomplete Representations.Let s =(s t :0≤t <n )be a discrete-time signal of length n ;this may also be viewed as a vector in R n .We are interested in the reconstruction of this signal using superpositions of elementary waveforms.Traditional methods of analysis and reconstruction involve the use of orthogonal bases,such as the Fourier basis,various discrete cosine transformbases,and orthogonal wavelet bases.Such situations can be viewed as follows:given a list of n waveforms,one wishes to represent s as a linear combination of these waveforms.The waveforms in the list,viewed as vectors in R n ,are linearly independent,and so the representation is unique.2.1.Dictionaries and Atoms.A considerable focus of activity in the recent sig-nal processing literature has been the development of signal representations outside the basis setting.We use terminology introduced by Mallat and Zhang [29].A dic-tionary is a collection of parameterized waveforms D =(φγ:γ∈Γ).The waveforms φγare discrete-time signals of length n called atoms .Depending on the dictionary,the parameter γcan have the interpretation of indexing frequency,in which case the dictionary is a frequency or Fourier dictionary,of indexing time-scale jointly,in which case the dictionary is a time-scale dictionary,or of indexing time-frequency jointly,in which case the dictionary is a time-frequency ually dictionaries are complete or overcomplete,in which case they contain exactly n atoms or more than n atoms,but one could also have continuum dictionaries containing an inﬁnity of atoms and undercomplete dictionaries for special purposes,containing fewer than n atoms.Dozens of interesting dictionaries have been proposed over the last few years;we focusD o w n l o a d e d 08/09/14 t o 58.19.126.38. R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p ://w w w .s i a m .o r g /j o u r n a l s /o j s a .p h p132S.S.CHEN,D.L.DONOHO,AND M.A.SAUNDERSin this paper on a half dozen or so;much of what we do applies in other cases as well.2.1.1.T rivial Dictionaries.We begin with some overly simple examples.The Dirac dictionary is simply the collection of waveforms that are zero except in one point:γ∈{0,1,...,n −1}and φγ(t )=1{t =γ}.This is of course also an orthogonal basis of R n —the standard basis.The heaviside dictionary is the collection of waveforms that jump at one particular point:γ∈{0,1,...,n −1};φγ(t )=1{t ≥γ}.Atoms in this dictionary are not orthogonal,but every signal has a representation s =s 0φ0+n −1 γ=1(s γ−s γ−1)φγ.(2.1)2.1.2.Frequency Dictionaries.A Fourier dictionary is a collection of sinusoidalwaveforms φγindexed by γ=(ω,ν),where ω∈[0,2π)is an angular frequency variable and ν∈{0,1}indicates phase type:sine or cosine.In detail,φ(ω,0)=cos(ωt ),φ(ω,1)=sin(ωt ).For the standard Fourier dictionary,we let γrun through the set of all cosines with Fourier frequencies ωk =2πk/n ,k =0,...,n/2,and all sines with Fourier frequencies ωk ,k =1,...,n/2−1.This dictionary consists of n waveforms;it is in fact a basis,and a very simple one:the atoms are all mutually orthogonal.An overcomplete Fourier dictionary is obtained by sampling the frequencies more ﬁnely.Let be a whole number >1and let Γ be the collection of all cosines with ωk =2πk/( n ),k =0,..., n/2,and all sines with frequencies ωk ,k =1,..., n/2−1.This is an -fold overcomplete system.We also use complete and overcomplete dictionaries based on discrete cosine transforms and sine transforms.2.1.3.Time-Scale Dictionaries.There are several types of wavelet dictionaries;to ﬁx ideas,we consider the Haar dictionary with “father wavelet”ϕ=1[0,1]and “mother wavelet”ψ=1(1/2,1]−1[0,1/2].The dictionary is a collection of transla-tions and dilations of the basic mother wavelet,together with translations of a father wavelet.It is indexed by γ=(a,b,ν),where a ∈(0,∞)is a scale variable,b ∈[0,n ]indicates location,and ν∈{0,1}indicates gender.In detail,φ(a,b,1)=ψ(a (t −b ))·√a,φ(a,b,0)=ϕ(a (t −b ))·√a.For the standard Haar dictionary,we let γrun through the discrete collection ofmother wavelets with dyadic scales a j =2j /n ,j =j 0,...,log 2(n )−1,and locations that are integer multiples of the scale b j,k =k ·a j ,k =0,...,2j −1,and the collection of father wavelets at the coarse scale j 0.This dictionary consists of n waveforms;it is an orthonormal basis.An overcomplete wavelet dictionary is obtained by sampling the locations more ﬁnely:one location per sample point.This gives the so-called sta-tionary Haar dictionary,consisting of O (n log 2(n ))waveforms.It is called stationary since the whole dictionary is invariant under circulant shift.A variety of other wavelet bases are possible.The most important variations are smooth wavelet bases,using splines or using wavelets deﬁned recursively fromtwo-scale ﬁltering relations [10].Although the rules of construction are more complicated (boundary conditions [33],orthogonality versus biorthogonality [10],etc.),these have the same indexing structure as the standard Haar dictionary.In this paper,we use symmlet -8smooth wavelets,i.e.,Daubechies nearly symmetric wavelets with eight vanishing moments;see [10]for examples.D o w n l o a d e d 08/09/14 t o 58.19.126.38. R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p ://w w w .s i a m .o r g /j o u r n a l s /o j s a .p h pATOMIC DECOMPOSITION BY BASIS PURSUIT133Time 00.5100.20.40.60.81(c) Time DomainFig.2.1Time-frequency phase plot of a wavelet packet atom.2.1.4.Time-Frequency Dictionaries.Much recent activity in the wavelet com-munities has focused on the study of time-frequency phenomena.The standard ex-ample,the Gabor dictionary,is due to Gabor [19];in our notation,we take γ=(ω,τ,θ,δt ),where ω∈[0,π)is a frequency,τis a location,θis a phase,and δt is the duration,and we consider atoms φγ(t )=exp {−(t −τ)2/(δt )2}·cos(ω(t −τ)+θ).Such atoms indeed consist of frequencies near ωand essentially vanish far away from τ.For ﬁxed δt ,discrete dictionaries can be built fromtim e-frequency lattices,ωk =k ∆ωand τ = ∆τ,and θ∈{0,π/2};with ∆τand ∆ωchosen suﬃciently ﬁne these are complete.For further discussions see,e.g.,[9].Recently,Coifman and Meyer [6]developed the wavelet packet and cosine packet dictionaries especially to meet the computational demands of discrete-time signal pro-cessing.For one-dimensional discrete-time signals of length n ,these dictionaries each contain about n log 2(n )waveforms.A wavelet packet dictionary includes,as special cases,a standard orthogonal wavelets dictionary,the Dirac dictionary,and a collec-tion of oscillating waveforms spanning a range of frequencies and durations.A cosine packet dictionary contains,as special cases,the standard orthogonal Fourier dictio-nary and a variety of Gabor-like elements:sinusoids of various frequencies weighted by windows of various widths and locations.In this paper,we often use wavelet packet and cosine packet dictionaries as exam-ples of overcomplete systems,and we give a number of examples decomposing signals into these time-frequency dictionaries.A simple block diagram helps us visualize the atoms appearing in the decomposition.This diagram,adapted from Coifman and Wickerhauser [7],associates with each cosine packet or wavelet packet a rectangle in the time-frequency phase plane.The association is illustrated in Figure 2.1for a cer-tain wavelet packet.When a signal is a superposition of several such waveforms,we indicate which waveforms appear in the superposition by shading the corresponding rectangles in the time-frequency plane.D o w n l o a d e d 08/09/14 t o 58.19.126.38. R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p ://w w w .s i a m .o r g /j o u r n a l s /o j s a .p h p134S.S.CHEN,D.L.DONOHO,AND M.A.SAUNDERS2.1.5.Further Dictionaries.We can always merge dictionaries to create mega-dictionaries;examples used below include mergers of wavelets with heavisides.2.2.Linear Algebra.Suppose we have a discrete dictionary of p waveforms and we collect all these waveforms as columns of an n -by-p matrix Φ,say.The decompo-sition problem(1.1)can be written Φα=s ,(2.2)where α=(αγ)is the vector of coeﬃcients in (1.1).When the dictionary furnishes a basis,then Φis an n -by-n nonsingular matrix and we have the unique representation α=Φ−1s .When the atoms are,in addition,mutually orthonormal,then Φ−1=ΦT and the decomposition formula is very simple.2.2.1.Analysis versus Synthesis.Given a dictionary of waveforms,one can dis-tinguish analysis from synthesis .Synthesis is the operation of building up a signal by superposing atoms;it involves a matrix that is n -by-p :s =Φα.Analysis involves the operation of associating with each signal a vector of coeﬃcients attached to atoms;it involves a matrix that is p -by-n :˜α=ΦT s .Synthesis and analysis are very diﬀer-ent linear operations,and we must take care to distinguish them.One should avoid assuming that the analysis operator ˜α=ΦT s gives us coeﬃcients that can be used as is to synthesize s .In the overcomplete case we are interested in,p n and Φis not invertible.There are then many solutions to (2.2),and a given approach selects a particular solution.One does not uniquely and automatically solve the synthesis problemby applying a sim ple,linear analysis operator.We now illustrate the diﬀerence between synthesis (s =Φα)and analysis (˜α=ΦTs ).Figure 2.2a shows the signal Carbon .Figure 2.2b shows the time-frequency structure of a sparse synthesis of Carbon ,a vector αyielding s =Φα,using a wavelet packet dictionary.To visualize the decomposition,we present a phase-plane display with shaded rectangles,as described above.Figure 2.2c gives an analysis of Carbon ,with the coeﬃcients ˜α=ΦT s ,again displayed in a phase plane.Once again,between analysis and synthesis there is a large diﬀerence in sparsity.In Figure 2.2d we compare the sorted coeﬃcients of the overcomplete representation (synthesis)with the analysis coeﬃcients.putational Complexity of Φand ΦT .Diﬀerent dictionaries can im-pose drastically diﬀerent computational burdens.In this paper we report compu-tational experiments on a variety of signals and dictionaries.We study primarily one-dimensional signals of length n ,where n is several thousand.Signals of this length occur naturally in the study of short segments of speech (a quarter-second to a half-second)and in the output of various scientiﬁc instruments (e.g.,FT-NMR spec-trometers).In our experiments,we study dictionaries overcomplete by substantial factors,say,10.Hence the typical matrix Φwe are interested in is of size “thousands”by “tens-of-thousands.”The nominal cost of storing and applying an arbitrary n -by-p matrix to a p -vector is a constant times np .Hence with an arbitrary dictionary of the sizes we are interested in,simply to verify whether (1.1)holds for given vectors αand s would require tens of millions of multiplications and tens of millions of words of memory.In contrast,most signal processing algorithms for signals of length 1000require only thousands of memory words and a few thousand multiplications.Fortunately,certain dictionaries have fast implicit algorithms .By this we mean that Φαand ΦT s can be computed,for arbitrary vectors αand s ,(a)without everD o w n l o a d e d 08/09/14 t o 58.19.126.38. R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p ://w w w .s i a m .o r g /j o u r n a l s /o j s a .p h pATOMIC DECOMPOSITION BY BASIS PURSUIT135Time0.5100.20.40.60.81Time0.5100.20.40.60.81(d) Sorted CoefficientsSynthesis: SolidAnalysis: Dashed Fig.2.2Analysis versus synthesis of the signal Carbon .storing the matrices Φand ΦT ,and (b)using special properties of the matrices to accelerate computations.The most well-known example is the standard Fourier dictionary for which we have the fast Fourier transform algorithm.A typical implementation requires 2·n storage locations and 4·n ·J multiplications if n is dyadic:n =2J .Hence for very long signals we can apply Φand ΦT with much less storage and time than the matrices would nominally require.Simple adaptation of this idea leads to an algorithm for overcomplete Fourier dictionaries.Wavelets give a more recent example of a dictionary with a fast implicit algorithm;if the Haar or S8-symmlet is used,both Φand ΦT may be applied in O (n )time.For the stationary wavelet dictionary,O (n log(n ))time is required.Cosine packets and wavelet packets also have fast implicit algorithms.Here both Φand ΦT can be applied in order O (n log(n ))time and order O (n log(n ))space—much better than the nominal np =n 2log 2(n )one would expect fromnaive use of the m atrix deﬁnition.For the viewpoint of this paper,it only makes sense to consider dictionaries with fast implicit algorithms.Among dictionaries we have not discussed,such algorithms may or may not exist.2.3.Existing Decomposition Methods.There are several currently popular ap-proaches to obtaining solutions to (2.2).2.3.1.Frames.The MOF [9]picks out,among all solutions of (2.2),one whose coeﬃcients have minimum l 2norm:min α 2subject toΦα=s .(2.3)The solution of this problemis unique;label it α†.Geometrically,the collection of all solutions to (2.2)is an aﬃne subspace in R p ;MOF selects the element of this subspace closest to the origin.It is sometimes called a minimum-length solution.There is aD o w n l o a d e d 08/09/14 t o 58.19.126.38. R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p ://w w w .s i a m .o r g /j o u r n a l s /o j s a .p h p136S.S.CHEN,D.L.DONOHO,AND M.A.SAUNDERSTime0.5100.20.40.60.81Time0.5100.20.40.60.81Fig.2.3MOF representation is not sparse.matrix Φ†,the generalized inverse of Φ,that calculates the minimum-length solution to a systemof linear equations:α†=Φ†s =ΦT (ΦΦT )−1s .(2.4)For so-called tight frame dictionaries MOF is available in closed form.A nice example is the standard wavelet packet dictionary.One can compute that for all vectors v ,ΦT v 2=L n · v 2,L n =log 2(n ).In short Φ†=L −1n ΦT .Notice that ΦTis simply the analysis operator.There are two key problems with the MOF.First,MOF is not sparsity preserving .If the underlying object has a very sparse representation in terms of the dictionary,then the coeﬃcients found by MOF are likely to be very much less sparse.Each atom in the dictionary that has nonzero inner product with the signal is,at least potentially and also usually,a member of the solution.Figure 2.3a shows the signal Hydrogen made of a single atom in a wavelet packet dictionary.The result of a frame decomposition in that dictionary is depicted in a phase-plane portrait;see Figure 2.3c.While the underlying signal can be synthesized from a single atom,the frame decomposition involves many atoms,and the phase-plane portrait exaggerates greatly the intrinsic complexity of the object.Second,MOF is intrinsically resolution limited .No object can be reconstructed with features sharper than those allowed by the underlying operator Φ†Φ.Suppose the underlying object is sharply localized:α=1{γ=γ0}.The reconstruction will not be α,but instead Φ†Φα,which,in the overcomplete case,will be spatially spread out.Figure 2.4presents a signal TwinSine consisting of the superposition of two sinusoids that are separated by less than the so-called Rayleigh distance 2π/n .We analyze these in a fourfold overcomplete discrete cosine dictionary.In this case,reconstruction by MOF (Figure 2.4b)is simply convolution with the Dirichlet kernel.The result is the synthesis fromcoeﬃcients with a broad oscillatory appearance,consisting not of twoD o w n l o a d e d 08/09/14 t o 58.19.126.38. R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p ://w w w .s i a m .o r g /j o u r n a l s /o j s a .p h pATOMIC DECOMPOSITION BY BASIS PURSUIT137Fig.2.4Analyzing TwinSine with a fourfold overcomplete discrete cosine dictionary.but of many frequencies and giving no visual clue that the object may be synthesized fromtwo frequencies alone.2.3.2.Matching Pursuit.Mallat and Zhang [29]discussed a general method for approximate decomposition (1.2)that addresses the sparsity issue directly.Starting froman initial approxim ation s (0)=0and residual R (0)=s ,it builds up a sequence of sparse approximations stepwise.At stage k ,it identiﬁes the dictionary atomthat best correlates with the residual and then adds to the current approximation a scalar multiple of that atom,so that s (k )=s (k −1)+αk φγk ,where αk = R (k −1),φγk and R (k )=s −s (k ).After m steps,one has a representation of the form(1.2),with residual R =R (m ).Similar algorithms were proposed by Qian and Chen [39]for Gabor dictionaries and by Villemoes [48]for Walsh dictionaries.A similar algorithm was proposed for Gabor dictionaries by Qian and Chen [39].For an earlier instance of a related algorithm,see [5].An intrinsic feature of the algorithmis that when stopped after a few steps,it yields an approximation using only a few atoms.When the dictionary is orthogonal,the method works perfectly.If the object is made up of only m n atoms and the algorithmis run for m steps,it recovers the underlying sparse structure exactly.When the dictionary is not orthogonal,the situation is less clear.Because the algorithmis m yopic,one expects that,in certain cases,it m ight choose wrongly in the ﬁrst few iterations and end up spending most of its time correcting for any mistakes made in the ﬁrst few terms.In fact this does seem to happen.To see this,we consider an attempt at superresolution.Figure 2.4a portrays again the signal TwinSine consisting of sinusoids at two closely spaced frequencies.When MP is applied in this case (Figure 2.4c),using the fourfold overcomplete discrete cosine dictionary,the initial frequency selected is in between the two frequencies making up the signal.Because of this mistake,MP is forced to make a series of alternating corrections that suggest a highly complex and organized structure.MPD o w n l o a d e d 08/09/14 t o 58.19.126.38. R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p ://w w w .s i a m .o r g /j o u r n a l s /o j s a .p h p138S.S.CHEN,D.L.DONOHO,AND M.A.SAUNDERSFig.2.5Counterexamples for MP.misses entirely the doublet structure.One can certainly say in this case that MP has failed to superresolve.Second,one can give examples of dictionaries and signals where MP is arbitrarily suboptimal in terms of sparsity.While these are somewhat artiﬁcial,they have a character not so diﬀerent fromthe superresolution exam ple.DeVore and Temlyakov’s Example.Vladimir Temlyakov,in a talk at the IEEE Confer-ence on Information Theory and Statistics in October 1994,described an example in which the straightforward greedy algorithmis not sparsity preserving.In our adapta-tion of this example,based on Temlyakov’s joint work with DeVore [12],one constructs a dictionary having n +1atoms.The ﬁrst n are the Dirac basis;the ﬁnal atomin-volves a linear combination of the ﬁrst n with decaying weights.The signal s has an exact decomposition in terms of A atoms,but the greedy algorithm goes on forever,with an error of size O (1/√m )after m steps.We illustrate this decay in Figure 2.5a.For this example we set A =10and choose the signal s t =10−1/2·1{1≤t ≤10}.The dictionary consists of Dirac elements φγ=δγfor 1≤γ≤n andφn +1(t )=c,1≤t ≤10,c/(t −10),10<t ≤n,with c chosen to normalize φn +1to unit norm.Shaobing Chen’s Example.The DeVore–Temlyakov example applies to the original MP algorithmas announced by Mallat and Zhang in 1992.A later reﬁnem ent of the algorithm(see Pati,Rezaiifar,and Krishnaprasad [38]and Davis,Mallat,and Zhang [11])involves an extra step of orthogonalization.One takes all m terms that have entered at stage m and solves the least-squares problemmin (αi )s −m i =1αi φγi2D o w n l o a d e d 08/09/14 t o 58.19.126.38. R e d i s t r i b u t i o n s u b j e c t t o S I A M l i c e n s e o r c o p y r i g h t ; s e e h t t p ://w w w .s i a m .o r g /j o u r n a l s /o j s a .p h p。

tims热电离质谱

tims热电离质谱
热电离质谱法（Thermal Ionization Mass Spectrometry, TIMS）是一种高精度的同位素分析技术。

这种技术主要通过将待测样品，如经分离纯化的试样涂敷在Re、Ta、Pt等高熔点的金属带表面上，然后通过高温加热产生热致电离。

离子产生后，会在真空条件下穿过一个或多个细金属带，然后在真空下加速进入扇形磁场进行分离，这里的离子会根据其质荷比进行分离。

最后，这些离子被引入质谱仪进行检测和测量。

TIMS可以用于分析从Li到U的大部分元素，主要用于稀有同位素的测量，例如地球科学中的放射性同位素，包括铀系列和钍系列同位素。

此外，这项技术也被广泛应用于B、Cl、Br、Ba、Sr、Nd、Pb等元素的同位素分析测试，以及相关的超净化学室中，主要用于对各种地质样品进行Sr、Nd、Pb 等元素的同位素分析。

质谱

三、碎片离子
分子离子往往具有过剩的能量，很容易经过裂解生成碎片离子（fragment ion）。生成的碎片离子可能再次裂解，生成质量更小的碎片离子，另外裂解同时也可能发生重排，所以在化合物的质谱中，常可看到许多碎片离子峰。掌握离子的开裂规律，对利用质谱推测化合物的结构是有帮助的。
MALDI
电喷雾电离 Electrospray Ionization, ESI 大气压化学电离 Atmospheric Pressure Chemical Ionization, APCI
电子轰击电离 EI
电子轰击电离又称为电子轰击，或电子电离。是应用最普遍，发展最成熟的电离方法。
轰击电压 50-70eV, 有机分子的电离电位一般为715eV。
各类化合物的分子离子（M+ ）按稳定性可以排序如下：芳环（包括芳杂环）＞脂环＞硫醚和疏酮＞共轭烯＞直链烷烃＞酰胺＞酮＞醛＞胺＞脂＞醚＞羧酸＞支链烃＞腈＞伯醇＞叔醇＞缩醛．
芳环、共轭烯及硫化物由于电荷离域而稳定。脂环至少要断裂2个键才能裂解成碎片，而造成分于离子峰稳定。
直链酮、酯、醛、酰胺、醚和卤化物的分子中的氧或氮原子上的孤电子对可以稳定碎片离子中的正电荷中心，使得分于离子容易裂解，造成分子离子峰弱．
a.只看到丢失小基团的峰，不出现M＋，那么可以由合理的峰推测M＋。
b.如果观察到比最高质荷比小3～14原子质量单位的峰，那么最高质荷比的峰很可能不是分子离子峰。丢失许多个氢原子（或分子）需要很高的能量，不大可能发生。亚甲基是高能量的中性碎片，从分子离子峰丢失CH2 的质谱现象也不大容易发生。图谱中一旦出现丢失14原子质量单位的峰，就应怀疑样品中混有一个比被测化合物少一个CH2 的同系物。若出现比最高质荷比小3原子质量单位的峰，可能意味着分子离子峰应比最高质荷比的峰大 15原子质量单位，或者说比次高质荷比的峰大18原子质量单位。它们分别与分子离子丢失甲基（得到最高质荷比的峰）丢失水（得到次高核质比的峰）相对应。

小波分析中英文对照外文翻译文献

小波分析中英文对照外文翻译文献(文档含英文原文和中文翻译)译文：一小波研究的意义与背景在实际应用中，针对不同性质的信号和干扰，寻找最佳的处理方法降低噪声，一直是信号处理领域广泛讨论的重要问题。

目前有很多方法可用于信号降噪，如中值滤波，低通滤波，傅立叶变换等，但它们都滤掉了信号细节中的有用部分。

传统的信号去噪方法以信号的平稳性为前提，仅从时域或频域分别给出统计平均结果。

根据有效信号的时域或频域特性去除噪声，而不能同时兼顾信号在时域和频域的局部和全貌。

更多的实践证明，经典的方法基于傅里叶变换的滤波，并不能对非平稳信号进行有效的分析和处理，去噪效果已不能很好地满足工程应用发展的要求。

常用的硬阈值法则和软阈值法则采用设置高频小波系数为零的方法从信号中滤除噪声。

实践证明，这些小波阈值去噪方法具有近似优化特性，在非平稳信号领域中具有良好表现。

小波理论是在傅立叶变换和短时傅立叶变换的基础上发展起来的，它具有多分辨分析的特点，在时域和频域上都具有表征信号局部特征的能力，是信号时频分析的优良工具。

小波变换具有多分辨性、时频局部化特性及计算的快速性等属性，这使得小波变换在地球物理领域有着广泛的应用。

随着技术的发展，小波包分析(Wavelet Packet Analysis)方法产生并发展起来，小波包分析是小波分析的拓展，具有十分广泛的应用价值。

它能够为信号提供一种更加精细的分析方法，它将频带进行多层次划分，对离散小波变换没有细分的高频部分进一步分析，并能够根据被分析信号的特征，自适应选择相应的频带，使之与信号匹配，从而提高了时频分辨率。

小波包分析(wavelet packet analysis)能够为信号提供一种更加精细的分析方法，它将频带进行多层次划分，对小波分析没有细分的高频部分进一步分解，并能够根据被分析信号的特征，自适应地选择相应频带,使之与信号频谱相匹配，因而小波包具有更广泛的应用价值。

利用小波包分析进行信号降噪，一种直观而有效的小波包去噪方法就是直接对小波包分解系数取阈值，选择相关的滤波因子，利用保留下来的系数进行信号的重构，最终达到降噪的目的。

ims 离子迁移谱

ims 离子迁移谱IMS（离子迁移谱）是一种分析技术，可用于检测和鉴定气相中的化合物。

它是一种高分辨质谱技术，主要用于气体相色谱。

IMS的工作原理基于离子在电场中的移动速度差异，通过测量离子的移动时间和分子空间结构，可以对样品中的化合物进行鉴定和定量分析。

IMS技术最早于20世纪70年代开始研究和发展。

现在，IMS已经成为一种在化学和生物分析中广泛应用的技术之一。

它的应用范围包括爆炸物检测、毒品识别、化学和生物武器检测、环境监测等领域。

IMS具有快速、灵敏度高、特异性强等优点，因此被广泛应用于安全、保健、食品安全等领域。

IMS设备由三大模块组成：Ionization（离子化）、Drift Tube （漂移管）和Detector（检测器）。

首先，样品进入离子化室，通过不同的离子化技术，将气相中的化合物转化成离子。

常见的离子化技术包括化学离子化、紫外线电离和放电离化等。

然后，离子进入漂移管，漂移管中设置有电场，离子在电场中移动，根据其电荷量和质荷比大小，不同的离子具有不同的漂移速度。

最后，离子到达检测器，根据离子到达检测器所需的时间，可以推断出离子的质量和浓度。

IMS的工作原理主要基于离子在电场中的移动速度差异。

根据牛顿第二定律，运动物体的加速度与施加于其上的力成正比。

离子在电场中受到电场力的作用，其加速度与电场强度成正比。

根据质荷比的差异，不同的离子具有不同的加速度，因此具有不同的运动速度。

在漂移管中，离子的运动速度与其到达检测器所需的时间成反比。

通过测量离子的到达时间，可以推断其质量和浓度。

IMS具有许多优点。

首先，IMS具有高灵敏度。

相对于其他气相色谱技术，IMS的灵敏度更高。

其次，IMS具有快速分析速度。

IMS的分析时间通常在数秒到数分钟之间，因此适用于快速分析。

此外，IMS还具有高分辨率和低检测限等优点。

IMS的应用非常广泛。

在爆炸物检测中，IMS可以检测爆炸物的气相释放。

在毒品识别中，IMS可以检测毒品的气相组分。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

1TheIsotopeWavelet:ASignalTheoreticFrameworkforAnalyzingMassSpectrometryData

ReneHussong1,AndreasHildebrandt2Keywords:Proteomics,MassSpectrometry,Wavelets,PeakPicking1Introduction.

Computationalproteomicsisoneoftoday’sforemostchallengesinbioinformaticswithappli-cationsinﬁeldsasdiverseastoxicology,diagnostics,ortargetidentiﬁcationandvalidation.Thecurrentde-factostandardforhigh-throughputproteomicsstudies,HPLC-MS,iscapableofgeneratinghugeamountsofdataforasingleanalysis:theLC-MSmaps,whichcanbethoughtofasacollectionofone-dimensionalmassspectrometricscansforapossiblylargenumberofretentiontimes.Here,thesignalofinterest–theamountofcertainproteinsorpeptidescontainedinthesample–ishiddeninacomplexmixtureofbaselinetermsandchemicalaswellasinstrumentnoiseeﬀects.Thus,mostbiologicallyormedicallyrelevantapplicationsrequireacertain,oftenmanual,preprocessingoftherawinstrumentdata,dur-ingwhichthemassspectrometricpeaksinthesamplearedetectedandthoseofpotentialinterestareretained.Wepresentanovelwavelet-basedapproachtodetectregionsofinterestinmassspectrometricscans.Tothisend,wedesignedatailoredisotopewaveletthathasbeensuccessfullytestedinamyoglobinquantiﬁcationstudy[3].Here,wegivemoredetailedinformationaboutthedesignandcharacteristicsoftheisotopewavelet.

Previouswork.Recently,considerableresearchhasbeendevotedtotheproblemofau-

tomaticallydetectingandanalyzingpeakscontainedinMSdatasetsinaneﬃcientmanner.Whilemostsuggestedtechniquesfollowamodel-ﬁttingapproach,wavelet-basedmethodsareofparticularinterestsincetheyare–bydesign–robustagainstnoiseandbaselinetermsandthereforedonotrequirepotentiallydestructivepreprocessingoperationstoremovetheseartifactsfromthe”real”signal.Moreover,waveletsarewell-suitedtoseparateevenstronglyoverlappingpeaks[2],anartifactthatoftenfurthercomplicatestheanalysisofMSdata.Tothebestofourknowledge,allwaveletsproposedsofarforprocessingmassspectrometricscansonlyuseverylocalizedinformationaboutthesample:theytrytodetectsinglepeaks,independentoftheirsurroundings.Ontheotherhand,theneighborhoodofamassspectro-metricpeakcontainsinformationveryvaluabletoanautomatedanalysisscheme:inreality,thedetectedmolecularfragmentsalwaysoccurindiﬀerentisotopemixtures,andhence,withoneparticularmassofacertainfragment,wewillalwaysﬁndanumberofneighboringpeaksinawell-deﬁneddistance,followingabinomialintensityproﬁle.

Isotopepatternrecognition.Usingthespeciﬁcshapeofanisotopepattern,wecan

reliablyneglectalargenumberofspuriousnoisepeaksfromfurtheranalysisiftheirneigh-borhooddoesnotﬁtintotheexpectedisotopicdistribution.Ifweindeedﬁndtheexpectedproﬁle,wecandirectlyinferthechargeofthecurrentfragment,andwillbeabletoestimateitsmasswithhighaccuracy,sinceerrorsinthecomputationofapeak’scentroidcanbe

1CenterforBioinformatics,SaarlandUniversity.E-mail:rene@bioinf.uni-sb.de

2CenterforBioinformatics,SaarlandUniversity.E-mail:anhi@bioinf.uni-sb.de2averagedoutoverthepeaksinitsisotopicpattern.Whileitispossible–andcommonprac-tice–toﬁrstdetectindependentpeaksinthespectrumandafterwardstrytoﬁtthemintoisotopicpatterns,webelievethatanapproachthatdoesnotdetectisolatedpeaks,butratherfullisotopicpatternsdirectlyatthesignalprocessingstagewouldbemorereasonable.Byincorporatingisotopeinformationintothepeakpickingstage,wecanexcludealargenumberofnoisepeaksattheﬁrstpossiblestage,improvingtheperformanceoflaterapplicationsinspeedandqualityofﬁt.Sincenearlyeveryanalysispipelinereliesonthisbasic,butneverthelessimportantseedingstep,thereisaneedforpreciseandaccuratealgorithmsthatalsocopewiththeimmenseamountofdata.

2Theisotopewavelet.WepresentanovelclassofwaveletfunctionsthatisspeciﬁcallydesignedtocorrelatewithisotopicpatternscontainedinMSdata.Thebasicbuildingblockoftheisotopewaveletisgivenby

ψ(t,λ,µ):=θ(t)sin(2πµtmn)·exp(−λ)·λµtΓ(µt+1)(1)whereθdenotestheHeavisidefunctionandmnisthemassofaneutronandthusthecharacteristicdistancebetweentwosubsequentsinglechargedisotopicpeaks.µrepresentsthechargestateandthereforestretchesorsqueezesthepatternaccordingly.λ=λ(m)isalow-rankpolynomialdescribingthemeanmasssignal[1].Equation(1)canbeinterpretedasanoscillatingsinewavewithfrequencyadaptedtotheisotopicpatternandanamplitudefollowingacontinuousanalogueofaPoissondistribution.ψasdenotedinformula(1)isnotyetawavelet,sinceithasnon-vanishingaverage.Tofulﬁllthisnecessaryrequirementwesubtracttheresultingmeanfromψ.Theinvolvedpowerandgammafunctionrenderaneﬃcientcomputationofψdiﬃcult.UsinganalyticalapproximationsitbecomespossibletospeedupcomputationsandinferimportantpropertiesoftheisotopewaveletbytheuseofFouriertransforms.