Constructing Bio-molecular Databases on a DNA-based Computer

合集下载

木薯赤霉素途径DELLA蛋白基因克隆及其对干旱胁迫的响应

木薯赤霉素途径DELLA蛋白基因克隆及其对干旱胁迫的响应

木薯赤霉素途径DELLA蛋白基因克隆及其对干旱胁迫的响应廖文彬;彭明【摘要】赤霉素(GA)信号转导途径是通过DELLA抑制蛋白来调控的.笔者利用拟南芥DELLA蛋白基因序列,通过电子克隆方法首次克隆了1个木薯DELLA蛋白基因,长度为1 857 bp,具有完整的蛋白编码框的cDNA序列,命名为MeGAI.生物信息学分析显示,该蛋白具有与拟南芥DELLA蛋白一样的保守结构域,如DELLA结构域、VHYNP结构域、POLY(S/T)结构域、核定位信号、VHVID结构域、亮氨酸结构域、GRAS结构域;该基因在干旱胁迫下的表达模式研究结果表明,该基因在干旱胁迫下是下调表达的;GA生物合成重要基因GA20-氧化酶基因在干旱胁迫下的表达模式研究结果表明,两者在干旱胁迫下的表达模式具有良好的相关性,这说明GA途径可能参与木薯抗旱机制.【期刊名称】《热带生物学报》【年(卷),期】2012(003)004【总页数】7页(P298-304)【关键词】木薯DELLA蛋白;基因克隆;生物信息学分析;干旱胁迫响应【作者】廖文彬;彭明【作者单位】中国热带农业科学院热带生物技术研究所/农业部热带作物生物学与遗传资源利用重点实验室,海南海口571101;中国热带农业科学院热带生物技术研究所/农业部热带作物生物学与遗传资源利用重点实验室,海南海口571101【正文语种】中文【中图分类】Q344+.12在植物体内赤霉素(GA)是通过合成途径与信号转导途径来调控植物的生长与发育的。

高等植物GA的生物合成前体是牻牛儿基焦磷酸(GGPP)[1]。

GA生物合成的相关酶主要有:牻牛儿基焦磷酸合成酶(GGFS)、内根-贝壳杉烯合酶(CPS)、内根-贝壳杉烯台酶(ent-kaurrene synthase,KS)、内根-贝壳杉烯氧化酶(end-kaurene oxidase)、内根-贝壳杉烯酸-7β-羟化酶(ent-kaurene acid-7β-hydroxylase)、GA12醛合酶和GA13-羟化酶、GA 20-氧化酶、GA2氧化酶(GA2ox)和GA3氧化酶(GA3ox)等。

自组装铁蛋白在纳米疫苗领域的应用进展

自组装铁蛋白在纳米疫苗领域的应用进展

生物技术进展2019年㊀第9卷㊀第3期㊀240~245CurrentBiotechnology㊀ISSN2095 ̄2341进展评述Reviews㊀收稿日期:2018 ̄12 ̄26ꎻ接受日期:2019 ̄02 ̄22㊀基金项目:国家重点研发计划项目(2017YFD0500706ꎻ2016YFD0500108)ꎻ国家自然科学基金项目(31670156)资助ꎮ㊀作者简介:魏珍珍ꎬ硕士研究生ꎬ研究方向为病毒微生物ꎮE ̄mail:646122815@qq.comꎮ∗通信作者:易咏竹ꎬ副研究员ꎬ研究方向为病毒微生物ꎮE ̄mail:Yiyongzhu@126.com自组装铁蛋白在纳米疫苗领域的应用进展魏珍珍1ꎬ㊀刘兴健2ꎬ㊀王㊀朋1ꎬ㊀张志芳2ꎬ㊀易咏竹3∗1.江苏科技大学生物技术学院ꎬ江苏镇江212003ꎻ2.中国农业科学院生物技术研究所ꎬ北京100081ꎻ3.中国农业科学院蚕业研究所ꎬ江苏镇江212018摘㊀要:自组装蛋白在真核细胞及原核细胞中是普遍存在的ꎬ其对生命体的正常运转具有重要意义ꎬ甚至关系到生命体的进化ꎮ常见的自组装蛋白包括病毒颗粒(virusparticles)㊁血清白蛋白(serumalbumin)㊁丝蛋白(silkprotein)及铁蛋白(ferritin)ꎮ其中ꎬ铁蛋白可形成粒径均一㊁生物相容性良好的纳米材料ꎬ还具有独特的理化性质ꎬ如pH敏感㊁高温耐受㊁大多数变性剂耐受ꎬ即可通过调节pH来控制铁蛋白的自组装特性ꎮ铁蛋白是存在于大多数生物体内的天然蛋白ꎬ在肿瘤的诊断成像及治疗㊁药物载体和纳米疫苗等领域具有广阔的应用前景ꎮ重点探讨了铁蛋白的仿生合成及其在纳米疫苗领域的应用进展ꎬ以期为新型动物纳米疫苗的研发提供参考ꎮ关键词:自组装蛋白ꎻ重组铁蛋白ꎻ纳米疫苗DOI:10.19586/j.2095 ̄2341.2018.0139ApplicationProgressofSelf ̄assembledFerritininNano ̄vaccineWEIZhenzhen1ꎬLIUXingjian2ꎬWANGPeng1ꎬZHANGZhifang2ꎬYIYongzhu3∗1.CollegeofBiotechnologyꎬJiangsuUniversityofScienceandTechnologyꎬJiangsuZhenjiang212003ꎬChinaꎻ2.BiotechnologyResearchInstituteꎬChineseAcademyofAgriculturalSciencesꎬBeijing100081ꎬChinaꎻ3.SericulturalResearchInstituteꎬChineseAcademyofAgriculturalSciencesꎬJiangsuZhenjiang212018ꎬChinaAbstract:Self ̄assembledproteinsareubiquitousineukaryoticandprokaryoticcellsꎬandtheyareimportantforlivingorganismstomaintainthenormaloperationꎬandevenrelatedtotheevolutionoflivingorganisms.Commonself ̄assembledproteinsincludevirusparticlesꎬserumalbuminꎬsilkproteinandferritin.Amongthemꎬferritincanformnanomaterialswithuniformparticlesizeandgoodbiocompatibility.ItalsohasuniquephysicalandchemicalpropertiesꎬsuchaspHsensitivityꎬhightemperaturetoleranceꎬandresistancetomostdenaturantsꎬsoastocontroltheself ̄assemblycharacteristicsofferritinbypHregulation.Ferritinisanaturalproteinfoundinmostlivingorganismsꎬandithasabroadapplicationprospectintumordiagnosticimagingandtherapyꎬdrugcarrierandnano ̄vaccine.Thebionicsynthesisofferritinanditsapplicationinnano ̄vaccineweremainlydiscussedinordertoprovidereferencesfortheresearchanddevelopmentofnovelanimalnano ̄vaccine.Keywords:self ̄assembledproteinꎻrecombinantferritinꎻnano ̄vaccine㊀㊀自组装蛋白在真核细胞及原核细胞中是普遍存在的ꎬ蛋白质亚基间会自发组装构成高度有序的结构ꎬ这是维持机体正常运转的保证ꎬ也是机体进化的推动力[1]ꎮ由自组装蛋白形成的纳米材料ꎬ不仅具有生物相容性良好以及粒径均一㊁稳定的特性ꎬ还在细胞成像㊁病灶检测和药物缓释等方面具有广阔的应用前景ꎮ到目前为止ꎬ研究最多的自组装蛋白纳米颗粒包括病毒颗粒(virusparticles)㊁血清白蛋白(se ̄rumalbumin)㊁丝蛋白(silkprotein)及铁蛋白(fer ̄ritin)ꎮ其中ꎬ病毒颗粒侵染宿主细胞并在宿主细胞内的自组装行为ꎬ是自然界中典型的生物纳米. All Rights Reserved.材料的形成方式ꎬ主要用于特异性检测以及病毒侵染宿主细胞的机制和路径的研究[2ꎬ3]ꎬ经基因修饰后还可用于研制借助病毒释放基因的药物等方面的研究[4]ꎻ血清白蛋白是脊椎动物血浆中含量最高的蛋白质ꎬ其分子的弹性良好ꎬ结构改变后也极易恢复ꎬ不同来源的血清白蛋白的空间构造均十分保守[5]ꎬ在药物传递系统领域拥有潜在的应用前景[6]ꎻ丝蛋白是一类线状蛋白的生物高分子材料ꎬ可抗紫外线ꎬ也可抗蛋白水解酶ꎬ其柔韧性好㊁抗疲劳度高ꎬ有着与钢材类似的张力强度ꎬ还具有良好的热㊁酸㊁碱稳定性和生物相容性ꎬ在生物材料[7]和药物载体[8]领域应用广泛ꎮ而铁蛋白是存在于大多数生物体内的天然蛋白ꎬ具有独特的理化性质:①铁蛋白对pH不耐受ꎬ较为敏感ꎬ在酸性条件(pH2.0)下铁蛋白外壳会解体成亚基ꎬ而当pH回升到生理条件(pH7.4)时ꎬ各亚基又重组形成完整的铁蛋白[9ꎬ10]ꎻ②铁蛋白的天然高级结构不受多种变性剂的影响ꎬ一般蛋白质在1~4mol/L的低浓度盐酸胍或者脲溶液中就会发生变性ꎬ而铁蛋白在6mol/L的盐酸胍或8mol/L的脲溶液中才会发生蛋白质解聚ꎬ即铁蛋白对变性剂的耐受性高[11]ꎻ③铁蛋白对高温具有较高的耐受性ꎬ大多数蛋白质在温度高于生理条件后极易变性ꎬ但铁蛋白在高温(70ħ~80ħ)时可维持10min以上不会发生变性ꎬ且其高级结构维持完好[12]ꎮ基于铁蛋白独特的理化性质ꎬ本文主要对铁蛋白的仿生合成及其在肿瘤的诊断成像及治疗㊁药物载体和纳米疫苗领域的应用进展进行了综述ꎬ阐述了天然铁蛋白的结构及修饰㊁人工制备重组铁蛋白的研究进程ꎬ分析了重组铁蛋白在各领域中的应用ꎬ以期为研发对机体无害㊁适应不同生物体的新型疫苗提供参考ꎮ1㊀铁蛋白的结构及其修饰在生命体中ꎬ天然的铁蛋白主要由水合氧化铁核和蛋白质外壳2个部分组成ꎬ其结构是高度对称的ꎬ封闭的笼形结构由24个亚基组成ꎮ哺乳动物铁蛋白外壳的分子量约为480kDaꎬ外直径约为12nmꎬ可容纳约4500个铁原子的内腔直径约为8nmꎮ哺乳动物机体中的铁蛋白外壳是由H亚基和L亚基组成的ꎬ但亚铁氧化酶活性中心(ferroxidasecenter)只存在于H亚基上[13]ꎮ许多在机体中发挥重要作用的蛋白质和辅酶的组成成分都含有铁元素ꎻ而广泛存在于机体中的铁蛋白在铁离子代谢中起着至关重要的作用ꎬ可维持铁的稳态ꎬ抵抗氧化应激ꎻ此外ꎬ铁蛋白还可以捕捉游离二价铁将其氧化并形成稳定的铁核ꎬ从而消除过量金属离子的其他毒性作用[14]ꎮ自然界中的铁蛋白都含有铁核ꎬ其组分是水铁矿(5Fe2O3 9H2O)ꎬ也可称之为全铁蛋白(ho ̄loferritin)ꎬ即铁蛋白(ferritin)ꎬ而不含铁内核的铁蛋白ꎬ称为去铁铁蛋白(apoferritin)ꎮ铁蛋白的球形中空结构有3个界面:内表面㊁外表面及亚基间接触面(图1)[15]ꎮ在对铁蛋白进行修饰改造时ꎬ其内表面可将材料包裹于铁蛋白内核ꎬ作为纳米复合材料合成的纳米反应器ꎻ外表面可连接配体ꎬ赋予铁蛋白特殊功能ꎻ亚基间接触面可通过调节溶液pH完成解聚与重组ꎬ开发铁蛋白的新功能ꎮ图1㊀可用于修饰的铁蛋白3个界面[16]Fig.1㊀Threeinterfacesofferritinthatcanbeusedformodification[16].2㊀重组铁蛋白的人工制备随着交叉学科的快速发展㊁生物学与纳米技术的联用ꎬ仿生合成铁蛋白技术也逐渐得到改善ꎮ1991年ꎬ英国巴斯大学首次合成了磁性铁蛋白ꎬ他们以天然马脾铁蛋白为模板ꎬ人工除去了水铁矿(5Fe2O3 9H2O)的天然内核ꎬ并将磁性铁核在马脾铁蛋白的空腔内合成[17]ꎬ这项工作开辟了一个新领域 仿生合成纳米颗粒ꎮ但这同样也存在着问题ꎬ在利用天然马脾铁蛋白外壳作为模板142魏珍珍ꎬ等:自组装铁蛋白在纳米疫苗领域的应用进展. All Rights Reserved.合成纳米颗粒前ꎬ首先要除去蛋白质内的天然水铁矿内核ꎬ而去核的过程需要利用可破坏蛋白质外壳的强还原剂处理铁蛋白ꎬ以致亚铁离子不能全部进入蛋白质外壳的内核中ꎬ而是吸附到蛋白质外壳的表面被氧化ꎬ从而导致合成的铁蛋白聚集[18]ꎮ天然铁蛋白的自组装特性ꎬ使得在大肠杆菌中批量表达重组铁蛋白成为可能ꎮ利用大肠杆菌表达的铁蛋白亚基可以自组装形成24聚体的铁蛋白外壳ꎬ与天然铁蛋白相比ꎬ结构一致㊁分散性好㊁粒径均一ꎬ所以在不破坏铁蛋白外壳完整性的前提下ꎬ可将大肠杆菌作为优良的模式生物来仿生合成铁蛋白纳米颗粒ꎮ2006年ꎬ美国蒙大拿州立大学首次利用大肠杆菌成功获得几乎纯的铁蛋白外壳ꎬ并以这些铁蛋白外壳为模板ꎬ仿生合成了磁性铁蛋白[19]ꎮ这种新技术不仅极大地简化了分离纯化天然铁蛋白外壳的过程ꎬ而且避免了强还原剂对蛋白质外壳的破坏ꎬ保持了蛋白质外壳良好的完整性ꎬ使得整个合成过程高效且快速ꎮ值得注意的是ꎬ虽然利用大肠杆菌可仿生合成与天然铁蛋白结构相似的铁蛋白ꎬ但是二者内核晶型不同ꎬ仿生合成铁蛋白的内核为Fe3O4ꎬ具有超顺磁性ꎬ这也是仿生合成的铁蛋白被称为磁性铁蛋白的原因ꎮ目前ꎬ已能够成功构建基于大肠杆菌的铁蛋白原核表达体系ꎬ利用IPTG诱导表达后ꎬ经过纯化㊁复性等步骤ꎬ即可获得与天然结构相同的铁蛋白纳米颗粒ꎬ其在生物医药领域具有广泛的应用前景[20]ꎮ仿生合成的铁蛋白纳米颗粒与其他纳米颗粒相比ꎬ具有以下优点:①粒径小ꎬ约为12nmꎬ有利于其在病灶组织(如肿瘤)的渗透和积累[21]ꎻ②粒径均一ꎬ在大肠杆菌中能仿生合成理想的粒径均匀且分散性良好的铁蛋白纳米颗粒ꎻ③生物相容性良好ꎬ利用大肠杆菌表达的人重组铁蛋白纳米颗粒制成的生物技术药物ꎬ应用于机体后ꎬ不易引起免疫排斥反应ꎬ对机体的毒性有较大程度的降低ꎻ④易于靶向性修饰ꎬ铁蛋白纳米颗粒在合成时可直接通过基因修饰ꎬ在外壳及亚基间接触面上修饰所需肽段等ꎬ使其成为纳米载体ꎮ此外ꎬ仿生合成的磁性铁蛋白纳米颗粒内核为Fe3O4ꎬ具有超顺磁性和过氧化物酶活性的双功能特性ꎮFe3O4的内核直径在4~7nmꎬ具有超顺磁性ꎬ使其成为潜在的MRI造影剂[22]ꎮ而我国科学家于2007年发现ꎬFe3O4磁性纳米颗粒还具有过氧化物酶的活性[23]ꎬ即在显色底物中含有H2O2时ꎬFe3O4磁性纳米颗粒可以将其催化氧化发生颜色反应ꎮ已有研究表明ꎬ铁蛋白的表达量在病变的脑组织和多种类型的肿瘤细胞中都较正常组织细胞多[24]ꎮ目前ꎬ检测脑神经退化性疾病及各种肿瘤的无创伤性的手段即为磁共振成像(magneticresonanceimagingꎬMRI)ꎬ可以对病变组织内的铁含量进行定量检测[25]ꎮ因此ꎬ仿生合成的磁性铁蛋白纳米颗粒在病灶诊断及治疗中具有巨大的应用前景(图2)ꎮ3㊀铁蛋白纳米颗粒的应用3.1㊀铁蛋白纳米颗粒在药物载体领域的应用铁蛋白纳米颗粒在药物载体领域ꎬ不仅可作为载体ꎬ同时还可作为信号分子ꎮ基于铁蛋白纳米颗粒具有的良好的生物相容性和特殊的球形空腔结构ꎬ其可成为铁氰化物㊁荧光素等各类小分子探针的理想载体ꎮ英国诺丁汉大学以无内核的铁蛋白外壳作为纳米材料的载体ꎬ系统地评估了铁蛋白包装对纳米材料稳定性及生物相容性的影响ꎮ实验结果表明ꎬ包装有探针的纳米颗粒不仅具有量子点优异的荧光性质ꎬ同时ꎬ还因为被铁蛋白包裹而降低了相应的毒性ꎻ通过进一步对铁蛋白外壳的修饰ꎬ包裹有量子点的铁蛋白纳米颗粒还可实现靶向细胞识别ꎬ并使得靶向过程可视[28]ꎬ为后期的临床诊断及病灶组织治疗提供了重要的技术支持ꎮ此外ꎬ铁蛋白也可作为信号分子ꎬ在生物传感器中利用其纳米材料的特性ꎬ双向放大电信号ꎬ构建一种电化学免疫检测方法ꎮ如利用金纳米颗粒与rGO ̄AuNPs材料修饰的玻碳电极合成AuNPs ̄Ab2 ̄Ferritin复合物ꎬ通过2次免疫反应可形成AuNPs ̄Ab2 ̄ferritin/Ag/Ab1/rGO ̄Au ̄chi/GCꎬ一种特殊的夹心免疫结构ꎬ该结构能实现检测人血浆硝化铜蓝蛋白(nitratedceruloplasmin)的目的[29]ꎮ3.2㊀铁蛋白纳米颗粒在纳米疫苗领域的应用研究人员基于铁蛋白特殊的空间结构ꎬ对其进行改造ꎬ结果表明ꎬ生物基因改造不会影响铁蛋白亚基间的自组装ꎬ而且24个亚基的基因均可进242生物技术进展CurrentBiotechnology. All Rights Reserved.图2㊀可用于靶向肿瘤并使其可视化的磁性铁蛋白纳米颗粒Fig.2㊀Magneticferritinnanoparticlesthatcanbeusedtotargetandvisualizetumors.注:A:仿生合成磁性铁蛋白[26]ꎻB:磁性铁蛋白的双功能特性ꎻC:常规免疫组化方法ꎻD:磁性铁蛋白检测肿瘤新技术[27]ꎮ行改造ꎬ这一发现使得铁蛋白纳米颗粒成为一个疫苗开发和抗原递呈的平台[30]ꎮ2006年ꎬ美国新世纪医药公司首次利用铁蛋白外壳作为呈递抗原的疫苗研发平台ꎬ在铁蛋白L亚基的N端融合表达HIV ̄1病毒的Tat肽段ꎬ利用铁蛋白的自组装特性生成融合蛋白ꎬ随后进行动物免疫实验ꎬ实验结果表明ꎬ该融合蛋白在动物机体内可激起免疫应答反应[30]ꎮ2013年ꎬ美国国家卫生研究所和过敏与传染病研究所将铁蛋白应用于流感疫苗的研发ꎬ将幽门螺杆菌铁蛋白亚基的N端与流感病毒的血凝素蛋白(hemagglutininꎬHA)基因融合ꎬ当铁蛋白自组装形成融合蛋白时ꎬ由蛋白核心向外伸出引入的血凝素HAꎬ由于铁蛋白具有三重对称轴ꎬ因而可形成8个HA突起ꎬ与流感病毒表面的突起相似(图3)[32]ꎮ将该融合蛋白纳米颗粒作为抗原进行动物免疫实验ꎬ在动物体内成功诱导了中和性抗体ꎬ达到了流感病毒疫苗的作用ꎮ同时ꎬ与传统灭活病毒疫苗相比ꎬ这种流感血凝素融合蛋白纳米颗粒在动物体内产生的中和性抗体水平高10倍以上ꎬ而且存在于铁蛋白表面的HA突起能特异性识别流感病毒HA三聚体蛋白的茎部和头部这2个高度保守的位点ꎮ此外ꎬ这种新型疫苗的免疫范围更广ꎬ能中和绝大多数同型病毒ꎮ通过基因修饰ꎬ铁蛋白自组装纳米图3㊀流感病毒HA的铁蛋白纳米颗粒的分子设计和表征[32]Fig.3㊀ThemoleculardesignandcharacterizationofferritinnanoparticlesfrominfluenzavirusHA[32].注:纳米粒子的负面染色TEM图像ꎮ1~6代表了HA尖峰在图像中的编号ꎮ342魏珍珍ꎬ等:自组装铁蛋白在纳米疫苗领域的应用进展. All Rights Reserved.颗粒还可以融合表达其他病毒抗原作为抗原递呈的制备疫苗平台ꎬ为各类动物病毒病的防治提供了较好的技术支持ꎮ目前ꎬ在制备双组分铁蛋白纳米颗粒ꎬ即同时表达多种抗原的铁蛋白纳米颗粒方面也做了尝试(图4)ꎬ纳米颗粒上的抗原多聚化可以使中和抗体响应得到改善[33]ꎮ在此研究中ꎬ设计了双组分铁蛋白变体ꎬ允许在1个颗粒上以确定的比例和几何图案黏着2种不同的抗原ꎮ双组分铁蛋白专门设计用于三聚体抗原ꎬ每个抗原接受每个颗粒图4㊀双组分铁蛋白纳米粒子的设计ꎬ用于附着不同的三聚体抗原[33]Fig.4㊀Designoftwo ̄componentferritinnanoparticlesforattachmentofdifferenttrimericantigens[33].注:单组分铁蛋白的示意图ꎮ其具有8个拷贝的三聚体抗原A(黑色)和双组分铁蛋白ꎬ每个三聚体抗原A具有4个拷贝(黑色)和B(灰色)ꎮ4个三聚体ꎬ并用来自HIV ̄1包膜(Env)和流感血凝素(HA)的抗原进行测试ꎮ用具有不同Env㊁HA或2种抗原的双组分铁蛋白颗粒对豚鼠进行免疫ꎬ引发针对各病毒的中和抗体应答ꎮ该结果证明了铁蛋白表面可展示不只1种抗原ꎬ也提供了双组分纳米颗粒自组装原理的证据ꎬ将来可作为三聚体抗原的多聚体免疫原呈递的一般技术ꎮ此研究的成功展开ꎬ为后期新型疫苗的制备开拓了新的思路ꎮ相比于直接在铁蛋白表面表达抗原ꎬ也可在铁蛋白表面或者空腔内连接衍生自卵清蛋白的抗原肽OT ̄1(SIINFEKL)或OT ̄2(ISQAVHAA ̄HAEINEAGR)ꎬ然后再将重组铁蛋白作用于树突细胞ꎬ其可启动和控制抗原特异性免疫应答ꎮ树突细胞在其中起着重要作用ꎬ即将抗原内化ꎬ再加工和呈递给原始T淋巴细胞并诱导其增殖和分化为效应细胞(图5)ꎬ导致抗原特异性靶细胞的选择性杀伤[21]ꎬ同时ꎬIFN ̄γ/IL ̄2和IL ̄10/IL ̄13细胞因子的产生可证实铁蛋白纳米疫苗会增强机体的免疫反应ꎮ基于树突细胞的铁蛋白纳米颗粒疫苗的开发已成为体内直接抗原特异性适应性免疫的非常有前景的一种方法ꎮ图5㊀携带OT肽的铁蛋白蛋白笼纳米颗粒诱导的抗原特异性T细胞增殖和随后的免疫应答[34]Fig.5㊀FerroproteinproteincagenanoparticlescarryingOTpeptideinducedantigen ̄specificTcellproliferationandsubsequentimmuneresponse[34].4㊀展望自组装蛋白广泛存在于机体中ꎬ与其他自组装蛋白相比ꎬ自组装铁蛋白具有独特的解聚与重组方式ꎬ可耐受高热和高浓度变性剂ꎬ同时其独特的高级空间结构也便于进行基因定向修饰ꎬ可在一定程度上对修饰过程实现精准控制ꎮ通过生物手段与化学方法相结合的修饰方法ꎬ如在铁蛋白表面共价连接各类大分子ꎬ可实现特异性修饰特定位点ꎬ还可赋予铁蛋白更多新的性能ꎬ铁蛋白的应用范围也被拓宽ꎻ而通过将标记蛋白与铁蛋白亚基融合表达ꎬ使融合蛋白有序的展示在铁蛋白外壳的外表面ꎬ可提高抗体或药物等目标蛋白的载量和效率ꎬ从而作为一种潜在的新型疫苗ꎮ同时ꎬ基于铁蛋白的纳米颗粒特性ꎬ其也可作为信号442生物技术进展CurrentBiotechnology. All Rights Reserved.分子在生物传感器中双向放大信号ꎬ构建电化学免疫检测方法ꎬ在疾病诊治方面具有广阔的应用前景ꎮ因而ꎬ实现铁蛋白的改造及修饰多功能化是未来研究的重要方向ꎮ不过ꎬ有关自组装铁蛋白的研究仍有以下3个方面亟待深入探究:①铁蛋白的磁学性质及生理机制ꎻ②铁蛋白表面展示融合蛋白后ꎬ其具体的作用机制及通路ꎻ③目前作为抗原载体的铁蛋白多为昆虫的铁蛋白及马脾铁蛋白ꎬ其他生物体内的铁蛋白的具体分类及差异ꎮ使用从机体提取的天然无害蛋白来生产各种疫苗是值得期待的ꎬ并且生产纳米级疫苗是近期的研究重点ꎬ利用铁蛋白表面表达单种融合抗原甚至可能是多种融合抗原来生产新型疫苗必将成为未来的研究热点ꎮ参㊀考㊀文㊀献[1]㊀BergerBꎬWaldispühlJ.Novelperspectivesonproteinstructureprediction[A].In:ProblemSolvingHandbookinComputationalBiologyandBioinformatics[M].Boston:Spring ̄erꎬ2010ꎬ179-207.[2]㊀BeecherJF.Organicmaterials:Woodꎬtreesandnanotechnology[J].Nat.Nanotechnol.ꎬ2007ꎬ2(8):466-467. [3]㊀DouglasTꎬYoungM.Host ̄guestencapsulationofmaterialsbyassembledvirusproteincages[J].Natureꎬ1998ꎬ393(6681):152-155.[4]㊀WeaverJꎬZakeriRꎬAouadiSꎬetal..Synthesisandcharacter ̄izationofquantumdot ̄polymercomposites[J].J.Mater.Chem.ꎬ2009ꎬ19(20):3198-3206.[5]㊀BeattieWGꎬDugaiczykA.Structureandevolutionofhumanα ̄fetoproteindeducedfrompartialsequenceofclonedcDNA[J].Geneꎬ1982ꎬ20(3):415-422.[6]㊀何乃普ꎬ潘素娟ꎬ王荣民.热诱导白蛋白与壳聚糖在溶液中的自组装[J].高分子学报ꎬ2015(1):61-69. [7]㊀吴蕾.丝素蛋白取向凝胶/羟基磷灰石复合支架的设计及对骨髓间充质干细胞成骨性能的调控研究[D].江苏苏州:苏州大学ꎬ硕士学位论文ꎬ2017.[8]㊀雷容.多孔丝素蛋白颗粒的制备及其作为阿霉素药物载体的研究[D].杭州:浙江理工大学ꎬ硕士学位论文ꎬ2018. [9]㊀KangSꎬOltroggeLMꎬBroomellCCꎬetal..Controlledas ̄semblyofbifunctionalchimericproteincagesandcompositionanalysisusingnoncovalentmassspectrometry[J].J.Am.Chem.Soc.ꎬ2008ꎬ130(49):16527-16529.[10]㊀王占通.基于铁蛋白纳米颗粒的诊断治疗一体化探针研究[D].福建厦门:厦门大学ꎬ博士学位论文ꎬ2017. [11]㊀SantambrogioPꎬPintoPꎬSoniaLꎬetal..Effectsofmodifica ̄tionsnearthe2 ̄ꎬ3 ̄and4 ̄foldsymmetryaxesonhumanfer ̄ritinrenaturation[J].Biochem.J.ꎬ1997ꎬ322(2):461-468. [12]㊀StefaniniSꎬCavalloSꎬWangCQꎬetal..ThermalstabilityofhorsespleenapoferritinandhumanrecombinantHapoferritin[J].Arch.Biochem.Biophys.ꎬ1996ꎬ325(1):58-64. [13]㊀StillmanTJꎬHempsteadPDꎬArtymiukPJꎬetal..Thehigh ̄resolutionX ̄raycrystallographicstructureoftheferritin(EcFt ̄nA)ofEscherichiacoliꎻcomparisonwithhumanHferritin(HuHF)andthestructuresoftheFe3+andZn2+derivatives[J].J.Mol.Biol.ꎬ2001ꎬ307(2):587-603.[14]㊀AlkhateebAAꎬConnorJR.Nuclearferritin:Anewroleforferritinincellbiology[J].BBAGeneSubjectsꎬ2010ꎬ1800(8):793-797.[15]㊀UchidaMꎬKangSꎬReichhardtCꎬetal..Theferritinsuper ̄family:Supramoleculartemplatesformaterialssynthesis[J].BBAGeneSubjectsꎬ2010ꎬ1800(8):834-845.[16]㊀胡有生ꎬ邹国林.用铁蛋白合成纳米粒子的研究进展[J].氨基酸和生物资源ꎬ2003ꎬ25(3):34-36.[17]㊀MeldrumFCꎬWadeVJꎬNimmoDLꎬetal..Synthesisofin ̄organicnanophasematerialsinsupramolecularproteincages[J].Natureꎬ1991ꎬ349(6311):684-687.[18]㊀MoskowitzBMꎬFrankelRBꎬWaltonSAꎬetal..Determina ̄tionofthepreexponentialfrequencyfactorforsuper ̄paramagneticmaghemiteparticlesinmagnetoferritin[J].J.Geophys.Res.Sol.Ea.ꎬ1997ꎬ102(B10):22671-22680. [19]㊀OkudaMꎬKobayashiYꎬSuzukiKꎬetal..Self ̄organizedinor ̄ganicnanoparticlearraysonproteinlattices[J].NanoLett.ꎬ2005ꎬ5(5):991-993.[20]㊀李志鹏ꎬ刘福航ꎬ崔奎青ꎬ等.铁蛋白Ferritin原核表达和纯化及纳米颗粒胞外自组装[J].畜牧兽医学报ꎬ2018ꎬ49(1):75-82.[21]㊀DreherMRꎬLiuWꎬMichelichCRꎬetal..Tumorvascularpermeabilityꎬaccumulationꎬandpenetrationofmacromoleculardrugcarriers[J].J.NatlCancerI.ꎬ2006ꎬ98(5):335-344. [22]㊀UchidaMꎬTerashimaMꎬCunninghamCHꎬetal..Ahumanferritinironoxidenano ̄compositemagneticresonancecontrastagent[J].Magnet.Reson.Med.ꎬ2008ꎬ60(5):1073-1081. [23]㊀阎锡蕴ꎬ高利增ꎬ聂棱ꎬ等.磁性纳米材料的新功能及新用途:中国ꎬ101037676B[P].2011-05-04.[24]㊀SabbahENꎬKadoucheJꎬEllisonDꎬetal..InvitroandinvivocomparisonofDTPA ̄andDOTA ̄conjugatedantiferritinmono ̄clonalantibodyforimagingandtherapyofpancreaticcancer[J].Nucl.Med.Biol.ꎬ2007ꎬ34(3):293-304.[25]㊀HammondKEꎬMetcalfMꎬCarvajalLꎬetal..Quantitativeinvivomagneticresonanceimagingofmultiplesclerosisat7Teslawithsensitivitytoiron[J].Ann.Neurol.ꎬ2008ꎬ64(6):707-713.[26]㊀FanKꎬCaoCꎬPanYꎬetal..Magnetoferritinnanoparticlesfortargetingandvisualizingtumourtissues[J].Nat.Nanotechnol.ꎬ2012ꎬ7(7):459-464.[27]㊀FanKꎬGaoLꎬYanX.Humanferritinfortumordetectionandtherapy[J].WIRESNanomed.Nanobiotechnol.ꎬ2013ꎬ5(4):287-298.[28]㊀TuryanskaLꎬBradshawTDꎬSharpeJꎬetal..Thebiocompati ̄bilityofapoferritin ̄encapsulatedPbSquantumdots[J].Smallꎬ2009ꎬ5(15):1738-1741.[29]㊀刘碧荣.基于纳米技术的免疫传感器在生物标志物检测中的应用[D].武汉:华中师范大学ꎬ硕士学位论文ꎬ2014. [30]㊀张婷婷.基于铁蛋白的纳米结构可控自组装与功能化[D].河南开封:河南大学ꎬ硕士学位论文ꎬ2016.[31]㊀CarterDCꎬLiCQ.Ferritinfusionproteinsforuseinvaccinesandotherapplications:USꎬ20040006001A1[P].2004-01-08. [32]㊀KanekiyoMꎬWeiCJꎬYassineHMꎬetal..Self ̄assemblinginfluenzananoparticlevaccineselicitbroadlyneutralizingH1N1antibodies[J].Natureꎬ2013ꎬ499(7456):102-106. [33]㊀GeorgievISꎬJoyceMGꎬChenREꎬetal..Two ̄componentferritinnanoparticlesformultimerizationofdiversetrimericanti ̄gens[J].ACSInfect.Dis.ꎬ2018ꎬ4(5):788-796. [34]㊀HanJAꎬKangYJꎬShinCꎬetal..Ferritinproteincagenano ̄particlesasversatileantigendeliverynanoplatformsfordendriticcell(DC) ̄basedvaccinedevelopment[J].Nanomedicineꎬ2014ꎬ10(3):561-569.542魏珍珍ꎬ等:自组装铁蛋白在纳米疫苗领域的应用进展. All Rights Reserved.。

合成生物学整理

合成生物学整理

本课程。。。综合Wikipedia和Wikigenes的基础上,整合了班级同学的共同努力,为今后的合成生物学教学提供参考。
所有编者(按姓氏拼音):
陈鹏祥、陈颂赞、丁彦甫、高嘉豪、胡大辉、林汉扬、刘苏滢、蒋刘一琦、潘唯玮、沈浩卿、盛涛涛、冉雪彬、王紫鑫、吴芑柔、肖雨曦、薛继统、杨文君、叶青、袁略真、张霈婧、张正越、郑炯壕、仲策、周丽娜
合成生物学的快速发展由三项关键的技术促成——数学模型的建立、DNA的合成以及DNA测序技术。
4.1 数学模型的建立
正如系统生物学,合成生物学的发展离不开生物过程的数学模型建立。近来,人们开始发展更大规模、更多层次的基因调节网络模型,以达到模拟整体基因调控网络中的生物分子相互作用,其中包括转录、翻译和基因表达调控的激活与抑制。目前有很多商业化的免费软件可供系统生物学家们使用,但是我们也注意到了合成生物学家们对整合的开发环境(IDE)的需求,例如各种工程化领域中的计算机辅助设计系统(CAD)。除了整合开发环境的需求,高通量的计算也对合成生物学的研究起到关键作用,例如使用并行计算、云计算等方式进行有效的药物发现。具体来说,模式设计、模型建立、校验合成生物学设备与系统,以及生物学参数的量化处理都是合成生物学中模型建立的重要组成部分,原因在于生物学过程的模型预测与真实情况的差异可能让我们发现对生物学过程的假设的一些缺陷,并且提示我们合成生物系统中存在的“故障”。未来,合成生物学的强力工具将帮助我们完成时间依赖的参数测量,以及大量的参数平行测量。
(合成生物学中的一些工具以及他们的应用描述已经展示在了下面的链接中:/e/art/e/187.html)
4.2 DNA合成
DNA或寡核苷酸的化学合成是合成生物学的重要组分。多亏了自动DNA合成仪的进步,现在合成和集成完整的基因、调控元件、基因回路或者整个微生物基因组已经成为可能。Khorana和他的同事们先驱性地探究了从寡核苷酸合成DNA的工作,并首次完成了一个酵母tRNA基因。这个过程也叫作基因的人工合成,因为不需要使用起始DNA模板。生长激素抑制素是第一个被化学合成的肽链,而白细胞干扰素则是第一个能在细菌里表达的人工蛋白合成基因。这些研究揭示了合成生物学的可能具有的应用。DNA的化学合成通常比重组DNA克隆更加直接经济,且被生物技术常规使用。

J. Comput. Chem.

J. Comput. Chem.

2D Depiction of Nonbonding Interactions forProtein ComplexesPENG ZHOU,1FEIFEI TIAN,2ZHICAI SHANG11Institute of Molecular Design&Molecular Thermodynamics,Department of Chemistry,Zhejiang University,Hangzhou310027,China2College of Bioengineering,Chongqing University,Chongqing400044,ChinaReceived7May2008;Revised25June2008;Accepted22July2008DOI10.1002/jcc.21109Published online22October2008in Wiley InterScience().Abstract:A program called the2D-GraLab is described for automatically generating schematic representation of nonbonding interactions across the protein binding interfaces.The inputfile of this program takes the standard PDB format,and the outputs are two-dimensional PostScript diagrams giving intuitive and informative description of the protein–protein interactions and their energetics properties,including hydrogen bond,salt bridge,van der Waals interaction,hydrophobic contact,p–p stacking,disulfide bond,desolvation effect,and loss of conformational en-tropy.To ensure these interaction information are determined accurately and reliably,methods and standalone pro-grams employed in the2D-GraLab are all widely used in the chemistry and biology community.The generated dia-grams allow intuitive visualization of the interaction mode and binding specificity between two subunits in protein complexes,and by providing information on nonbonding energetics and geometric characteristics,the program offers the possibility of comparing different protein binding profiles in a detailed,objective,and quantitative manner.We expect that this2D molecular graphics tool could be useful for the experimentalists and theoreticians interested in protein structure and protein engineering.q2008Wiley Periodicals,Inc.J Comput Chem30:940–951,2009Key words:protein–protein interaction;nonbonding energetics;molecular graphics;PostScript;2D-GraLabIntroductionProtein–protein recognition and association play crucial roles in signal transduction and many other key biological processes. Although numerous studies have addressed protein–protein inter-actions(PPIs),the principles governing PPIs are not fully under-stood.1,2The ready availability of structural data for protein complexes,both from experimental determination,such as by X-ray crystallography,and by theoretical modeling,such as protein docking,has made it necessary tofind ways to easily interpret the results.For that,molecular graphics tools are usually employed to serve this purpose.3Although a large number of software packages are available for visualizing the three-dimen-sional(3D)structures(e.g.PyMOL,4GRASP,5VMD,6etc.)and interaction modes(e.g.MolSurfer,7ProSAT,8PIPSA,9etc.)of biomolecules,the options for producing the schematic two-dimensional(2D)representation of nonbonding interactions for PPIs are very scarce.Nevertheless,a few2D graphics programs were developed to depict protein-small ligand interactions(e.g., LIGPLOT,10PoseView,11MOE,12etc.).These tools,however, are incapable of handling the macromolecular complexes.Some other available tools presenting macromolecular interactions in 2D level mainly include DIMPLOT,10NUCPLOT,13and MON-STER,14etc.Amongst,only the DIMPLOT can be used for aesthetically visualizing the nonbinding interactions of PPIs. However,such a program merely provides a simple description of hydrogen bonds,hydrophobic interactions,and steric clashes across the binding interfaces.In this article,we describe a new molecular graphics tool, called the two-dimensional graphics lab for biosystem interac-tions(2D-GraLab),which adopts the page description language (PDL)to intuitively,exactly,and detailedly reproduce the non-bonding interactions and energetics properties of PPIs in Post-Script page.Here,the following three points are the emphasis of the2D-GraLab:(i)Reliability.To ensure the reliability,the pro-grams and methods employed in2D-GraLab are all widely used in chemistry and biology community;(ii)Comprehensiveness. 2D-GraLab is capable of handling almost all the nonbonding interactions(and even covalent interactions)across binding Additional Supporting Information may be found in the online version of this article.Correspondence to:Z.Shang;e-mail:shangzc@interface of protein complexes,such as hydrogen bond,salt bridge,van der Waals(vdW)interaction,hydrophobic contact, p–p stacking,disulfide bond,desolvation effect,and loss of con-formational entropy.The outputted diagrams are diversiform, including individual schematic diagram and summarized sche-matic diagram;(iii)Artistry.We elaborately scheme the layout, color match,and page style for different diagrams,with the goal of producing aesthetically pleasing2D images of PPIs.In addi-tion,2D-GraLab provides a graphical user interface(GUI), which allows users to interact with this program and displays the spatial structure and interfacial feature of protein complexes (see .Fig.S1).Identifying Protein Binding InterfacesAn essential step in understanding the molecular basis of PPIs is the accurate identification of interprotein contacts,and based upon that,subsequent works are performed for analysis and lay-out of nonbonding mon methods identifyingprotein–protein binding interfaces include a Voronoi polyhedra-based approach,changes in solvent accessible surface area(D SASA),and various radial cutoffs(e.g.,closest atom,C b,andcentroid,etc.).152D-GraLab allows for the identification of pro-tein–protein binding interfaces at residue and atom levels.Identifying Binding Interfaces at Residue LevelAll the identifying interface methods at residue level belong toradial cutoff approach.In the radial cutoff approach,referencepoint is defined in advance for each residue,and the residues areconsidered in contact if their reference points fell within thedefined cutoff ually,the C a,C b,or centroid are usedas reference point.16–18In2D-GraLab,cutoff distance is moreflexible:cutoff distance5r A1r B1d,where r A and r B are residue radii and d is set by users(as the default d54A˚,which was suggested by Cootes et al.19).Identifying Binding Interfaces at Atom LevelAt atom level,binding interfaces are identified using closestatom-based radial cutoff approach20and D SASA-basedapproach.21For the closest atom-based radial cutoff approach,ifthe distance between any two atoms of two residues from differ-ent chains is less than a cutoff value,the residues are consideredin contact;In the D SASA-based approach,the SASA is calcu-lated twice to identify residues involved in a binding interface,once for the monomers and once for the complex,if there is achange in the SASA(D SASA)of a residue when going from themonomers to the dimer form,then it is considered involved inthe binding interface.In2D-GraLab,three manners are provided for visualizing thebinding interfaces,including spatial structure exhibition,residuedistance plot,and residue-pair contact map(see .Figs.S2–S4).Analysis and2D Layout of NonbondingInteractionsThe inputfile of2D-GraLab is standard PDB format,and the outputs are two-dimensional PostScriptfile giving intuitive and informative representation of the PPIs and their strengths, including hydrogen bond,salt bridge,vdW interaction,desolva-tion effect,ion-pair,side-chain conformational entropy(SCE), etc.The outputs are in two forms as individual schematic dia-gram and summarized schematic diagram.The individual sche-matic diagram is a detailed depiction of each nonbonding profile,whereas the summarized schematic diagram covers all nonbonding interactions and disulfide bonds across the binding interface.To produce the aesthetically high quality layouts,which pos-sess reliable and accurate parameters,several widely used pro-grams listed in Table1are employed in2D-GraLab to perform the core calculations and analysis of different nonbonding inter-actions.2D-GraLab carries out prechecking procedure for pro-tein structures and warns the structural errors,but not providing revision and refinement functions.Therefore,prior to2D-GraLab analysis,protein structures are strongly suggested to be prepro-cessed by programs such as PROCHECK(structure valida-tion),27Scwrl3(side-chain repair),28and X-PLOR(structure refinement).29Individual Schematic DiagramHydrogen BondThe program we use for analyzing hydrogen bonds across bind-ing interfaces is HBplus,23which calculates all possible posi-tions for hydrogen atoms attached to donor atoms which satisfy specified geometrical criteria with acceptor atoms in the vicinity. In2D-GraLab,users can freely select desired hydrogen bonds involving N,O,and/or S atoms.Besides,the water-mediated hydrogen bond is also given consideration.Bond strength of conventional hydrogen bonds(except those of water-mediated Table1.Standalone Programs Employed in2D-GraLab.Program FunctionReduce v3.0322Adding hydrogen atoms for proteinsHBplus v3.1523Identifying hydrogen bonds and calculatingtheir geometric parametersProbe v2.1224Identifying steric contacts and clashes at atomlevelMSMS v2.6125Calculating SASA values of protein atoms andresiduesDelphi v4.026Calculating Coulombic energy and reactionfield energy,determining electrostatic energyof ion-pairsDIMPLOT v4.110Providing application programming interface,users can directly set and executeDIMPLOT in the2D-GraLab GUI9412D Depiction of Nonbonding Interactions for Protein ComplexesFigure1.(a)Schematic representation of a conventional hydrogen bond and a water-mediated hydro-gen bond across the binding interface of IGFBP/IGF complex(PDB entry:2dsr).This diagram was produced using2D-Gralab.The conventional hydrogen bond is formed between the atom N(at the backbone of residue Leu69in chain B)and the atom OE1(at the side-chain of residue Glu3in chain I);The water-mediated hydrogen bond is formed between the atom ND1(at the side-chain of residue His5in chain B)and the atom O(at the backbone of residue Asp20in chain I),and because hydrogen positions of water are almost never known in the PDBfile,the water molecule,when serving as hydrogen bond donor,is not yet determined for its H...A length and D—H...A angle,denoted as mark ‘‘????.’’In this diagram,chains,residues,and atoms are labeled according to the PDB format.(b)Spa-tial conformation of the conventional hydrogen bond.(c)Spatial conformation of the water-mediated hydrogen bond.hydrogen bonds)is calculated using Lennard-Jones 8-6potential with angle weighting.30D U HB¼E m 3d m 8À4d m6"#cos 4h ðh >90 Þ(1)where d is the separation between the heavy acceptor atom andthe donor hydrogen atom in angstroms;E m ,the optimum hydro-gen-bond energy for the particular hydrogen-bonding atoms con-sidered;d m ,the optimum hydrogen-bond length for the particu-lar hydrogen-bonding atoms considered.E m and d m vary accord-ing to the chemical type of the hydrogen-bonding atoms.The hydrogen bond potential is set to zero when angle h 908.31Hydrogen bond parameters are taken from CHARMM force field (for N and O atoms)and Autodock (for S atom).32,33Figure 1a is the schematic representation of a conventional hydrogen bond and a water-mediated hydrogen bond across the binding interface of insulin-like growth factor-binding protein (IGFBP)/insulin-like growth factor (IGF)complex.In this dia-gram,abundant information about the hydrogen bond geometry and energetics properties is presented in a readily acceptant manner.Figures 1b and 1c are spatial conformations of the cor-responding conventional hydrogen bond and water-mediated hydrogen bond.Van der Waals InteractionThe small-probe approach developed in Richardson’s laboratory enables us to detect the all atom contact profile in protein pack-ing.2D-GraLab uses program Probe 24to realize this method to identity steric contacts and clashes on the binding interfaces.Word et al.pointed out that explicit hydrogen atoms can effec-tively improve Probe’s performance.24However,considering calculations with explicit hydrogen atoms are time-consuming,and implicit hydrogen mode is also possibly used in some cases;therefore,in 2D-GraLab,both explicit and implicit hydrogen modes are provided for users.In addition,2D-GraLab uses the Reduce 22to add hydrogen atoms for proteins,and this programis also developed in Richardson’s laboratory and can be wellcompatible with Probe.According to previous definition,vdW interaction between two adjacent atoms is classified into wide contact,close contact,small overlap,and bad overlap.24Typically,vdW potential function has two terms,a repulsive term and an attractive term.In 2D-GraLab,vdW interaction is expressed as Lennard-Jones 12-6potential.34D U SI ¼E m d m d 12À2d md6"#(2)where E m is the Lennard-Jones well depth;d m is the distance at the Lennard-Jones minimum,and d is the distance between two atoms.The Lennard-Jones parameters between pairs of different atom types are obtained from the Lorentz–Berthelodt combina-tion rules.35Atomic Lennard-Jones parameters are taken from Probe and AMBER force field.24,36Figure 2a was produced using 2D-GraLab and gives a sche-matic representation of steric contacts and clashes (overlaps)between the heavy chain residue Tyr131and two light chain res-idues Ser121and Gln124of cross-reaction complex FAB (the antibody fragment of hen egg lysozyme).By this diagram,we can obtain the detail about the local vdW interactions around the residue Tyr131.In contrast,such information is inaccessible in the 3D structural figure (Fig.2b).Desolvation EffectIn 2D-GraLab,program MSMS 25is used to calculate the SASA values of interfacial residues at atom level,and four atomic radii sets are provided for calculating the SASA,including Bondi64,Chothia75,Li98,and CHARMM83.32,37–39Bondi64is based on contact distances in crystals of small molecules;Chothia75is based on contact distances in crystals of amino acids;Li98is derived from 1169high-resolution protein crystal structures;CHARMM83is the atomic radii set of CHARMM force field.Desolvation free energy of interfacial residues is calculated using empirical additive model proposed by Eisenberg andFigure 2.(a)Schematic representation of steric contacts and overlaps between the residue Tyr131in heavy chain (chain H)and the surrounding residues Ser121and Gln124in light chain (chain L)of cross-reaction complex FAB (PDB entry:1fbi).This diagram was produced using 2D-Gralab in explicit hydrogen mode.In this diagram,interface is denoted by the broken line;Wide contact,close contact,small overlap,and bad overlap are marked by blue circle,green triangle,yellow square,and pink rhombus,respectively;Moreover,vdW potential of each atom-pair is given in the histogram,with the value measured by energy scale,and the red and blue indicate favorable (D U \0)and unfav-orable (D U [0)contributions to the binding,respectively;Interaction potential 20.324kcal/mol in the center circle denotes the total vdW contribution by residue Tyr131;Chains,residues,and heavy atoms are labeled according to the PDB format,and hydrogen atoms are labeled in Reduce format.(b)Spatial conformation of chain H residue Tyr131and its local environment.Green or yellow stands forgood contacts (green for close contact and yellow for slight overlaps \0.2A˚),blue for wide contacts [0.25A˚,hot pink spikes for bad overlaps !0.4A ˚.It is revealed that Tyr131is in an intensive clash with chain L Gln124,while in slight contact with chain L Ser121,which is well consistent with the 2D schematic diagram.9432D Depiction of Nonbonding Interactions for Protein Complexes944Zhou,Tian,and Shang•Vol.30,No.6•Journal of Computational ChemistryFigure2.(Legend on page943.)Maclachlam,40and the conformation of interfacial residues is assumed to be invariant during the binding process.D G dslv¼Xic i D A i(3)where the sum is over all the atoms;c i and D A i are the atomic solvation parameter(ASP)and the changes in solvent accessible surface area(D SASA)of atom i,respectively.Juffer et al.41 found that although desolvation free energies calculated from different ASP sets are linear correlation to each other,the abso-lute values are greatly different.In view of that,2D-GraLab pro-vides four ASP sets published in different periods:Eisenberg86, Kim90,Schiffer93,and Zhou02.40,42–44As shown in Figure3,the D SASA and desolvation free energy of interfacial residues in chain A of HLA-A*0201pro-tein complex during the binding process are reproduced in a rotiform diagram form using2D-GraLab.In this diagram,the desolvation free energy contributed by chain A is28.056kcal/ mol,and moreover,the D SASA value of each interfacial residue is also presented clearly.Ion-PairThere are six types of residue-pairs in the ion-pairs:Lys-Asp, Lys-Glu,Arg-Asp,Arg-Glu,His-Asp,and ually,ion-pairs include three kinds:salt bridge,NÀÀO bridge,and longer-range ion-pair,and found that most of the salt bridges are stabi-lizing toward proteins;the majority of NÀÀO bridges are stabi-lizing;the majority of the longer-range ion-pairs are destabiliz-ing toward the proteins.45The salt bridge can be further distin-guished as hydrogen-bonded salt bridge(HB-salt bridge)and nonhydrogen-bonded salt bridge(NHB-salt bridge or salt bridge).46In2D-GraLab,the longer-range ion-pair is neglected, and for short-range ion-pair,four kinds are defined:HB-salt bridge,NHB-salt bridge or salt bridge,hydrogen-bonded NÀÀO bridge(HB-NÀÀO bridge),and nonhydrogen-bonded N-O bridge (NHB-NÀÀO bridge or NÀÀO bridge).Although both the N-terminal and C-terminal residues of a given protein are also charged,the large degree offlexibility usually experienced by the ends of a chain and the poor structural resolution resulting from it.47Therefore,we preclude these terminal residues in the 2D-GraLab.A modified Hendsch–Tidor’s method is used for calculating association energy of ion-pairs across binding interfaces.48D G assoc¼D G dslvþD G brd(4)where D G dslv represents the sum of the unfavorable desolvation penalties incurred by the individual ion-pairing residues due to the change in their environment from a high dielectric solvent (water)in the unassociated state;D G brd represents the favorable bridge energy due to the electrostatic interaction of the side-chain charged groups.We usedfinite difference solutions to the linearized Poisson–Boltzmann equations in Delphi26to calculate the D G dslv and D G brd.Centroid of the ion-pair system is used as grid center,with temperature of298.15K(in this way,1kT50.593kcal/mol),and the Debye-Huckel boundary conditions are applied.49Considering atomic parameter sets have a great influ-ence on the continuum electrostatic calculations of ion-pair asso-ciation energy,502D-GraLab provides three classical atomic parameter sets for users,including PARSE,AMBER,and CHARMM.51–53Figure4is the schematic representation of four ion-pairs formed across the binding interface of penicillin acylase enzyme complex.This diagram clearly illustrates the information about the geometries and energetics properties of ion-pairs,such as bond length,centroid distance,association energy,and angle. The ion-pair angle is defined as the angle between two unit vec-tors,and each unit vector joins a C a atom and a side-chain charged group centroid in an ion-pairing residue.54In this dia-gram,the four ion-pairs,two HB-salt bridges,and two HB-NÀÀO bridges formed across the binding interface are given out. Association energies of the HB-salt bridges are both\21.5 kcal/mol,whereas that of the HB-NÀÀO bridges are all[20.5 kcal/mol.Therefore,it is believed that HB-salt bridge is more stable than HB-NÀÀO bridge,which is well consistent with the conclusion of Kumar and Nussinov.45,46Side-Chain Conformational EntropyIn general,SCE can be divided into the vibrational and the con-formational.55Comparison of several sets of results using differ-ent techniques shows that during protein folding process,the mean conformational free energy change(T D S)is1kcal/mol per side-chain or0.5kcal/mol per bond.Changes in vibrational entropy appear to be negligible compared with the entropy change resulted from the loss of accessible rotamers.56SCE(S) can be calculated quite simply using Boltzmann’s formulation.57S¼ÀRXip i ln p i(5)where R is the universal gas constant;The sum is taken over all conformational states of the system and p i is the probability of being in state i.Typical methods used for SCE calculations, include self-consistent meanfield theory,58molecular dynam-ics,59Monte Carlo simulation,60etc.,that are all time-consum-ing,thus not suitable for2D-GraLab.For that,the case is sim-plified,when we calculate the SCE of an interfacial residue,its local surrounding isfixed(adopting crystal conformation).In this way,SCE of each interfacial residue is calculated in turn.For the20coded amino acids,Gly,Ala,Pro,and Cys in disulfide bonds are excluded.57For other cases,each residue’s side-chain conformation is modeled as a rotamer withfinite number of discrete states.61The penultimate rotamer library used was developed by Lovell et al.,62as recommended by Dun-brack for the study of SCE.63For an interfacial residue,the potential E i of each rotamer i is calculated in both binding state and unbinding state,and subsequently,rotamer’s probability dis-tribution(p)of this residue is resulted by Boltzmann’s distribu-tion law,then the SCE in different states are solved out using eq.(5).The situation of rotamer i is defined as serious clash or nonclash:serious clash is the clash score of rotamer i more than a given threshold value,and then E i511;whereas for the9452D Depiction of Nonbonding Interactions for Protein Complexes946Zhou,Tian,and Shang•Vol.30,No.6•Journal of Computational ChemistryFigure3.Schematic representation of desolvation effect for interfacial residues in chain A of HLA-A*0201complex(PDB entry:1duz).This diagram was produced using2D-GraLab.In this diagram,the pie chart is equally divided,with each section indicates an interfacial residue in chain A;In a sec-tor,red1blue is the SASA of corresponding residue in unbinding state,the blue is in binding state,and the red is thus of D SASA;The green polygonal line is made by linking desolvation free energy ofeach interfacial residue,and at the purple circle,desolvation free energy is0(D U50),beyond thiscircle indicates unfavorable contributions to binding(D U[0),otherwise is favorable(D U\0);Inthe periphery,residue symbols are colored in red,blue,and black in terms of favorable,unfavorable,and neutral contributions to the binding,respectively;The SASA and desolvation free energy for eachinterfacial residue can be measured qualitatively by the horizontally black and green scales.[Colorfigure can be viewed in the online issue,which is available at .]Figure4.Four ion-pairs formed across the binding interface of penicillin acylase enzyme complex (PDB entry:1gkf).In thisfigure,left is2D schematic diagram produced using2D-GraLab,and posi-tively and negatively charged residues are colored in blue and red,respectively;Bridge-bonds formed between the charged atoms of ion-pairs are colored in green,blue,and yellow dashed lines for the hydrogen-bonded bridge,nonhydrogen-bonded bridge,and long-range interactions,respectively;The three parameters in bracket are ion-pair type,angle,and association energy.The right in thisfigure is the spatial conformations of corresponding ion-pairs.[Colorfigure can be viewed in the online issue, which is available at .]Figure5.(a)Loss of side-chain conformational entropy of chain B interfacial residues in HIV-1 reverse transcriptase complex(PDB entry:1rt1).This diagram was produced using2D-GraLab.In this diagram,the pie chart is equally divided,with each section indicates an interfacial residue in chain B; In a sector,side-chain conformational entropies in unbinding and binding state are colored in yellow and blue,respectively;The green polygonal line is made by linking conformational free energy of each interfacial residue;The conformational entropy and conformational free energy for each interfa-cial residue can be measured qualitatively by the horizontally black and green scales,respectively;In the periphery,residue symbols are colored in yellow,blue,and black in terms of favorable,unfavora-ble,and neutral contributions to binding,respectively.(b)The rotamers of chain B interfacial residues Lys20,Lys22,Tyr56,Asn136,Ile393,and Trp401in HIV-1reverse transcriptase complex.These rotamers were generated using2D-GraLab.[Colorfigure can be viewed in the online issue,which is available at .]9472D Depiction of Nonbonding Interactions for Protein Complexes948Zhou,Tian,and Shang•Vol.30,No.6•Journal of Computational ChemistryFigure5.(Legend on page947.)Figure6.The summarized schematic diagram of nonbonding interactions and disulfide bond across the interface of AIV hemagglutinin H5complex(PDB entry:1jsm).Length of chain A and chain B are321and160,represented as two bold horizontal lines.Interface parts in the bold lines are colored in orange,and residue-pairs in interactions are linearly linked;Conventional hydrogen bond,water-mediated hydrogen bond,ionpair,hydrophobic force,steric clash,p–p stacking,and disulfide bond are colored in aqua,bottle green,red,blue,purple,yellow,and brown,respectively;In the‘‘dumbbell shape’’symbols,residue-pair types and distances are also presented.[Colorfigure can be viewed in the online issue,which is available at .]9492D Depiction of Nonbonding Interactions for Protein Complexescase of nonclash,four potential functions are used in2D-Gra-Lab:(i)E i5E0,a constant61;(ii)statistical potential,the poten-tial energy E i of rotamer i is calculated from database-derived probability61;(iii)coarse-grained model,E i of rotamer i is esti-mated by atomic contact energies(ACE)64;and(iv)Lennard-Jones potential.58Loss of binding entropy of chain B interfacial residues in HIV-1reverse transcriptase complex is schematically repre-sented in Figure5a.Similar to desolvation effect diagram,loss of binding entropy is also presented in a rotiform diagram form. This diagram reveals that during the process of forming HIV-1 reverse transcriptase complex,the total loss of conformational free energy of chain B is9.14kcal/mol,indicating a strongly unfavorable contribution to binding(D G[0),and the average loss of conformational free energy for each residue is about0.3 kcal/mol,much less than those in protein folding(about1kcal/ mol56).Figure5b shows the rotamers of six interfacial residues in chain B.Summarized Schematic DiagramFigure6illustrates nonbonding interactions and disulfide bond formed across the binding interface of avian influenza virus (AIV)hemagglutinin H5.This protein is a dimer linked by a disulfide bond.In this diagram,conventional hydrogen bond, water-mediated hydrogen bond,ion-pair,hydrophobic force, steric clash,p–p stacking,and disulfide bond are represented in different colors.Hydrogen bonds,colored in aqua,are calculated by program HBplus.23Data in this diagram are the separation between the acceptor atom and the heavy donor atom.Water-mediated hydrogen bonds are colored in bottle green, also calculated by HBplus.23Ion-pairs,colored in red,include salt bridge and NÀÀO bridge,determined by the Kumar’s rule.45,46Data in this dia-gram are centroid distance of ion-pair.Hydrophobic forces are colored in blue.According to the D SASA rule,if the two apolar and/or aromatic interfacial resi-dues(Leu,Ala,Val,Ile,Met,Cys,Pro,Tyr,Phe,and Trp)are within the distance d\r A1r B12.8(r A and r B are side-chain radii,2.8is the diameter of water molecule),they are considered in hydrophobic contact.Data in this diagram are centroid–cent-roid separation between the two residues.Steric clashes are colored in purple.Here,only bad overlaps calculated by Probe24are presented.In2D-GraLab,explicit and implicit hydrogen modes are provided,hydrogen atoms in explicit hydrogern mode are added using Reduce.22Data in this diagram are the centroid–centroid separation when the two atoms are badly overlapped.p–p stacking are colored in yellow.Presently,studies on pro-tein stacking interactions are in lack.In2D-GraLab,p–p stack-ing is identified using the McGaughey’s rule,65i.e.,if the cent-roid–centroid separation between two aromatic rings is within 7.5A˚,they are regarded as p–p stacking(aromatic residues are Phe,Tyr,Trp,and His).This rule has been successfully adopted to study the p–p stacking across protein interfaces by Cho et al.66Besides,2D-GraLab also sets the constraints of stacking angle(dihedral angel between the planes of two aromatic rings).Data in this diagram are centroid–centroid separations between two aromatic rings in stacking state.Disulfide bonds are colored in brown,taken from the PDB records.Data in this diagram are the separations of two sulfide atoms.ConclusionsMost,if not all,biological processes are regulated through asso-ciation and dissociation of protein molecules and essentially controlled by nonbonding energetics.67Graphically-intuitive vis-ualization of these nonbonding interactions is an important approach for understanding the mechanism of a complex formed between two proteins.Although a large number of software packages are available for visualizing the3D structures,the options for producing schematic2D summaries of nonbonding interactions for a protein complex are comparatively few.In practice,the2D and3D visualization methods are complemen-tary.In this article,we have described a new2D molecular graphics tool for analyzing and visualizing PPIs from spatial structures,and the intended goal is to schematically present the nonbonding interactions stabilizing the macromolecular complex in a graphically-intuitive manner.We anticipate that renewed in-terest in automated generation of2D diagrams will significantly reduce the burden of protein structure analysis and make insights into the mechanism of PPIs.2D-GraLab is written in C11and OpenGL,and the output-ted2D schematic diagrams of nonbinding interactions are described in PostScript.Presently,2D-GraLab v1.0is available to academic users free of charge by contacting us. References1.Chothia,C.;Janin,J.Nature1974,256,705.2.Jones,S.;Thornton,J.M.Proc Natl Acad Sci USA1996,93,13.3.Luscombe,N.M.;Laskowski,R.A.;Westhead,D.R.;Milburn,D.;Jones,S.;Karmirantzoua,M.;Thornton,J.M.Acta Crystallogr D 1998,54,1132.4.DeLano,W.L.The PyMOL Molecular Graphics System;DeLanoScientific:San Carlos,CA,2002.5.Petrey,D.;Honig,B.Methods Enzymol2003,374,492.6.Humphrey,W.;Dalke,A.;Schulten,K.J Mol Graphics1996,14,33.7.Gabdoulline,R.R.;Wade,R.C.;Walther,D.Nucleic Acids Res2003,31,3349.8.Gabdoulline,R.R.;Hoffmann,R.;Leitner,F.;Wade,R.C.Bioin-formatics2003,19,1723.9.Wade,R. C.;Gabdoulline,R.R.;De Rienzo, F.Int J QuantumChem2001,83,122.10.Wallace, A. C.;Laskowski,R. A.;Thornton,J.M.Protein Eng1995,8,127.11.Stierand,K.;Maaß,P.C.;Rarey,M.Bioinformatics2006,22,1710.12.Clark,A.M.;Labute,P.J Chem Inf Model2007,47,1933.13.Luscombe,N.M.;Laskowski,R. A.;Thorntonm J.M.NucleicAcids Res1997,25,4940.14.Salerno,W.J.;Seaver,S.M.;Armstrong,B.R.;Radhakrishnan,I.Nucleic Acids Res2004,32,W566.15.Fischer,T.B.;Holmes,J.B.;Miller,I.R.;Parsons,J.R.;Tung,L.;Hu,J.C.;Tsai,J.J Struct Biol2006,153,103.950Zhou,Tian,and Shang•Vol.30,No.6•Journal of Computational Chemistry。

异质结构-NaGdF_(4)Yb,Er纳米棒负载在金属有机框架上以调节上转换光致发光

异质结构-NaGdF_(4)Yb,Er纳米棒负载在金属有机框架上以调节上转换光致发光

第40卷第2期2021年4月红外与毫米波学报J.Infrared Millim.Waves Vol.40,No.2 April,2021文章编号:1001-9014(2021)02-0166-06DOI:10.11972/j.issn.1001-9014.2021.02.005The heterostructure NaGdF4:Yb,Er nanorods loaded on metal-organicframeworks for tuning upconversion photoluminescenceLIU Yi,JIAO Ji-Qing*,LYU Bai-Ze,WANG Jiu-Xing(College of Materials Science and Engineering,National Center of International Joint Research for Hybrid Materials Technology,National Base of International Sci.&Tech.Cooperation,Qingdao University,Qingdao266071,China)Abstract:Multi-component heterostructure nanocomposites can not only inherit the original properties of eachcomponent,but also induce new chemical and electronic properties through the interaction between the compo⁃nents.The heterostructure zeolitic imidazolate framework/NaGdF4:Yb,Er(ZIF-67/NaGdF4:Yb,Er)was pre⁃pared by a stepwise synthesis strategy.And it avoided agglomeration and quenching of upconversion(UC)nanoparticles,and displayed better stability.In the heterostructure nanocomposites,ZIF-67is employed as an en⁃ergy transmission platform under980nm pared to pure NaGdF4:Yb,Er nanorods,the UC photo⁃luminescence of heterostructure ZIF-67/NaGdF4:Yb,Er is tuned from green to red owing to the synergistic effect of each component.Key words:heterostructure,controllable synthesis,nanocomposite,luminescence,upconversionPACS:42异质结构—NaGdF4:Yb,Er纳米棒负载在金属有机框架上以调节上转换光致发光刘毅,焦吉庆*,吕柏泽,王久兴(青岛大学材料科学与工程学院国家杂化材料技术国际联合研究中心国际科学技术合作国家基地,山东青岛266071)摘要:多组分异质结构纳米复合材料不仅可以继承每个组分原有的性能,而且还可以通过组分之间的相互作用诱导出新的化学、电子性能。

数据库系统英文文献

数据库系统英文文献

Database Systems1. Fundamental Concepts of DatabaseDatabase and database technology are having a major impact on the growing use of computers. It is fair to say that database will play a critical role in almost all areas where computers are used, including business, engineering, medicine, law, education, and library science, to name a few. The word "database" is in such common use that we must begin by defining what a database is. Our initial definition is quit general.A database is a collection of related data. By data, we mean known facts that can be recorded and that have implicit meaning. For example, consider the names, telephone numbers, and addresses of all the people you know. Y ou may have recorded this data in an indexed address book, or you may have stored it on a diskette using a personal computer and software such as DBASE III or Lotus 1-2-3. This is a collection of related data with an implic it meaning and hence is a database.The above definition of database is quite general; for example, we may consider the collection of words that make up thispage of text to be related data and hence a database. However, the common use of the term database is usually more restricted.A database has the following implicit properties:.A database is a logically coherent collection of data with some inherent meaning. A random assortment of data cannot bereferred to as a database..A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and somepreconceived applications in which these users are interested..A database represents some aspect of the real world, sometimes called the mini world. Changes to the mini world are reflected in the database.In other words, a database has some source from which data are derived, some degree of interaction with events in the real world, and an audience that is actively interested in the contents of the database.A database can be of any size and of varying complexity. For example, the list of names and addresses referred to earlier may have only a couple of hundred records in it, each with asimple structure. On the other hand, the card catalog of a large library may contain half a million cards stored under different categories-by primary author’s last name, by subject, by book title, and the like-with each category organized in alphabetic order. A database of even greater size and complexity may be that maintained by the Internal Revenue Service to keep track of the tax forms filed by taxpayers of the United States. If we assume that there are 100million taxpayers and each taxpayer files an average of five forms with approximately 200 characters of information per form, we would get a database of 100*(106)*200*5 characters(bytes) of information. Assuming the IRS keeps the past three returns for each taxpayer in addition to the current return, we would get a database of 4*(1011) bytes. This huge amount of information must somehow be organized and managed so that users can search for, retrieve, and update the data as needed.A database may be generated and maintained manually or by machine. Of course, in this we are mainly interested in computerized database. The library card catalog is an example of a database that may be manually created and maintained. A computerized database may be created and maintained either by a group of application programs written specifically for that task or by a database management system.A data base management system (DBMS) is a collection of programs that enables users to create and maintain a database. The DBMS is hence a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. Defining a database involves specifying the types of data to be stored in the database, along with a detailed description of each type of data. Constructing the database is the process of storing the data itself on some storage medium that is controlled by the DBMS. Manipulating a database includes such functions as querying the database to retrieve specific data, updating the database to reflect changes in the mini world, and generating reports from the data.Note that it is not necessary to use general-purpose DBMS software for implementing a computerized database. We could write our own set of programs to create and maintain the database, in effect creating our own special-purpose DBMS software. In either case-whether we use a general-purpose DBMS or not-we usually have a considerable amount of software to manipulate the database in addition to the database itself. The database and software are together called a database system.2. Data ModelsOne of the fundamental characteristics of the database approach is that it provides some level of data abstraction by hiding details of data storage that are not needed by most database users. A data model is the main tool for providing this abstraction. A data is a set of concepts that can beused to describe the structure of a database. By structure of a database, we mean the data types, relationships, and constraints that should hold on the data. Most data models also include a set of operations for specifying retrievals and updates on the database.Categories of Data ModelsMany data models have been proposed. We can categorize data models based on the types of concepts they provide to describe the database structure. High-level or conceptual data models provide concepts that are close to the way many users perceive data, whereas low-level or physical data models provide concepts that describe the details of how data is stored in the computer. Concepts provided by low-level data models are generally meant for computer specialists, not for typical end users. Between these two extremes is a class of implementation data models, which provide concepts that may be understood by end users but that are not too far removed from the way data is organized within the computer. Implementation data models hide some details of data storage but can be implemented on a computer system in a direct way.High-level data models use concepts such as entities, attributes, and relationships. An entity is an object that is represented in the database. An attribute is a property that describes some aspect of an object. Relationships among objects are easily represented in high-level data models, which are sometimes called object-based models because they mainly describe objects and their interrelationships.Implementation data models are the ones used most frequently in current commerc ial DBMSs and include the three most widely used data models-relational, network, and hierarchical. They represent data using record structures and hence are sometimes called record-based data modes.Physical data models describe how data is stored in the computer by representing information such as record formats, record orderings, and access paths. An access path is a structure that makes the search for particular database records much faster.3. Classification of Database Management SystemsThe main criterion used to classify DBMSs is the data model on which the DBMS is based. The data models used most often in current commercial DBMSs are the relational, network, and hierarchical models. Some recent DBMSs are based on conceptual or object-oriented models. We will categorize DBMSs as relational, hierarchical, and others.Another criterion used to classify DBMSs is the number of users supported by the DBMS. Single-user systems support only one user at a time and are mostly used with personal computer. Multiuser systems include the majority of DBMSs and support many users concurrently.A third criterion is the number of sites over which the database is distributed. Most DBMSs are centralized, meaning that their data is stored at a single computer site. A centralized DBMS can support multiple users, but the DBMS and database themselves reside totally at a single computer site. A distributed DBMS (DDBMS) can have the actual database and DBMS software distributed over many sites connected by a computer network. Homogeneous DDBMSs use the same DBMS software at multiple sites. A recent trend is to develop software to access several autonomous preexisting database stored under heterogeneous DBMSs. This leads to a federated DBMS (or multidatabase system),, where the participating DBMSs are loosely coupled and have a degree of local autonomy.We can also classify a DBMS on the basis of the types of access paty options available for storing files. One well-known family of DBMSs is based on inverted file structures. Finally, a DBMS can be general purpose of special purpose. When performance is a prime consideration, a special-purpose DBMS can be designed and built for a specific application and cannot be used for other applications, Many airline reservations and telephone directory systems are special-purpose DBMSs.Let us briefly discuss the main criterion for classifying DBMSs: the data mode. The relational data model represents a database as a collection of tables, which look like files. Mos t relational databases have high-level query languages and support a limited form of user views.The network model represents data as record types and also represents a limited type of 1:N relationship, called a set type. The network model, also known as the CODASYL DBTG model, has an associated record-at-a-time language that must be embedded in a host programming language.The hierarchical model represents data as hierarchical tree structures. Each hierarchy represents a number of related records. There is no standard language for the hierarchical model, although most hierarchical DBMSs have record-at-a-time languages.4. Client-Server ArchitectureMany varieties of modern software use a client-server architecture, in which requests by one process (the client) are sent to another process (the server) for execution. Database systems are no exception. In the simplest client/server architecture, the entire DBMS is a server, except for the query interfaces that interact with the user and send queries or other commands across to the server. For example, relational systems generally use the SQL language for representing requests from the client to the server. The database server then sends the answer, in the form of a table or relation, back to the client. The relationship between client and server can get more work in theclient, since the server will e a bottleneck if there are many simultaneous database users.。

农村集体土地确权登记工作流程和特点--广东省广州市花都案例

农村集体土地确权登记工作流程和特点--广东省广州市花都案例

农村集体土地确权登记工作流程和特点--广东省广州市花都案例周羽【摘要】完善我国农村集体土地确权登记发证系统是我国土地管理重要工作之一。

这将有助于加强农村土地管理、盘活农村土地、推动土地制度改革、实现土地集约化利用。

此项工作涉及多个部门与单位,工作量庞大。

其主要工作包括:确定宗地面积界限、面积、权利主体、地籍数据数据库建库、申请的审批、归档。

以广州市花都区为例,通过调查该区域的基础农村地籍资料的完整程度,分析各个乡镇街道的土地情况,回顾整个农村集体土地确权登记的工作流程,总结2012-2013年全国农村集体土地确权登记到“经济社”一级工作中采用的新方式、新特点。

%It is among the most important jobs for the land management of China to improve the system of rural collective land right verification,registration and certification,which will strengthen rural land management,efficiently use land,promote institutional reform of land,and achieve intensified useof land.Many departments will be involved in this job as its main tasks include identifying the parcellboundary,area and ownership,constructing databases,examining and approving the applications, and archiving the files. With Huadu District as an example, this paper analyzes and summarizes the new mode and characteristics in rural collective land verification and registration from 2012 to 2013,based on the investigation of basic rural land registration information and the analysis of land situation.【期刊名称】《国土资源科技管理》【年(卷),期】2014(000)004【总页数】4页(P139-142)【关键词】农村土地;工作流程;确权登记【作者】周羽【作者单位】广东省核工业地质测绘院,广东广州 510800【正文语种】中文【中图分类】F301(265)农村集体土地确权发证包含两方面内容:一是农村集体所有权的确权登记,另外一方面是农村集体土地使用权确权登记[1]。

一种基于结构域的蛋白质功能分类预测新方法

一种基于结构域的蛋白质功能分类预测新方法
这个方法局部计算结构域属于各个功能分类中的可能性只有那些属于某个功能分类的蛋白质的结构域组成信息才会被计算进去但是那些不属于这个功能分类的蛋白质的信息就被忽略了所以这个简单方法没有充分利用所有的信息
第 49 卷 第 20 期
2004 年 10 月
论 文
一种基于结构域的蛋白质功能分类预测新方法
俞晓晶 ①② * 林建成 ① * 石铁流 ① † 李亦学 ① †
功能分类目录
功能
下式计算得出 : F ( Dm , Cn ) = Smn . N mn
3 CELL CYCLE AND DNA PROCESSING 4 TRANSCRIPTION 5 PROTEIN SYNTHESIS 6 PROTEIN FATE (folding, modification, destination) 7 CELLULAR TRANSPORT AND TRANSPORT MECHANISMS 8 CELL RESCUE, DEFENSE AND VIRULENCE 9 REGULATION OF/INTERACTION WITH CELLULAR ENVIRONMENT 10 CELL FATE 11 CONTROL OF CELLULAR ORGANIZATION 12 SUBCELLULAR LOCALISATION 13 TRANSPORT FACILITATION
Dn ∈P i
Requirement (structural or catalytic)”包含的 ORF 数目 太少(少于 25), 不能进行统计计算, 所以在分析中没 有包含它们. PFAM 数据库包含用多序列比对和隐马 尔可夫模型 (HMM) 得到的完整的蛋白质结构域的谱 图 . 结构域的边界、家族成员以及比对是通过基于专 家经验、序列相似性、 HMM-profiles 和其他蛋白质 家族数据库的半自动的方法得到的 [33,34]. 本研究中 使用了 PFAM 数据库的一个子集 PFAM- A(Version 8.0), 它包含了 5193 个知名的结构域类型 . PFAM 的 另一个子集 PFAM-B 包含了大量的由 Domainer 程序 自动运算生成的小家族 [35], 但是这些比对的结果可 靠性比较低而且不生成 HMM-profiles, 也不被稳定 的支持和注释 [33], 所以不采用它作为蛋白质的结构 域组成成分. 将 Swiss-prot40 中的所有酵母蛋白质的 PFAM 结构域组成信息提取出来 [36], 在所有包含结构 域信息的 3010 个酵母蛋白质中, 1517 个蛋白质它们 的 ORF 在 MIPS 的功能目录中已有分类. 这样, 把数 据分成 2 个独立的数据集合, 即由 1200 个蛋白质组 成的训练集合和由 317 个蛋白质组成的测试集合. 然 后 , 用训练集合中的蛋白质结构域组成信息计算每 个结构域属于每个功能分类的可能性. 随后, 用对测 试集合中蛋白质分类的预测来评估方法 . 最后, 对用 “简单 ”方法和 MLE 方法得到的结果进行了比较 . (ⅱ ) 简单方法[28]. F(Dm, Cn )表示结构域 Dm 属 于功能分类 Cn 的可能性. 我们构造了一个直观简单 的 F 量度, 即用功能分类 Cn 中所有蛋白质包含结构 域 Dm 的数目除以功能分类 Cn 中所有蛋白质包含的所 有结构域的数目 . 这样 , 可能性的简单量度就可以由
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Constructing Bio-molecular Databases on a DNA-based ComputerWeng-Long Chang11Contact Author: Department of Computer Science and Information Engineering, National Kaohsiung University of Applied Sciences, 415 Chien Kung Road, Kaohsiung 807, Taiwan, R. O. C.E-mail: changwl@.twMichael (Shan-Hui) Ho22Department of Information Management, School of Information Technology, Ming Chuan University, 5,Teh-Ming Rd., Gwei-Shan, 333 Taoyuan, Taiwan, R. O. C.E-mail: MHoInCerritos@Minyi Guo33Department of Computer Software, The University of Aizu, Aizu-Wakamatsu City, Fukushima 965-8580, JapanE-mail: minyi@u-aizu.ac.jp_____________________________________________________________________________________________________________________ Codd [Codd 1970] wrote the first paper in which the model of a relational database was proposed. Adleman [Adleman 1994] wrote the first paper in which DNA strands in a test tube were used to solve an instance of the Hamiltonian path problem. From [Adleman 1994], it is obviously indicated that for storing information in molecules of DNA allows for an information density of approximately 1 bit per cubic nm (nanometer) and a dramatic improvement over existing storage media such as video tape which store information at a density of approximately 1 bit per 1012 cubic nanometers. This paper demonstrates that biological operations can be applied to construct bio-molecular databases where data records in relational tables are encoded as DNA strands. In order to achieve the goal, DNA algorithms are proposed to perform eight operations of relational algebra (calculus) on bio-molecular relational databases, which include Cartesian product, union, set difference, selection, projection, intersection, join and division. Furthermore, this work presents clear evidence of the ability of molecular computing to perform data retrieval operations on bio-molecular relational databases.Categories and Subject Descriptors: H.3.0 [Information Storage and Retrieval]: General;H.3..3 [Information Storage and Retrieval]: Information Search and Retrieval - Retrieval models; D.3.0 [Programming Languages]: General; D.3.1 [Programming Languages]: Formal Definitions and Theory – Syntax, Semantics; D..3. m [Programming Languages]: MiscellaneousGeneral Terms: Relational Databases, Bio-molecular Relational Databases, Molecular ComputingAdditional Key Words and Phrases: Relational Algebra (Calculus), Bio-molecular Relational Algebra (Calculus), DNA-based Supercomputing_____________________________________________________________________________________________________________________1. INTRODUCTIONIn 1970, Codd [Codd 1970] wrote the first paper where a new model for database structure and design appeared - the relational model. The relational model from [Codd 1970] is the first incarnation of relational database systemsand is an enormous advancement over other database models. In 1994, Adleman [Adleman 1994] succeeded in solving an instance of the Hamiltonian path problem in a test tube by handling DNA strands. From [Guo et al. 2005], it is clearly pointed out that optimal solution of every NP-complete or NP-hard problem is determined from its characteristic. DNA-based algorithms have been proposed to solve many computational problems. These contain satisfiability [Lipton 1995], the maximal clique problem [Ho et al. 2004], the set-packing problem [Ho et al. 2004], the set-splitting problem [Chang et al. 2004], the set-cover problem and the problem of exact cover by 3-sets [Chang and Guo 2004], the subset production [Ho 2005], the binary integer programming problem [Yeh et al. 2006], the dominating-set problem [Guo et al. 2004], the maximum cut problem [Xiao et al. 2004], real DNA experiments of Knapsack problems [Henkel et al. 2007] and the set-partition problem [Chang 2007]. One potentially significant area of application for DNA algorithms is the breaking of encryption schemes [Chang et al. 2005; Boneh et al. 1996; Adleman et al. 1999; Chang et al. 2004]. From [Guarnieri et al. 2006; Ahrabian and Nowzari-Dalini 2004] DNA-based arithmetic algorithms are proposed.On the other hand, molecular dynamics and (sequential) membrane systems from the viewpoint of Markov chain theory were proposed from [Muskulus et al. 2006]. Reif and LaBean [Reif and LaBean 2007] overviewed the past and current states of the emerging research area of the field of bio-molecular devices. Wu and Seeman [Wu and Seeman 2006] described the computation using a DNA strand as the basic unit and they had used this unit to achieve the function of multiplication. It was reported in [Macdonald et al. 2006] that a second-generation deoxyribozyme-based automaton MAYA-II, which plays a complete game of tic-tac-toe according to a perfect strategy, integrates 128 deoxyribozyme-based logic gates, 32 input DNA molecules, and 8 two-channel fluorescent outputs across 8 wells. The first direct observations of the tile-based DNA self-assembly in solution, using fluorescent nanotubes composed of a single tile, was presented in [Ekani-Nkodo et al. 2004]. In [Dehnert et al 2006], it was found that with increasing range of correlations the capacity to distinguish between the species on the basis of this correlation profile is getting better and requires ever shorter sequence segments for obtaining a full species separation. In [Müller et al. 2006], it was shown that “open” tweezers exist in a single conformation with minimal FRET efficiency. From [Dirks et al. 2007], the first algorithm for calculating the partition function of an unpseudoknotted complex of multiple interacting nucleic acid strands was proposed.DES (the United States Data Encryption Standard) is one of the most widely used cryptographic systems. It produces a 64-bit ciphertext from a 64-bit plaintext under the control of a 56-bit key. A cryptanalyst obtains a plaintext and its corresponding ciphertext and wishes to determine the key used to perform the encryption. The most naive approach to this problem is to try all 256 keys, encrypting the plaintext under each key until a key that produces the ciphertext is found and is called the plaintext-ciphertext attack. Adleman and his co-authors [Adleman et al. 1999] provided a description of such an attack using the sticker model of molecular computation. Start with approximately 256 identical ssDNA memory strands each 11580 nucleotides long. Each memory strand contains 579 contiguous blocks each 20 nucleotides long. As it is appropriate in the sticker model there are 579 stickers⎯one complementaryto each block. Memory strands with annealed stickers are called memory complexes. When the 256 memory complexes have half of their sticker positions occupied at the end of the computation, they weigh approximately 0.7 g and, in solution at 5 g/liter, would occupy approximately 140 ml. Hence, the volume of the 1303 tubes needs be no more than 140 ml each. It follows that the 1303 tubes occupy, at most, 182 L and can, for example, be arrayed in 1 m long and wide and 18 cm deep.Adleman and his co-authors [Adleman et al. 1999] indicated that at the end of computation for breaking DES, 256× (56 key bits + 64 ciphertext bits) pairs were generated and processed. Adleman and his co-authors [Adleman et al. 1999] also pointed out that this codebook for breaking DES has approximately 263 (8 × 1018) bits of information (the equivalent of approximately one billion 1 gigabyte CDs). The actual running time for the algorithm of breaking DES depends on how fast the operations can be performed. If each operation requires 1 day, then the computation for breaking DES will require 18 years. If each operation requires 1 hour, then the computation for breaking DES will require approximately 9 months. If each operation can be completed in 1 minute, then the computation for breaking DES will take 5 days. Finally if the effective duration of a step can be reduced to 1 second, then the effort for breaking DES will require 2 hours. While it has been argued that special purpose electronic hardware [Adleman et al. 1999] or massively parallel supercomputers (the IBM Blue Gene/L machine is capable of 183.5 TFLOPS or 183.5 × 1012 floating-point operations per second) might be used to break DES in a reasonable amount of time, it appears that today's most powerful sequential machines would be unable to accomplish the task.In this paper, we first use the method of designing DNA sequences, cited from [Braich et al. 2000; Braich et al. 2002], to construct solution spaces of DNA strands for encoding every domain of a relational model [Codd 1970; Ullman and Widom 1997]. Then by using basic biological operations, we, respectively, develop DNA-based algorithms to perform eight operations of relational algebra (calculus), which include Cartesian product, union, set difference, selection, projection, intersection, join and division. Furthermore, this work offers clear evidence of the ability of molecular computing to perform data retrieval operations on bio-molecular relational databases.The paper is organized as follows. Section 2 introduces DNA models of computation proposed by Adleman and his co-authors. Section 3 introduces the DNA program to finish eight operations of relational algebra (calculus) on bio-molecular relational databases. Experimental results by simulated DNA computing and Conclusions are, respectively, drawn in Section 4 and Section 5.2. BACKGROUNDIn this section we review the basic structure of the DNA molecule and then discuss available techniques for dealing with DNA that will be used to perform eight operations of relational algebra (calculus), which include Cartesian product, union, set difference, selection, projection, intersection, join and division.2.1. THE STRUCTURE OF DNAFrom [Sinden 1994; Paun et al. 1998], DNA (DeoxyriboNucleic Acid) is the molecule that plays the main role in DNA based computing. In the biochemical world of large and small molecules, polymers and monomers, DNA is a polymer, which is strung together from monomers called deoxyriboNucleotides. The monomers used for the construction of DNA are deoxyribonucleotides. Each deoxyribonucleotide contains three components: a sugar, a phosphate group, and a nitrogenous base. The sugar has five carbon atoms − for the sake of reference there is a fixed numbering of them. The carbons of the sugar are numbered from 1' to 5'. The phosphate group is attached to the 5' carbon, and the nitrogenous base is attached to the 1' carbon. Within the sugar structure there is a hydroxyl group attached to the 3' carbon. Figure 1 is applied to show the chemical structure of a nucleotide [Sinden 1994; Paun et al. 1998].As stated in [Sinden 1994; Paun et al. 1998], distinct nucleotides are detected only with their bases, which come in two sorts: purines and pyrimidines. Purines include adenine and guanine, abbreviated A and G. Pyrimidines contain cytosine and thymine, abbreviated C and T. Because nucleotides are distinguished solely from their bases, they are simply represented as A, G, C, or T nucleotides, depending upon the kinds of bases that they have.Figure 1: The chemical structure of a nucleotide.From [Sinden 1994; Paun et al. 1998], nucleotides can be linked together in two different ways. The first method is that the 5'-phosphate group of one nucleotide is joined with 3'-hydroxyl group of the other forming a phosphodiester bond. The resulting molecule has the 5'-phosphate group of one nucleotide, denoted as 5' end, and the 3'-hydroxyl group of the other nucleotide available, denoted as 3' end, for bonding. This gives the molecule the directionality, and we can talk about the direction of 5' end to 3' end or 3' end to 5' end. The second way is that the base of one nucleotide interacts with the base of the other to form a hydrogen bond. This bonding is the subject ofthe following restriction on the base pairing: A and T can pair together, and C and G can pair together − no other pairings are possible. This pairing principle is called the Watson−Crick complementarity (named after James D. Watson and Francis H. C. Crick who deduced the famous double helix structure of DNA in 1953, and won the Nobel Prize for the discovery).According to [Sinden 1994; Paun et al. 1998], a DNA strand is essentially a sequence (polymer) of four types of nucleotides detected by one of four bases they contain. Two strands of DNA can form (under appropriate conditions) a double strand, if the respective bases are the Watson-Crick complements of each other – A matches T and C matches G; also 3’ end matches 5’ end. The length of a single stranded DNA is the number of nucleotides comprising the single strand. Thus, if a single stranded DNA includes 20 nucleotides, then we say that it is a 20 mer (i.e., it is a polymer containing 20 monomers). The length of a double stranded DNA (where each nucleotide is base paired) is counted in the number of base pairs. Thus if we make a double stranded DNA from a single stranded 20 mer, then the length of the double stranded DNA is 20 base pairs, also written 20 bp. Hybridization is a special technology term for the pairing of two single DNA strands to make a double helix and also takes advantages of the specificity of DNA base pairing for the detection of specific DNA strands (for more discussions of the relevant biological background, please refer to [Sinden 1994; Paun et al. 1998]).2.2. AALEMAN’S EXPERIMENT FOR SOLUTION OF A SATISFIABILITY PROBLEMAdleman and his co-authors [Braich et al. 2000; Braich et al. 2002] performed experiments that were applied to, respectively, solve a 6-variable 11-clause formula and a 20-variable 24-clause 3-conjunctive normal form (3-CNF) formula. A Lipton encoding [Lipton 1994] was used to represent all possible variable assignments for the chosen 6-variable or 20-variable SAT problem. For each of the 6 variables x1, …, x6 two distinct 15 base value sequences were designed. One represents true (T), x k T, and another represents false (F), x k F for 1 ≤k≤ 6. Each of the 26 truth assignments was represented by a library sequence of 90 bases consisting of the concatenation of one value sequence for each variable. DNA molecules with library sequences are termed library strands and a combinatorial pool containing library strands is termed a library. The 6-variable library strands were synthesized by employing a mix-and-split combinatorial synthesis technique [Braich et al. 2002]. The library strands were assigned library sequences with x1 at the 5’-end and x6 at the 3’-end (5’ −x1−x2−x3−x4−x5−x6− 3’). Thus synthesis began by assembling the two 15 base oligonucleotides with sequences x6T and x6F. This process was repeated until all 6 variables had been treated.The probes used for separating the library strands have sequences complementary to the value sequences. Errors in the separation of the library strands are errors in the computation. Sequences must be designed to ensure that library strands have little secondary structure that might inhibit intended probe-library hybridization. The design must also exclude sequences that might encourage unintended probe-library hybridization. To help achieve thesegoals, sequences were computer-generated to satisfy the proposed seven constraints [Braich et al. 2002]. The similar method also is applied to solve a 20-variable of 3-SAT [Braich et al. 2002].2.3. DNA MANIPULATIONSIn the last decade there have been revolutionary advances in the field of biomedical engineering particularly in recombinant DNA and RNA manipulating. Due to the industrialization of the biotechnology field, laboratory techniques for recombinant DNA and RNA manipulation are becoming highly standardized. Basic principles about recombinant DNA can be found in [Sinden 1994; Paun et al. 1998]. In this subsection we describe eight biological operations that are useful for finishing eight operations of relational algebra (calculus). The method of constructing DNA solution space for eight operations of relational algebra (calculus) is based on the proposed method in [Braich et al. 2000; Braich et al. 2002].A (test) tube is a set of molecules of DNA (a multi-set of finite strings over the alphabet {A, C, G, T}). Given a tube, one can perform the following operations:1.Extract. Given a tube P and a short single strand of DNA, S, the operation produces two tubes +(P, S) and −(P,S), where +(P, S) is all of the molecules of DNA in P which contain S as a sub-strand and −(P, S) is all of the molecules of DNA in P which do not contain S.2.Merge. Given tubes P1 and P2, yield ∪(P1, P2), where ∪(P1, P2) = P1∪P2. This operation is to pour two tubesinto one, without any change in the individual strands.3.Detect. Given a tube P, if P includes at least one DNA molecule we have ‘yes’, and if P contains no DNAmolecule we have ‘no’.4.Discard. Given a tube P, the operation will discard P.5.Amplify. Given a tube P, the operation, Amplify(P, P1, P2), will produce two new tubes P1and P2 so that P1 andP2 are totally a copy of P (P1 and P2 are now identical) and P becomes an empty tube.6.Append. Given a tube P containing a short strand of DNA, Z, the operation will append Z onto the end of everystrand in P.7.Append-head. Given a tube P containing a short strand of DNA, Z, the operation will append Z onto the head ofevery strand in P.8.R e ad. Given a tube P, the operation is used to describe a single molecule, which is contained in tube P. Even ifP contains many different molecules each encoding a different set of bases, the operation can give an explicitdescription of exactly one of them.3. CONSTRUCTING BIO-MOLECULAR RELATIONAL DATABASES3.1. THE INTRODUCTION TO A RELATIONAL VIEW OF DATAThe term relation is applied here in its accepted mathematical sense. Given sets S1, S2, …, S n (not necessarilydistinct), R is a relation on these n sets if it is a set of n-tuples each of which has its first element from S1, its secondelement from S2, and so on [Codd 1970]. More concisely, R is a subset of the Cartesian product S1×S2×…×S n.We shall refer to S j as the j th domain of R. As defined above, R is said to have degree n. Relations of degree 1 areoften called unary, degree 2 binary, degree 3 ternary, and degree n n-ary. For expository reasons, we shall frequentlymake use of an array representation of relations. An array that represents an n-ary relation R has the followingproperties [Codd 1970]:(1)Each row represents an n-tuple of R.(2)The ordering of rows is immaterial.(3)All rows are distinct.(4)The ordering of columns is significant ⎯ it corresponds to the ordering S1, S2, …, S n of the domains on which Ris defined.(5)The significance of each column is partially conveyed by labeling it with the name of the correspondingdomain.The example in Figure 2 illustrates a relation of degree 2, called employee, which reflects the employee’spersonal information of the same company from specified employee’s number to specified employee’s name.Employee’s number Employee’s nameFisher1 CarrieHamill2 MarkFigure 2: A relation of degree 2.3.2. DNA ALGORITHMS FOR THE CARTESIAN PRODUCT ON BIO-MOLECULAR DATABASESThe Cartesian product (or cross-product, or just product) of n sets, S1, S2, …S n, is the set of pairs that can beformed by choosing the first element of the pair to be any element of S1, the second element of the pair to be any element of S2, and so on [Codd 1970; Ullman and Widom 1997]. Assume that L k is the number of bits for the value of each element in S k to 1 ≤k≤n. Also suppose that R is an n-ary relation and has m elements. Assume that R is equal to {(r i, 1, …r i, n)|r i, k∈S k for 1 ≤k≤n and 1 ≤i≤m}. Also suppose that the value encoding r i, k in R can be represented as a binary number, v i, k, 1…v i, k, l for 1 ≤l≤L k, 1 ≤k≤n and 1 ≤i≤m. The bits v i, k, 1 and v i, k,l represent, respectively, the first bit and the last bit for r i, k. From [Braich et al. 2000; Braich et al. 2002], for every bit v i, k, j to 1 ≤j≤L k, two distinct 15 base value sequences are designed. One represents the value “0” for v i, k, j and the other represents the value “1” for v i, k, j. For the sake of convenience in our presentation, assume that v i, k, j1 denotes the value of v i, k, j to be 1 and v i, k, j0defines the value of v i, k, j to be 0 and v i, k, j defines the value of v i, k, j to be 0 or 1. The following DNA algorithms are used to implement a relational algebra (calculus), the Cartesian product, for constructing a bio-molecular database, R.Procedure Insert(T80, i)(1)For k = 1 to n(2) For j = 1 to L k(2a) Append(T80, v i, k, j).EndForEndForEndProcedureLemma 3−1: One record in a bio-molecular database, R,can be constructed with a library sequence from the algorithm Insert(T80, i).Proof:The algorithm, Insert(T80, i), is implemented via the append operation. It consists of one nested loop. The outer loop is applied to insert one record (including n fields) into a bio-molecular database, R. The inner loop is employed to construct each field of one record in R. Each time Step (2a) is used to append a DNA sequence, representing the value 0 or 1 for v i, k, j, onto the end of every strand in tube T80. This is to say that the value 0 or 1 to the j th bit in the k th field of the i th record in R appears in tube T80. After repeating execution of Step (2a), it finally produces tube T80 that consists of a DNA sequence with (15 * n * L k) base pairs, representing one record in R. Therefore, it is inferred that one record in a bio-molecular database, R,can be constructed with a library sequence. ■From Insert(T80, i), it takes (n * L k) append operations and a test tube to insert one record into a bio-molecular database, R. A binary number of (n * L k) bits corresponds to a record in a bio-molecular database, R. A value sequence for every bit of a record contains 15 base pairs. Therefore, the length of a DNA strand, encoding a recordin a bio-molecular database, R, is (15 * n * L k) base pairs consisting of the concatenation of one value sequence for each bit.Procedure CartesianProduct(T0, m)(1)For i = 1 to m(1a) Insert(T80, i).(1b) T0 = ∪(T0, T80).EndForEndProcedureLemma 3−2: A bio-molecular database, R,can be constructed with library sequences from the algorithm, CartesianProduct(T0, m).Proof:The algorithm, CartesianProduct(T0, m), is implemented via the append operation. It includes a single loop. The single loop is used to insert m records into a bio-molecular database, R. Each time Step (1a) is applied to call the procedure, Insert(T80, i), to insert one record (including n fields) into a bio-molecular database, R. This is to say that the i th record in R appears in tube T80. Next Step (2) is applied to pour tube T80 into tube T0. This implies that the i th record in R appears in tube T0 and tube T80 becomes an empty tube. After repeating execution of Step (1a) and Step (1b), it finally produces tube T0 that consists of m DNA sequences, representing m records in R. Therefore, it is derived that a bio-molecular database, R,can be constructed with library sequences. ■From CartesianProduct(T0, m), it takes (m * n * L k) append operations and two tubes to construct a bio-molecular database, R. A binary number of (n * L k) bits corresponds to a record in a bio-molecular database, R. A value sequence for every bit of a record contains 15 base pairs. Therefore, the length of a DNA strand, encoding a record in a bio-molecular database, R, is (15 * n * L k) base pairs consisting of the concatenation of one value sequence for each bit.3.3. DNA ALGORITHM FOR SET OPERATIONS ON BIO-MOLECULAR DATABASESThe three most common operations on sets are union, intersection, and difference. The following definitions, cited from [Ullman and Widom 1997], are used to explain how these operations perform their functions on arbitrary sets X and Y.Definition 3−1:X∪Y, the union of X and Y, is the set of elements that are in X or Y or both. An element appearsonly once in the union even if it is present in both X and Y.Definition 3−2: X∩Y, the intersection of X and Y, is the set of elements that are in both X and Y.Definition 3−3:X−Y, the difference of X and Y, is the set of elements that are in X but not in Y. Note that X−Y is different from Y−X; the latter is the set of elements that are in Y but not in X.When we apply these operations above to n-ary relations, we need to put some conditions on X and Y. The first condition is that X and Y must have identical sets of columns, and the domain for each column must be the same in X and Y. The second condition is that before we compute the set-theoretic union, intersection, or difference of sets of tuples, the columns of X and Y must be ordered so that their order is the same for both relations. DNA algorithms for performing these operations are, respectively, proposed in subsection 3.3.1, subsection 3.3.2 and subsection 3.3.3.3.3.1. A DNA ALGORITHM FOR Union OPERATOR ON BIO-MOLECULAR DATABASESAssume that X and Y are n-ary relations and have, respectively, p elements and q elements. Also suppose that X and Y are, respectively, equal to {(r i, 1, …r i, n)|r i, k∈S k for 1 ≤k≤n and 1 ≤i≤p} and {(r i, 1, …r i, n)|r i, k∈S k for 1 ≤k≤n and 1 ≤i≤q}. After the two DNA algorithms, CartesianProduct(T1, p) and CartesianProduct(T2, q), are called and are performed, tube T1 consists of p DNA sequences representing p records in X and tube T2 includes q DNA sequences representing q records in Y. The following DNA algorithm is used to perform X∪Y. Notations used in the following DNA algorithm appear in section 3.2.Procedure Union(T1, T2, T3, p)(1)Amplify(T1, T11, T12).(2)Amplify(T2, T21, T22).(3)T1 = ∪(T1, T11).(4)T2 = ∪(T2, T21).(5)For i = 1 to p(6) For k = 1 to n(7) For j = 1 to L k(7a) T22 = +(T22, v i, k, j) and T22OFF = −(T22, v i, k, j).(7b) T22ON = ∪(T22ON, T22OFF).EndForEndFor(7c) Discard(T22).(7d) T22 = ∪(T22, T22ON).EndFor(8) T3 = ∪( T12, T22).EndProcedureLemma 3−3: Union operator on two n-ary relations can be performed with library sequences from the algorithm, Union(T1, T2, T3, p).Proof:The algorithm, Union(T1, T2, T3, p), is implemented via the amplify, merge, extract and discard operations. DNA strands in tube T1 are used to represent p elements in X and DNA strands in tube T2 are also employed to represent q elements in Y. Step (1) is applied to amplify tube T1 and to generate two new tubes, T11 and T12, which are copies of T1 and tube T1 becomes empty. Next Step (2) is also employed to amplify tube T2 and to generate two new tubes, T21 and T22, which are copies of T2 and tube T2 becomes empty. Step (3) is used to pour tube T11 into tube T1. This is to say that DNA strands representing p elements in X are still reserved in tube T1. Then Step (4) is used to pour tube T21 into tube T2. This implies that DNA strands representing q elements in Y are still reserved in tube T2. From Step (3) through Step (4), it is very clear that the property for no change of elements in X and Y is satisfied in the processing of X∪Y. Step (5) is the outer loop of the nested loop and is used to check whether every element in X appears also in Y. Step (6) and Step (7) are the inner loop of the nested loop and are applied to examine whether the i th element in X also appears in Y.Each time Step (7a) employs the extract operation to form two test tubes: T22 and T22OFF. The values encoded by DNA strands in tube T22 are equal to the value of v i, k, j. The values encoded by DNA strands in tube T22OFF are not equal to the value of v i, k, j. Next each time Step (7b) uses the merge operation to pour tube T22OFF into tube T22ON. This indicates that elements in Y, that are different from the i th element in X, are encoded by DNA strands in tube T22ON. After repeating execution of Steps (7a) through (7b), tube T22 contains DNA strands encoding the i th element, that appears in both X and Y and tube T22ON includes DNA strands encoding elements in Y, which are different from the i th element in X. Then each time Step (7c) applies the discard operation to discard tube T22. On the execution of Step (7d), it applies the merge operation to pour tube T22ON into tube T22. After repeating execution of Step (7a) through Step (7d), this implies that elements in Y and in both X and Y are removed, and elements in X and in both X and Y are reserved. This guarantees that elements in both X and Y appear only once in the processing of X ∪Y. Finally, Step (8) uses the merge operation to pour tubes T12 and T22 into tube T3. This is to say that DNA strands in tube T3 is the result of X ∪Y. Therefore, it is derived that X ∪Y is performed through the algorithm, Union(T1, T2, T3, p). ■From Union(T1, T2, T3, p), it takes two amplify operations, (p * n * L k + p + 3) merge operations, (p * n * L k)。

相关文档
最新文档