转录组数据分析解读及实例操作-1
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
– h3p://bow+e-‐bio.sourceforge.net/index.shtml/ – h3p://samtools.sourceforge.net/ – h3p://tophat.cbcb.umd.edu/ – h3p://cufflinks.cbcb.umd.edu/ – h3p://compbio.mit.edu/cummeRbund/ *Linux, 64bit CPU, 16G memory
转录组数据分析解读及 实例操作
罗奇斌 中科院基因组研究所 德国慕尼黑工业大学
Second genera+on sequencers
2
3
பைடு நூலகம்
4
常规分析
5
实验流程
6
分析所需工具
• Bow+e so1ware • SAM tools
Web-‐based tools rQuant.web -‐ is a web service to provide convenient access to tools for the quan+ta+ve analysis of RNA-‐Seq data. Galaxy -‐ Mapping pipeline for Illumina, 454, and SOLiD sequencing data. UCSC Genome Browser -‐ This site contains the reference sequence and working dra1 assemblies for a large collec+on of genomes. It also provides portals to the ENCODE and Neandertal projects. Bioconductor -‐ Bioconductor is an open source and open development so1ware project for the analysis and comprehension of genomic data. ExpEdit -‐ is a web applica+on for assessing RNA edi+ng in human at known or user specified sites supported by transcript data obtained by RNA-‐Seq experiments. Myrna -‐ a cloud compu+ng tool for RNA sequence. GenePa3ern -‐ is a powerful genomic analysis placorm that provides access to more than 100 tools for gene expression analysis, proteomics, SNP analysis and common data processing tasks. Others Scripture -‐ is a method for transcriptome reconstruc+on that relies solely on RNA-‐Seq reads and an assembled genome to build a transcriptome ab ini&o. CisGenome -‐ An integrated tool for +ling array, ChIP-‐seq, genome and cis-‐regulatory element analysis.
7
• TopHat so1ware
• Cufflinks so1ware
• CummeRbund so1ware
• RNAseq is a powerful tool to detcet the whole transciptome in cell and tissue. • Previous RNAseq research focus on mRNA, but recent studies prove that part of functional noncoding transctipt and proteincoding RNAs are lack of polyA.
RNA-seq的生物学重复和标准
1. 至少有两个生物学重复,除非“短时间梯度取样” (overlapping time points with high temporal resolution)不需要 技术重复 2. 对基因注释较好的物种,只定量比较研究,可用reads大于 20M;用于注释基因组的转录组,大于>100M 3. 最好有浓度不同长度不同的绝对定量control (Spike-in),以评 估mapping质量、测序均匀性和RNA-seq定量效果 4. “3端/5端比值”是衡量RNA完整性的关键指标(理想值是1), 也要进行计算评估 5. 样品处理流程,文库构建流程,测序机器,测序类型,分析 软件,样品评估关键指标,rpkm值关键结果完备。
Background
mRNA-seq
RNA-Seq Data Analysis Tools Mapping and Assembly tools BWA -‐ BWA is a fast light-‐weighted tool that aligns rela+vely short sequences (queries) to a sequence database (targe), such as the human reference genome SeqMap -‐ A Tool For Mapping Millions Of Short Sequences To The Genome. MAQ -‐ stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences. ERANGE -‐ Mapping and Quan+fying Mammalian Transcriptomes by RNA-‐Seq Cufflinks -‐ assembles transcripts, es+mates their abundances, and tests for differen+al expression and regula+on in RNA-‐Seq samples. iAssembler – a standalone package to assemble ESTs generated using Sanger and/or Roche-‐454 pyrosequencing technologies into con+gs. MapPER -‐ an RNA-‐seq paired-‐end read (PER) protocol. Support splice mapping and quan7fy TopHat -‐ is a fast splice junc+on mapper for RNA-‐Seq reads. SpliceMap -‐ SpliceMap is a de novo splice junc+on discovery tool. It offers high sensi+vity and support for arbitrarily long RNA-‐seq read lengths. MapSplice -‐ Splice Junc+on Mapping Tool. Trinity RNA-‐Seq Assembly – so1ware solu+ons targeted to the reconstruc+on of full-‐length transcripts and alterna+vely spliced isoforms from Illumina RNA-‐Seq data PALMapper -‐ a combina+on of the spliced alignment method QPALMA with the short read alignment tool GenomeMapper.
Content of transcriptome
1. Genes: expression , alterante splices 2. Noncoding RNA: snoRNA, mRNA-like ncRNA, snRNA, some antisense transcripts, pesudogenes, retrotransposon ,and others functional RNAs 3. Some repeat elements
转录组数据分析解读及 实例操作
罗奇斌 中科院基因组研究所 德国慕尼黑工业大学
Second genera+on sequencers
2
3
பைடு நூலகம்
4
常规分析
5
实验流程
6
分析所需工具
• Bow+e so1ware • SAM tools
Web-‐based tools rQuant.web -‐ is a web service to provide convenient access to tools for the quan+ta+ve analysis of RNA-‐Seq data. Galaxy -‐ Mapping pipeline for Illumina, 454, and SOLiD sequencing data. UCSC Genome Browser -‐ This site contains the reference sequence and working dra1 assemblies for a large collec+on of genomes. It also provides portals to the ENCODE and Neandertal projects. Bioconductor -‐ Bioconductor is an open source and open development so1ware project for the analysis and comprehension of genomic data. ExpEdit -‐ is a web applica+on for assessing RNA edi+ng in human at known or user specified sites supported by transcript data obtained by RNA-‐Seq experiments. Myrna -‐ a cloud compu+ng tool for RNA sequence. GenePa3ern -‐ is a powerful genomic analysis placorm that provides access to more than 100 tools for gene expression analysis, proteomics, SNP analysis and common data processing tasks. Others Scripture -‐ is a method for transcriptome reconstruc+on that relies solely on RNA-‐Seq reads and an assembled genome to build a transcriptome ab ini&o. CisGenome -‐ An integrated tool for +ling array, ChIP-‐seq, genome and cis-‐regulatory element analysis.
7
• TopHat so1ware
• Cufflinks so1ware
• CummeRbund so1ware
• RNAseq is a powerful tool to detcet the whole transciptome in cell and tissue. • Previous RNAseq research focus on mRNA, but recent studies prove that part of functional noncoding transctipt and proteincoding RNAs are lack of polyA.
RNA-seq的生物学重复和标准
1. 至少有两个生物学重复,除非“短时间梯度取样” (overlapping time points with high temporal resolution)不需要 技术重复 2. 对基因注释较好的物种,只定量比较研究,可用reads大于 20M;用于注释基因组的转录组,大于>100M 3. 最好有浓度不同长度不同的绝对定量control (Spike-in),以评 估mapping质量、测序均匀性和RNA-seq定量效果 4. “3端/5端比值”是衡量RNA完整性的关键指标(理想值是1), 也要进行计算评估 5. 样品处理流程,文库构建流程,测序机器,测序类型,分析 软件,样品评估关键指标,rpkm值关键结果完备。
Background
mRNA-seq
RNA-Seq Data Analysis Tools Mapping and Assembly tools BWA -‐ BWA is a fast light-‐weighted tool that aligns rela+vely short sequences (queries) to a sequence database (targe), such as the human reference genome SeqMap -‐ A Tool For Mapping Millions Of Short Sequences To The Genome. MAQ -‐ stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences. ERANGE -‐ Mapping and Quan+fying Mammalian Transcriptomes by RNA-‐Seq Cufflinks -‐ assembles transcripts, es+mates their abundances, and tests for differen+al expression and regula+on in RNA-‐Seq samples. iAssembler – a standalone package to assemble ESTs generated using Sanger and/or Roche-‐454 pyrosequencing technologies into con+gs. MapPER -‐ an RNA-‐seq paired-‐end read (PER) protocol. Support splice mapping and quan7fy TopHat -‐ is a fast splice junc+on mapper for RNA-‐Seq reads. SpliceMap -‐ SpliceMap is a de novo splice junc+on discovery tool. It offers high sensi+vity and support for arbitrarily long RNA-‐seq read lengths. MapSplice -‐ Splice Junc+on Mapping Tool. Trinity RNA-‐Seq Assembly – so1ware solu+ons targeted to the reconstruc+on of full-‐length transcripts and alterna+vely spliced isoforms from Illumina RNA-‐Seq data PALMapper -‐ a combina+on of the spliced alignment method QPALMA with the short read alignment tool GenomeMapper.
Content of transcriptome
1. Genes: expression , alterante splices 2. Noncoding RNA: snoRNA, mRNA-like ncRNA, snRNA, some antisense transcripts, pesudogenes, retrotransposon ,and others functional RNAs 3. Some repeat elements