– h3p://bow+e-­‐bio.sourceforge.net/index.shtml/ – h3p://samtools.sourceforge.net/ – h3p://tophat.cbcb.umd.edu/ – h3p://cufflinks.cbcb.umd.edu/ – h3p://compbio.mit.edu/cummeRbund/ *Linux, 64bit CPU, 16G memory
转录组数据分析解读及 实例操作
罗奇斌 中科院基因组研究所 德国慕尼黑工业大学
Second genera+on sequencers
பைடு நூலகம்
• Bow+e so1ware • SAM tools
Web-­‐based tools rQuant.web -­‐ is a web service to provide convenient access to tools for the quan+ta+ve analysis of RNA-­‐Seq data. Galaxy -­‐ Mapping pipeline for Illumina, 454, and SOLiD sequencing data. UCSC Genome Browser -­‐ This site contains the reference sequence and working dra1 assemblies for a large collec+on of genomes. It also provides portals to the ENCODE and Neandertal projects. Bioconductor -­‐ Bioconductor is an open source and open development so1ware project for the analysis and comprehension of genomic data. ExpEdit -­‐ is a web applica+on for assessing RNA edi+ng in human at known or user specified sites supported by transcript data obtained by RNA-­‐Seq experiments. Myrna -­‐ a cloud compu+ng tool for RNA sequence. GenePa3ern -­‐ is a powerful genomic analysis placorm that provides access to more than 100 tools for gene expression analysis, proteomics, SNP analysis and common data processing tasks. Others Scripture -­‐ is a method for transcriptome reconstruc+on that relies solely on RNA-­‐Seq reads and an assembled genome to build a transcriptome ab ini&o. CisGenome -­‐ An integrated tool for +ling array, ChIP-­‐seq, genome and cis-­‐regulatory element analysis.
• TopHat so1ware
• Cufflinks so1ware
• CummeRbund so1ware
• RNAseq is a powerful tool to detcet the whole transciptome in cell and tissue. • Previous RNAseq research focus on mRNA, but recent studies prove that part of functional noncoding transctipt and proteincoding RNAs are lack of polyA.
1. 至少有两个生物学重复,除非“短时间梯度取样” (overlapping time points with high temporal resolution)不需要 技术重复 2. 对基因注释较好的物种,只定量比较研究,可用reads大于 20M;用于注释基因组的转录组,大于>100M 3. 最好有浓度不同长度不同的绝对定量control (Spike-in),以评 估mapping质量、测序均匀性和RNA-seq定量效果 4. “3端/5端比值”是衡量RNA完整性的关键指标(理想值是1), 也要进行计算评估 5. 样品处理流程,文库构建流程,测序机器,测序类型,分析 软件,样品评估关键指标,rpkm值关键结果完备。
RNA-Seq Data Analysis Tools Mapping and Assembly tools BWA -­‐ BWA is a fast light-­‐weighted tool that aligns rela+vely short sequences (queries) to a sequence database (targe), such as the human reference genome SeqMap -­‐ A Tool For Mapping Millions Of Short Sequences To The Genome. MAQ -­‐ stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences. ERANGE -­‐ Mapping and Quan+fying Mammalian Transcriptomes by RNA-­‐Seq Cufflinks -­‐ assembles transcripts, es+mates their abundances, and tests for differen+al expression and regula+on in RNA-­‐Seq samples. iAssembler – a standalone package to assemble ESTs generated using Sanger and/or Roche-­‐454 pyrosequencing technologies into con+gs. MapPER -­‐ an RNA-­‐seq paired-­‐end read (PER) protocol. Support splice mapping and quan7fy TopHat -­‐ is a fast splice junc+on mapper for RNA-­‐Seq reads. SpliceMap -­‐ SpliceMap is a de novo splice junc+on discovery tool. It offers high sensi+vity and support for arbitrarily long RNA-­‐seq read lengths. MapSplice -­‐ Splice Junc+on Mapping Tool. Trinity RNA-­‐Seq Assembly – so1ware solu+ons targeted to the reconstruc+on of full-­‐length transcripts and alterna+vely spliced isoforms from Illumina RNA-­‐Seq data PALMapper -­‐ a combina+on of the spliced alignment method QPALMA with the short read alignment tool GenomeMapper.
Content of transcriptome
1. Genes: expression , alterante splices 2. Noncoding RNA: snoRNA, mRNA-like ncRNA, snRNA, some antisense transcripts, pesudogenes, retrotransposon ,and others functional RNAs 3. Some repeat elements