Deep RNA-Seq uncovers the peach transcriptome landscape

Deep RNA-Seq uncovers the peach transcriptome landscape
Deep RNA-Seq uncovers the peach transcriptome landscape

Deep RNA-Seq uncovers the peach transcriptome landscape

Lu Wang ?Shuang Zhao ?Chao Gu ?Ying Zhou ?Hui Zhou ?Juanjuan Ma ?

Jun Cheng ?Yuepeng Han

Received:15April 2013/Accepted:15June 2013óSpringer Science+Business Media Dordrecht 2013

Abstract Peach (Prunus persica )is one of the most important of deciduous fruit trees worldwide.To facilitate isolation of genes controlling important horticultural traits of peach,transcriptome sequencing was conducted in this study.A total of 133million pair-end RNA-Seq reads were generated from leaf,?ower,and fruit,and 90%of reads were mapped to the peach draft genome.Sequence assembly revealed 1,162transcription factors and 2,140novel transcribed regions (NTRs).Of these 2,140NTRs,723contain an open reading frame,while the rest 1,417are non-coding RNAs.A total of 9,587SNPs were identi?ed

across six peach genotypes,with an average density of one SNP per *5.7kb.The top of chromosome 2has higher density of expressed SNPs than the rest of the peach gen-ome.The average density of SSR is 312.5/Mb,with tri-nucleotide repeats being the most abundant.Most of the detected SSRs are AT-rich repeats and the most common di-nucleotide repeat is CT/TC.The predominant type of alternative splicing (AS)events in peach is exon-skipping isoforms,which account for 43%of all the observed AS events.In addition,the most active transcribed regions in peach genome were also analyzed.Our study reveals for the ?rst time the complexity of the peach transcriptome,and our results will be helpful for functional genomics research in peach.

Keywords Peach áTranscriptome áAlternative splicing áRNA-Seq áNon-coding RNA

Introduction

Peach (Prunus persica ),a member of the family Rosaceae ,is the third most important of the deciduous fruit trees worldwide,ranking only after apple and pear.It is a diploid with a base chromosome number of 8.Peach is not only a major economic fruit crop grown world-wide,but also serves as an important model species for functional genomics research of woody perennial angiosperms due to its several distinct advantages,including self-compatibility,short juvenile phase (2–3years),and a small genome size

(*230Mb)(Aru

′s et al.2012).Over the last several dec-ades,great efforts have been made to develop various genomics resources such as ESTs (Yamamoto et al.2002),genetic maps (Joobeur et al.2000;Dirlewanger et al.2004),and BAC libraries (Zhebentyayeva et al.2008)and

Electronic supplementary material The online version of this article (doi:10.1007/s11103-013-0093-5)contains supplementary material,which is available to authorized users.

L.Wang áS.Zhao áC.Gu áY.Zhou áH.Zhou áJ.Ma áJ.Cheng áY.Han (&)

Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture,Wuhan Botanical Garden of the Chinese Academy of Sciences,Wuhan 430074,People’s Republic of China e-mail:yphan@https://www.360docs.net/doc/ac1665272.html, L.Wang

e-mail:wanglu@https://www.360docs.net/doc/ac1665272.html, S.Zhao

e-mail:247907101@https://www.360docs.net/doc/ac1665272.html, C.Gu

e-mail:caswhgc@https://www.360docs.net/doc/ac1665272.html, Y.Zhou

e-mail:zhouying_613@https://www.360docs.net/doc/ac1665272.html, H.Zhou

e-mail:huichou1987@https://www.360docs.net/doc/ac1665272.html, J.Ma

e-mail:mabaiquan10@https://www.360docs.net/doc/ac1665272.html, J.Cheng

e-mail:jcheng2007@https://www.360docs.net/doc/ac1665272.html,

Plant Mol Biol

DOI 10.1007/s11103-013-0093-5

to address molecular mechanism underlying various hor-ticultural traits in peach(Boudehri et al.2009;Li et al. 2009;Jime′nez et al.2010a,b;Brandi et al.2011).How-ever,only limited information is available on gene net-works associated with economically important traits.The genetic identi?cation of functionally important genes in peach is hampered by the lack of comprehensive transcript and physical maps.

Recently,the high-quality genome sequence of the doubled haploid peach cv.‘Lovell’has been released (The International Peach Genome Initiative2013),and this suggests we are entering the post-genomic era. Extensive functional genomics work is underway to identify the activity of various functional elements in the peach genome.Progress in functional genomic research is dependent on the availability of detailed transcriptome information.However,most of molecular studies in peach are involved in structural genomics,and few transcrip-tome studies have been conducted(Shulaev et al.2008; Mart?′nez-Go′mez et al.2011).As of now,only79,689 expressed sequence tags(ESTs)have been deposited in NCBI.To facilitate both functional annotation of the peach genome and identi?cation of genes controlling important traits,it is important to explore the transcrip-tome landscape of peach.

In the past decade,several approaches such as EST sequencing and microarray analysis have been developed to investigate features of transcriptome in fruit trees(Ya-mamoto et al.2002;Newcomb et al.2006;Trainotti et al. 2006;Vecchietti2009;Soria-Guerra et al.2011).However, Sanger-based EST sequencing generates highly redundant sequences of high-expressed and few low-expressed tran-scripts,rendering such efforts not as suitable as deep transcriptome analysis.Microarray design requires prior knowledge of ESTs or genomic sequences,and microarray analysis cannot detect either RNA variants such as alter-native splicing(AS)transcripts or novel transcripts.More recently,a deep RNA sequencing methodology,also called RNA-Seq,has been developed and shows its tremendous power in characterizing transcriptome because it can detect low-expressed transcripts,splice variants,and novel tran-scripts(Mortazavi et al.2008;Socquet-Juglard et al.2013). Therefore,RNA-Seq is now regarded as the latest and most powerful tool for sequencing and pro?ling of transcriptome.

In peach,two versions of microarrays developed from 4,806and7,862non-redundant ESTs,respectively,have been used to investigate transcriptome changes associated with biological processes such as response to hormone treatments,fruit ripening and chilling injury(Livio et al. 2007;Bonghi et al.2011;Mart?′nez-Go′mez et al.2011). However,the utilization of these microarrays has been greatly limited in peach transcriptome analysis because they were designed for several thousand genes,which was quite inadequate on the whole transcriptome scale.

To obtain a global view of the peach transcriptome, RNA-Seq method has been conducted on different tissues of peach.Based on extensive data analyses,we have identi?ed a substantial number of novel transcripts that signi?cantly improve the current genome annotation of peach.Moreover,other features have also been investi-gated,including alternative spliced(AS)isoforms,single nucleotide polymorphisms(SNPs),simple sequence repeats(SSRs),and the untranslated region(UTR) boundaries.The transcriptome data allow us to make accurate predictions of gene structures.Our results will be very helpful for the future of functional genomic research in peach and other fruit trees.

Materials and methods

Plant materials

Four peach varieties(‘Baifeng’,‘Jinxiang’,‘Dahongpao’, and‘Mantianhong’)and two ornamental peach varieties (‘Hongbaihua’and‘Hongyetao’),maintained at Wuhan Botanical Garden of the Chinese Academy of Sciences (Hubei Province,PRC),were used for transcriptome analysis.Leaves were collected from cv.Hongyetao and Mantianhong during juvenile stage in spring season. Flowers were collected from cv.Hongbaihua and Man-tianhong at the pink stage.Fruits were collected from cv. Baifeng,Jinxiang,and Dahongpao at65and85days after pollination.All the samples were immediately frozen in liquid nitrogen and then stored at-75°C until use.

cDNA library preparation and Illumina sequencing

Total RNA was extracted using TRIzol(Invitrogen,CA, USA)according to the manufacturer’s instructions,and treated with RNase-free DNase I(Takara,Dalian,China)to remove residual DNA.Equivalent amount of total RNA from fruit tissues of the same genotype were mixed equally and subjected to puri?cation of poly(A)mRNA.The puri?cation of Poly(A)mRNAs were performed using oligo-dT attached to magnetic beads.The mRNAs were fragmented using super sonication,and then subjected to ?rst-and second-strand cDNA synthesis using random hexamer primers.The cDNA libraries were prepared according to Illumina’s protocols.Fragments of*300bp were excised and enriched by PCR for18cycles.In total, we constructed2,2,3paired-end cDNA libraries for leaf,?ower,and fruit tissues,respectively.The cDNA libraries were sequenced using Illumina Hiseq2000sequencer according to the manufacturer’s instructions.

Plant Mol Biol

Mapping RNA-Seq reads to the peach genome

and transcript annotation

RNA-Seq reads were aligned against the peach genome sequences(https://www.360docs.net/doc/ac1665272.html,/node/355)using pro-grams Tophat,Bowtie,and BWA(Trapnell et al.2009; Langmead et al.2009;Li and Durbin2009).Overlapping RNA-Seq reads were merged into continuous transcribed sequences using cuf?ink package(Trapnell et al.2009), and the splice junction maps and splicing isoforms were simultaneously generated.UTRs were identi?ed according to the method as previously described by Lu et al.(2010). The sequences of the assembled transcripts were compared against NCBI RefSeq nucleotide database and Swiss-Prot and UniPro protein databases.Homologues were sequen-tially annotated according to the blast results,followed by the pathway annotation pipelines,including COG(http:// https://www.360docs.net/doc/ac1665272.html,/COG/),GO(http://www.geneontology. org),and KEGG(www.genome.jp/kegg/).

Identi?cation of SNPs,SSRs,and alternative splicing events

RNA-Seq reads from all the six genotypes were used for SNP identi?cation.SNP calling was conducted using VarScan2.2.10with the default parameters.The distribu-tion of SNPs on coding and UTR regions was analyzed using ANNOVAR(Wang et al.2010).

The assembled transcribed sequences were searched for perfect microsatellites,with a basic motif length of2–6bp, using the SSR scanning program(Temnykh et al.2001). Repeats with a minimum length of12bp for di-to tetra-nucleotide repeats,15bp for penta-nucleotide repeats,and 18bp for hexa-nucleotide repeats were recorded.The result that RNA-Seq reads were mapped to the peach genome was used to detect AS events using MATS pack-age(Shen et al.2012).RNA-Seq reads from all the six genotypes were used for identi?cation of splicing events.

Differentially or strongly expressed gene assessment

The RNA-Seq read-mapping result was used to predict gene expression pro?les,and gene expression level was quanti?ed using FPKM values(Fragments per kilo bases per million reads).FPKM values were calculated using program Cuf?inks(http://cuf?https://www.360docs.net/doc/ac1665272.html,/)with a statistical method of RSEM.We set a threshold value at0.3 FPKM to determine whether or not a gene was expressed in a speci?c tissue.To identify active transcribed regions in the peach genome,RNA-Seq reads from all samples were used to calculate the gene FPKM value.The genes with the top1and10%highest FPKM values were considered to derive from active transcribed regions.Real-time PCR analysis and AS validation

Total RNA was extracted using TRIzol(Invitrogen,CA, USA)following the manufacturer’s instructions.Approxi-mately5l g of total RNA per sample was treated with DNase I(Takara,Dalian,China),and then subjected to the?rst strand cDNA synthesis.A SYBR green-based real-time PCR assay was carried out in a total volume of20l L reaction mixture containing10.0l L of29SYBR Green I Master Mix(Takara,Dalian,China),0.2l M of each primer,and 100ng of template cDNA.A peach gene PpEF2was used as a constitutive control(Tong et al.2009).Ampli?cations were performed using a StepOne real-time PCR System(Applied Biosystems).The ampli?cation program consisted of one cycle of95°C for30s,followed by40cycles of95°C for 15s and60°C for30s.Fluorescent products were detected in the last step of each cycle.Melting curve analysis was performed at the end of40cycles to ensure proper ampli?-cation of target fragments.All analyses were repeated three times using biological replicates.

The validation of AS events was performed using RT-PCR.The mixture of cDNAs prepared from leaves of cv. Hongyetao and Mantianhong was used as template.The PCR program consisted of one cycle of95°C for5min,followed by45cycles of95°C for30s,55°C for30s and72°C for 1min.The sequences of primers are listed in Table S1. Results

Overview of the RNA-Seq data

A total of40.8,61.2,and26.9million pair-end reads in length of81bp were generated from?ower,fruit,and leaf tissues,respectively(Table1).Raw reads were trimmed by removing adaptor sequences,empty reads,and low quality sequences.As a result,260.9million(98.1%)of high quality reads,designated as clean reads,were generated.Of the clean reads,98.8and1.2%were paired-and single-end reads,respectively(Table1).The majority of the clean reads(89.3%)were mapped to the peach genome.Most of the mapped reads(95.4%)were anchored onto eight mega-scaffolds that represent96%of the peach genome and correspond to8haploid chromosomes(n=8)of peach.A small portion of the mapped reads(4.6%)were located on other194sequence scaffolds,which have not been anchored onto chromosomes.

Identi?cation of transcribed regions in the peach genome

RNA-Seq reads from all the three tissues were mapped to the peach genome and35,263consensus sequences were

Plant Mol Biol

identi?ed.All the35,263consensus sequences were gen-erated from24,427genes.Of the24,427genes,22,287 were previously predicted in the peach reference gene set in GDR database(https://www.360docs.net/doc/ac1665272.html,/peach/genome). However,the rest of2,140genes were not included in the peach reference gene set,representing novel transcribed regions(NTRs)(Fig.1a).Moreover,our analysis revealed *20.0Mb UTRs.On average,each gene contained an 876bp of UTR.The transcript sizes ranged from0.1to 13.3kb,with an average of1.8kb(Fig.1b).The peach predicted transcripts in GDR database were well improved when they were combined with our results.The percentage of RNA-Seq reads mapped to the transcribed regions of our study was higher than that of RNA-Seq reads mapped to the GDR predicted transcripts.

The annotation of genes identi?ed in this study was shown in Fig.1c.Of the24,427genes,15,684(64.2%) had one or more GO(Gene Ontology)annotations, resulting in11,118,11,311and12,980biological process, cellular component and molecular function terms,respec-tively.Of all the24,427genes,4,745had one or more KEGG annotations,and they belonged to2,489pathways.

A total of1,162genes were identi?ed to be putative tran-scription factors(TFs).Among these peach TFs,37were not included in the GDR predicted gene set.

Of the2,140NTRs,723(33.8%)contained an open reading frame,while the rest1,417(66.2%)were non-coding RNA(ncRNA)genes.The physical distributions of NTRs and ncRNAs on the peach genome were shown in Fig.2.The protein-coding NTRs were evenly distributed throughout the peach genome,while high density of ncR-NAs was observed on chromosomes1,4,5,7,and8.Of the 2,140NTRs,2,072(96.8%)and69(3.2%)were located on genetically anchored and non-anchored scaffolds, respectively.Of the anchored NTRs,692(32.3%)and 1,380(64.5%)were protein-coding and ncRNAs,respec-tively.On contrast,31and38of the non-anchored NTRs were protein-coding and ncRNAs,respectively.

Analysis of SNPs in the transcribed regions of peach

In this study,transcriptome sequencing data were gener-ated from six genotypes,and thus provided an opportunity to investigate the frequency of SNPs in transcribed regions.As a result,a total of9,587SNPs were identi?ed from the peach transcribed regions,with an average density of one SNP per*5.7kb.Of all the SNPs,61.7and38.3%were transitions and transversions,respectively(Table2).Two transitions A/G and C/T were the two most abundant SNPs and accounted for31.00and30.71%of all SNPs, respectively.Four transversions i.e.A/C,A/T,G/C,and G/T were evenly present,with each accounting for*10% of all SNPs.Moreover,16and6SNPs caused stop codon gain and loss,respectively.

The physical distribution of expressed SNPs on the peach genome was summarized in Fig.3.All the9,587 SNPs were located on31scaffolds(Table S2).Scaffold2 had the highest density of expressed SNPs,followed by scaffolds6and4.Of all the9,587SNPs,9,109(95.0%) and478(5.0%)were located in5,127protein-coding genes and359ncRNAs,respectively.Of the5,127protein-coding transcripts,3,143contained1SNP and1,984had two or more SNPs.A transcript encoding ACC-NBS-LRR protein(GDR accession no.ppa016901m)involved in disease resistance showed the highest genetic variation of 30SNPs.However,most of the transcripts(77.9%)had no SNPs.Moreover,5,148,916,and3,045SNPs were iden-ti?ed in coding sequences(CDSs),50UTRs,and30UTRs, respectively.Of the5,148SNPs in CDSs,2,687(52.2%) were synonymous and located in1,741genes.2,461 (47.8%)were non-synonymous and located in2,025 genes.

Analysis of SSRs in the transcribed regions of peach

A total of17,979SSRs were identi?ed in the peach tran-scriptome,with an average density of one SSR per3.2kb (Table3).Tri-nucleotide repeats were the most abundant, accounting for36.5%of all SSRs.Di-,tetra-,penta-,and hexa-nucleotide repeats accounted for32.5,17.6,6.5,and 6.9%,respectively,for all SSRs.Of the dimers,CT/TC repeats were the most abundant,accounting for57.2%of all trimers,while,AG/GA,AT/TA,AC/CA and GT/TG repeats accounted for27.4,10.2,3.1and2.0%,respec-tively.Only one GC/CG repeat12bp in length was found. Of the trimers,AAG/AGA/GAA repeats were the most abundant,accounting for15.8%of all trimers,followed by CTT/TCT/TTC(14.6%)and CCT/CTC/TCC(8.0%).

Table1RNA-Seq clean reads and their physical mapping result in peach

Samples No.of reads(million)Total sizes(Mb)No.of mapped read(million) Paired Single Left read Right read Left Right

Flower40.800.773,191.23,180.637.4837.63

Fruit61.16 1.614,699.54,729.355.8055.79

Leaf26.890.802,084.42,088.824.2024.35

Plant Mol Biol

Among tetramers,AT-rich repeats were the most abundant,accounting for 20.7%of all tetramers.Moreover,76.9%of SSR motifs were \20bp in length.

The physical distribution of SSRs across the peach genome was shown in Fig.3.Overall,the SSR density was consistent with transcript density in the peach genome.In addition,the distribution of SSRs in UTRs and CDSs is also investigated (Table 4).Among UTR-SSRs,di-nucle-otide repeats were the most abundant,accounting for 42.0%.CT/TC and AG/GA dimers prevailed in UTR sequences,accounting for 23.3and 11.9%of all UTR-SSRs,respectively.Of CDS-SSRs,tri-nucleotide repeats were the most abundant,accounting for 68.2%of all CDS-SSRs.AAG/AGA/GAA and CTT/TCT/TTC trimers were frequently encountered in CDSs,accounting for 10.3and 7.4%of all CDS-SSRs,respectively (Fig.S1).

Identi?cation of alternative splicing events and exons in the peach transcriptome

Five types of AS events were analyzed in the peach tran-scriptome,including exon skipping (ES),alternative 50splice site (A5SS),alternative 30splice site (A3SS),mutually exclusive exons (MXE),and retained intron (RI).As a result,10,835AS events were identi?ed in 5,520transcribed regions,including 496NTRs (Fig.4a).ES was the most abundant type of AS events (42.8%),followed by A3SS (37.1%)and A5SS (15.6%)(Fig.4b).RI and MXE were rare and accounted only for 2.8and 1.7%,respectively.

All these AS events occurred in 5,634transcripts.The physical distribution of the transcripts containing AS events across the peach genome was shown in Fig.4

c.

Fig.1Overview of the peach transcriptome.a Comparison of the estimated gene numbers between GDR database (I)and our study (II).b Distribution of transcript sizes.c GO annotation of genes identi?ed in our study

Plant Mol Biol

Nearly all the transcripts containing AS events were loca-ted on scaffolds 1–9.

Genes expressed in peach leaf,?ower,and fruit

Overall,19,300,18,932,18,435genes were identi?ed in leaf,?ower,and fruit tissues,respectively (Fig.5a).The expression of 16,472genes was detected in all three tis-sues,while 831,892,and 680genes were speci?cally expressed in leaf,?ower,and fruit,respectively.856,427,and 1,141genes were expressed in two tissues,i.e.,leaf/fruit,?ower/fruit,and ?ower/leaf,respectively.3,128genes had extremely low expression in all three tissues.As mentioned above,1,162TFs were identi?ed in all samples analyzed.Of the 1,162TFs,874TFs were expressed in all analyzed tissues,while 21,33and 52TFs were spe-ci?cally expressed in ?ower,fruit,and leaf,respectively (Fig.5b).77,68,and 31TFs were expressed in two tissues,i.e.,?ower/leaf,leaf/fruit,and ?ower/fruit,respectively.Six TFs had extremely low expression in all three tissues.Among the TFs expressed in tested tissues,100,66,21,33,67,and 45encoded MYB,bHLH,WD40,MADS-box,ERF,and WRKY proteins,respectively (Fig.5c).Overall,most of these typical TFs (79%)were expressed in all three tissues,while 26,14,1,13,14,and 9of MYB,bHLH,WD40,MADS-box,ERF,and WRKY TFs,respectively,were differentially expressed (Fig.5d).Of the 26differ-entially expressed MYB TFs,11,4,and 2were exclusively expressed in leaves,?owers,and fruits,respectively,while 8and 1were expressed in two types of tissues viz.?ower/leaf and ?ower/fruit,respectively.For the 14differentially expressed bHLH TFs,3,2,and 1were exclusively expressed in leaf,?ower,and fruit tissues,respectively,while 5and 3were expressed in two types of tissues viz.?ower/leaf and leaf/fruit,respectively.Of the 14differen-tially expressed ERF TFs,4,1,and 5were exclusively expressed in leaf,?ower,and fruit tissues,respectively,while 2,1,and 1were expressed in two tissues viz.?ower/leaf,?ower/fruit,and leaf/fruit,respectively.Among the 13differentially expressed MADS TFs,6and 3

were

Fig.2Distribution of transcripts on the peach genome

Plant Mol Biol

exclusively expressed in leaf and ?ower,respectively,while 1,1,and 2were expressed in two tissues viz.?ower/leaf,?ower/fruit,and leaf/fruit,respectively.Of the 9dif-ferentially expressed WRKY TFs,5were exclusively expressed in leaf,while 4were expressed in leaf and fruit.Intriguingly,only one out of 23WD40TFs showed dif-ferential expression and it was exclusively expressed in fruits (Fig.5d).

The most active transcribed regions in peach genome The expression of 24,427genes was quanti?ed using FPKM values and 21,299(87.2%)had an FPKM value [0.3in at least one tissue.The physical distribution of the top 500highest expressed genes was shown in Fig.6a.Scaffolds 1,2,3,5,7,and 8contained more top 500genes at the bottom than at the top.In contrast,scaffold 4had more top 500genes at the top than at the bottom.Scaffold 6contained slightly more top 500genes at the top than at the bottom.The annotation of the top 500genes was shown in Fig.6b.Most of the top 500genes (97.5%)had one or more GO annotations,including biological process,cellu-lar component and molecular function terms.Validation of gene expression pro?les and AS events using RT-PCR

Seven pairs of primers were designed to verify AS events.Of the 7primer pairs,?ve generated more than one bands with predicted sizes (Fig.S2).In addition,?ve pair primers were designed to validate the expression data in silicon (Fig.S3).As a result,four genes showed high expression in all tested tissues and one was only expressed in leaf tissue.This result was well consistent with the expression pro?les estimated from RNA-Seq data.

Table 2Composition of SNPs in the transcribed regions of peach Region Type SNP No.%50UTR

Transition

A $G 276 2.91C $T 290 3.06Total

566 5.97Transversion

A $C 890.94A $T 910.96G $C 167 1.76G $T 810.86Total

428 4.5230UTR

Transition

A $G 8238.69C $T

8238.69Total 1,64617.38Transversion

A $C 257 2.71A $T 292 3.08G $C 511 5.39G $T 240 2.53Total

1,30013.72CDS

Transition

A $G 1,70718.02C $T 1,64017.31Total

3,34735.33Transversion

A $C 401 4.23A $T 454 4.79G $C 523 5.52G $T 420 4.43Total

1,79818.98ncRNA

Transition

A $G 107 1.13C $T

136 1.44Total 243 2.57Transversion

A $C 340.36A $T 290.31G $C 410.43G $T 240.25Total

145

1.53

Fig.3Distribution of expressed SNPs and SSRs on the peach genome

Plant Mol Biol

Plant Mol Biol

Table3Composition and length distribution of SSR motifs in the peach transcriptome

Repeat unit Repeat type Repeat length(bp)Total Frequency(%)

\20C20

Dimer AC/CA132461780.99 GT/TG94251190.66

AG/GA8008051,6058.93

CT/TC1,5411,8053,34618.61

AT/TA388211599 3.33

CG1010.01

Total2,9562,8925,84832.53

Trimer AAT/ATA/TAA143151580.88 ATT/TAT/TTA16019179 1.00

AAC/ACA/CAA30840348 1.94

GTT/TGT/TTG15471610.90

AAG/AGA/GAA9241151,039 5.78

CTT/TCT/TTC849107956 5.32

ACC/CAC/CCA41635451 2.51

GGT/GTG/TGG27112283 1.57

CCT/CTC/TCC47254526 2.93

AGG/GAG/GGA29742339 1.89

CCG/CGC/GCC832850.47

CGG/GCG/GGC835880.49

Others1,7741761,95010.84

Total5,9346296,56336.50 Tetramer AAAC/AACA/ACAA/CAAA10131040.58 AAAG/AAGA/AGAA/GAAA27613289 1.61

AAAT/AATA/ATAA/TAAA2215226 1.26

AACC/ACCA/CAAC/CCAA540540.30

AAGG/AGGA/GAAG/GGAA672690.38

AATT/ATTA/TTAA/TAAT12531280.71

ACCC/CACC/CCAC/CCCA340340.19

ATTT/TATT/TTAT/TTTA28318301 1.67

AGGG/GAGG/GGAG/GGGA532550.31

CCCT/CCTC/CTCC/CCCT105101150.64

CTTT/TCTT/TTCT/TTTC28621307 1.71

CCTT/CTTC/TCCT/TTCC743770.43

GGGT/GGTG/GTGG/TGGG190190.11

GGTT/GTTG/TGGT/TTGG491500.28

GTTT/TGTT/TTGT/TTTG12261280.71

Others1,129811,210 6.73

Total2,9981683,16617.61 Pentamer AAAAT/AAATA/AATAA/ATAAA/TAAAA7411850.47 AAAAG/AAAGA/AAGAA/AGAAA/GAAAA7618940.52

CTTTT/TCTTT/TTCTT/TTTCT/TTTTC80211010.56

ATTTT/TATTT/TTATT/TTTAT/TTTTA7611870.48

GTTTT/TGTTT/TTGTT/TTTGT/TTTTG438510.28

Others629123752 4.18

Total9781921,170 6.51 Hexamer Total9532791,232 6.85

SSRs recorded for the?nal dataset included dimmers and trimers with at least12bp in length and tetramers to hexamers with at least3repeats

Discussion

Frequency of SSRs and SNPs identi?ed in the peach transcriptome

SSRs and SNPs derived of transcribed sequences serve as gene-tagged markers and will be very helpful for genetics and functional genomics study.Here,our study reveals that SSRs are abundant in the peach transcriptome,with an average density of312SSRs per Mb.The SSR density in the peach transcriptome is similar to the overall density of SSRs in the expressed sequences of other dicots such as Arabidopsis(357SSRs/Mb),Medicago(324SSRs/Mb), soybean(403SSRs/Mb),poplar(424SSRs/Mb),grapevine (247SSRs/Mb),and cucumber(370SSRs/Mb),but lower than monocots such as rice(739SSRs/Mb)and sorghum (646SSRs/Mb)(Cavagnaro et al.2010).Moreover,tri-nucleotide repeats are the most frequent SSR type in the peach transcriptome,followed by di-and tetra-nucleotide repeats.This result is in agreement with previous reports that tri-nucleotide repeats are the most abundant type of SSRs in the expressed sequences of plant species such as Arabidopsis,Medicago,soybean,poplar,grapevine,rice and sorghum(Cavagnaro et al.2010).

The most common di-nucleotide repeat is AG/GA/CT/ TC in the peach transcriptome,which is consistent with previous?ndings in the expressed sequences of other plant species,including monocots such as rice,maize,barley and sorghum and dicots such as apple,almond,rose,Arabid-opsis,Medicago,soybean,poplar,grapevine,tomato and cotton(Cardle et al.2000;Kantety et al.2002;Jung et al. 2005;Cavagnaro et al.2010;Zhang et al.2012).The rarity of GC/CG repeats observed in the peach transcriptome seems to be common in the expressed sequences of all other species.However,the most common type of tri-nucleotide repeats in the expressed sequences varies among plant species.For example,AAG/AGA/GAA are the pre-dominant tri-nucleotide repeats in the expressed sequences of dicots such as Arabidopsis,peach,apple and cucum-ber,while CCG/CGC/GCC repeats prevail in the expressed sequences of monocots such as rice and sorghum

Table4Distribution of SSRs in CDSs and UTRs in the peach

transcriptome

Repeat Region Repeat length(bp)Sum%

\20C20

Dimer CDS313309622 3.46

UTR2,6432,5835,22629.07

Trimer CDS3,4443253,76920.96

UTR2,4903042,79415.54

Tetramer CDS4768484 2.69

UTR2,5221602,68214.92

Pentamer CDS103191220.68

UTR8751731,048 5.83

Hexamer CDS396132528 2.94

UTR557147704

3.92

Fig.4AS events in the peach transcriptome.a Diagram of?ve major types of AS events(Shen et al.2012).b Proportions of different types of AS events.c Distribution of AS events on the peach genome

Plant Mol Biol

Fig.5Characterization of genes in the peach transcriptome.a Genes expressed in leaf,fruit,and ?ower.b TFs expressed in leaf,fruit,and ?ower.c Typical TFs in the peach transcriptome.d Typical TFs expressed in leaf,fruit,and

?ower

Fig.6Physical distribution (a )and GO annotation (b )of the top 500highest expressed genes in the peach transcriptome

Plant Mol Biol

(Zhang et al.2012;Cavagnaro et al.2010).This variation in the frequency of tri-nucleotide repeats may be partially attributed to the fact that GC contents in monocots are generally higher than those observed in dicots(Cavagnaro et al.2010).

In contrast to the high frequency of SSRs in the peach transcriptome,a low density of SNPs in the expressed sequences(*0.2/kb)was observed across six varieties. This observed SNP density in the peach transcriptome is extremely lower than those reported for other fruit tree species.For example,an average density of15.6SNPs per kb has been reported in the expressed sequences from grapevine(Lijavetzky et al.2007).In apple,71,482SNPs were identi?ed from9,555EST contigs,with an overall density of6.7SNPs per kb(Chagne′et al.2008).Similarly, Khan et al.(2012)detected37,807SNPs from6,888apple EST contigs,with an average density of5.3SNPs per kb.A low density of SNPs in the peach transcriptome could be partially attributed to a small sample of peach varieties used in this study.On the other hand,whole genome re-sequencing of56peach breeding accessions revealed 1,022,354SNPs,with an overall density of4.4SNPs/kb in the peach genome(Ahmad et al.2011;Verde et al.2012). The frequency of SNPs in genomic DNA sequences is much higher than observed in the transcribed sequences in this study.This result implies selection pressure could be stronger in genic regions than in nongenic regions during the process of peach domestication and adaptation.

It is worth noting that a putative resistance gene (ppa016901m)that contains the highest density of expressed SNPs is located at the top of scaffold2.Intriguingly,the density of expressed SNPs,including both synonymous and non-synonymous coding SNPs,is higher at the top of scaf-fold2than those at the rest of the peach genome(Fig.3). Peach genome sequencing of cultivated varieties and wild species also reveals a high SNP density at the top of scaffold 2(The International Peach Genome Initiative2013).The top of scaffold2is rich in resistant genes.Therefore,our study con?rms the previous?nding that regions hosting resistant genes are evolving rapidly(McHale et al.2006).

Novel transcribed regions and AS events in the peach transcriptome

In this study,we have produced over21.5Gb Illumina RNA-Seq data,which represent*96-fold coverage of the peach genome and over550-fold coverage of the peach reference gene set.In addition,up to90%of RNA-Seq reads have been mapped to the peach reference genome.This percent-age of the mapped reads is much higher than the ratio of *60%previously reported in rice(Lu et al.2010).The outcome of our effort suggests that both transcriptome sequences and the peach reference genome are of high quality.Therefore,our cDNA deep sequencing data provide a good opportunity to identify AS events and NTRs in peach.

Firstly,alternative splicing is common in plants and over 20%of plant genes produce two or more transcript isoforms (Campbell et al.2006).Here,22.6%of peach genes are observed to undergo AS,which is commensurate with the levels observed in rice and Arabidopsis.However,the pre-dominant type of AS events in peach is exon-skipping iso-forms,which account for43%of all the observed AS events. This result contradicts the previous?nding that exon-skip-ping is relatively rare in plants(Barbazuk et al.2008).For example,the proportion of AS events that undergo exon-skipping in Arabidopsis,rice and maize are3,11,and5%, respectively.Similarly,intron retention isoforms are rare in peach,accounting for only about3%of all the observed AS events as opposed to over30%reported in Arabidopsis,rice and maize(Barbazuk et al.2008).These contradictions clearly suggest that the preferential type of AS events is not conserved among wide spectra of plant species.

In peach,838AS events have been discovered through EST sequencing(The International Peach Genome Initiative 2013).However,this conventional approach is usually impaired by EST representation biased towards highly expressed genes and depth of sequencing.For example,the construction of EST database has been well conducted in human,but31%of exons are still represented by no or a single EST(Johnson et al.2003).The depth of the human EST collection is much better than any of EST collections conducted in plants(Barbazuk et al.2008).Thus,using EST collections to investigate AS events is greatly limited in plants,which may result in underestimation of AS events in plants.High-throughput sequencing tools such as RNA-Seq are obviously more powerful than Sanger’s EST sequencing for the purpose towards investigation of AS events.In this study,the transcribed regions of peach are deeply sequenced, with an average coverage of550-fold depth.Therefore,our results related to the estimation of AS events in peach are not only repeatable but reliable as well.Of course,it is still needed to sequence more cDNA libraries covering different tissues,developmental stages and a range of stress conditions to get a full view of AS events in peach.

Secondly,our study reveals the incidence of2,140 NTRs,most of which(66.2%)are long non-coding RNAs (lncRNA).More recently,IncRNAs is becoming a hot research topic in plants.For example,two classes of lncRNAs,which play important role in regulation of ver-nalization,have been identi?ed in Arabidopsis FLC locus (Swiezewski et al.2009;Heo and Sung2011).Likewise, several lncRNAs,which are responsive to powdery mildew infection and heat stress,have been reported in wheat(Xin et al.2011).Here,we report the transcripts of IncRNAs at genome-wide level in peach,and our?ndings will be helpful for functional genomics research in peach.

Plant Mol Biol

Transcriptome sequence serves as a complement

to the draft sequence of the peach genome

Compared with the reference genomes of apple and strawberry,the peach reference genome possesses a high quality of sequence assembly(The International Peach Genome Initiative2013).However,most of the predicted genes contain no UTRs,suggesting there is a large room for the improvement of annotation of the peach genome.In this study,the UTRs of the peach reference genes have been extended to an average size of876bp.These UTRs are very useful for the study of digital gene expression pro?ling analysis because the sequences of UTRs are unique and can serve as gene tags(Nishiyama et al.2012). Moreover,our study also reveals nearly2,983novel tran-scripts.Surprisingly,these new transcripts include15 S-locus genes(6encoding S-haplotype-speci?c S-RNase and9encoding S-haplotype-speci?c F-box protein).All these S-locus genes show a high level expression in the tested tissues.It is well known that S-locus genes are responsible for self-incompatibility in fruit trees of Rosa-ceae such as almond,pear and apple(Wang et al.2009). Peach,unlike its close relative almond,is self-fertile.Thus, it is not clear whether these S-locus genes have the same function as their orthologs after the divergence of peach from other Rosaceae species.In addition,10,835of alter-native splicing events and2,461non-synonymous SNPs have also been identi?ed in this study.The expressed SNPs are of highly informative resource for genotyping such as the design of peach SNP chip.

In short,our study reveals for the?rst time the com-plexity of the peach transcriptome,and gives an extensive new knowledge about alternative splicing,NTRs,and gene boundaries.The results will not only serve as a comple-ment to the predicted gene database of peach,but also provide an invaluable resource for functional genomics research in peach and other fruit trees in the future. Acknowledgments This project was supported by funds received from the National863program of China(No.2011AA100206),the National 948Project from the Ministry of Agriculture of China,and the National Natural Science Foundation of China(No.31201604and31000139).

References

Ahmad R,Par?tt DE,Fass J,Ogundiwin E,Dhingra A,Gradziel TM, Lin D,Joshi NA,Mart?′nez-Garc?′a PJ,Crisosto CH(2011)Whole genome sequencing of peach(Prunus persica L.)for SNP identi?cation and selection.BMC Genomics12:569

Aru′s P,Verde I,Sosinski B,Zhebentyayeva T,Abbott AG(2012)The peach genome.Tree Genet Genomes8:1–17

Barbazuk WB,Fu Y,McGinnis KM(2008)Genome wide analyses of alternative splicing in plants:opportunities and challenges.

Genome Res18:1381–1392Bonghi C,Trainotti L,Botton A,Tadiello A,Rasori A,Ziliotto F, Zaffalon V,Casadoro G,Ramina A(2011)A microarray approach to identify genes involved in seed-pericarp cross-talk and development in peach.BMC Plant Biol11:107

Boudehri K,Bendahmane A,Cardinet G,Troadec C,Moing A, Dirlewanger E(2009)Phenotypic and?ne genetic characteriza-tion of the D locus controlling fruit acidity in peach.BMC Plant Biol19:59

Brandi F,Bar E,Mourgues F,Horva′th G,Turcsi E,Giuliano G, Liverani A,Tartarini S,Lewinsohn E,Rosati C(2011)Study of ‘Redhaven’peach and its white-?eshed mutant suggests a key role of CCD4carotenoid dioxygenase in carotenoid and norisoprenoid volatile metabolism.BMC Plant Biol11:24 Campbell MA,Haas BJ,Hamilton JP,Mount SM,Buell CR(2006) Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis.BMC Genomics7:327 Cardle L,Ramsay L,Milbourne D,Macaulay M,Marshall D,Waugh R(2000)Computational and experimental characterization of physically clustered simple sequence repeats in plants.Genetics 56:847–854

Cavagnaro PF,Senalik DA,Yang L,Simon PW,Harkins TT,Kodira CD,Huang S,Weng Y(2010)Genome-wide characterization of simple sequence repeats in cucumber(Cucumis sativus L.).BMC Genomics11:569

Chagne′D,Gasic K,Crowhurst RN,Han Y,Bassett HC,Bowatte DR, Lawrence TJ,Rikkerink EH,Gardiner SE,Korban SS(2008) Development of a set of SNP markers present in expressed genes of the apple.Genomics92:353–358

Dirlewanger E,Cosson P,Howad W,Capdeville G,Bosselut N, Claverie M,Voisin R,Poizat C,Lafargue B,Baron O,Laigret F, Kleinhentz M,Aru′s P,Esmenjaud D(2004)Microsatellite genetic linkage maps of myrobalan plum and an almond-peach hybrid-location of root-knot nematode resistance genes.Theor Appl Genet109:827–838

Heo JB,Sung S(2011)Vernalization-mediated epigenetic silencing by a long intronic noncoding RNA.Science331:76–79

Jime′nez S,Li ZG,Reighard GL,Bielenberg DG(2010a)Identi?ca-tion of genes associated with growth cessation and bud dormancy entrance using a dormancy-incapable tree mutant.

BMC Plant Biol10:25

Jime′nez S,Reighard GL,Bielenberg DG(2010b)Gene expression of DAM5and DAM6is suppressed by chilling temperatures and inversely correlated with bud break.Plant Mol Biol73:157–167 Johnson JM,Castle J,Garrett-Engele P,Kan Z,Loerch PM,Armour CD,Santos R,Schadt EE,Stoughton R,Shoemaker DD(2003) Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays.Science302:2141–2144 Joobeur T,Periam N,de Vicente MC,King GJ,Aru′s P(2000) Development of a second generation linkage map for almond using RAPD and SSR markers.Genome43:649–655

Jung S,Abbott A,Jesudurai C,Tomkins J,Main D(2005)Frequency, type,distribution and annotation of simple sequence repeats in Rosaceae ESTs.Funct Integr Genomics5:136–143

Kantety V,Rota L,Matthews E,Sorrells E(2002)Data mining for simple sequence repeats in expressed sequence tags from barley, maize,rice sorghum and wheat.Plant Mol Biol48:501–510 Khan MA,Han Y,Zhao YF,Korban SS(2012)A high-throughput apple SNP genotyping platform using the GoldenGate TM assay.

Gene494:196–201

Langmead B,Trapnell C,Pop M,Salzberg SL(2009)Ultrafast and memory-ef?cient alignment of short DNA sequences to the human genome.Genome Biol10:R25

Li H,Durbin R(2009)Fast and accurate short read alignment with Burrows–Wheeler transform.Bioinformatics25:1754–1760

Li ZG,Reighard GL,Abbott AG,Bielenberg DG(2009)Dormancy associated MADS genes from the EVG locus of peach[Prunus

Plant Mol Biol

persica(L.)Batsch]have distinct seasonal and photoperiodic expression patterns.J Exp Bot60:3521–3530

Lijavetzky D,Cabezas JA,Iba′n?ez A,Rodr?′guez V,Mart?′nez-Zapater JM(2007)High throughput SNP discovery and genotyping in grapevine(Vitis vinifera L.)by combining a re-sequencing approach and SNPlex technology.BMC Genomics8:424

Livio T,Tadiello A,Casadoro G(2007)Variations of the peach fruit transcriptome during ripening and in response to hormone treatments.Caryologia60:156–159

Lu T,Lu G,Fan D,Zhu C,Li W,Zhao Q,Feng Q,Zhao Y,Guo Y,Li W,Huang X,Han B(2010)Function annotation of the rice transcriptome at single-nucleotide resolution by RNA-seq.

Genome Res20:1238–1249

Mart?′nez-Go′mez P,Crisosto CH,Bonghi C,Rubio M(2011)New approaches to Prunus transcriptome analysis.Genetica 139:755–769

McHale L,Tan X,Koehl P,Michelmore RW(2006)Plant NBS-LRR proteins:adaptable guards.Genome Biol7:212

Mortazavi A,Williams BA,McCue K,Schaeffer L,Wold B(2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq.Nat Methods5:621–628

Newcomb RD,Crowhurst RN,Gleave AP,Rikkerink EH,Allan AC, Beuning LL,Bowen JH,Gera E,Jamieson KR,Janssen BJ, Laing WA,McArtney S,Nain B,Ross GS,Snowden KC, Souleyre EJ,Walton EF,Yauk YK(2006)Analyses of expressed sequence tags from apple.Plant Physiol141:147–166 Nishiyama R,Le DT,Watanabe Y,Matsui A,Tanaka M et al(2012) Transcriptome analyses of a salt-tolerant cytokinin-de?cient mutant reveal differential regulation of salt stress response by cytokinin de?ciency.PLoS One7:e32124

Shen S,Park JW,Huang J,Dittmar KA,Lu ZX,Zhou Q,Carstens RP, Xing Y(2012)MATS:a Bayesian framework for?exible detection of differential alternative splicing from RNA-Seq data.

Nucleic Acids Res40:e61

Shulaev V,Korban SS,Sosinski B,Abbott AG,Aldwinckle HS,Folta KM,Iezzoni A,Main D,Aru′s P,Dandekar AM,Lewers K, Gardiner SE,Potter D,Veilleux E(2008)Multiple models for Rosaceae genomics.Plant Physiol147:985–1003

Socquet-Juglard D,Kamber T,Pothier JF,Christen D,Gessler C, Duffy B,Patocchi A(2013)Comparative RNA-Seq analysis of early-Infected peach leaves by the invasive phytopathogen Xanthomonas arboricola pv.Pruni.PLoS One8:e541969 Soria-Guerra RE,Rosales-Mendoza S,Gasic K,Wisniewski ME, Band M,Korban SS(2011)Gene expression is highly regulated in early developing fruit of apple.Plant Mol Biol Rep 29:885–897

Swiezewski S,Liu F,Magusin A,Dean C(2009)Cold-induced silencing by long antisense transcripts of an Arabidopsis Polycomb target.Nature462:799–802

Temnykh S,DeClerck G,Lukashova A,Lipovich L,Cartinhour S, McCouch S(2001)Computational and experimental analysis of microsatellites in rice(Oryza sativa L.):frequency,length

variation,transposon associations,and genetic marker potential.

Genome Res11:1441–1452

The International Peach Genome Initiative(2013)The high-quality draft genome of peach(Prunus persica)identi?es unique patterns of genetic diversity,domestication and genome evolu-tion.Nat Genet.doi:10.1038/ng.2586

Tong Z,Gao Z,Wang F,Zhou J,Zang Z(2009)Selection of reliable reference genes for gene expression studies in peach using real-time PCR.BMC Mol Biol10:71

Trainotti L,Bonghi C,Ziliotto F,Zanin D,Rasori A,Casadoro G, Ramina A,Tonutti P(2006)The use of microarray l PEACH1.0 to investigate transcriptome changes during transition from preclimacteric to climacteric phase in peach fruit.Plant Sci 170:606–613

Trapnell C,Pachter L,Salzberg SL(2009)TopHat:discovering splice junctions with RNA-Seq.Bioinformatics25:1105–1111 Vecchietti A(2009)Comparative analysis of expressed sequence tags from tissues in ripening stages of peach.Tree Genet Genomes 5:377–391

Verde I,Bassil N,Scalabrin S,Gilmore B,Lawley CT,Gasic K, Micheletti D,Rosyara UR,Cattonaro F,Vendramin E,Main D, Aramini V,Blas AL,Mockler TC,Bryant DW,Wilhelm L, Troggio M,Sosinski B,Aranzana MJ,Aru′s P,Iezzoni A, Morgante M,Peace C(2012)Development and evaluation of a9K SNP array for peach by internationally coordinated SNP detection and validation in breeding germplasm.PLoS One7:e35668 Wang C,Xu G,Jiang X,Chen G,Wu J,Wu H,Zhang S(2009) S-RNase triggers mitochondrial alteration and DNA degradation in the incompatible pollen tube of Pyruspyrifolia in vitro.Plant J 57:220–229

Wang K,Li M,Hakonarson H(2010)ANNOVAR:functional annotation of genetic variants from high-throughput sequencing data.Nucleic Acids Res38:e164

Xin MM,Wang Y,Yao YY,Song N,Hu ZR,Qin DD,Xie CJ,Peng HR,Ni ZF,Sun QX(2011)Identi?cation and characterization of wheat long non-protein coding RNAs responsive to powdery mildew infection and heat stress by using microarray analysis and SBS sequencing.BMC Plant Biol11:61–73

Yamamoto T,Mochida K,Imai T,Shi IZ,Ogiwara I,Hayashi T (2002)Microsatellite markers in peach[Prunus persica(L.) Batsch]derived from an enriched genomic and cDNA libraries.

Mol Ecol Notes2:298–302

Zhang Q,Ma B,Li H,Chang Y,Han Y,Li J,Wei G,Zhao S,Khan MA,Zhou Y,Gu C,Zhang X,Han Z,Korban SS,Li S,Han Y (2012)Identi?cation,characterization,and utilization of gen-ome-wide simple sequence repeats to identify a QTL for acidity in apple.BMC Genomics13:537

Zhebentyayeva T,Swire-Clark G,Georgi L,Garay L,Jung S,Forrest S,Blenda A,Blackmon B,Mook J,Horn R,Howad W,Aru′s P, Main D,Tomkins J,Sosinski B,Baird W,Reighard G,Abbott A (2008)A framework physical map for peach,a model Rosaceae species.Tree Genet Genomics4:745–756

Plant Mol Biol

相关主题
相关文档
最新文档