全基因组关联分析

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

More supports
Candidate gene selection
Biochemical evidence (clear pathway)
QTL evidence
Other molecular evidences (eg
expression, microassay, etc.)
DXS
0.5
Ear diameter (Low population structure)
a.
0.4 Simple Q 0.3 K Q+K 0.3
b.
0.4
Simple Q 0.4
c.
Q GC Q+K 0.3 Simple
Cumulative P
K
0.2 0.2 GC 0.1 GC 0.1 0.1 0.2 Q+K
A
G A
G A
G G
G A
A A
A
GA
A G
A A
G A
Type I error rates (false positive)
Flowering time (High population structure)
0.5 0.5
Ear height (Moderate population structure)
Genetic effect (Phenotypic variation explained in %)
Q + K model had highest power to detect SNPs with true effects.
Phenotyping
Nothing to say!!
It is the most important thing! Multiple locations/replications
K
Simple Q K Q+K GC
0.4 0.5
0 0 0.1 0.2 0.3 Observed P 0.4 0.5
0 0 0.1 0.2 0.3 Observed P 0.4 0.5
0 0 0.1 0.2 0.3 Observed P
A straight diagonal line indicates an appropriate control of false positives.
0 0 (0) 0.2 (0.8) 0.4 (3.3) 0.6 (7.1) 0.8 (11.9) 1 (17.4)
0
Genetic effect (Phenotypic variation explained in %)
Genetic effect (Phenotypic variation explained in %)
Chr.
1
LD 1.5-2k 2-5k 5-10k 5-10k 1-1.5k <1k 5-10k 5-10k 1.5-2k
10M 100M 200M 2-5k 200M+
Chr1
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0 0.1k 0.2k 0.3k 0.4k 0.6k 1k 1.5k 2K 5K 10K 50K 100k 1M 5M Chr10
单倍体型分析
LCYE associations across seasons
Mixed Model Environment Avg, Observation No. 157 2003 154 Ratio Across Environments** 2002 44 2003 156 2004 154 2005 156 2003
Sequencing partial gene in whole panel
Look for the associations based on LD
Estimate the LD of the target gene
Sequencing alignment using Biolign/Bioedit/Cluster
γ- Carotene
6.06
7.06
IspFh 8.06 ZEP1
9.06
10.06
LCY-b β- Carotene HYD-b
β-Cryptoxanthin
6.07
PSY2 8.07
VINCED 9.07 WC
10.07
6.08
8.08
9.08
8.09
Zeaxanthin ZEP1 VDE1 Antheraxanthin ZEP1 VDE1 Violaxanthin ABA
708
753 1003
PZB01400.2
PZB00728.1 LYCE.4
0.063
0.326 0.313
zmAO (aldehyde oxidase)
acp (acyl carrier protein) lcye(Lycopene epsilon-cyclase)
1257
1305 1379
PZB01482.3
DXR
Isps DMADP GGPS
6.00 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
7.00
8.00
9.00
10.00
GGPP + GGPP
PSY1 6.01 DXSd 7.01 8.01 9.01 10.01
PSY (Y1) CRTISO Phytoene PDS δ- Carotene ZDS LCY-e Lycopene LCY-b (Ps1)
群体结构
False positive Power
Section 2
Various association samples
e
Population structure
d
c
a
b
Familial relatedness
Yu et al., Nat Genet 38: 203-208 (2006)
G
Section 3
Association analysis --TASSEL
几个值得讨论的问题
等位基因频率
Haplotype 分析
LD的影响
等位基因频率
功能位点的频率往往是严重偏离1:1的---符合
生物学逻辑 VA基因的例子 抗旱基因的例子
GGPP
PSY PDS Z-ISO ZDS/CRTISO
See another presentation
Estimate the LD of the target gene
Software--- Tassel As demo by Xiaohong Show results with two way
连锁不平衡
a
A
B
b
读杨小红等 作物学报, 2007 综述
0.6
0.6
Simple
0.4 GC 0.4
GC
GC
0.2
0.2
0.2
Simple Q K Q+K GC
0 (0) 0.2 (0.8) 0.4 (3.3) 0.6 (7.1) 0.8 (11.9) 1 (17.4)
0 0 (0) 0.2 (0.8) 0.4 (3.3) 0.6 (7.1) 0.8 (11.9) 1 (17.4)
0.056
0.085 0.481 0.061 0.076
set105 (SET domain-containing protein)
set104 (SET domain-containing protein) mitochondrial phosphate transporter zmet3 (DNA cytosine methyltransferase) putative SF16 protein
Pop.
P1 P2 P3
LCYE
SNP216 3'TE 5'TE
HYDB1 D4 6 1 3 3'TE 20 5 22 12 10 1
60 87 31
23 40 8
lycopene
LCYE LCYB
δ-carotene
LCYB
γ-carotene
LCYB
α-carotene
HYDb
β-carotene
PZA03371.2 PZB01389.1
0.110
0.052 0.430
gn1 (homeobox transcription factor)
? abi1 (ABA insensitive 1)
1383
1429 1455 1486 1497
PZA03637.3
PZA03635.1 PZB01186.1 PZA03573.4 PZA03395.2
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
R2
0
500
1000
1500
2000
2500 bp
0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 1 250 500 750 1000 1500 2000
Population development
Total Chr1 Chr2
Chr2 Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 Average
Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 Chr10
2-5K
Diversity inbreds are the best choice for developing an association mapping panel
关联分析一些问题讨论
1)候选基因策略
2)全基因组策略
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
A A
G G
A A
A A
G G
G G
A A
G G
A A
Candidate gene selection
Population development
gene sequencing Phenotyping Association analysis
Q + K model has best Type I error control, most important when trait is related to population structure (e.g., flowering time).
Statistical power
Flowering time (High population structure)
site 21 24 144 221 307 563 SNP PZB01403.4 PZD00056.3 PZB02194.1 PZD00027.3 PZB00137.1 PZA03301.5 MAF 0.054 0.212 0.373 0.090 0.420 0.056 Candidate or nearest gene(s) zmAO(aldehyde oxidase) mads2(MADS box protein 2) ivr1(invertase gene) zmm16(putative MADS-domain transcription factor) pif3(Phytochrome Interacting Factor 3) Harpin-induced 1 domain containing protein
1
Ear height (Moderate population structure)
1
Ear diameter (Low population structure)
1
d.
Q+K Q
K 0.8
e.
Q+K K Q Simple
f.
Q+K
Adjusted average power
Q
0.8
0.8
Simple 0.6 K 0.4
HYDb1
zeinoxanthin
HYDE源自文库
β-cryptoxanthin
HYDb
lutein
zeaxanthin ABA
低频位点,需要足够大的群体,同时少数几个 异常位点就会严重影响结果.
关联分析之前: 1)处理表型数据,确定是否适合关联
2)MAF>0.05 或0.01
Summary of candidate genes in the 17 loci associated with metabolite traits
6.02
HYD1 HYD2 IspFg ZDS
7.02
8.02
9.02
DXSe
10.02
6.03 IPP1 IPP2 6.04
7.03
8.03
9.03
10.03
7.04
8.04
9.04
10.04
DXSc 6.05
7.05
LYCe 8.05
9.05
10.05
δ- Carotene LCY-b α- Carotene HYD-e Lutein
Next potential pathway
VE pathway
Oil pathway
Disease
ABA ……
Gene sequencing and alignment
Sequencing the whole gene in a core set
look for the potential functional SNPs/InDels to develop markers to score the whole panel
相关文档
最新文档