生物信息学实验指导实验二ensemble使用

生物信息学实验指导实验二ensemble使用
生物信息学实验指导实验二ensemble使用

实验二Ensemble 使用

1.1在Ensemble页面All genomes的下拉菜单中选择human,查看这个物种的具体信息,人的染色体和基因数量如图所示,基因数量主要看Alternative sequence 的图示。genetic variation有Short Variants (329,179,721)和Structural variants (5,955,877)。

1.2 在Ensemble 首页进行human for MAPK4搜索,在结果页面追加Restrict category to 为gene,筛选到117条序列,打开登录号为ENSG的目标序列,查看Gene-based displays。

1.2.1这个基因有6个可变剪接,他们之间序列长度不同,其中4个可以编码蛋白,所编码蛋白的氨基酸数量也不同。

1.2.2 在Comparative Genomics项Genomic alignments中,选择multiple,然后选择27种amniota vertebrates Pecan进行比对,在configure this page中勾选Show conservation regions,在Alignments (text)部分,可以看到蓝色高亮显示的保守区域了。

1.2.3 MAPK4基因位于Chromosome 18: 50,560,078-50,731,824。有10个外显子,9内含子。从sequence项可以看到core exons的数量,从基因结构图示也可以看到内含子和外显子的数量。

1.2.4 MAPK4 属于PTHR24055_SF25(2 genes)蛋白家族。家族其他成员还有MAPK4-001,MAPK4-002,MAPK4-003,MAPK4-005。

1.2.5从GO注释中,我们了解到MAPK4基因可编码蛋白的四个转录本,分别在分子功能、生物学进程和细胞组分方面的信息。

1.2.6这个基因的Structural variants的genetic variation有SNP、deletion、insertion、CNV(拷贝数变异)和short tandem repeat variation。

1.2.7在gene expression项,可以看到32 experiments中MAPK基因的表达情况,下载Table content,为TSV文件,用Excel表打开查看具体表达情况。可知MAPK4在人体的144个组织(附下载文件)中不同程度的表达,这些组织分别为:cerebellum spinal cord diencephalon m idbrain h indbrain brain fragmentadrenal gland forebrain basal ganglion temporal lobemedulla oblongata cerebral cortex choroid plexus telencephalon heart kidney testis ovary lung c audate nucleus bronchus cervix, uterine heart muscle nasopharynx parathyroid gland locus ceruleus nucleus accumbens umbilical cord diencephalon and midbrain

oral mucosa telencephalic ventricle globus pallidus muscle of arm prefrontal cortex putamen p ons brain eyehindbrain without cerebellum frontal lobe esophagus pituitary and diencephalon cerebellar hemisphere r ight renal cortex right renal pelvis hippocampus rectum left kidney left renal cortex renal pelvis hippocampal formation endometrium saliva-secreting gland t onsil thyroid gland Brodmann (1909) area 9 duodenum left renal pelvis forebrain fragment dorsal thalamus Brodmann (1909) area 24 skeletal muscle of trunk hindbrain fragment small intestine occipital lobe b rain meninx hypothalamus throat thymus forebrain and midbrain placenta adipose tissue prostate gland amygdala gall bladder parietal lobe smooth muscle tissue trachea muscle of leg colon

seminal vesicle liver fallopian tube urinary bladder skeletal muscle tissue

diaphragm large intestine sigmoid colon epididymis tibial artery stomach lymph node olfactory apparatus substantia nigra hippocampus proper

vermiform appendix cortex of kidney occipital cortex atrium auricular region zone of skin bone marrow m iddle frontal gyrus middle temporal gyrus C1 segment of cervical spinal cord pancreas breast spleen vagina coronary artery pituitary gland heart left ventricle mitral valve vas deferens esophagogastric junction tongue esophagus muscularis mucosa pineal body pulmonary valve aorta tibial nerve uterus ectocervix endocervix transverse colon left cardiac atrium tricuspid valve lower leg skin minor salivary gland suprapubic skin dura mater esophagus mucosa subcutaneous adipose tissue artery parotid gland penis small intestine Peyer's patch uterine cervix submandibular gland soft tissue

leukocyte EBV-transformed lymphocyte blood greater omentumtransformed skin fibroblast

1.2.8在regulation项中的图示中,我们可以看到regulatory build,分析基因的motif

feature,enhancer,promotor和transcription factor binding site。且图示下方有各调控区域的功能,序列,序列长度和精确位点。

1.2.9下载你研究的生物分子的rtf 格式的序列文件,用word 打开浏览。附下载文件及部分序列。

50559478 CGGAATCCCAGGCCGGCTGGGGACCGGTGCACTTGGGCTCCGCGCCCCCTCGACCCTCGG 50559537 50559538 CCCAGTGCCCCTTCCCGCGCGCGCGGGTCTCCCCGGTTCCAGAGCCCACCGGTCCCCGCC 50559597 50559598 GGCTCCTTCTCCCCACCCACCCTCCCACCGGGCCCCCGGCGGCTGCAGCCGCGCGGGGCT 50559657 50559658 GGCGGGGCGGCGACCGGGCTCAGGCAGATCCCCGCTTCCCGCCTTCTCGGCGCCCCCTCC 50559717 50559718 CTCCCGGACGGAGCCCGAGGATCCCCCACCCACGGCGGGCGTGAGGAAGGGCTTCTGAGT 50559777 50559778 GACTGGAGCTCTACCGCGTGTGCCCCGGGAAGGCCAGGCTACCCGGGACGGGGCTCGGCT 50559837 50559838 CCCCAGGTGAGCTCGTCTCCGCGGGACTGGGTCCGGGAAGGCCCCAGGACCGCGCGGCTG 50559897 50559898 AGCGGCCTGGAGGCTGCGGGAGGGCAGAGCAGGGCGCGCGGGAGACTGCCGCCCCCGGGC 50559957 50559958 GCCCAGGGCCCGGCTCCCCAGCGCCACCGCCGCAGCAGGTGGGGGCCCAGTGGGCGGGGG 50560017 50560018 CGGGGCCCGGCTCTGGGCGGAGCCGAGGCGGCGGCGGCGCAGGCTGGGGCCGGGGCCGGG 50560077 50560078 GCGGGAGCCGGAGCCCGAGCTGGAGCAGCGAGCCGGGCTGTCGGGGCGACCGCGGGAGCT 50560137 50560138 CGCCGTGCGCCGTGGCTGGGACCGGCCTGGCCGAGCGCGCCGGCGCCGCGGCCGCAGACA 50560197 50560198 AAGGGCGGCTCGCGCCCGGGCCGCCACGCTCTCGGGCTCTGCCTCG GTAAGTGGCTCCCC 50560257 50560258 TCCGCTGGCTTTCTCCTCCCGCCGCCTGCGCCTCTCGGAGTTCGGCGGGCTCCGGAGAAG 50560317 50560318 CGGGGAAGAGATGAGACTTCCCCGCCCGCACTGCCTCCCCACCTTACCCTAACAATAAGC 50560377 50560378 CCCCCAGGCCAAGCCACTGCCAAACTAGCGAGTTTCCGAGCGGCGGGGGTCTCCCGCGGG 50560437 50560438 ACCCGCCCGGCTGCCCTGGGTGAGCTCCTCGCCTGCAGACCGCGCGCCGGTGCTGTCCTG 50560497 50560498 GACCCGTTTGGGATGGGAGGTTGCCGCTGGGCTCCTCGCGTTGTGTTTAGGGGAGGAGGA 50560557 50560558 CGCAGGGGCCGGGCGCCGCTAGGGGACCCCACCCCCGGGGACAGTCCGGAGCGCTTGGGG 50560617 50560618 TCGCCGAGGGGCAGTTCACACTGC GAGTTCAGATTCGGATCGCAGTCCCGATTATCCTCC 50560677 50560678 CCTCCAGCCTCTCCCTTTCTCGTTGAAGGGTTAATACAGCGTCCTCTCCCCTCGCCACCC 50560737 50560738 GACAGAGGCGCCTACACTGGCG GTAGGTAGCCCCTGGGAGAGGGGGAGTGGGGGGACCCC 50560797 50560798 GCCGCTTTCGCCGCTGGGCGACCCAGAGCCCCAGCCTGCCGGAGAGGGCAGCGGCTCGGG 50560857 50560858 TTTGACATCCCAGCTGGGTCCCGGGCCGGCTCCCTGAGCCTCCTCCCGGGTTGCTCTCTA 50560917 50560918 TCAGGAAAGCAATCGGAAGTCAGGCCGGCTTTTGCTTTTGTTCTGCCAGCTACTCTACGG 50560977 50560978 AATCGTAGGTGAAGCCGGGGTGGGCGGATGCCCCGGGAGGGGGCTGTGGCGGGAGTTCCA 50561037 50561038 GGTGCGTCCCCGAAATGACCATTGGAGGCGGCGGCTGTTTCCCGCCCCTGGGTGGGGAAT 50561097 50561098 GGATTCCGATCGCTAATCGATACCCTGGAGCCAGCAGTGGGTCAGCAGCGTCCCGACAGA 50561157

…………………………………………..

2. 在Filters的GENE中输入MAPK4的登录号,Attributes中选择输出六个物种的旁系同源序列与人类的直系同源序列的蛋白和基因的ID号。将输出的基因和蛋白ID保存,用于同源序列下载。序列下载时,先选择对应的Dataset,同源基因Attributes选择序列(未剪切基因),Filters的GENE设为Gene stable ID,同源蛋白下载时,Attributes选择序列(peptide),将Filters的GENE设为Protein stable ID,将

序列导出为fasta格式。(附下载序列)

Biomart下载的同源序列登录号

ensemble人类MAPK4旁系同源蛋白.txt

ensemble人类MAPK4直系同源蛋白.txt

ensemble人类MAPK4直系同源基因.txt ensemble 人类MAPK4旁系同源基因.txt

相关主题
相关文档
最新文档