欢迎来到麦多课文档分享! | 帮助中心 海量文档,免费浏览,给你所需,享你所想!
麦多课文档分享
全部分类
  • 标准规范>
  • 教学课件>
  • 考试资料>
  • 办公文档>
  • 学术论文>
  • 行业资料>
  • 易语言源码>
  • ImageVerifierCode 换一换
    首页 麦多课文档分享 > 资源分类 > PPT文档下载
    分享到微信 分享到微博 分享到QQ空间

    A Field Guide part 2.ppt

    • 资源ID:377825       资源大小:2.82MB        全文页数:98页
    • 资源格式: PPT        下载积分:2000积分
    快捷下载 游客一键下载
    账号登录下载
    微信登录下载
    二维码
    微信扫一扫登录
    下载资源需要2000积分(如需开发票,请勿充值!)
    邮箱/手机:
    温馨提示:
    如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如需开发票,请勿充值!如填写123,账号就是123,密码也是123。
    支付方式: 支付宝扫码支付    微信扫码支付   
    验证码:   换一换

    加入VIP,交流精品资源
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    A Field Guide part 2.ppt

    1、A Field Guide part 2,August 30, 2005,University of Colorado Health Sciences Center,Part 2,Entrez: text searching a GenBank record preview/index,BLAST: sequence searching pre-computed searches algorithms whats new?,VAST: structure searching,Example: mapping oligos to a genome,GenBank Records,The Flat

    2、file Format,A Typical GenBank Record,LOCUS NM_019570 4279 bp mRNA linear INV 28-OCT-2004 DEFINITION Mus musculus REV1-like(S. cerevisiae)(Rev1l),mRNA ACCESSION NM_019570 VERSION NM_019570.3 GI:50811869 KEYWORDS .,GenBank Record: Feature Table,GenPept identifier,GenBank Record: Feature Table, cont.,G

    3、enBank Record: sequence,skip,Indexing for Nucleotide UID 59958365,Field Indexed Termsprimary accession NM_001012399 title Bos taurus hemochromatosis (hfe), mRNA. organism Bos taurus sequence length 1168 modification date 2005/02/19 properties biomol mrnagbdiv mamsrcdb refseq,Global Entrez Search: HF

    4、E,HFE,Entrez Nucleotide: HFE,137 records,Not HFE,Smarter Query,hfetitle,AND humanorgn,hfetitle AND humanorgn (cont),Primary data,Preview/Index,Preview/Index,Preview/Index: Properties, srcdb,Properties,Preview/Index: Properties, srcdb,AND srcdb refseqProperties,Preview/Index: Properties, srcdb,AND sr

    5、cdb ddbj/embl/genbankProperties,#1 hfe 137 #2 hfetitle AND humanorgn 42#3 #2 AND srcdb refseqprop 11 #4 #2 AND srcdb ddbj/embl/genbankprop 31,Database Queries,#5 #4 AND gbdiv priprop 29 #4 #4 AND gbdiv estprop 2,Molecule Queries,#1 hfe 116 #2 hfetitle AND humanorgn 42#3 #2 AND biomol mrnaprop 29 #4

    6、#2 AND biomol genomicprop 13,More Queries,Fields are database-specific,Other Entrez Databases,UniSTS: markers on the Genethon map of human chromosome 12 GenethonMap Name AND humanorganism AND 12chromosome,UniGene: rat clusters that have at least one mRNA ratorganism NOT 0mrna count,Structure: struct

    7、ures of bacterial kinases with resolutions below 2 bacteriaorganism AND kinase AND 000.00:002.00resolution,SNP: uniquely mapped microsatellites on human chr2 microsatSNP Class AND 1Map Weight AND 2Chromosome) AND humanorgn,Basic Local Alignment Search Tool,BLAST Web Searches, 2005,200,000,Nucleotide

    8、 or protein: Related SequencesBLAST link: BLink,Precomputed BLAST Services,Transcript clusters: UniGeneProtein homologs: HomoloGene,Link to Related Sequences,Related Sequences,Most similar,Least similar,BLink (BLAST Link),BLink Output,Global vs Local Alignment,Global vs Local Alignment,Seq1: WHEREIS

    9、WALTERNOW (16aa) Seq2: HEWASHEREBUTNOWISHERE (21aa),The Flavors of BLAST,Standard BLAST nucleotide, protein and translations (blastn, blastp, blastx, tblastn, tblastx) traditional “contiguous” word hit Megablast optimized for large batch searches can use discontiguous words PSI-BLAST constructs PSSM

    10、s automatically; uses as query very sensitive protein search RPS BLAST searches a database of PSSMs tool for conserved domain searches,“contiguous”,discontiguous,Fast - heuristic approach based on Smith WatermanLocal alignmentsStatistical significance- Expect valueVersatile- blastn, blastp, blastx,

    11、tblastn, tblastx, rps-blast, psi-blast- www, standalone, and network clients,Why Is BLAST So Popular?,How BLAST Works,Make lookup table of “words” for query Scan database for hits Ungapped extensions of hits (initial HSPs) Gapped extensions (no traceback) Gapped extensions (traceback; alignment deta

    12、ils),Nucleotide Words,GTACTGGACAT TACTGGACATGACTGGACATGGCTGGACATGGATGGACATGGACGGACATGGACCGACATGGACCCACATGGACCCT,Make a lookup table of words,. . .,Protein Words,GTQ TQIQITITVTVEVEDEDLDLF.,Make a lookup table of words, -f 11 = blastp default ,Minimum Requirements for a Hit,Nucleotide BLAST requires o

    13、ne exact matchProtein BLAST requires two neighboring matches within 40 aa,GTQITVEDLFYNISEI YYN,ATCGCCATGCTTAATTGGGCTTCATGCTTAATT,neighborhood words,one exact match,two matches, -A 40 = blastp default ,BLASTP Summary,High-scoring pair (HSP),Scoring Systems - Nucleotides,A G C T A +1 3 3 -3 G 3 +1 3 -

    14、3 C 3 3 +1 -3 T 3 3 3 +1,Identity matrix,CAGGTAGCAAGCTTGCATGTCA | | | raw score = 19-9 = 10 CACGTAGCAAGCTTG-GTGTCA, -r 1 -q -3 ,Scoring Systems - Proteins,Position Independent Matrices PAM Matrices (Percent Accepted Mutation)Derived from observation; small dataset of alignmentsImplicit model of evol

    15、utionAll calculated from PAM1PAM250 widely used BLOSUM Matrices (BLOck SUbstitution Matrices)Derived from observation; large dataset of highly conserved blocksEach matrix derived separately from blocks with a defined percent identity cutoffBLOSUM62 - default matrix for BLAST Position Specific Score

    16、Matrices (PSSMs)PSI- and RPS-BLAST,A 4 R -1 5 N -2 0 6 D -2 -2 1 6 C 0 -3 -3 -3 9 Q -1 1 0 0 -3 5 E -1 0 0 2 -4 2 5 G 0 -2 0 -1 -3 -2 -2 6 H -2 0 1 -1 -3 0 0 -2 8 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 F -

    17、2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 V 0 -3 -3 -3 -1

    18、-2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1A R N D C Q E G H I L K M F P S T W Y V X,BLOSUM62,Position-Specific Score Matrix,DAF-1,Serine/Threonine protein kinases catalytic loop,A R N D C Q E G H I L K M F P S T W Y V435 K -1 0 0 -1 -2 3 0 3

    19、 0 -2 -2 1 -1 -1 -1 -1 -1 -1 -1 -2 436 E 0 1 0 2 -1 0 2 -1 0 -1 -1 0 0 0 -1 0 0 -1 -1 -1 437 S 0 0 -1 0 1 1 0 1 1 0 -1 0 0 0 2 0 -1 -1 0 -1438 N -1 0 -1 -1 1 0 -1 3 3 -1 -1 1 -1 0 0 -1 -1 1 1 -1 439 K -2 1 1 -1 -2 0 -1 -2 -2 -1 -2 5 1 -2 -2 -1 -1 -2 -2 -1 440 P -2 -2 -2 -2 -3 -2 -2 -2 -2 -1 -2 -1 0

    20、-3 7 -1 -2 -3 -1 -1441 A 3 -2 1 -2 0 -1 0 1 -2 -2 -2 0 -1 -2 3 1 0 -3 -3 0442 M -3 -4 -4 -4 -3 -4 -4 -5 -4 7 0 -4 1 0 -4 -4 -2 -4 -1 2 443 A 4 -4 -4 -4 0 -4 -4 -3 -4 4 -1 -4 -2 -3 -4 -1 -2 -4 -3 4444 H -4 -2 -1 -3 -5 -2 -2 -4 10 -6 -5 -3 -4 -3 -2 -3 -4 -5 0 -5 445 R -4 8 -3 -4 0 -1 -2 -3 -2 -5 -4 0

    21、-3 -2 -4 -3 -3 0 -4 -5 446 D -4 -4 -1 8 -6 -2 0 -3 -3 -5 -6 -3 -5 -6 -4 -2 -3 -7 -5 -5447 I -4 -5 -6 -6 -3 -4 -5 -6 -5 3 5 -5 1 1 -5 -5 -3 -4 -3 1448 K 0 0 1 -3 -5 -1 -1 -3 -3 -5 -5 7 -4 -5 -3 -1 -2 -5 -4 -4 449 S 0 -3 -2 -3 0 -2 -2 -3 -3 -4 -4 -2 -4 -5 2 6 2 -5 -4 -4450 K 0 3 0 1 -5 0 0 -4 -1 -4 -3

    22、 4 -3 -2 2 1 -1 -5 -4 -4451 N -4 -3 8 -1 -5 -2 -2 -3 -1 -6 -6 -2 -4 -5 -4 -1 -2 -6 -4 -5452 I -3 -5 -5 -6 0 -5 -5 -6 -5 6 2 -5 2 -2 -5 -4 -3 -5 -3 3 453 M -4 -4 -6 -6 -3 -4 -5 -6 -5 0 6 -5 1 0 -5 -4 -3 -4 -3 0 454 V -3 -3 -5 -6 -3 -4 -5 -6 -5 3 3 -4 2 -2 -5 -4 -3 -5 -3 5 455 K -2 1 1 4 -5 0 -1 -2 1

    23、-4 -2 4 -3 -2 -3 0 -1 -5 -2 -3456 N 1 1 3 0 -4 -1 1 0 -3 -4 -4 3 -2 -5 -2 2 -2 -5 -4 -4 457 D -3 -2 5 5 -1 -1 1 -1 0 -5 -4 0 -2 -5 -1 0 -2 -6 -4 -5458 L -3 -1 0 -3 0 -3 -2 3 -4 -2 3 0 1 1 -2 -2 -3 5 -1 -3,Position-Specific Score Matrix,catalytic loop,Local Alignment Statistics,High scores of local a

    24、lignments between two random sequences follow the Extreme Value Distribution,Score (S),Alignments,Expect Value E = number of database hits you expect to find by chance, S,your score,expected number of random hits,More info: www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html,Gapped Alignments,Gappin

    25、g provides more biologically realistic alignmentsGapped BLAST parameters are simulated for each scoring matrixAffine gap costs = -(a+bk) a = gap open penalty b = gap extend penalty A gap of length 1 receives the score -(a+b),An Alignment BLAST Cannot Make,1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACC

    26、ACGCTATTCTTGCTGTTG| | | | | | | | | | | | | | | | | | |1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG61 GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT| | | | | | | | | | | | | |61 GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT121 GGGGTGAACAAGGTTATTTCAGGCTT

    27、GCTCGTGGTAAAAAC| | | | | | | | | |121 GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC,Reason: no contiguous exact match of 7 bp.,BLAST 2 Sequences (blastx) output:,An Alignment BLAST Can Make,Solution: compare protein sequences; BLASTX,Score = 290 bits (741), Expect = 7e-77 Identities = 147/331 (44%), Pos

    28、itives = 206/331 (61%), Gaps = 8/331 (2%) Frame = +3,Other BLAST Algorithms,Megablast Discontiguous Megablast PSI-BLAST PHI-BLAST,Megablast: NCBIs Genome Annotator,Long alignments of similar DNA sequences Greedy algorithm Concatenation of query sequences Faster than blastn; less sensitive,MegaBLAST

    29、& Word Size,Trade-off: sensitivity vs speed,Too fast for you?,MegaBLAST & Word Size,Trade-off: sensitivity vs speed,Discontiguous Megablast,Uses discontiguous word matches Better for cross-species comparisons,Templates for Discontiguous Words,W = 11, t = 16, coding: 1101101101101101 W = 11, t = 16,

    30、non-coding: 1110010110110111 W = 12, t = 16, coding: 1111101101101101 W = 12, t = 16, non-coding: 1110110110110111 W = 11, t = 18, coding: 101101100101101101 W = 11, t = 18, non-coding: 111010010110010111 W = 12, t = 18, coding: 101101101101101101 W = 12, t = 18, non-coding: 111010110010110111 W = 1

    31、1, t = 21, coding: 100101100101100101101 W = 11, t = 21, non-coding: 111010010100010010111 W = 12, t = 21, coding: 100101101101100101101 W = 12, t = 21, non-coding: 111010010110010010111,Reference: Ma, B, Tromp, J, Li, M. PatternHunter: faster and more sensitive homology search. Bioinformatics March

    32、, 2002; 18(3):440-5,W = word size; # matches in template t = template length,Discontiguous (Cross-species) MegaBLAST,Discontiguous Word Options,MegaBLAST vs Discontiguous MegaBLAST,NM_017460,Homo sapiens cytochrome P450, family 3, subfamily A, polypeptide 4 (CYP3A4), transcript variant 1, mRNA (2768

    33、 letters),vs Drosophila,MegaBLAST vs Discontiguous MegaBLAST,MegaBLAST = “No significant similarity found.”,Discontiguous megaBLAST =,Another Example . . .,Discontiguous megaBLAST = numerous hits . . .,Query: NM_078651 Drosophila melanogaster CG18582-PA (mbt) mRNA, (3244 bp) /note= mushroom bodies t

    34、iny; synonyms: Pak2, STE20, dPAK2,MegaBLAST = “No significant similarity found.”,Database: nr (nt), Mammaliaorgn,Ex: Discontiguous MegaBLAST,Ex: BLASTN,PSI-BLAST,Example: Confirming relationships of purine nucleotide metabolism proteins,Position-specific Iterated BLAST,gi|113340|sp|P03958|ADA_MOUSE

    35、ADENOSINE DEAMINASE (ADENOSINE MAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGF VIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVD EQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAY RTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGA VRFKNDKANYSLNTDDPLIFK

    36、STLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKK,PSI-BLAST,0.005,E value cutoff for PSSM,RESULTS: Initial BLASTP,Same results as protein-protein BLAST; different format,Results of First PSSM Search,Other purine nucleotide metabolizing enzymes not found by ordinary BLAST,Tenth PSSM Search: Convergence,Just b

    37、elow threshold, another nucleotide metabolism enzyme,Reverse PSI-BLAST (RPS)-BLAST,Adenosine/AMP Deaminase Domain,. . .,PHI-BLAST,gi|231729|sp|P30429|CED4_CAEEL CELL DEATH PROTEIN 4 MLCEIECRALSTAHTRLIHDFEPRDALTYLEGKNIFTEDHSELISKMSTRLERIANFLRIYRRQASE LIDFFNYNNQSHLADFLEDYIDFAINEPDLLRPVVIAPQFSRQMLDRKLL

    38、LGNVPKQMTCYIREYHV IKKLDEMCDLDSFFLFLHGRAGSGKSVIASQALSKSDQLIGINYDSIVWLKDSGTAPKSTFDLFTDI LKSEDDLLNFPSVEHVTSVVLKRMICNALIDRPNTLFVFDDVVQEETIRWAQELRLRCLVTTRDVEI ASQTCEFIEVTSLEIDECYDFLEAYGMPMPVGEKEEDVLNKTIELSSGNPATLMMFFKSCEPKTFEK,GAxxxxGKST,Genome BLAST,Genome BLAST via Map Viewer,Example Search Pathways: H

    39、emochromatosis,Gene,“hemochromatosis” HFE,nucleotide sequence,Example: Human Genome BLAST,Human Genome BLAST: Results,Human Genome BLAST: MapViewer,Whats New?,BLAST Databases,Nucleotide refseq_rna = NM_*, XM_* refseq_genomic = NC_*, NG_* env_nt environmental samplefilter, e.g., 16S rRNAProtein refse

    40、q = NP_*, XP_* env_nr,New Formatter,Select lower case,Select red,New Formatter,gray line = same database hithsps color-coded independently,BLAST Output: Alignments & Filter,low complexity sequence filtered,Advanced Options,Limit to Organism,allfilter NOT ma,Example Entrez QueriesallFilter NOT mammal

    41、iaOrganismray finned fishesOrganismsrcdb refseqPropertiesNucleotide only:biomol mrnaPropertiesbiomol genomicPropertiesOtherAdvancede 10000 expect value-v 2000 descriptions-b 2000 alignments,-e 10000 -v 2000,Searching by Structure,Why search for similar structures?Find homologs with low sequence simi

    42、larityExplore protein evolution: similar protein folds can support different functionsIdentify conserved core elements to model related proteins of unknown structure,Indexing into MMDB,Structure,MMDB Molecular Modeling Data Base,Structure Summary,Conserved Domains,3D Domain Neighbors,Structure Neigh

    43、bors,3D Domains,1,3,2,4,Conserved Domains,SH3,SH2,VAST: Alignment,For each protein chain,locate SSEs (secondary structure elements),represent SSEs as individual vectors,1,2,3,4,5,6,Human IL-4,IL-4 & Leptin,align the vectors.,VAST,Structure neighbors,Taq DNA polymerase,VAST Results for the Chain,Tabl

    44、e view,VAST,Vector Alignment Search Tool,3D Domain structure neighbors,VAST Results for Domain 1,Not found with Chain query!,Best way to convert PDB files to MMDB format for viewing with Cn3D!,submit file to PDB,Example: Mapping Oligos Onto a Genome,forward CCATGGCGACCCTGGAAAAGCreverse CAGCAGCGGCTGT

    45、GCCTGCGG,?,?,?,Map Oligos Onto Genome,CCATGGCGACCCTGGAAAAGCNNNNNNNNNNCAGCAGCGGCTGTGCCTGCGG,-W 7 e 1000,Genome BLAST Results,Primer Alignments,forward primer,reverse primer,MapViewer,MapViewer,Sequence View (sv),forward,reverse,Service Addresses,BLAST blast-helpncbi.nlm.nih.gov General Help infoncbi.nlm.nih.gov Wayne Matten mattenncbi.nlm.nih.gov,


    注意事项

    本文(A Field Guide part 2.ppt)为本站会员(刘芸)主动上传,麦多课文档分享仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文档分享(点击联系客服),我们立即给予删除!




    关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

    copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
    备案/许可证编号:苏ICP备17064731号-1 

    收起
    展开