A Field Guide part 2.ppt
《A Field Guide part 2.ppt》由会员分享,可在线阅读,更多相关《A Field Guide part 2.ppt(98页珍藏版)》请在麦多课文档分享上搜索。
1、A Field Guide part 2,August 30, 2005,University of Colorado Health Sciences Center,Part 2,Entrez: text searching a GenBank record preview/index,BLAST: sequence searching pre-computed searches algorithms whats new?,VAST: structure searching,Example: mapping oligos to a genome,GenBank Records,The Flat
2、file Format,A Typical GenBank Record,LOCUS NM_019570 4279 bp mRNA linear INV 28-OCT-2004 DEFINITION Mus musculus REV1-like(S. cerevisiae)(Rev1l),mRNA ACCESSION NM_019570 VERSION NM_019570.3 GI:50811869 KEYWORDS .,GenBank Record: Feature Table,GenPept identifier,GenBank Record: Feature Table, cont.,G
3、enBank Record: sequence,skip,Indexing for Nucleotide UID 59958365,Field Indexed Termsprimary accession NM_001012399 title Bos taurus hemochromatosis (hfe), mRNA. organism Bos taurus sequence length 1168 modification date 2005/02/19 properties biomol mrnagbdiv mamsrcdb refseq,Global Entrez Search: HF
4、E,HFE,Entrez Nucleotide: HFE,137 records,Not HFE,Smarter Query,hfetitle,AND humanorgn,hfetitle AND humanorgn (cont),Primary data,Preview/Index,Preview/Index,Preview/Index: Properties, srcdb,Properties,Preview/Index: Properties, srcdb,AND srcdb refseqProperties,Preview/Index: Properties, srcdb,AND sr
5、cdb ddbj/embl/genbankProperties,#1 hfe 137 #2 hfetitle AND humanorgn 42#3 #2 AND srcdb refseqprop 11 #4 #2 AND srcdb ddbj/embl/genbankprop 31,Database Queries,#5 #4 AND gbdiv priprop 29 #4 #4 AND gbdiv estprop 2,Molecule Queries,#1 hfe 116 #2 hfetitle AND humanorgn 42#3 #2 AND biomol mrnaprop 29 #4
6、#2 AND biomol genomicprop 13,More Queries,Fields are database-specific,Other Entrez Databases,UniSTS: markers on the Genethon map of human chromosome 12 GenethonMap Name AND humanorganism AND 12chromosome,UniGene: rat clusters that have at least one mRNA ratorganism NOT 0mrna count,Structure: struct
7、ures of bacterial kinases with resolutions below 2 bacteriaorganism AND kinase AND 000.00:002.00resolution,SNP: uniquely mapped microsatellites on human chr2 microsatSNP Class AND 1Map Weight AND 2Chromosome) AND humanorgn,Basic Local Alignment Search Tool,BLAST Web Searches, 2005,200,000,Nucleotide
8、 or protein: Related SequencesBLAST link: BLink,Precomputed BLAST Services,Transcript clusters: UniGeneProtein homologs: HomoloGene,Link to Related Sequences,Related Sequences,Most similar,Least similar,BLink (BLAST Link),BLink Output,Global vs Local Alignment,Global vs Local Alignment,Seq1: WHEREIS
9、WALTERNOW (16aa) Seq2: HEWASHEREBUTNOWISHERE (21aa),The Flavors of BLAST,Standard BLAST nucleotide, protein and translations (blastn, blastp, blastx, tblastn, tblastx) traditional “contiguous” word hit Megablast optimized for large batch searches can use discontiguous words PSI-BLAST constructs PSSM
10、s automatically; uses as query very sensitive protein search RPS BLAST searches a database of PSSMs tool for conserved domain searches,“contiguous”,discontiguous,Fast - heuristic approach based on Smith WatermanLocal alignmentsStatistical significance- Expect valueVersatile- blastn, blastp, blastx,
11、tblastn, tblastx, rps-blast, psi-blast- www, standalone, and network clients,Why Is BLAST So Popular?,How BLAST Works,Make lookup table of “words” for query Scan database for hits Ungapped extensions of hits (initial HSPs) Gapped extensions (no traceback) Gapped extensions (traceback; alignment deta
12、ils),Nucleotide Words,GTACTGGACAT TACTGGACATGACTGGACATGGCTGGACATGGATGGACATGGACGGACATGGACCGACATGGACCCACATGGACCCT,Make a lookup table of words,. . .,Protein Words,GTQ TQIQITITVTVEVEDEDLDLF.,Make a lookup table of words, -f 11 = blastp default ,Minimum Requirements for a Hit,Nucleotide BLAST requires o
13、ne exact matchProtein BLAST requires two neighboring matches within 40 aa,GTQITVEDLFYNISEI YYN,ATCGCCATGCTTAATTGGGCTTCATGCTTAATT,neighborhood words,one exact match,two matches, -A 40 = blastp default ,BLASTP Summary,High-scoring pair (HSP),Scoring Systems - Nucleotides,A G C T A +1 3 3 -3 G 3 +1 3 -
14、3 C 3 3 +1 -3 T 3 3 3 +1,Identity matrix,CAGGTAGCAAGCTTGCATGTCA | | | raw score = 19-9 = 10 CACGTAGCAAGCTTG-GTGTCA, -r 1 -q -3 ,Scoring Systems - Proteins,Position Independent Matrices PAM Matrices (Percent Accepted Mutation)Derived from observation; small dataset of alignmentsImplicit model of evol
15、utionAll calculated from PAM1PAM250 widely used BLOSUM Matrices (BLOck SUbstitution Matrices)Derived from observation; large dataset of highly conserved blocksEach matrix derived separately from blocks with a defined percent identity cutoffBLOSUM62 - default matrix for BLAST Position Specific Score
16、Matrices (PSSMs)PSI- and RPS-BLAST,A 4 R -1 5 N -2 0 6 D -2 -2 1 6 C 0 -3 -3 -3 9 Q -1 1 0 0 -3 5 E -1 0 0 2 -4 2 5 G 0 -2 0 -1 -3 -2 -2 6 H -2 0 1 -1 -3 0 0 -2 8 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 F -
17、2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 V 0 -3 -3 -3 -1
18、-2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1A R N D C Q E G H I L K M F P S T W Y V X,BLOSUM62,Position-Specific Score Matrix,DAF-1,Serine/Threonine protein kinases catalytic loop,A R N D C Q E G H I L K M F P S T W Y V435 K -1 0 0 -1 -2 3 0 3
19、 0 -2 -2 1 -1 -1 -1 -1 -1 -1 -1 -2 436 E 0 1 0 2 -1 0 2 -1 0 -1 -1 0 0 0 -1 0 0 -1 -1 -1 437 S 0 0 -1 0 1 1 0 1 1 0 -1 0 0 0 2 0 -1 -1 0 -1438 N -1 0 -1 -1 1 0 -1 3 3 -1 -1 1 -1 0 0 -1 -1 1 1 -1 439 K -2 1 1 -1 -2 0 -1 -2 -2 -1 -2 5 1 -2 -2 -1 -1 -2 -2 -1 440 P -2 -2 -2 -2 -3 -2 -2 -2 -2 -1 -2 -1 0
20、-3 7 -1 -2 -3 -1 -1441 A 3 -2 1 -2 0 -1 0 1 -2 -2 -2 0 -1 -2 3 1 0 -3 -3 0442 M -3 -4 -4 -4 -3 -4 -4 -5 -4 7 0 -4 1 0 -4 -4 -2 -4 -1 2 443 A 4 -4 -4 -4 0 -4 -4 -3 -4 4 -1 -4 -2 -3 -4 -1 -2 -4 -3 4444 H -4 -2 -1 -3 -5 -2 -2 -4 10 -6 -5 -3 -4 -3 -2 -3 -4 -5 0 -5 445 R -4 8 -3 -4 0 -1 -2 -3 -2 -5 -4 0
21、-3 -2 -4 -3 -3 0 -4 -5 446 D -4 -4 -1 8 -6 -2 0 -3 -3 -5 -6 -3 -5 -6 -4 -2 -3 -7 -5 -5447 I -4 -5 -6 -6 -3 -4 -5 -6 -5 3 5 -5 1 1 -5 -5 -3 -4 -3 1448 K 0 0 1 -3 -5 -1 -1 -3 -3 -5 -5 7 -4 -5 -3 -1 -2 -5 -4 -4 449 S 0 -3 -2 -3 0 -2 -2 -3 -3 -4 -4 -2 -4 -5 2 6 2 -5 -4 -4450 K 0 3 0 1 -5 0 0 -4 -1 -4 -3
22、 4 -3 -2 2 1 -1 -5 -4 -4451 N -4 -3 8 -1 -5 -2 -2 -3 -1 -6 -6 -2 -4 -5 -4 -1 -2 -6 -4 -5452 I -3 -5 -5 -6 0 -5 -5 -6 -5 6 2 -5 2 -2 -5 -4 -3 -5 -3 3 453 M -4 -4 -6 -6 -3 -4 -5 -6 -5 0 6 -5 1 0 -5 -4 -3 -4 -3 0 454 V -3 -3 -5 -6 -3 -4 -5 -6 -5 3 3 -4 2 -2 -5 -4 -3 -5 -3 5 455 K -2 1 1 4 -5 0 -1 -2 1
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- AFIELDGUIDEPART2PPT
