Can New Oracle10g Search Features Help Bridge the .ppt
《Can New Oracle10g Search Features Help Bridge the .ppt》由会员分享,可在线阅读,更多相关《Can New Oracle10g Search Features Help Bridge the .ppt(33页珍藏版)》请在麦多课文档分享上搜索。
1、Can New Oracle10g Search Features Help Bridge the Biological Discovery Gap?,Jake Y. Chen, Ph.D. Head of Computational Proteomics & Principle Bioinformatics Scientist,Marcel Davidson Head of Data Management,Messages,New Informatics Challenges in Protein Interactomics R&D Scale, integration, discovery
2、 issues A data-driven discovery-oriented framework “Enabling” Features in 10g? Biological data integration? Biological data analysis integration?,Outline,Data-driven Discovery-oriented Computational Framework 10g Regular Expression Case Studies 10g BLAST Case Studies,Why Myriad Maps Protein-Protein
3、Interactions,Conventional Drug Discovery,Post-Genomic Drug Discovery,Nucleus,GPCR,enzyme,hormone receptor,Nucleus,target validation,lead discovery, optimization,novel, more specific targets non-specific targets novel, druggable targets,enhanced pre-validation target pool,Principle of the Yeast Two-H
4、ybrid (Y2H) System,Reporter Gene,DNA Binding Domain,Bait,Activation Domain,Prey,( No Reporter Gene Activity ),Scenario B: Human Proteins X and Z do not Interact,Readout: No growth of yeast colonies,Human Protein Z,Human Protein X,DNA,Reporter Gene,DNA Binding Domain,Human Protein X,Reporter mRNA,Rep
5、orter mRNA,Reporter mRNA,Bait,Human Protein Y,Activation Domain,Prey,Scenario A: Human Proteins X and Y do Interact,Readout: Yeast colonies grow,DNA,Data Collected from Y2H System,Perform BLAST Against Human REFSEQ DB,Protein Interaction Network (Snapshot of 8,000 interactions),Knowledge Discovery (
6、KD) Challenges,80,000 unique interactions100 biological data sources,Protein Interaction Data,$, drugs, ,Data-drivenDiscovery-oriented,KD in Interaction-based Proteomics,Bioinformatics DB Framework,Annotation DB RefSeq, LocusLink, GO, OMIM, CGAP, Protein Kinase DB, GPCR DB, Ensemble, Curation, ,Y2H
7、Data Processing and Analysis DB Lab_Seq, Seq_Match, Y2H_Mart,Y2H Interaction Data Mart Y2h_Mart,A Schema Fragment to Manage Sequence Similarity Results,Jake Yue Chen and John Carlis (2003) Genomic Data Modeling. Information Systems Journal, 28(4), p287-310.,Interaction Matrix using Randomly Ordered
8、Locus IDs,12,958 unique Interactions 1955 bait loci 2766 prey loci,Jake Yue Chen, et al (2003) Proceedings of the IEEE Computer Science Society Bioinformatics Conference 2003. Stanford University, Stanford, CA.,Outline,Data-driven Discovery-oriented Computational Framework 10g Regular Expression Cas
9、e Studies 10g BLAST Case Studies,Oracle10g Regular Expressions: Powerful String Processing,RE new tools in Oracle10g Search and manipulate data strings of arbitrary complexity Prior database solutions SQL LIKE operator Java stored procedures, C external libraries Prior non-database solutions: AWK, S
10、ED, GREP, PERL, etc. Done now inside database Facilitates rapid data-centric analysis,Case1: Retrieving Protein data from SGD (Saccharomyces Genome Database),ORF Identifier,Associated Amino Acid Sequence,HTTP Raw Data,Quick Search:Site Map | Help | Full Search | HomeCommunity InfoSubmit DataBLASTPri
11、mersPatMatchGene/Seq ResourcesVirtual LibraryContact SGDSequence for a region of YDR099W/BMH2Send questions or suggestions to SGDBLAST search | FASTA searchProtein translation of the coding sequence.Other Formats Available: GCGYDR099W Chr 4 MSQTREDSVYLAKLAEQAERYEEMVENMKAVASSGQELSVEERNLLSVAYKNVIGARRA
12、S WRIVSSIEQKEESKEKSEHQVELIRSYRSKIETELTKISDDILSVLDSHLIPSATTGESK VFYYKMKGDYHRYLAEFSSGDAREKATNSSLEAYKTASEIATTELPPTHPIRLGLALNFS VFYYEIQNSPDKACHLAKQAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDISES GQEDQQQQQQQQQQQQQQQQQAPAEQTQGEPTK*Return to SGD Send a Message to the SGD Curators ,Need to parse out embedded AA S
13、equence,Function to Return AA Sequence Given ORF,create or replace function orf2seq (p_orf in varchar2 ) return varchar2 isv_stream clob;strt number; begin- Retrieve the HTTP stream:v_stream := httpuritype.getclob(httpuritype.createuri(http:/db.yeastgenome.org/cgi-bin/SGD/getSeq?seq=|p_orf|,Web site
14、 URL,RegExp to remove control chars from HTTP stream,Parameterized ORF Id,RegExp to extract AA sequence,Amino Acid Sequence for ORF YDR099W,SQL select orf2seq(YDR099W) from dual;ORF2SEQ(YDR099W) - MSQTREDSVYLAKLAEQAERYEEMVENMKAVASSGQELSVEERNLLSVAYKNVIGARRASWRIVSSIEQKEESKEKSEHQ VELIRSYRSKIETELTKISDDI
15、LSVLDSHLIPSATTGESKVFYYKMKGDYHRYLAEFSSGDAREKATNSSLEAYKTASEI ATTELPPTHPIRLGLALNFSVFYYEIQNSPDKACHLAKQAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDISES GQEDQQQQQQQQQQQQQQQQQAPAEQTQGEPTKElapsed: 00:00:01.24,Elapsed time 2 sec. (network latency),SQL insert into pseq (orf_id, sequence) 2 values (YDR099W, orf2seq(Y
16、DR099W);,Case 2: Motif Searching in Proteins,PROSITE database of protein sequence motifsID TYR_PHOSPHO_SITE; PATTERN. AC PS00007; DT APR-1990 (CREATED); APR-1990 (DATA UPDATE); APR-1990 (INFO UPDATE). DE Tyrosine kinase phosphorylation site. PA RK-x(2,3)-DE-x(2,3)-Y. CC /TAXO-RANGE=?E?V; CC /SITE=5,
17、phosphorylation; CC /SKIP-FLAG=TRUE; DO PDOC00007; Source: http:/www.expasy.org/prosite/ps_frequent_patterns.txtTKP Pattern: RK-x(2,3)-DE-x(2,3)-Y. R=Arginine, K=Lysine, D=Aspartate, E=Glutamate, Y=Tyrosine, x=any AA Oracle10g Regular Expression Equivalent RK.2,3DE.2,3Y,TKP,TKP motif pattern,1 Argin
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- CANNEWORACLE10GSEARCHFEATURESHELPBRIDGETHEPPT

链接地址:http://www.mydoc123.com/p-379273.html