A Probabilistic Approach toHigh Throughput Drug Discovery.ppt
《A Probabilistic Approach toHigh Throughput Drug Discovery.ppt》由会员分享,可在线阅读,更多相关《A Probabilistic Approach toHigh Throughput Drug Discovery.ppt(47页珍藏版)》请在麦多课文档分享上搜索。
1、A Probabilistic Approach to High Throughput Drug Discovery,Introduction and Motivation Probability Modeling in Drug Discovery Representation of Chemical Structures (Descriptors) Focused Combinatorial Library Design Summary and Outlook,2,High Throughput Screening,Large-scale automation of biological
2、assays (HTS) Use robotics to perform 10,000 to 100,000 screens per day Brute-force approach to drug discovery: “rapidly screen all compounds” Noteworthy drawbacks to HTS: Economics: $1-$5 per assay (provided large collections are assayed) Logistics: compound formatting, inventory systems and other o
3、verhead Precision Loss: effective “binary” measurement: active/inactive (pass/fail) High Error Rate: assay, synthesis failure, sample degradation, registration Resulting effects: Quality for quantity tradeoff - lots of low quality data High level of noise (error) in data makes interpretation very di
4、fficult HTS has gained acceptance and is routinely used to generate lead compounds for drug discovery projects,3,Sources of Compounds for HTS,Initial screening libraries (first libraries used in project) Historical “in-house” collection of compounds augmented with compounds purchased from external s
5、uppliers 1 million+ compounds available means initial screening library must be designed (diversity retained using fewer numbers of compounds) Receptor biased initial screening libraries are a possibility Follow-up libraries Parallel synthesis / combinatorial chemistry is an excellent source of larg
6、e numbers of (new) compounds Synthesis of “all” analogs around a lead structure exhibits poor diversity but very good for “local” exploration and lead follow-up External screening compound purchasing and in-house combinatorial chemistry efforts have gained acceptance and are routinely used in lead g
7、eneration and follow-up,4,High Throughput Discovery Cycle,Brute-force HTS not practical At least 10 trillion stable drug candidates At 1 billion screens per day 27 years are needed to screen all 10 trillion A discovery cycle can be used to reduce total screens Use HTS data to affect the selection of
8、 compounds to screen next Scale-up of the traditional experimental discovery cycle,5,Required Technology for HTD Cycle,High Throughput Screening facility Parallel synthesis and combinatorial chemistry capabilities Methodology for automatically analyzing HTS data Humans find it difficult to interpret
9、 large amounts of noisy data Automatic HTS QSAR technology necessary for HTD cycle Methodology for designing focused combinatorial libraries HTS QSAR results are used to bias a combinatorial library towards activity ADME properties and other design criteria should be taken into account Meaningful re
10、presentation of compounds Collection of molecular descriptors meaningful across projects (avoid time consuming variable selection procedures) Definition of a “chemistry space” for diversity studies (design of initial screening libraries),Probability Modeling in Drug Discovery,7,Probabilistic Formali
11、sm (Bayesian Inference),Step 1: Write all observables as a joint probability density; e.g., Pr (A,B,C) Step 2: Decompose density using probability theory and Bayes theorem until components are measurable; e.g., Pr (A,B,C) = Pr (B | A,C) Pr (C | A) Pr (A) Step 3: Model each component in product from
12、a database or experimental data set Step 4: Make predictions or estimates using computed model of Pr(A,B,C),8,Probabilities in Speech Recognition,Successful speech recognizers select (predict) an output word sequence from an input waveform by maximizing the joint likelihood Pr (WAVE, WORDS) This is
13、used (in part) to solve the isophonetic word sequence problem; e.g., “imadam” can be “Im Adam” or “Im a Dam” or “eye mad am” Pr (WAVE, WORDS) = Pr (WAVE | WORDS) Pr (WORDS) Pr(WORDS) is the prior probability of a word sequence (utterance) Pr(WAVE | WORDS) is used to score the waveform under the assu
14、mption or hypothesis that the word sequence is WORDS Build model of Pr(WORDS) by training on, say, 500,000,000 words of newspaper text (the prior knowledge) Pr(WORDS) effectively depresses importance of unlikely utterances in favor of more plausible statements (real phrases),9,Probabilities in Drug
15、Discovery,Notation: Y = active(0/1) D = drugable(0/1) S = structure Decompose:Product of probabilities balances competing goals Classification alone (e.g., RP) is not enough: weighted outcomes needed Methodology similar to “soft” classification problems or fuzzy logic Any method of probability model
16、ing is valid (e.g., histogram, analytic) Approximations introduced can be clearly identified e.g., Pr (D | Y, S) Pr (D | S) : drugability is independent of activity (!?),Drugable given active structure (approximated by “is drug-like” efforts),Activity assuming structure (probabilistic QSAR efforts),
17、10,Pr(Y|X) via Binary QSAR,If Y is “binary activity” and X is a descriptor vector thenPathology of Binary QSAR is reasonable If new structure is outside the training set then Pr(Y=1), the hit rate, is used to make predictions (no other information available),Active,Inactive,X1,Xk,Xk+1,Xn,Pr(Y),Pr(X|
18、Y),X1,Xn,Active,Inactive,Active,Inactive,Pr(X),Pr(Y|X),Bayes Theorem,11,Distribution Estimates,Four distributions in formula are of two types Pr(Y=0), Pr(Y=1) Prior probability of inactive/active Pr(X=x|Y=0), Pr(X=x|Y=1) Probability of ligand assuming inactive/active Modeling assumption: independent
19、 uncorrelated! Decompose multi-dimensional distribution into a product Estimate 2n+2 distributions instead of original four Binary QSAR Algorithm Compute descriptor vectors di De-correlate descriptors xi = Q(di - u) Estimate distributions from xi ,yi Pr (X = x | Y = y) Assemble p (x) Pr (Y = 1 | X =
20、 x) Predict for new descriptors d p (Q (d - u),12,Experience with Binary QSAR,Fundamental methodology publication (robustness study) Biocomputing Proceedings of the 1999 Pacific Symposium World Scientific Publishing, Singapore, 1999 Example literature data sets (non-HTS data) Estrogen receptor (Gao
21、et al.; J. Chem. Info. Comput. Sci., 1999, 36) O-acyltransferase (ACAT) (Labute et. al.; in press) Example industrial data sets (HTS assays) ArQule: 24,000 cpds. 200 active, 93% on inactives, 60% on actives Pharmacopeia: 24,000 cpds. 90% on inactives, 90% on actives SmithKline Beecham: 80,000 cpds.
22、100 active, 90% on actives Best success story: Pharmacia & Upjohn Binary QSAR model used to select building blocks in combi-chem library Improved activity from M to nM (factor of 1000),13,Combined Design Model for HTD Cycle,Use Binary QSAR method twice, once for activity model and once for drugabili
23、ty model Train drugability model Pr (D | X) on WDI/ACD for drug-like/non-drug-like or on specific data sets (e.g., blood-brain barrier permeability) Complete model of activity and drugability is the product Pr(D | X) Pr(Y | X) which approximates Pr(D, Y | S),ADME Model,Activity Model,Library Design,
24、Binary QSAR,BioAssay,Design Model,Combinatorial Library,HTS Data,Drugability Data (e.g., BBB or drug-like),Binary QSAR,Representation of Chemical Structures (Descriptors),15,A Brief History of QSAR,Original philosophy (Hansch & Leo): Use a fixed set of meaningful molecular properties to describe a w
25、ide variety of biological phenomena Linear regression used to determine SAR The determination of linear relationships is basic science Statistical regression framework used to assess significance of SAR Proliferation of descriptors Early successes lead to introduction of a vast array of descriptors
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- APROBABILISTICAPPROACHTOHIGHTHROUGHPUTDRUGDISCOVERYPPT

链接地址:http://www.mydoc123.com/p-373176.html