Analyzing Metabolomic Datasets.ppt
《Analyzing Metabolomic Datasets.ppt》由会员分享,可在线阅读,更多相关《Analyzing Metabolomic Datasets.ppt(26页珍藏版)》请在麦多课文档分享上搜索。
1、Analyzing Metabolomic Datasets,Jack Liu Statistical Science, RTP, GSK 7-14-2005,Overview,Features of Metabolomic datasets Pre-learning procedures Experimental design Data preprocess and sample validation Metabolite selection Unsupervised learning Profile clustering SVD/RSVD Supervised learning Softw
2、are,Why metabolomics?,Discover new disease biomarkers for screening and therapy progression A small subsets of metabolites can indicate an early disease stage or predict a therapy efficiency Associate metobolites (functions) with transcripts (genes) Metobolites are downstream results of gene express
3、ion,Metabolomics datasets,Advantages Metabolomics are not organism specific = make cross-platform analysis possible Changes are usually large Closer to phenotype Metabolites are well known (900-1000) Disadvantages Lots of missing data and mismatches (like Proteomics) Expensive (about 2-10 more expen
4、sive than Affymetrix),Experimental design,Traditional experimental design still apply Blocking Randomization Enough replicates Design the experiment based on the expectation A two-group design will not lead to a complete profiling (if samples in groups are homogenous) A multiple-group design may hav
5、e difficulty for supervised learning (if group number is large and data is noisy),Data preprocessing,Perform transformation Log-2 transformation is a common choice Normalization: use simple ones Summarization is needed for technical replicates Filter variables by missing patterns What to do with the
6、 missing data?,“Curse of missing data”,Missing can be due to multiple causes Informative missing Inconsistency / mismatch Unknown missing (we recently identified a suppression effect in Proteomics) What to do? Replace with the detection limit (nave) Leave as it is and let the algorithm to deal with
7、it (we may ignore important missing patterns) Single imputation (KNN, SVD. Not easy for a data with 20% missing) Multiple imputation (How to impute? Not easy to apply) Whats needed? Theory support for univariate modeling incorporating missing values/censored values,NCI dataset,58 cells and 300 metab
8、olites, no replicates These cells are the majorities of the famous NCI-60 cancer cell lines 27% missing data. Can not replace missing values with a low value. Why?,Missing value replacement: does it always work?,Before replacement Correlation = 0.88,After replacement Correlation = 0.68,Cell 1 and 2
9、are both breast cancer cell types,Note: use pair-wise deletion to compute correlation; replace with value 13.,Sample validation,Objective After we do the experiment, how do we decide if a sample has passed QC and is not an outlier? Solutions Technical QC measures PCA: visual approach. Accepting or n
10、ot is arbitrary Correlation-based method: formal and quantitative approach; based on all the data; has been taken by GSK as the formal procedure Sample validation is a cost-saving procedure,Metabolite selection,Objective Filter metabolites and assign significance Outcome Least square means Fold chan
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- ANALYZINGMETABOLOMICDATASETSPPT
