The Problem of Detecting Differentially Expressed Genes.ppt
《The Problem of Detecting Differentially Expressed Genes.ppt》由会员分享,可在线阅读,更多相关《The Problem of Detecting Differentially Expressed Genes.ppt(101页珍藏版)》请在麦多课文档分享上搜索。
1、,The Problem of Detecting Differentially Expressed Genes,Class 1,Class 2,Fold Change is the Simplest MethodCalculate the log ratio between the two classes and consider all genes that differ by more than an arbitrary cutoff value to be differentially expressed. A two-fold difference is often chosen.F
2、old change is not a statistical test.,(1) For gene consider the null hypothesis of no association between its expression level and its class membership.,Test of a Single Hypothesis,(3) Perform a test (e.g Students t-test) for each gene.,(2) Decide on level of significance (commonly 5%).,(4) Obtain P
3、-value corresponding to that test statistic.,(5) Compare P-value with the significance level. Then either reject or retain the null hypothesis.,Two-Sample t-Statistic,Students t-statistic:,Two-Sample t-Statistic,Pooled form of the Students t-statistic, assumed common variance in the two classes:,Two
4、-Sample t-Statistic,Modified t-statistic of Tusher et al. (2001):,TRUE,PREDICTED,Types of Errors in Hypothesis Testing,Multiplicity Problem,Further: Genes are co-regulated, subsequently there is correlation between the test statistics.,When many hypotheses are tested, the probability of a false posi
5、tive increases sharply with the number of hypotheses.,Suppose we measure the expression of 10,000 genes in a microarray experiment.,If all 10,000 genes were not differentially expressed, then we would expect for:P= 0.05 for each test, 500 false positives.P= 0.05/10,000 for each test, .05 false posit
6、ives.,Example,Controlling the Error Rate,Methods for controlling false positives e.g. Bonferroni are too strict for microarray analyses Use the False Discovery Rate instead (FDR)(Benjamini and Hochberg 1995),Methods for dealing with the Multiplicity Problem,The Bonferroni Method controls the family
7、wise error rate (FWER) i.e. the probability that at least one false positive error will be made,The False Discovery Rate (FDR) emphasizes the proportion of false positives among the identified differentially expressed genes.,Too strict for gene expression data, tries to make it unlikely that even on
8、e false rejection of the null is made, may lead to missed findings,Good for gene expression data says something about the chosen genes,The FDR is essentially the expectation of the proportion of false positives among the identified differentially expressed genes.,False Discovery Rate Benjamini and H
9、ochberg (1995),Possible Outcomes for N Hypothesis Tests,where,Positive FDR,Lindsay, Kettenring, and Siegmund (2004).A Report on the Future of Statistics.Statist. Sci. 19.,Key papers on controlling the FDR,Genovese and Wasserman (2002) Storey (2002, 2003)Storey and Tibshirani (2003a, 2003b)Storey, Ta
10、ylor and Siegmund (2004)Black (2004)Cox and Wong (2004),Controlling FDR,Benjamini and Hochberg (1995),Benjamini-Hochberg (BH) Procedure,Controls the FDR at level a when the P-values following the null distribution are independent and uniformly distributed.,(1) Let be the observed P-values.,(2) Calcu
11、late .,(3) If exists then reject null hypotheses corresponding to. Otherwise, reject nothing.,Example: Bonferroni and BH Tests,Suppose that 10 independent hypothesis tests are carried out leading to the following ordered P-values:,0.00017 0.00448 0.00671 0.00907 0.01220 0.33626 0.39341 0.53882 0.581
12、25 0.98617,(a) With a = 0.05, the Bonferroni test rejects any hypothesis whose P-value is less than a / 10 = 0.005.,Thus only the first two hypotheses are rejected.,(b) For the BH test, we find the largest k such that P(k) ka / N.,Here k = 5, thus we reject the first five hypotheses.,q-VALUE,q-value
13、 of a gene j is expected proportion of false positives when calling that gene significant.P-value is the probability under the null hypothesis of obtaining a value of the test statistic as or more extreme than its observed value. The q-value for an observed test statistic can be viewed as the expect
14、ed proportion of false positives among all genes with their test statistics as or more extreme than the observed value.,LIST OF SIGNIFICANT GENES,Call all genes significant if pj 0.05 or Call all genes significant if qj 0.05 to produce a set of significant genes so that a proportion of them (0.05) i
15、s expected to be false (at least for a large no. of genes not necessarily independent),BRCA1 versus BRCA2-mutation positive tumours (Hedenfalk et al., 2001),BRCA1 (7) versus BRCA2-mutation (8) positive tumours, p=3226 genesP=.001 gave 51 genes differentially expressedP=0.0001 gave 9-11 genes,Using q
16、0.05, gives 160 genes are taken to be significant.It means that approx. 8 of these 160 genes are expected to be false positives.Also, it is estimated that 33% of the genes are differentially expressed.,Permutation Method,The null distribution has a resolution on the order of the number of permutatio
17、ns. If we perform B permutations, then the P-value will be estimated with a resolution of 1/B. If we assume that each gene has the same null distribution and combine the permutations, then the resolution will be 1/(NB) for the pooled null distribution.,Null Distribution of the Test Statistic,Using j
18、ust the B permutations of the class labels for the gene-specific statistic Wj , the P-value for Wj = wj is assessed as:,where w(b)0j is the null version of wj after the bth permutation of the class labels.,If we pool over all N genes, then:,Class 1 Class 2,Gene 1 A1(1) A2(1) A3(1) B4(1) B5(1) B6(1),
19、Gene 2 A1(2) A2(2) A3(2) B4(2) B5(2) B6(2),Suppose we have two classes of tissue samples, with three samples from each class. Consider the expressions of two genes, Gene 1 and Gene 2.,Null Distribution of the Test Statistic: Example,Class 1 Class 2,Gene 1 A1(1) A2(1) A3(1) B4(1) B5(1) B6(1),Gene 2 A
20、1(2) A2(2) A3(2) B4(2) B5(2) B6(2),Gene 1 A1(1) A2(1) A3(1) A4(1) A5(1) A6(1),To find the null distribution of the test statistic for Gene 1, we proceed under the assumption that there is no difference between the classes (for Gene 1) so that:,Perm. 1 A1(1) A2(1) A4(1) A3(1) A5(1) A6(1) . There are
21、10 distinct permutations.,And permute the class labels:,Ten Permutations of Gene 1,A1(1) A2(1) A3(1) A4(1) A5(1) A6(1)A1(1) A2(1) A4(1) A3(1) A5(1) A6(1)A1(1) A2(1) A5(1) A3(1) A4(1) A6(1)A1(1) A2(1) A6(1) A3(1) A4(1) A5(1)A1(1) A3(1) A4(1) A2(1) A5(1) A6(1)A1(1) A3(1) A5(1) A2(1) A4(1) A6(1)A1(1) A
22、3(1) A6(1) A2(1) A4(1) A5(1)A1(1) A4(1) A5(1) A2(1) A3(1) A6(1)A1(1) A4(1) A6(1) A2(1) A3(1) A5(1)A1(1) A5(1) A6(1) A2(1) A3(1) A4(1),As there are only 10 distinct permutations here, the null distribution based on these permutations is too granular. Hence consideration is given to permuting the labe
23、ls of each of the other genes and estimating the null distribution of a gene based on the pooled permutations so obtained. But there is a problem with this method in that the null values of the test statistic for each gene does not necessarily have the theoretical null distribution that we are tryin
24、g to estimate.,Suppose we were to use Gene 2 also to estimate the null distribution of Gene 1. Suppose that Gene 2 is differentially expressed, then the null values of the test statistic for Gene 2 will have a mixed distribution.,Class 1 Class 2,Gene 1 A1(1) A2(1) A3(1) B4(1) B5(1) B6(1),Gene 2 A1(2
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- THEPROBLEMOFDETECTINGDIFFERENTIALLYEXPRESSEDGENESPPT

链接地址:http://www.mydoc123.com/p-373333.html