Introduction to SNP and Haplotype Analysis.ppt
《Introduction to SNP and Haplotype Analysis.ppt》由会员分享,可在线阅读,更多相关《Introduction to SNP and Haplotype Analysis.ppt(52页珍藏版)》请在麦多课文档分享上搜索。
1、Introduction to SNP and Haplotype Analysis,Algorithms and Computational Biology Lab, Department of Computer Science & Information Engineering, National Taiwan University, Taiwan.,Yao-Ting Huang,Kun-Mao Chao,2,Genetic Variations,The genetic variations in DNA sequences (e.g., insertions, deletions, an
2、d mutations) have a major impact on genetic diseases and phenotypic differences. All humans share 99% the same DNA sequence. The genetic variations in the coding region may change the codon of an amino acid and alters the amino acid sequence.,Single Nucleotide Polymorphism,A Single Nucleotide Polymo
3、rphisms (SNP), pronounced “snip,” is a genetic variation when a single nucleotide (i.e., A, T, C, or G) is altered and kept through heredity. SNP: Single DNA base variation found 1% Mutation: Single DNA base variation found 1%,C T T A G C T T,C T T A G T T T,SNP,C T T A G C T T,C T T A G T T T,Mutat
4、ion,94%,6%,99.9%,0.1%,4,Mutations and SNPs,Common Ancestor,Observed genetic variations,5,Single Nucleotide Polymorphism,SNPs are the most frequent form among various genetic variations. 90% of human genetic variations come from SNPs. SNPs occur about every 300600 base pairs. Millions of SNPs have be
5、en identified (e.g., HapMap and Perlegen). SNPs have become the preferred markers for association studies because of their high abundance and high-throughput SNP genotyping technologies.,Single Nucleotide Polymorphism,A SNP is usually assumed to be a binary variable. The probability of repeat mutati
6、on at the same SNP locus is quite small. The tri-allele cases are usually considered to be the effect of genotyping errors. The nucleotide on a SNP locus is called a major allele (if allele frequency 50%), or a minor allele (if allele frequency 50%).,A C T T A G C T T,A C T T A G C T C,C: Minor alle
7、le,94%,6%,T: Major allele,7,Haplotypes,A haplotype stands for a set of linked SNPs on the same chromosome. A haplotype can be simply considered as a binary string since each SNP is binary.,8,Genotypes,The use of haplotype information has been limited because the human genome is a diploid. In large s
8、equencing projects, genotypes instead of haplotypes are collected due to cost consideration.,9,Problems of Genotypes,Genotypes only tell us the alleles at each SNP locus. But we dont know the connection of alleles at different SNP loci. There could be several possible haplotypes for the same genotyp
9、e.,or,We dont know which haplotype pair is real.,10,Research Directions of SNPs and Haplotypes in Recent Years,Haplotype Inference,Tag SNP Selection,Maximum Parsimony,Perfect Phylogeny,Statistical Methods,Haplotype block,LD bin,Prediction Accuracy,SNP Database,11,Haplotype Inference,The problem of i
10、nferring the haplotypes from a set of genotypes is called haplotype inference. This problem is already known to be not only NP-hard but also APX-hard. Most combinatorial methods consider the maximum parsimony model to solve this problem. This model assumes that the real haplotypes in natural populat
11、ion is rare. The solution of this problem is a minimum set of haplotypes that can explain the given genotypes.,12,Maximum Parsimony,or,Find a minimum set of haplotypes to explain the given genotypes.,13,Related Works,Statistical methods: Niu, et al. (2002) developed a PL-EM algorithm called HAPLOTYP
12、ER. Stephens and Donnelly (2003) designed a MCMC algorithm based on Gibbs sampling called PHASE. Combinatorial methods: Gusfield (2003) proposed an integer linear programming algorithm. Wang and Xu (2003) developed a branching and bound algorithm called HAPAR to find the optimal solution. Brown and
13、Harrower (2004) proposed a new integer linear formulation of this problem.,14,Our Results,We formulated this problem as an integer quadratic programming (IQP) problem. We proposed an iterative semidefinite programming (SDP) relaxation algorithm to solve the IQP problem. This algorithm finds a soluti
14、on of O(log n) approximation. We implemented this algorithm in MatLab and compared with existing methods. Huang, Y.-T., Chao, K.-M., and Chen, T., 2005, “An Approximation Algorithm for Haplotype Inference by Maximum Parsimony,” Journal of Computational Biology, 12: 1261-1274.,15,Problem Formulation,
15、Input: A set of n genotypes and m possible haplotypes. Output: A minimum set of haplotypes that can explain the given genotypes.,16,Integer Quadratic Programming (IQP),Define xi as an integer variable with values 1 or -1. xi = 1 if the i-th haplotype is selected. xi = -1 if the i-th haplotype is not
16、 selected. Minimizing the number of selected haplotypes is to minimize the following integer quadratic function:,17,Integer Quadratic Programming (IQP),Each genotype must be resolved by at least one pair of haplotypes. For genotype G1, the following integer quadratic function must be satisfied.,or,S
17、uppose h1 and h2 are selected,18,Integer Quadratic Programming (IQP),Maximum parsimony:We use the SDP-relaxation technique to solve this IQP problem.,to resolve all genotypes.,Find a minimum set of haplotypes,19,The Flow of the Iterative SDP Relaxation Algorithm,Integer Quadratic Programming,Integra
18、l Solution,Semidefinite Programming,Vector Solution,Vector Formulation,SDP Solution,20,Research Directions of SNPs and Haplotypes in Recent Years,Haplotype Inference,Tag SNP Selection,Maximum Parsimony,Perfect Phylogeny,Statistical Methods,Haplotype block,LD bin,Prediction Accuracy,SNP Database,21,P
19、roblems of Using SNPs for Association Studies,The number of SNPs is still too large to be used for association studies. There are millions of SNPs in a human body. To reduce the SNP genotyping cost, we wish to use as few SNPs as possible for association studies. Tag SNPs are a small subset of SNPs t
20、hat is sufficient for performing association studies without losing the power of using all SNPs. There are many definitions of tag SNPs. We will first study one definition of tag SNPs based on haplotype blocks model.,22,Haplotype Blocks and Tag SNPs,Recent studies have shown that the chromosome can
21、be partitioned into haplotype blocks interspersed by recombination hotspots (Daly et al, Patil et al.). Within a haplotype block, there is little or no recombination occurred. The SNPs within a haplotype block tend to be inherited together. Within a haplotype block, a small subset of SNPs (called ta
22、g SNPs) is sufficient to distinguish each pair of haplotype patterns in the block. We only need to genotype tag SNPs instead of all SNPs within a haplotype block.,23,Recombination Hotspots and Haplotype Blocks,24,A Haplotype Block Example,The Chromosome 21 is partitioned into 4,135 haplotype blocks
23、over 24,047 SNPs by Patil et al. (Science, 2001). Blue box: major allele Yellow box: minor allele,25,Examples of Tag SNPs,P1,P2,P3,P4,S1,S2,S3,S4,S5,S6,S7,S8,S9,S10,S11,S12,SNP loci,Haplotype patterns,Suppose we wish to distinguish an unknown haplotype sample. We can genotype all SNPs to identify th
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- INTRODUCTIONTOSNPANDHAPLOTYPEANALYSISPPT
