Asynchronous Pattern Matching -Metrics.ppt
《Asynchronous Pattern Matching -Metrics.ppt》由会员分享,可在线阅读,更多相关《Asynchronous Pattern Matching -Metrics.ppt(34页珍藏版)》请在麦多课文档分享上搜索。
1、Asynchronous Pattern Matching - Metrics,Amihood Amir CPM 2006,Motivation,Motivation,In the “old” days: Pattern and text are given in correct sequential order. It is possible that the content is erroneous. New paradigm: Content is exact, but the order of the pattern symbols may be scrambled. Why? Tra
2、nsmitted asynchronously? The nature of the application?,Example: Swaps,Tehse knids of typing mistakes are very common So when searching for pattern These we are seeking the symbols of the pattern but with an order changed by swaps. Surprisingly, pattern matching with swaps is easier than pattern mat
3、ching with mismatches (ACHLP:01),Example: Reversals,AAAGGCCCTTTGAGCCC AAAGAGTTTCCCGGCCC Given a DNA substring, a piece of it can detach and reverse. This process still computationally tough. Question: What is the minimum number of reversals necessary to sort a permutation of 1,n,Global Rearrangement
4、s?,Berman & Hannenhalli (1996) called this Global Rearrangement as opposed to Local Rearrangement (edit distance). Showed it is NP-hard.Our Thesis: This is a special case of errors in the address rather than content.,Example: Transpositions,AAAGGCCCTTTGAGCCC AATTTGAGGCCCAGCCCGiven a DNA substring, a
5、 piece of it can be transposed to another area. Question: What is the minimum number of transpositions necessary to sort a permutation of 1,n ?,Complexity?,Bafna & Pevzner (1998), Christie (1998), Hartman (2001): 1.5 Polynomial Approximation.Not known whether efficiently computable.This is another s
6、pecial case of errors in the address rather than content.,Example: Block Interchanges,AAAGGCCCTTTGAGCCC AAGTTTAGGCCCAGCCC Given a DNA substring, two non-empty subsequences can be interchanged. Question: What is the minimum number of block interchanges necessary to sort a permutation of 1,n ? Christi
7、e (1996): O(n ),2,A General-Purpose Metric,Options: 1. count interchanges,interchange,interchange matches,S1=bbaca S2=bbaac,2. L1 , L2 ,or any other metric on the address. Example: AGGTTCCAATC1 22 1 12 215 11 GTAGCAACTCT,In This Talk:,We concentrate on counting the interchanges As a metric. (we also
8、 have results on the L2 metric, partial results on L1, and Address register errors)We have a pedagogical reason for this,Summary,Biology: sorting permutations Reversals (Berman & Hannenhalli, 1996) Transpositions (Bafna & Pevzner, 1998),Pattern Matching: Swaps (Amir, Lewenstein & Porat, 2002),NP-har
9、d ?,Block interchanges O(n2) (Christie, 1996),O(n log m),Note: A swap is a block interchange simplification,1. Block size,2. Only once,3. Adjacent,Edit operations map,Reversal, Transposition, Block interchange: 1. arbitrary block size 2. not once 3. non adjacent4. permutation 5. optimization Interch
10、ange: 1. block of size 1 2. not once 3. non adjacent4. permutation 5. optimization Generalized-swap: 1. block of size 1 2. once 3. non adjacent4. repetitions 5. optimization/decision Swap: 1. block of size 1 2. once 3. adjacent4. repetitions 5. optimization/decision,interchange,interchange matches,S
11、1=bbaca S2=bbaac,generalized-swapmatches,S1=bbaca S2=bcaba,Definitions,Generalized Swap Matching,INPUT: text T0n, pattern P0m OUTPUT: all i s.t. P generalized-swap matches Tii+m,Reminder: Convolution The convolution of the strings t1n and p1m is the string t*p such that:,Fact: The convolution of n-l
12、ength text and m-length pattern can be done in O(n log m) time using FFT.,In Pattern Matching,Convolutions:,O(n log m) using FFT,b0 b1 b2,b0 b1 b2,b0 b1 b2,Problem: O(n log m) only in algebraically closed fields, e.g. C.,Solution: Reduce problem to (Boolean/integer/real) multiplication. S,This reduc
13、tion costs!,Example: Hamming distance.,Counting mismatches is equivalent to Counting matches,A B A B C A B B B A,Example:,Count all “hits” of 1 in pattern and 1 in text.,1 0 1,1 0 1,1 0 1,For,Define:,1 if a=b,0 o/w,Example:,For,Do:,+,+,Result: The number of times a in pattern matches a in text + the
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- ASYNCHRONOUSPATTERNMATCHINGMETRICSPPT
