A Revealing Introduction to Hidden Markov Models.ppt
《A Revealing Introduction to Hidden Markov Models.ppt》由会员分享,可在线阅读,更多相关《A Revealing Introduction to Hidden Markov Models.ppt(71页珍藏版)》请在麦多课文档分享上搜索。
1、A Revealing Introduction to Hidden Markov Models,Mark Stamp,1,HMM,Hidden Markov Models,What is a hidden Markov model (HMM)? A machine learning technique A discrete hill climb technique Where are HMMs used? Speech recognition Malware detection, IDS, etc., etc. Why is it useful? Efficient algorithms,H
2、MM,2,Markov Chain,Markov chain is a “memoryless random process” Transitions depend only on current state and transition probabilities matrix Example on next slide,HMM,3,Markov Chain,We are interested in average annual temperature Only consider Hot and Cold From recorded history, we obtain probabilit
3、ies See diagram to the right,HMM,4,H,C,0.7,0.6,0.3,0.4,Markov Chain,Transition probability matrixMatrix is denoted as ANote, A is “row stochastic”,HMM,5,H,C,0.7,0.6,0.3,0.4,Markov Chain,Can also include begin, end states Begin state matrix is In this example,Note that is row stochastic,HMM,6,H,C,0.7
4、,0.6,0.3,0.4,begin,end,0.6,0.4,Hidden Markov Model,HMM includes a Markov chain But this Markov process is “hidden” Cannot observe the Markov process Instead, we observe something related to hidden states Its as if there is a “curtain” between Markov chain and observations Example on next slide,HMM,7
5、,HMM Example,Consider H/C temperature example Suppose we want to know H or C temperature in distant past Before humans (or thermometers) invented OK if we can just decide Hot versus Cold We assume transition between Hot and Cold years is same as today That is, the A matrix is same as today,HMM,8,HMM
6、 Example,Temp in past determined by Markov process But, we cannot observe temperature in past Instead, we note that tree ring size is related to temperature Look at historical data to see the connection We consider 3 tree ring sizes Small, Medium, Large (S, M, L, respectively) Measure tree ring size
7、s and recorded temperatures to determine relationship,HMM,9,HMM Example,We find that tree ring sizes and temperature related byThis is known as the B matrix:Note that B also row stochastic,HMM,10,HMM Example,Can we now find temps in distant past? We cannot measure (observe) temp But we can measure t
8、ree ring sizes and tree ring sizes related to temp By the B matrix So, we ought to be able to say something about temperature,HMM,11,HMM Notation,A lot of notation is required Notation may be the most difficult part,HMM,12,HMM Notation,To simplify notation, observations are taken from the set 0,1,M-
9、1 That is, The matrix A = aij is N x N, whereThe matrix B = bj(k) is N x M, where,HMM,13,HMM Example,Consider our temperature example What are the observations? V = 0,1,2, which corresponds to S,M,L What are states of Markov process? Q = H,C What are A,B, , and T? A,B, on previous slides T is number
10、 of tree rings measured What are N and M? N = 2 and M = 3,HMM,14,Generic HMM,Generic view of HMMHMM defined by A,B, and We denote HMM “model” as = (A,B,),HMM,15,HMM Example,Suppose that we observe tree ring sizes For 4 year period of interest: S,M,S,L Then = (0, 1, 0, 2) Most likely (hidden) state s
11、equence? We want most likely X = (x0, x1, x2, x3) Let x0 be prob. of starting in state x0 Note prob. of initial observation And ax0,x1 is prob. of transition x0 to x1 And so on,HMM,16,HMM Example,Bottom line? We can compute P(X) for any X For X = (x0, x1, x2, x3) we haveSuppose we observe (0,1,0,2),
12、 then what is probability of, say, HHCC? Plug into formula above to find,HMM,17,HMM Example,Do same for all 4-state sequences We find The winner is? CCCH Not so fast my friend,HMM,18,HMM Example,The path CCCH scores the highest In dynamic programming (DP), we find highest scoring path But, HMM maxim
13、izes expected number of correct states Sometimes called “EM algorithm” For “Expectation Maximization” How does HMM work in this example?,HMM,19,HMM Example,For first position Sum probabilities for all paths that have H in 1st position, compare to sum of probs for paths with C in 1st position - bigge
14、st wins Repeat for each position and we find:,HMM,20,HMM Example,So, HMM solution gives us CHCH While dynamic program solution is CCCH Which solution is better? Neither! Why is that? Different definitions of “best”,HMM,21,HMM Paradox?,HMM maximizes expected number of correct states Whereas DP choose
15、s “best” overall path Possible for HMM to choose “path” that is impossible Could be a transition probability of 0 Cannot get impossible path with DP Is this a flaw with HMM? No, its a feature,HMM,22,The Three Problems,HMMs used to solve 3 problems Problem 1: Given a model = (A,B,) and observation se
16、quence O, find P(O|) That is, we score an observation sequence to see how well it fits the given model Problem 2: Given = (A,B,) and O, find an optimal state sequence Uncover hidden part (as in previous example) Problem 3: Given O, N, and M, find the model that maximizes probability of O That is, tr
17、ain a model to fit the observations,HMM,23,HMMs in Practice,Typically, HMMs used as follows Given an observation sequence Assume a hidden Markov process exists Train a model based on observations Problem 3 (determine N by trial and error) Then given a sequence of observations, score it vs model from
18、 previous step Problem 1 (high score implies its similar to training data),HMM,24,HMMs in Practice,Previous slide gives sense in which HMM is a “machine learning” technique We do not need to specify anything except the parameter N And “best” N found by trial and error That is, we dont have to think
19、too much Just train HMM and then use it Best of all, efficient algorithms for HMMs,HMM,25,The Three Solutions,We give detailed solutions to the three problems Note: We must have efficient solutions Recall the three problems: Problem 1: Score an observation sequence versus a given model Problem 2: Gi
20、ven a model, “uncover” hidden part Problem 3: Given an observation sequence, train a model,HMM,26,Solution 1,Score observations versus a given model Given model = (A,B,) and observation sequence O=(O0,O1,OT-1), find P(O|) Denote hidden states as X = (x0, x1, . . . , xT-1) Then from definition of B,P
21、(O|X,)=bx0(O0) bx1(O1) bxT-1(OT-1) And from definition of A and ,P(X|)=x0 ax0,x1 ax1,x2 axT-2,xT-1,HMM,27,Solution 1,Elementary conditional probability fact:P(O,X|) = P(O|X,) P(X|) Sum over all possible state sequences X,P(O|) = P(O,X|) = P(O|X,) P(X|)= x0bx0(O0)ax0,x1bx1(O1)axT-2,xT-1bxT-1(OT-1) Th
22、is “works” but way too costly Requires about 2TNT multiplications Why? There better be a better way,HMM,28,Forward Algorithm,Instead of brute force: forward algorithm Or “alpha pass” For t = 0,1,T-1 and i=0,1,N-1, lett(i) = P(O0,O1,Ot,xt=qi|) Probability of “partial sum” to t, and Markov process is
23、in state qi at step t What the? Can be computed recursively, efficiently,HMM,29,Forward Algorithm,Let 0(i) = ibi(O0) for i = 0,1,N-1 For t = 1,2,T-1 and i=0,1,N-1, lett(i) = (t-1(j)aji)bi(Ot) Where the sum is from j = 0 to N-1 From definition of t(i) we seeP(O|) = T-1(i) Where the sum is from i = 0
24、to N-1 Note this requires only N2T multiplications,HMM,30,Solution 2,Given a model, find “most likely” hidden states: Given = (A,B,) and O, find an optimal state sequence Recall that optimal means “maximize expected number of correct states” In contrast, DP finds best scoring path For temp/tree ring
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- AREVEALINGINTRODUCTIONTOHIDDENMARKOVMODELSPPT

链接地址:http://www.mydoc123.com/p-373181.html