An introduction to Bayesian Networksand the Bayes Net .ppt
《An introduction to Bayesian Networksand the Bayes Net .ppt》由会员分享,可在线阅读,更多相关《An introduction to Bayesian Networksand the Bayes Net .ppt(53页珍藏版)》请在麦多课文档分享上搜索。
1、An introduction to Bayesian Networks and the Bayes Net Toolbox for Matlab,Kevin Murphy MIT AI Lab 19 May 2003,Outline,An introduction to Bayesian networks An overview of BNT,Qualitative part: Directed acyclic graph (DAG) Nodes - random vars. Edges - direct influence,Quantitative part: Set of conditi
2、onal probability distributions,Compact representation of joint probability distributions via conditional independence,Together: Define a unique distribution in a factored form,What is a Bayes (belief) net?,Figure from N. Friedman,What is a Bayes net?,C ? R,B,E | A,A node is conditionally independent
3、 of its ancestors given its parents, e.g.,Hence,From 25 1 = 31 parameters to 1+1+2+4+2=10,Why are Bayes nets useful?,- Graph structure supports - Modular representation of knowledge - Local, distributed algorithms for inference and learning - Intuitive (possibly causal) interpretation,- Factored rep
4、resentation may have exponentially fewer parameters than full joint P(X1,Xn) =lower sample complexity (less data for learning)lower time complexity (less time for inference),What can Bayes nets be used for?,Posterior probabilities Probability of any event given any evidence Most likely explanation S
5、cenario that explains evidence Rational decision making Maximize expected utility Value of Information Effect of intervention Causal analysis,Radio,Call,Figure from N. Friedman,Explaining away effect,Domain: Monitoring Intensive-Care Patients 37 variables 509 parametersinstead of 237,A real Bayes ne
6、t: Alarm,Figure from N. Friedman,More real-world BN applications,“Microsofts competitive advantage lies in its expertise in Bayesian networks” - Bill Gates, quoted in LA Times, 1996 MS Answer Wizards, (printer) troubleshooters Medical diagnosis Genetic pedigree analysis Speech recognition (HMMs) Gen
7、e sequence/expression analysis Turbocodes (channel coding),Dealing with time,In many systems, data arrives sequentially Dynamic Bayes nets (DBNs) can be used to model such time-series (sequence) data Special cases of DBNs include State-space models Hidden Markov models (HMMs),State-space model (SSM)
8、/ Linear Dynamical System (LDS),“True” state,Noisy observations,Example: LDS for 2D tracking,Sparse linear Gaussian systems ) sparse graphs,Hidden Markov model (HMM),Phones/ words,acoustic signal,transition matrix,Gaussian observations,Probabilistic graphical models,Probabilistic models,Directed,Und
9、irected,Graphical models,Alarm network State-space models HMMs Nave Bayes classifier PCA/ ICA,Markov Random Field Boltzmann machine Ising model Max-ent model Log-linear models,(Bayesian belief nets),(Markov nets),Toy example of a Markov net,X1,X2,X5,X3,X4,e.g, X1 ? X4, X5 | X2, X3,Xi ? Xrest | Xnbrs
10、,Potential functions,Partition function,A real Markov net,Estimate P(x1, , xn | y1, , yn)Y(xi, yi) = P(observe yi | xi): local evidenceY(xi, xj) / exp(-J(xi, xj): compatibility matrix c.f., Ising/Potts model,Observed pixels,Latent causes,Inference,Posterior probabilities Probability of any event giv
11、en any evidence Most likely explanation Scenario that explains evidence Rational decision making Maximize expected utility Value of Information Effect of intervention Causal analysis,Radio,Call,Figure from N. Friedman,Explaining away effect,Kalman filtering (recursive state estimation in an LDS),Est
12、imate P(Xt|y1:t) from P(Xt-1|y1:t-1) and yt Predict: P(Xt|y1:t-1) = sXt-1 P(Xt|Xt-1) P(Xt-1|y1:t-1) Update: P(Xt|y1:t) / P(yt|Xt) P(Xt|y1:t-1),Forwards algorithm for HMMs,Predict:,Update:,Message passing view of forwards algorithm,Forwards-backwards algorithm,Discrete analog of RTS smoother,Belief P
13、ropagation,Figure from P. Green,Generalization of forwards-backwards algo. /RTS smoother from chains to trees - linear time, two-pass algorithm,aka Pearls algorithm, sum-product algorithm,BP: parallel, distributed version,Stage 1.,Stage 2.,Representing potentials,For discrete variables, potentials c
14、an be represented as multi-dimensional arrays (vectors for single node potentials) For jointly Gaussian variables, we can use y(X) = (m, S) or y(X) = (S-1 m ,S-1) In general, we can use mixtures of Gaussians or non-parametric forms,Manipulating discrete potentials,Marginalization,Multiplication,80%
15、of time is spent manipulating such multi-dimensional arrays!,Manipulating Gaussian potentials,Closed-form formulae for marginalization and multiplication O(1)/O(n3) complexity per operation Mixtures of Gaussian potentials are not closed under marginalization, so need approximations (moment matching)
16、,Semi-rings,By redefining * and +, same code implements Kalman filter and forwards algorithm By replacing + with max, can convert from forwards (sum-product) to Viterbi algorithm (max-product) BP works on any commutative semi-ring!,Inference in general graphs,BP is only guaranteed to be correct for
17、trees A general graph should be converted to a junction tree, by clustering nodes Computationally complexity is exponential in size of the resulting clusters (NP-hard),Approximate inference,Why? to avoid exponential complexity of exact inference in discrete loopy graphs Because cannot compute messag
18、es in closed form (even for trees) in the non-linear/non-Gaussian case How? Deterministic approximations: loopy BP, mean field, structured variational, etc Stochastic approximations: MCMC (Gibbs sampling), likelihood weighting, particle filtering, etc,- Algorithms make different speed/accuracy trade
19、offs,- Should provide the user with a choice of algorithms,Learning,Parameter estimation Model selection (structure learning),Parameter learning,Figure from M. Jordan,Conditional Probability Tables (CPTs),iid data,If some values are missing (latent variables), we must use gradient descent or EM to c
20、ompute the (locally) maximum likelihood estimates,Structure learning (data mining),Gene expression data,Figure from N. Friedman,Genetic pathway,Structure learning,Learning the optimal structure is NP-hard (except for trees) Hence use heuristic search through space of DAGs or PDAGs or node orderings
21、Search algorithms: hill climbing, simulated annealing, GAs Scoring function is often marginal likelihood, or anapproximation like BIC/MDL or AIC,Structural complexity penalty,Summary: why are graphical models useful?,- Factored representation may have exponentially fewer parameters than full joint P
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- ANINTRODUCTIONTOBAYESIANNETWORKSANDTHEBAYESNETPPT

链接地址:http://www.mydoc123.com/p-378289.html