A Multi-span Language Modeling Frame Work For Speech .ppt
《A Multi-span Language Modeling Frame Work For Speech .ppt》由会员分享,可在线阅读,更多相关《A Multi-span Language Modeling Frame Work For Speech .ppt(19页珍藏版)》请在麦多课文档分享上搜索。
1、A Multi-span Language Modeling Frame Work For Speech Recognition Jimmy Wang Speech Lab, NTU,Outline,1.Introduction. 2.N-gram Language Modeling. 3.Smoothing and Clustering of N-gram Language Model. 4.LSA Modeling. 5.Hybrid LSA+N-gram Language Model. 6.Conclusion.,INTRODUCTION, .劉邦友血案抓到一對象 劉邦友血案抓到一隊象
2、.水餃一碗多少錢睡覺一晚多少錢,INTRODUCTION,Stochastic Modeling of Speech Recognition :,INTRODUCTION,N-gram language modeling has been the the formalism of choice for ASR because of reliability, but can only constraint locally.For global constraints, parsing and rule-based grammar have been only successful in smal
3、l vocabulary application.,INTRODUCTION,N-gram+LSA (Latent Semantic Analysis) language models integrate local constraints via N-gram, and global constraints through LSA models.,N-gram Language Model,Assume each word depends only on the previous N-1 words (N words total).N-gram=N-1 order Markov Model.
4、P(象| 抓到一隊) P(象| 抓到 , 一隊). Perplexity:,N-gram Language Model,N-gram Training From Text Corpus: Corpus Size ranges from hundreds Mbytes to several Gbytes.Maximum Likelihood Approach:P(“the | nothing but”) C(“nothing but the”) / C(“nothing but”).,Smoothing and Clustering,Terrible on test data: If no oc
5、currences of C(xyz), probability is 0.Find 01 by optimizing on “held-out” data.,Smoothing and Clustering,CLUSTERING = Classes of (same things).P(Tuesday | party on) or P(Tuesday | celebration on)= P(WEEKDAY|EVENT)Put words in clusters: P(WEEKDAY|EVENT) WEEKDAY = Sunday, Monday, Tuesday,EVENT=party,
6、celebration, birthday.Clustering may lead to good result for verylittle training data.,Smoothing and Clustering,Word Clustering Methods:1.Build them by hand.2.Part of Speech (POS) tags.3.Automatic Clustering:Swap words betweenclusters to minimize perplexity. Automatic Clustering: 1.top-down splittin
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- AMULTISPANLANGUAGEMODELINGFRAMEWORKFORSPEECHPPT

链接地址:http://www.mydoc123.com/p-377847.html