欢迎来到麦多课文档分享! | 帮助中心 海量文档,免费浏览,给你所需,享你所想!
麦多课文档分享
全部分类
  • 标准规范>
  • 教学课件>
  • 考试资料>
  • 办公文档>
  • 学术论文>
  • 行业资料>
  • 易语言源码>
  • ImageVerifierCode 换一换
    首页 麦多课文档分享 > 资源分类 > PPT文档下载
    分享到微信 分享到微博 分享到QQ空间

    A Machine Learning Approach to Coreference Resolution of Noun .ppt

    • 资源ID:373161       资源大小:236.50KB        全文页数:16页
    • 资源格式: PPT        下载积分:2000积分
    快捷下载 游客一键下载
    账号登录下载
    微信登录下载
    二维码
    微信扫一扫登录
    下载资源需要2000积分(如需开发票,请勿充值!)
    邮箱/手机:
    温馨提示:
    如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如需开发票,请勿充值!如填写123,账号就是123,密码也是123。
    支付方式: 支付宝扫码支付    微信扫码支付   
    验证码:   换一换

    加入VIP,交流精品资源
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    A Machine Learning Approach to Coreference Resolution of Noun .ppt

    1、A Machine Learning Approach to Coreference Resolution of Noun Phrases,By W.M.Soon, H.T.Ng, D.C.Y.Lim Presented by Iman Sen,Outline,Introduction Process Overview Pipeline Process to find Markables Feature Selection The Decision Tree Results for MUC-6, MUC-7 & error analysis Conclusions,Introduction,C

    2、oreference for general noun phrases from unrestricted text. Learns using the decision tree method from a small annotated corpus. First learning based system that performed comparably with the best non-learning systems.,Process Overview,Markables are the union of all the noun phrases, named entities

    3、and nested noun phrases found. Find markables using a pipeline of NLP modules Form feature vectors for appropriate pairs of markables. These are the training examples. Train the decision tree classifier on these examples. For testing, determine pairs of markables in test document and present to the

    4、classifier. Stop after first successful coreference.,Tokenization & Sentence Segmentation,Morphological Processing,Free Text,POS tagger,NP Identification,Named Entity Recognition,Nested Noun Phrase Extraction,Semantic Class Determination,Markables,Pipelined NLP modules,Standard HMM based tagger,HMM

    5、Based, uses POS tags from previous module,HMM based, recognizes organization, person, location, date, time, money, percent,2 kinds: prenominals such as (wage) reduction) and possessive NPs such as (his) dog).,More on this in a bit!,Determining the Markables for training,Sentence 1 1. (Eastern Airlin

    6、es)a2 executives notified (union)el leaders that the carrier wishes to discuss selective ( (wage)c2 reductions)d2 on (Feb. 3)b2. 2. (Eastern Airlines)5 executives)6 notified ( (union)7 leaders)8 that (the carrier)9 wishes to discuss (selective (wage)10 reductions)11 on (Feb. 3)12. Sentence 2 1. ( (U

    7、nion)e2 representatives who could be reached)f1 said (they)f2 hadnt decided whether (they)f3 would respond. 2. ( (Union)13 representatives)14 who could be reached said (they)15 hadnt decided whether (they)16 would respond.The first version of each sentence is the manual coreference annotation, the s

    8、econd is the result of the pipeline modules. The letters in the 1st sentence denote coreference chains We make up pairs (i, j) as training examples We take only those NPs in a coreference chain where the NP boundaries match (shown in blue).,Determining the markables for training continued,In general

    9、, if a1, a2, a3 is a coreference chain correctly identified, then make up (a1,a2), (a2,a3) as +ve examples, and for all NPs found in between, say, a2 & a3, called e, make up ve examples (e, a3). Then a feature vector is generated for each pair,Markables for testing,For testing, every antecedent i, b

    10、efore j, is tried. Start with the immediate preceding i, and go backwards. Stop when you find the first +ve coreference. For nested NPs, we avoid the current markable.For example, in (his) daughter), we do not try to see if “his” corefers to “his daughter”.,Feature Selection,The authors selected the

    11、 following 12 features: Distance Feature (DIST): If (i,j) are in the same sentence then equal to 0, if one sentence apart, then equal to 1 and so on. i-Pronoun Feature (I_PRONOUN): Values are true or false. Return true if i in (i , j) is a pronoun. j-Pronoun Feature (J_PRONOUN): Tests if j is a pron

    12、oun in (i,j). String Match Feature (STR_MATCH): Returns true or false. Removes articles and demonstrative pronouns (such as “that”, “those”, etc) and tests for a match. Definite NP Feature (DEF_NP): If j starts with “the” return true, else false. Demonstrative Noun Phrase Feature (DEM_NP): If j star

    13、ts with “this, that, these, those” then return true, else false. Number Agreement Feature (NUMBER): Morphological root is used to determine if noun is singular or plural (if not a pronoun), returns true or false.,Feature Selection continued,Semantic Class Agreement Feature (SEMCLASS): returns true,

    14、false or unknown. Classes are “male, female, person, organization, location, date, time, money, percent, object”. Decided by the semantic module (pick 1st sense from WordNet), and is true if same or child of the other. For ex, male, female are persons, the others are objects. If either is unknown, c

    15、ompare head nouns, and if same, return true. Gender Agreement Feature (GENDER): derive from “Mr.,Mrs.” or “he, she”. If names not referred to with one of above, then look up database of common names. Gender of objects is “neutral”. Unknown classes will have “unknown” gender. Return true is gender ma

    16、tches. Both Proper Names Feature (PROPER_NAME): Look at capitalization and return true or false. Alias Feature (ALIAS): return true for aliases. For “persons”, last names are compared. For “dates”, day, month , year is extracted. For “organizations”, acronyms are checked. Appositive Feature (APPOSIT

    17、IVE): if j is in apposition to i, return true. Check for (absence of) verbs and proper punctuation (like “,”).,A Training Example,For each markable pair, a feature vector is derived and this constitutes a training example. Sentence: Separately, Clinton transition officials said that Frank Newman, 50

    18、, vice chairman and chief financial officer of BankAmerica Corp., is expected to be nominated as assistant Treasury secretary for domestic finance.Feature vector of the markable pair (i = Frank Newman, j = vice chairman).DIST 0 i and j are in the same sentence I_PRONOUN - i is not a pronoun J_PRONOU

    19、N - j is not a pronoun STR_MATCH - i and j do not match DEF_NP - j is not a definite noun phrase DEM_NP - j is not a demonstrative noun phrase NUMBER + i and j are both singular SEMCLASS 1 i and j are both persons (unknown is 2) GENDER 1 i and j are both males PROPER_NAME - Only i is a proper name A

    20、LIAS - j is not an alias of i APPOSITIVE + j is in apposition to i,The Decision Tree,The decision tree learning algorithm used is C5, an updated version of C4.5(Quinlan 1993). Basic idea is to pick a feature, split the training set into subsets based on the different values of the feature. If subset

    21、 consists of instances from the same class (after pruning), stop, else split on a different feature. The feature with the greatest information gain is picked as the next feature to split on. Information gain is measured in terms of entropy, and in this case the feature that will yield the lowest pos

    22、sible entropy is selected.,Example: “(Ms.Washington)s candidacy is being championed by (several powerful lawmakers) including (her) boss).”Feature set: DIST SEMCLASS NO. GENDER PROPER_NAME ALIAS J_PRON DEF_NP DEM_NP STR_MATCH APPOSITIVE I_PRON (0 1 + 1 - - + - - - - -) Does (Ms. Washington, her) cor

    23、efer?,The Decision Tree,STR_MATCH,+,J_PRONOUN,+,-,+,-,APPOSITIVE,+,-,+,ALIAS,+,-,+,-,GENDER,0,-,2,-,1,I_PRONOUN,+,-,DIST,=0,0,-,NUMBER,+,-,+,-,Note: Only 8 out of 12 features are used in the final tree,Results,MUC-6: Recall 58.6%, Precision 67.3%, F-Measure: 62.6%. Pruning set at 20%, min. no. of in

    24、stances set at 5 MUC-7: Recall 56.1%, Precision 65.5%, F-Measure: 60.4%.Pruning set at 60%, min. no. of instances set at 2. Results about 3rd or 4th amongst the best MUC-6 and MUC-7 systems Errors inherited from the pipeline NLP modules: POS tagger (96%), Named Entity Recognizer ( only 88.9%), and N

    25、P identification (about 90%) . Overall, in one test of 100 MUC annotated documents, achieved about 85% accuracy.,Error Analysis (on 5 random documents from MUC-6),The types and frequencies of errors that affect precision. Types of Errors Causing Spurious Links Frequency % Prenominal modifier string

    26、match 16 42.1% Strings match but noun phrases refer to 11 28.9%different entities Errors in noun phrase identification 4 10.5% Errors in apposition determination 5 13.2% Errors in alias determination 2 5.3%,The types and frequencies of errors that affect recall. Types of Errors Causing Missing Links

    27、 Frequency % Inadequacy of current surface features 38 63.3% Errors in noun phrase identification 7 11.7% Errors in semantic class determination 7 11.7% Errors in part-of-speech assignment 5 8.3% Errors in apposition determination 2 3.3% Errors in tokenization 1 1.7%,Conclusions,Very good results (c

    28、omparatively) for a relatively simple set of features. The 3 most important features were STR_MATCH, APPOSITIVE & ALIAS (discovered by training & testing with just these features). In fact, these 3 features account for 60.3%, 59.4% of the F-measure for MUC-6, MUC-7 respectively. Which means the othe

    29、r 9 features contribute only 2.3%(for MUC-6) and 1% for MUC-7. Some reasons why it performed better than the only comparable system in MUC(RESOLVE from UMass) are: Higher recall using the larger no. of semantic classes.The 3 crucial features (RESOLVE did not have the APPOSITIVE feature).Stopping at the first +ve coreference.,


    注意事项

    本文(A Machine Learning Approach to Coreference Resolution of Noun .ppt)为本站会员(王申宇)主动上传,麦多课文档分享仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文档分享(点击联系客服),我们立即给予删除!




    关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

    copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
    备案/许可证编号:苏ICP备17064731号-1 

    收起
    展开