Automatic Document Indexing in Large Medical Collections.ppt
《Automatic Document Indexing in Large Medical Collections.ppt》由会员分享,可在线阅读,更多相关《Automatic Document Indexing in Large Medical Collections.ppt(25页珍藏版)》请在麦多课文档分享上搜索。
1、HIKM2006,AMTEx,Automatic Document Indexing in Large Medical Collections,Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University of Crete, Chania, GreeceEvangelos E. Milios Dalhousie University, Halifax, Canada,HIKM2006,AMTEx,Overview,The need for automatic assignment of
2、index terms in large medical collections MMTx (by the US NLM) The AMTEx approach to medical document indexing AMTEx resources: MeSH & C/NC value Experiments & evaluation Discussion and future research,HIKM2006,AMTEx,Motivation and Objectives,MeSH is a taxonomy of medical terms Subset of UMLS Metathe
3、saurus MEDLINE is indexed by MeSH terms (assigned by experts) Other medical texts need to be associated with MEDLINE, e.g. consumer medical literature Need for automatic assignment of MeSH terms to any medical text,HIKM2006,AMTEx,MMTx (MetaMap Transfer),Maps arbitrary text to UMLS Metathesaurus conc
4、epts:Parsing to extract noun phrases (syntactic analysis - linguistic filter)Variant Generation (uses SPECIALIST Lexicon)Candidate Retrieval (mapping process to Metathesaurus Concepts)Candidate Evaluation (criteria: centrality, variation, coverage, cohesiveness),HIKM2006,AMTEx,MMTx Example,Parsing S
5、hallow syntactic analysis of the input text Linguistic filtering: isolates noun phrases Variant Generatione.g. “obstructive sleep apnea” has variants: obstructive sleep apnea, sleep apnea, sleep, apnea, osa, Candidate RetrievalCandidate Metathesaurus concepts for the variant “osa” :osa osa antigen,o
6、sa osa gene productosa osa proteinosa obstructive sleep apnea Candidate EvaluationObstructive Sleep apnea 1000Sleep Apnea 901Apnea 827 Sleeping 793Sleepy 755,HIKM2006,AMTEx,MMTx limitations,MMTx focus on UMLS rather than MeSHBut MEDLINE indexing is based on MeSH Exhaustive variant generation: the in
7、itial phrase is iteratively expanded into all possible UMLS variantsterm overgenerationterm concept diffusionunrelated terms added to the final candidate list,HIKM2006,AMTEx,The AMTEx method,New method for automatic indexing of medical documents Main idea: Initial term extraction based on a hybrid l
8、inguistic/statistical approach, the C/NC value Extracts general single and multi-word terms Extracted terms are validated against MeSH,HIKM2006,AMTEx,x Outline,INPUT: Document Collection,C/NC value Multi-word Term Extraction & Term Ranking,MeSH Term Validation,Single-word Term Extraction Non-MeSH mu
9、lti-word are broken down & validated against MeSH,Variant Generation,Term Expansion (MeSH),MeSH Thesaurus Resource,OUTPUT: MeSH Term Lists,HIKM2006,AMTEx,MeSH: Medical Subject Headings,The NLM medical & biological terms thesaurus: Organized in IS-A hierarchies more than 15 taxonomies & more than 22,
10、000 terms a term may appear in multiple taxonomies No PART-OF relationships Terms organized into synonym sets called entry terms, including stemmed term forms,HIKM2006,AMTEx,Fragment of the MeSH IS-A Hierarchy,Root,Nervous system diseases,Neurologic manifestations,pain,headache,neuralgia,Cranial ner
11、ve diseases,Facial neuralgia,HIKM2006,AMTEx,The C/NC value method,Hybrid (linguistic / statistical) term extraction method Domain independent Specifically designed for the identification of multi-word and nested terms:compound & multi-word terms very common in biomedical domainmulti-word terms often
12、 used in indexing,HIKM2006,AMTEx,C-value,C-value: a phrase may be a term, if it often appears alone or within other candidate terms,otherwise,: candidate term f(): frequency T: set of candidate terms containing P(T): number of such terms,HIKM2006,AMTEx,NC-value,NC-value: a phrase is more likely a te
13、rm, if it often appears in specific word context,w: context word t(w): number of terms w appears with n: number of all terms f(w): frequency of w as context word of ,HIKM2006,AMTEx,AMTEx step 1: C/NC value Multi-word Term Extraction & Ranking,Part-of-Speech Tagging Linguistic filtering:N+ N (A|N)+ N
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- AUTOMATICDOCUMENTINDEXINGINLARGEMEDICALCOLLECTIONSPPT

链接地址:http://www.mydoc123.com/p-378745.html