Digital Libraries and the Semantic WebA conceptual .ppt
《Digital Libraries and the Semantic WebA conceptual .ppt》由会员分享,可在线阅读,更多相关《Digital Libraries and the Semantic WebA conceptual .ppt(58页珍藏版)》请在麦多课文档分享上搜索。
1、Digital Libraries and the Semantic Web A conceptual framework and an agenda for research and practice Keynote presentation at ICSD 2009,Dagobert Soergel Department of Library and Information Studies Graduate School of Education University at Buffalo,Acknowledgments,Many of the ideas in this presenta
2、tion originated from a review of the papers submitted to the International Conference on the Semantic Web and Digital Libraries 2009 (ICSD 2009). So acknowledgments are due to all the paper authors.,2,Soergel, ICSD 2009 Keynote,DLs versus SW,Digital Libraries Manage, often large, collections of docu
3、ments and data sets and provide access to these resources and ideally tools to process them. Retrieval often based on words in text. Semantic Web Uses inference over a large distributed storehouse of propositional data, including ontologies, to - answer a question, - derive a problem solution, - dev
4、ise a plan of action.,3,Soergel, ICSD 2009 Keynote,DL SW,DL SW How can digital libraries support Semantic Web functionality? Generate propositional knowledge, including ontologies, from document corpora through information extraction or statistical methods SW DL How can Semantic Web technology impro
5、ve digital libraries? Use semantics to improve retrieval and presentation Towards unified systems Harmonize standards from DLs (and libraries generally) and SW, profiting from the thinking of both communities,4,Soergel, ICSD 2009 Keynote,Overview,Information extraction (and it use for ontology creat
6、ion) Semantically enriched documents Integrated store of documents, propositions, data sets Navigation in concept structures and document spaces Support for learning, sense making, tasks Schema and ontology creation and mapping,5,Soergel, ICSD 2009 Keynote,Information extraction,Text High blood pres
7、sure is a serious disease often caused by being overweight. In kids 4 12 it can be treated highly effectively with Nystatin Formal representation Causation (HighBloodPressure, Obesity) Treatment (HighBloodPressure, Human, Age, 4-12y, Nystatin, Effectiveness, 4),6,Soergel, ICSD 2009 Keynote,Answering
8、 questions,Question How can high blood pressure be prevented? Answer Loose weight?,7,Soergel, ICSD 2009 Keynote,Information extraction,Text Kids begin grazing independently from their mothers at three months Formal representation Separation (Mother, Child, Goat, Age, 3m),8,Soergel, ICSD 2009 Keynote
9、,Automatic information extraction,Find suitable documents or images Highly structured documents (such as dictionaries) and documents containing structured lists (such as a classification of life events) work well Recognize entities (concepts, named entities) Find the unique identifier for each (from
10、 some standard scheme) Noun phrase and verb phrase identification Word sense disambiguation, co-reference resolution Determine relationships, express propositions in formal representation Much of this requires syntactic and semantic parsing Also recognition or relationships from typographical arrang
11、ement Recognition of propositions not expressed in a single sentence Deal with negation and other qualifications. Certainty (as expressed in one source),9,Soergel, ICSD 2009 Keynote,Automatic information extraction,Add to proposition store If proposition already known, just add reference to source I
12、f proposition new, add proposition with its source Identify relationships between propositions (such as contradictions) Certainty (from information across sources, considering evidential strength of each source) Can label proposition as to general origin (language of source document, cultural origin
13、 of source document, scholarly / scientific school of source document) Knowledge in proposition store assists in IE from new documents,10,Soergel, ICSD 2009 Keynote,Computer-supported IE,Automatic information extraction is hard, need to supplement with human IE IE as part of document authoring or du
14、ring publishing Collaborative IE (crowdsourcing) Build systems that support the human task Make human IE and semantic enrichment by authors feasible Person edits results of automatic IE Person enters free-form proposition, system converts to formal representation, person checks Reconciliation of dif
15、ferences in results Computer-supported IE system should learn from changes made by human editor,11,Soergel, ICSD 2009 Keynote,Corpus-based information extraction,Find associations in a corpus Data mining over text corpora or numeric databases Finding connections between non-overlapping literatures,
16、pioneered by Don Swanson,12,Soergel, ICSD 2009 Keynote,Multilingual information extraction,Requires IE tools in multiple languages Creates proposition store from many sources Interesting experiment Document exists in two languages Apply IE to both versions and compare results,13,Soergel, ICSD 2009 K
17、eynote,IE for Ontology creation,Some extracted propositions can be used as elements of an ontology Discussed later,14,Soergel, ICSD 2009 Keynote,Semantic enrichment,15,Soergel, ICSD 2009 Keynote,A semantically enriched document,Reis et al. (2008) Impact of Environment and Social Gradient on Leptospi
18、ra infection in Urban Slums (doi:10.1371/journal.pntd.0000228). Infectious disease studied: Leptospirosis Pathogen (causative agent of disease): Leptospira spirochete Vector of disease pathogen: Rat (Rattus norvegicus) Pathogen host subjected to study: Human (Homo sapiens) Number of subject individu
19、als in study: 3,171 . . . Purpose of study: Quantify risk factors for leptospirosis . . . Principal finding 1: Prevalence of Leptospira antibodies . . . Principal finding 2: Disease risk . . .open sewers . . .,16,(http:/dx.doi.org/10.1371/journal.pntd.0000228.x002),Soergel, ICSD 2009 Keynote,A seman
20、tically enriched document,17,Soergel, ICSD 2009 Keynote,18,Soergel, ICSD 2009 Keynote,Semantically enriched documents,Semantic enrichment supports semantic retrieval Broad area of its own Many different forms Explicit document structure Concept and named entity tagging and identification Assigning a
21、dditional concepts or named entities Assigning extracted propositions Closely linked with information extraction IE produces elements of semantic enrichment,19,Soergel, ICSD 2009 Keynote,Semantic enrichment through document structure,On a broad level, a documents semantics can be made explicit simpl
22、y by the internal document structure Requires a document template or frame for the type of document Document Structure Ontology with templates / frames for many types of documents, including learning objects. Standards for digital objects Includes document formats such as MPEG or SCORM,20,Soergel, I
23、CSD 2009 Keynote,Template for a research report,1 Background (could also be called Problem)1.1 General problem area (often including a review of the literature)1.2 Specific problem. Purpose of the study, question to be answered2 Methods2.1 Discussion of the methods used in the study2.2 Description o
24、f the actual conduct of the study3 ResultsConclusions4.1 Summary of methods and results4.2 Relationship to existing body of knowledge.4.3 Implications for decision making and/or further research,21,Soergel, ICSD 2009 Keynote,Computer-supported IE,Automatic information extraction is hard, need to sup
25、plement with human IE IE as part of document authoring or during publishing Collaborative IE (crowdsourcing) Build systems that support the human task Make human IE and semantic enrichment by authors feasible Person edits results of automatic IE Person enters free-form proposition, system converts t
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- DIGITALLIBRARIESANDTHESEMANTICWEBACONCEPTUALPPT

链接地址:http://www.mydoc123.com/p-374354.html