The New Bill of Rights of Information Society.ppt
《The New Bill of Rights of Information Society.ppt》由会员分享,可在线阅读,更多相关《The New Bill of Rights of Information Society.ppt(42页珍藏版)》请在麦多课文档分享上搜索。
1、1,The New “Bill of Rights” of Information Society,Raj Reddy and Jaime Carbonell Carnegie Mellon University March 23, 2006 Talk at Google,2,New Bill of Rights,Get the right information e.g. search engines To the right people e.g. categorizing, routing At the right time e.g. Just-in-Time (task modelin
2、g, planning) In the right language e.g. machine translation With the right level of detail e.g. summarization In the right medium e.g. access to information in non-textual media,3,Relevant Technologies,“right information” “right people” “right time” “right language” “right level of detail” “right me
3、dium”,search engines classification, routing anticipatory analysis machine translation summarization speech input and output,4,“right information” Search Engines,5,The Right Information,Right Information from future Search Engines How to go beyond just “relevance to query” (all) and “popularity” Eli
4、minate massive redundancy e.g. “web-based email” Should not result in multiple links to different yahoo sites promoting their email, or even non-Yahoo sites discussing just Yahoo-email. Should result ina link to Yahoo email, one to MSN email, one to Gmail, one that compares them, etc. First show tru
5、sted info sources and user-community-vetted sources At least for important info (medical, financial, educational, ), I want to trust what I read, e.g., For new medical treatments First info from hospitals, medical schools, the AMA, medical publications, etc. , andNOT from Joe Shmos quack practice pa
6、ge or from the National Enquirer. Maximum Marginal Relevance Novelty Detection Named Entity Extraction,6,Beyond Pure Relevance in IR,Current Information Retrieval Technology Only Maximizes Relevance to Query What about information novelty, timeliness, appropriateness, validity, comprehensibility, de
7、nsity, medium,.? Novelty is approximated by non-redundancy! we really want to maximize: relevance to the query, given the user profile and interaction history, P(U(f i , ., f n ) | Q & C & U & H)where Q = query, C = collection set,U = user profile, H = interaction history .but we dont yet know how.
8、Darn.,7,query,documents,MMR,IR,Standard IR,Maximal Marginal Relevance vs. Standard Information Retrieval,8,Find the first report of a new event (Unconditional) Dissimilarity with Past Decision threshold on most-similar story (Linear) temporal decay Length-filter (for teasers) Cosine similarity with
9、standard weights:,Novelty Detection,9,New First Story Detection Directions,Topic-conditional models e.g. “airplane,” “investigation,” “FAA,” “FBI,” “casualties,” topic, not event “TWA 800,” “March 12, 1997” event First categorize into topic, then use maximally-discriminative terms within topic Rely
10、on situated named entities e.g. “Arcan as victim,” “Sharon as peacemaker”,10,Link Detection in Texts,Find text (e.g. Newstories) that mention the same underlying events. Could be combined with novelty (e.g. something new about interesting event.) Techniques: text similarity, NEs, situated NEs, relat
11、ions, topic-conditioned models, ,11,Purpose: to answer questions such as: Who is mentioned in these 100 Society articles? What locations are listed in these 2000 web pages? What companies are mentioned in these patent applications? What products were evaluated by Consumer Reports this year?,Named-En
12、tity identification,12,President Clinton decided to send special trade envoy Mickey Kantor to the special Asian economic meeting in Singapore this week. Ms. Xuemei Peng, trade minister from China, and Mr. Hideto Suzuki from Japans Ministry of Trade and Industry will also attend. Singapore, who is ho
13、sting the meeting, will probably be represented by its foreign and economic ministers. The Australian representative, Mr. Langford, will not attend, though no reason has been given. The parties hope to reach a framework for currency stabilization.,Named Entity Identification,13,Finite-State Transduc
14、ers w/variables Example output:FNAME: “Bill” LNAME: “Clinton” TITLE: “President”FSTs Learned from labeled data Statistical learning (also from labeled data) Hidden Markov Models (HMMs) Exponential (maximum-entropy) models Conditional Random Fields Lafferty et al,Methods for NE Extraction,14,Extracte
15、d Named Entities (NEs)People Places President Clinton Singapore Mickey Kantor Japan Ms. Xuemei Peng China Mr. Hideto Suzuki Australia Mr. Langford,Named Entity Identification,15,Motivation: It is useful to know roles of NEs: Who participated in the economic meeting? Who hosted the economic meeting?
16、Who was discussed in the economic meeting? Who was absent from the the economic meeting?,Role Situated NEs,16,Emerging Methods for Extracting Relations,Link Parsers at Clause Level Based on dependency grammars Probabilistic enhancements Lafferty, Venable Island-Driven Parsers GLR* Lavie, Chart Nyber
17、g, Placeway, LC-Flex Rose Tree-bank-trained probabilistic CF parsers IBM, Collins Herald the return of deep(er) NLP techniques. Relevant to new Q/A from free-text initiative. Too complex for inductive learning (today).,17,Example: (Who does What to Whom)“John Snell reporting for Wall Street. Today F
18、lexicon Inc. announced a tender offer for Supplyhouse Ltd. for $30 per share, representing a 30% premium over Fridays closing price. Flexicon expects to acquire Supplyhouse by Q4 2001 without problems from federal regulators“,Relational NE Extraction,18,Useful for relational DB filling, to prepare d
19、ata for “standard” DM/machine-learning methodsAcquirer Acquiree Sh.price Year _Flexicon Logi-truck 18 1999Flexicon Supplyhouse 30 10 2000. . . .,Fact Extraction Application,19,“right people” Text Categorization,20,The Right People,User-focused search is key If a 7-year old is working on a school p
20、roject taking good care of ones heart and types in “heart care”, she will want links to pages like “You and your friendly heart”, “Tips for taking good care of your heart”, “Intro to how the heart works” etc. NOT the latest New England Journal of Medicine article on “Cardiological implications of im
21、muo-active proteases”. If a cardiologist issues the query, exactly the opposite is desired Search engines must know their users better, and the user tasks Social affiliation groups for search and for automatically categorizing, prioritizing and routing incoming info or search results. New machine le
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- THENEWBILLOFRIGHTSOFINFORMATIONSOCIETYPPT
