The Energy Data Collection Project.ppt
《The Energy Data Collection Project.ppt》由会员分享,可在线阅读,更多相关《The Energy Data Collection Project.ppt(152页珍藏版)》请在麦多课文档分享上搜索。
1、1,The Energy Data Collection Project,2,The Vision: Ask the Government.,How have property values in the area changed over the past decade?,How many people had breast cancer in the area over the past 30 years?,Is there an orchestra? An art gallery? How far are the nightclubs?,Were thinking of moving t
2、o Denver.What are the schools like there?,3,The Vision: Ask the Government.,Are alternative energy sources any cheaper to use?,Which state has the highest oil production?,How long has the nuclear plant been in service?,Were thinking of moving to CambridgeHow much does gas cost there?,4,The problem a
3、nd the solution,Solution: Create a system to provide easy standardized access: need multi-database access engine, need powerful user interface, need terminology standardization mechanism.,Problem:FedStats has thousands of databases in over seventy Government agencies: data is duplicated and near-dup
4、licated, even Government officials and specialists cannot find it,5,The purpose of DGRC,To Make Digital Government HappenAdvance information systems research Bring the benefits of cutting edge IS research to government systems Help educate government and the community Learn needs from government par
5、tners to drive next stage system development Built pilot systems as part of new infrastructure,6,Research challenges,Scale to incorporate many databases build data models automaticallyProcess large and disparate data efficiently develop fast processing techniques create aggregation and substitution
6、operatorsIntegrate data models across sources and agencies take a large ontology and link the models into it automatically Incorporate additional information that is available from text use language processing tools to extract it Display complex information from distributed sources develop and evalu
7、ate new presentation techniques,7,System Architecture,Sources,Construction phase:Deploy DBsExtend ontol.,Text,Tables,Data,8,Columbias Team Approach,User Interface Year One: Hatzivassiloglou, Sandhaus Year Two: Feiner, Temiyabutr Database Aggregation Year One: Gravano, Singla Year Two: Ross, Zaman Au
8、tomatic Inter-Agency Ontologies Years One and Two: Klavans, Whitman,9,System interface Year One Progress,Components: 1. Query formation 2. Ontology/glossary browsing for concept navigation 3. Answer display, interaction historyGUI incorporates key technologies for facilitating user access to diverse
9、 databases: Context-sensitive menu-based input mechanism Visualization and navigation of results and the ontology Lightweight client runs on multiple platforms without downloads Java/Swing implementation allows client-side processing,Vasileios Hatzivassiloglou Jay Sandhaus,10,Information Aggregation
10、 Yr. 1 Progress,Problem: Data is not in exactly the form the user needs (monthly, not annually; actual values, not averaged) Solution: Attempt to provide unified view of data of various granularities: time period geographical region product Example over BLS data: View: monthly data available for all
11、 geographical regions Query: monthly prices for LA in 1979 Answer: yearly price for LA in 1979,Luis Gravano Anurag Singla,11,Aggregation challenges,Different coverage along these dimensions across data sets Users see a simple, unified view of the data; if a query cannot be answered, we answer the cl
12、osest query that we have data for Answers are always exact Key challenges: defining query proximity (default vs. user-specific) communicating query relaxation to users defining and navigating the space of answerable queries efficiently,12,Extracting and Structuring Information from Definitions Yr 1,
13、Problems: Proliferation of terms in domain Agencies define terms differently Many refer to the same or related entity Lengthy and dense term definitions often contain important information which is buried,Judith Klavans Brian Whitman,13,Glossary analysis framework,Extract ontological information app
14、lying language sensitive analysis tools Structure and deliver to ISI for access and display Based on past projects: analysis of definitions in machine-readable dictionaries Original domain specific glossaries,Gather glossaries, thesauri, definitions from govt agencies Create framework into which tex
15、t will be analyzed,14,DGRC-EDC Plans for Year Two,User Interface Incorporate new presentation approaches Link ontology access mechanisms to query input Incorporate other DG research (Marchionini) DatabaseIntegrate existing aggregation prototypeMain memory for fast performance Lexical Knowledge Bases
16、 Incorporate into SENSUS Add web crawler to extend coverage Develop mechanisms to merge definitions,15,End of Part I : DGRC EDC,Reviewed goals of DGRC Energy Data Collection Project Showed first year progress Gave early second year results Presented Columbias team approach Set out future goalsBut wh
17、at is next?,16,Next Steps for DGRC Growth,Ambitious two-pronged plan,Additional Funding For DGRC TRADE (NSF),Independent Foundation Funding (leverage NSF Investment),17,One Facet: From DGRC-EDC to DGRC-TRADE,Builds on past successes Brings in a new domain trade data Adds three new enhancements User
18、Needs and Evaluation Electronic Data Service at Columbia Users and Experts to test usefulness and usability Database incorporate cross data set aggregation Ontology add multilingual capability,18,Data Integration,Labor,EPA,EIA,Census,Heterogeneous Data Sources,User Interface,Information Access,Defin
19、ition Ontology,query,19,Data Integration,Labor,EPA,EIA,Census,Heterogeneous Data Sources,User Interface,Information Access,Definition Ontology,Trade,Main Memory Query Processing,Multilingual Access,User Evaluation,Task-based Evaluation,query,20,Columbias Electronic Data Service,Established to serve
20、social science researchers Operational unit of the Libraries Excellent relationship with faculty, staff and students Capable of supporting many levels of development and testing Evaluation effort led by Walter Bourne,21,Partners DGRC Trade,Evaluation experts from the US and Canada Cognitive evaluati
21、on User needs evaluation User interface evaluation Social scientists ISERP and CIESEN at Columbia Public Health Policy research,22,Facet Two: Building the DGRC,Seek substantial Foundation support Pursue a large vision Involvement of high level Columbia and ISI administration Gather an advisory board
22、 to develop a sustainable plan,23,What do we need from the NSF?,1. Information Ways to interact with portals E.g. Private companies delivering (free) government data 2. Contacts Leverage peer-review process of NSF to establish key contacts,24,To Sum,DGRC Energy Data Collection (EDC) Progress from Y
23、ear One Plans and early results from Year Two Larger Plans for Growing DGRC Trade Proposal NSF Plans for other funding,25,Todays Plan: Focus on DGRC-EDC,Major research challenges: Building and structuring the ontology Automated data aggregation Presentation of complex information Major practical cha
24、llenges: Getting more data into the system Understanding users needs,26,Thank you! Any questions?,27,Information Integration: Heterogeneity in Aggregation Luis Gravano Assistant Professor, Columbia U. (joint work with Anurag Singla and Vasilis Vassalos),28,Information Integration,Goal: To Provide Si
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- THEENERGYDATACOLLECTIONPROJECTPPT
