The World Wide Telescope a Digital Library Prototype.ppt
《The World Wide Telescope a Digital Library Prototype.ppt》由会员分享,可在线阅读,更多相关《The World Wide Telescope a Digital Library Prototype.ppt(32页珍藏版)》请在麦多课文档分享上搜索。
1、The World Wide Telescope a Digital Library Prototype,Jim Gray, Microsoft Research Alex Szalay, Johns Hopkins University,Talk at OCLC Dublin, OH, 17 May 2004 http:/ Model of Library Science ,Alexandria Gutenberg(Melvil) Dewey Decimal MARC (Henriette Avram) Dublin Core,Yes, I know there have been othe
2、r things.,Dublin Core,Elements Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Coverage Rights,Elements+ Audience Alternative TableOfContents Abstract Created Valid Available Issued Modified Extent Medium IsVersionOf HasVersion IsReplacedBy Replace
3、s IsRequiredBy Requires IsPartOf HasPart IsReferencedBy References IsFormatOf HasFormat ConformsTo Spatial Temporal Mediator DateAccepted DateCopyrighted DateSubmitted EducationalLevel AccessRights BibliographicCitation,Encoding LCSH (Lb. Congress Subject Head) MESH (Medical Subject Head) DDC (Dewey
4、 Decimal Classification) LCC (Lb. Congress Classification) UDC (Universal Decimal Classification) DCMItype (Dublin Core Meta Type) IMT (Internet Media Type) ISO639-2 (ISO language names) RFC1766 (Internet Language tags) URI (Uniform Resource Locator) Point (DCMI spatial point) ISO3166 (ISO country c
5、odes) Box (DCMI rectangular area) TGN (Getty Thesaurus of Geo Names) Period (DCMI time interval) W3CDTF (W3C date/time) RFC3066 (Language dialects)Types Collection Dataset Event Image InteractiveResouce Service Software Sound Text PhysicalObject StillImage MovingImage,Thanks!,Whats Happening?,We are
6、 drowning in information Single fixed hierarchy is hopeless Cant organize/find things in a simple tree HOPE: “schematized storage” Objects have “Dublin-like” facets Most facets acquired automatically (email, photo, doc,) Users add annotations and relationships Librarians call this accession Automate
7、 accession as much as possible Folders/directories are standing queries Organization is “search based” demo sis. Interesting (public) research projects Stuff Ive Seen: http:/ MyLifebits: http:/ Longhorn product embraces & extends these ideas.,The World Wide Telescope a Digital Library Prototype,Jim
8、Gray, Microsoft Research Alex Szalay, Johns Hopkins University,Talk at OCLC Dublin, OH, 17 May 2004 http:/ what about the talk I promised you?,The Talk,Libraries morphing to integrated text + data (you know that) Dublin Core is bedrock, but many issues remain. (you know that) WWT: All Astronomy data
9、 and literature online and integrated Problems Librarians have grappled with for centuries: curation, preservation, indexing, access, summarization. Overview of the World-Wide Telescope as a digital library Focus on metadata, schema, curation, and preservation Candidly, we have more problems than so
10、lutions, but the data is arriving and we are doing the best we can.,New Science Paradigms,Thousand years ago: science was empiricaldescribing natural phenomena Last few hundred years: theoretical branchusing models, generalizations Last few decades: a computational branchsimulating complex phenomena
11、 Today: data exploration (eScience)synthesizing theory, experiment and computation with advanced data management and statistics,The Big Picture,Experiments & Instruments,Simulations,facts,facts,answers,questions,Data ingest Managing a petabyte Common schema How to organize it? How to reorganize it H
12、ow to coexist with others,Data Query and Visualization tools Support/training Performance Execute queries in a minute Batch (big) query scheduling,?,The Big Problems,Literature,Other Archives,facts,facts,The Virtual Observatory,Premise: most data is (or could be online) The Internet is the worlds be
13、st telescope: It has data on every part of the sky In every measured spectral band: optical, x-ray, radio As deep as the best instruments (2 years ago). It is up when you are up The “seeing” is always great Its a smart telescope: links objects and data to literature Software is the capital expense S
14、hare, standardize, reuse,Why Is Astronomy Special?,Almost all literature online and public ADS: http:/adswww.harvard.edu/ CDS: http:/cdsweb.u-strasbg.fr/Data has no commercial valueNo privacy concerns, freely share results with othersGreat for experimenting with algorithmsIt is real and well documen
15、tedHigh-dimensional (with confidence intervals)Spatial, temporalDiverse and distributedMany different instruments from many different places and many different timesThe community wants to share the dataThere is a lot of it (soon petabytes),IRAS 100m,ROSAT keV,DSS Optical,2MASS 2m,IRAS 25m,NVSS 20cm,
16、WENSS 92cm,GB 6cm,Like all sciences, Astronomy Faces an Information Avalanche,Astronomers have a few hundred TB now 1 pixel (byte) / sq arc second 4TB Multi-spectral, temporal, 1PB They mine it looking for new (kinds of) objects or more of interesting ones (quasars), density variations in 400-D spac
17、e correlations in 400-D space Data doubles every year Data is public after 1 year So, 50% of the data is public Same access for everyone,Publishing Data,Exponential growth: Projects last at least 3-5 years Data sent upwards only at the end of the project Data will never be centralized More responsib
18、ility on projects Becoming Publishers and Curators Data will reside with projects Analyses must be close to the data,How to Publish Data: Web Services,Web SERVER: Given a url + parameters Returns a web page (often dynamic) Web SERVICE: Given a XML document (soap msg) Returns an XML document (with sc
19、hema) Tools make this look like an RPC. F(x,y,z) returns (u, v, w) Distributed objects for the web. + naming, discovery, security, Internet-scale distributed computing,Your program,Data In your address space,Web Service,soap,object in xml,Your program,Web Server,http,Web page,The Core Problem: No Ec
20、onomic Model,The archive user has not yet been born. How can he pay you to curate the data? Q: The Scientist gathered data for his own purpose. Why should he pay (invest time) for your needs? A: thats the scientific method Curating data (documenting the design, the acquisition, and the processing) i
21、s very hard and there is no reward for doing it. Results are rewarded, not the process of getting them. Storage/archive NOT the problem (its almost free) Curating/Publishing is expensive. Better standards & tools lower costs,Data Inflation Data Pyramid,Level 1A Grows 5TB pixels/year growing to 25TB
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- THEWORLDWIDETELESCOPEADIGITALLIBRARYPROTOTYPEPPT

链接地址:http://www.mydoc123.com/p-373122.html