Introduction to Stanford DB Group Research.ppt
《Introduction to Stanford DB Group Research.ppt》由会员分享,可在线阅读,更多相关《Introduction to Stanford DB Group Research.ppt(49页珍藏版)》请在麦多课文档分享上搜索。
1、1,Introduction to Stanford DB Group Research,Li Ruixuan http:/ ,2,Contents,Introduction Past projects Current projects Events References Links,3,The Stanford Database Group,“Mainstream” faculty Hector Garcia-Molina Jennifer Widom Jeff Ullman Gio Wiederhold “Adjunct” faculty Chris Manning (natural la
2、nguage processing) Rajeev Motwani (theory) Terry Winograd (human-computer interaction) A.k.a. Stanford InfoLab,4,Database Group (contd),Approximately 25 Ph.D. students Varying numbers of M.S. and undergraduate students Handful of visitors One senior research associate One systems administrator, one
3、programmer Excellent administrative staff Resident photographer,5,Research Areas (very coarse),Digital libraries Peer-to-peer systems Data streams Replication, caching, archiving, broadcast, The Web Ontologies, semantic Web Data mining Miscellaneous,6,Past Projects,LIC: Large-Scale Interoperation an
4、d Composition (1999) mediator (SKC, OntoWeb, CHAIMS, SmiQL, image DB) SKC: Scalable Knowledge Composition (2000) - semantic heterogeneity TID: Trusted Image Distribution (2001) - Image Filtering for Secure Distribution of Medical Information Image Database: Content-based Image Retrieval (2003) SimQL
5、:Simulation Access Language (2001) - Software modules in manufacturing, acquisition, and planning systems,7,Past Projects (contd),TSIMMIS: Wrapping and mediation for heterogenous information sources (1998) Lore: A Database Management System for XML (2000) WHIPS: WareHouse Information Prototype at St
6、anford (1998) - Data warehouse creation and maintenance MIDAS: Mining Data at Stanford (1999) WSQ: Web-Supported Queries (2000) - Integrating database queries and Web searches,8,Current Projects,WebBase: Crawling, storage, indexing, and querying of large collections of Web pages. (Molina) STREAM: A
7、Database Management System for Data Streams (Widom) Peers: Building primitives for peer-to-peer systems (Molina) Digital Libraries: Interoperating on-line services for end-user support (TID,WebBase,OntoAgents) (Molina) TRAPP: Approximate data caching: trading precision for performance (Widom) CHAIMS
8、: Compiling High-level Access Interfaces for Multi-site Software (1999) (Wiederhold) OntoAgents: Ontology based Infrastructure for Agents (2002) (Wiederhold),9,WebBase: Objectives,Provide a storage infrastructure for Web-like content Store a sizeable portion of the Web Enable researchers to easily b
9、uild indexes of page features across large sets of pages Distribute Webbase content via multicast channels Support structure and content-based querying over the stored collection,10,WebBase: Architecture,11,WebBase: Current Status,Efficient “smart” crawler Parallelism Freshness & Relevance Efficient
10、 and scalable indexing Distributed Web-scale content indexes Indexes over graph structure Unicast dissemination Within Stanford External clients: Columbia, U.Wash, U.C.Berkeley,12,WebBase: In Progress,WebBase Infrastructure Multicast dissemination Complex queries Other work PageRank extensions Clust
11、ering and similarity search Structured data extraction Hidden Web crawling,13,Data Streams: Motivation,Traditional DBMS - data stored in finite, persistent data sets New applications - data as multiple, continuous, rapid, time-varying data streams Network monitoring and traffic engineering Security
12、applications Telecom call records Financial applications Web logs and click-streams Sensor networks Manufacturing processes,14,STREAM: Architecture,15,STREAM: Challenges,Multiple, continuous, rapid, time-varying streams of data Queries may be continuous (not just one-time) Evaluated continuously as
13、stream data arrives Answer updated over time Queries may be complex Beyond element-at-a-time processing Beyond stream-at-a-time processing,16,DBMS versus DSMS,Persistent relations One-time queries Random access Access plan determined by query processor and physical DB design “Unbounded” disk store,T
14、ransient streams (and persistent relations) Continuous queries Sequential access Unpredictable data arrival and characteristics Bounded main memory,17,STREAM: Current Status,Data streams and stored relations Declarative language for registering continuous queries Flexible query plans Designed to cop
15、e with high data rates and query workloads Graceful approximation when needed Careful resource allocation and usage Relational, centralized (for now),18,STREAM: Ongoing Work,Algebra for streams Semantics for continuous queries Synopses and algorithmic issues Memory management issues Exploiting const
16、raints on streams Approximation in query processing Distributed stream processing System development,19,STREAM: Related Work,Amazon/Cougar (Cornell) sensors Aurora (Brown/MIT) sensor monitoring, dataflow Hancock (AT&T) telecom streams Niagara (OGI/Wisconsin) Internet XML databases OpenCQ (Georgia) t
17、riggers, incr. view maintenance Stream (Stanford) general-purpose DSMS Tapestry (Xerox) pub/sub content-based filtering Telegraph (Berkeley) adaptive engine for sensors Tribeca (Bellcore) network monitoring,20,Peer-To-Peer Systems,Multiple sites (at edge) Distributed resources Sites are autonomous (
18、different owners) Sites are both clients and servers Sites have equal functionality,21,P2P Benefits,Pooling available (inexpensive) resourcesHigh availability and fault-toleranceSelf-organization,22,P2P Challenges,Search Query Expressiveness Comprehensiveness Topology Data Placement Message Routing
19、Resource Management fairness load balancing,Security & Privacy Anonymity Reputation Accountability Information Preservation Information Quality Trust Denial of service attacks,23,Peers: Stanford Research,New Architectures Performance Modeling and Optimization Security and Trust Distributed Resource
20、Management Applications,24,Digital Library Project: Overview,25,DigLib Projects: DLI1,DLI2,Resource Discovery Retrieving Information Interpreting Information Managing Information Sharing Information,26,DigLib: Resource Discovery,Geographic Views (Tools to assist you in more systematically locating d
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- INTRODUCTIONTOSTANFORDDBGROUPRESEARCHPPT
