An Overview of Databases for the Big Data Ecosystem.ppt
《An Overview of Databases for the Big Data Ecosystem.ppt》由会员分享,可在线阅读,更多相关《An Overview of Databases for the Big Data Ecosystem.ppt(35页珍藏版)》请在麦多课文档分享上搜索。
1、An Overview of Databases for the Big Data Ecosystem,Keith W. Hare JCC Consulting, Inc. September 20, 2016,1,09/20/2016,Copyright 2016, JCC Consulting, Inc.,Abstract,The ultimate goal of big data techniques is to be able to identify useful, usable information in a timely fashion actionable analytics
2、Prerequisites to producing actionable analytics are Ability to analyze lots of disparate data Ability to discover, access, store and retrieve lots of data This presentation provides an overview of data storage and retrieval in a big data ecosystem Focus on the characteristics, not the implementation
3、s Useful for understanding how the pieces should fit together Addresses the prerequisites not the end goal,09/20/2016,Copyright 2016, JCC Consulting, Inc.,2,Who am I?,Senior Consultant with JCC Consulting, Inc. since 1985 High performance database systems Replicating data between database systems SQ
4、L Standards committees since 1988 Convenor, ISO/IEC JTC1 SC32 WG3, since 2005 Vice Chair, ANSI INCITS DM32.2, since 2003 Vice Chair, INCITS Big Data Technical Committee since 2015 Education Muskingum College, 1980, BS in Biology and Computer Science Ohio State, 1985, Masters in Computer & Informatio
5、n Science,3,09/20/2016,Copyright 2016, JCC Consulting, Inc.,Topics,Why is “Big Data” Different? Big Data Buzzwords High Level View Data Distribution Integrating Data from Multiple Sources Data Query Languages Big Data Eco-system Products Summary“Lets do a deep dive in the Big Data and drill down unt
6、il we hyperlocalize some disruptive technologies.” (See http:/ 2016, JCC Consulting, Inc.,4,Why is “Big Data” Different?,Often defined in terms of 3 4 5 6 7 Vs: Volume exceed capacity of a single “computer” Velocity speed at which data is generated Variety new types of data Variability speed at whic
7、h data changes Veracity quality & provenance Visualization meaningful presentation Value actionable analytics Focus on primary data rather than extract, load, and transform (ETL) In many ways, “Big Data” is what we have always been doing, only bigger and more complex.,09/20/2016,Copyright 2016, JCC
8、Consulting, Inc.,5,Big Data: Driving Forces,Inexpensive storage of large volumes of data Inexpensive compute power Next Generation Analytics Moving from off-line to in-line embedded analytics Explaining what happened Predicting what will happen Operating on Data at rest stored someplace Data in moti
9、on streaming Multiple disparate data sources Look at available data and wonder what answers are hidden there,Copyright 2016, JCC Consulting, Inc.,6,09/20/2016,Big Data: Working Definition,Requirements cannot be met on a single computer Variety, Volume, Velocity, Variability, Availability Imprecise t
10、erms, but useful for understanding problem space All relative what was impossible yesterday is Big Data today and will be trivial tomorrow Distribute data storage to support volume & velocity Replicate data storage to provide availability Distribute processing Apply compute power in parallel Avoid m
11、oving data across the network move the answers,Copyright 2016, JCC Consulting, Inc.,7,09/20/2016,Data Volume How Big is Big?,Gigabyte 1000*3 Terabytes 1000*4 Petabytes 1000*5 Exabyte 1000*6 Zettabyte 1000*7 Yottabyte 1000*8 Brontobyte* 1000*9 Gegobyte* 1000*10,09/20/2016,Copyright 2016, JCC Consulti
12、ng, Inc.,8,*This terminology is still subject to change.,Big Data Buzzwords,NoSQL Databases Sharding Map-Reduce Schema-less New SQL,09/20/2016,Copyright 2016, JCC Consulting, Inc.,9,Big Data Buzzwords NoSQL,Originally did not include SQL Rejected complexity of SQL language Rejected overhead and limi
13、tations of SQL Databases Now Not Only SQL Turns out that SQL is a powerful language for specifying queries Potentially useful data storage and retrieval techniques,09/20/2016,Copyright 2016, JCC Consulting, Inc.,10,Sharding,Partitioning data across multiple servers Scaling out Once the data is shard
14、ed, send queries to data with Map Reduce,09/20/2016,Copyright 2016, JCC Consulting, Inc.,11,Big Data Buzzwords Map Reduce,Patented algorithm for: partitioning queries to run on multiple nodes in parallel Integrating the results Map Reduce details originally created by developer Operations can (and s
15、hould) be generated by database software,09/20/2016,Copyright 2016, JCC Consulting, Inc.,12,Big Data Buzzwords Schema-less,Reduce development time by eliminating up-front schema design Schema information still exists Embedded in the data Embedded in the code to support an API Pinned to a developers
16、wall Reinventing databases from the 1960s,09/20/2016,Copyright 2016, JCC Consulting, Inc.,13,Big Data Buzzwords New SQL,Combine powerful SQL query language with performance benefits of NoSQL databases Support ACID transactions,09/20/2016,Copyright 2016, JCC Consulting, Inc.,14,High level view,“Big D
17、ata” Data Types Data Storage Models When is data accessed? Data Distribution Integrating Data From Multiple Sources Variety of Data Sets/Sources Variety of Data Source Ownership Data query languages,09/20/2016,Copyright 2016, JCC Consulting, Inc.,15,“Big Data” Data Types,Traditional Data Types Chara
18、cter Numerical Date/Time/Timestamp Large Objects LOB/BLOB/CLOB “Big Data” Data Types Multi-dimensional arrays Images/video Documents Loosely formatted data Objects Spatial,Copyright 2016, JCC Consulting, Inc.,16,09/20/2016,Data Storage Models,Row Store Tabular Column Store Key Value Document XML JSO
19、N Java Script Object Notation BSON Binary JSON Graph Multi dimensional array Object,09/20/2016,Copyright 2016, JCC Consulting, Inc.,17,When is data accessed?,After being stored Before (or instead of) being stored Streaming data,09/20/2016,Copyright 2016, JCC Consulting, Inc.,18,Data Distribution,Sin
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- ANOVERVIEWOFDATABASESFORTHEBIGDATAECOSYSTEMPPT

链接地址:http://www.mydoc123.com/p-378327.html