HIVEData Warehousing Analytics on Hadoop.ppt
《HIVEData Warehousing Analytics on Hadoop.ppt》由会员分享,可在线阅读,更多相关《HIVEData Warehousing Analytics on Hadoop.ppt(19页珍藏版)》请在麦多课文档分享上搜索。
1、HIVE Data Warehousing & Analytics on Hadoop,Joydeep Sen Sarma, Ashish Thusoo Facebook Data Team,Why Another Data Warehousing System?,Problem: Data, data and more data 200GB per day in March 2008 back to 1TB compressed per day today The Hadoop Experiment Problem: Map/Reduce is great but every one is
2、not a Map/Reduce expert I know SQL and I am a python and php expert So what do we do: HIVE,What is HIVE?,A system for querying and managing structured data built on top of Map/Reduce and Hadoop We had: Structured logs with rich data types (structs, lists and maps) A user base wanting to access this
3、data in the language of their choice A lot of traditional SQL workloads on this data (filters, joins and aggregations) Other non SQL workloads,Data Warehousing at Facebook Today,Web Servers,Scribe Servers,Filers,Hive on Hadoop Cluster,Oracle RAC,Federated MySQL,HIVE: Components,HDFS,Hive CLI,DDL,Que
4、ries,Browsing,Map Reduce,MetaStore,Thrift API,SerDe,Thrift,Jute,JSON,Execution,Hive QL,Parser,Planner,Mgmt. Web UI,Data Model,Logical Partitioning,HashPartitioning,Schema,Library,clicks,HDFS,MetaStore,/hive/clicks,/hive/clicks/ds=2008-03-25,/hive/clicks/ds=2008-03-25/0,Tables,#Buckets=32 Bucketing I
5、nfo Partitioning Cols,Dealing with Structured Data,Type system Primitive types Recursively build up using Composition/Maps/Lists Generic (De)Serialization Interface (SerDe) To recursively list schema To recursively access fields within a row object Serialization families implement interface Thrift D
6、DL based SerDe Delimited text based SerDe You can write your own SerDe Schema Evolution,MetaStore,Stores Table/Partition properties: Table schema and SerDe library Table Location on HDFS Logical Partitioning keys and types Other information Thrift API Current clients in Php (Web Interface), Python (
7、old CLI), Java (Query Engine and CLI), Perl (Tests) Metadata can be stored as text files or even in a SQL backend,Hive CLI,DDL: create table/drop table/rename table alter table add column Browsing: show tables describe table cat table Loading Data Queries,Hive Query Language,Philosophy SQL like cons
8、tructs + Hadoop StreamingQuery Operators in initial version Projections Equijoins and Cogroups Group by SamplingOutput of these operators can be: passed to Streaming mappers/reducers can be stored in another Hive Table can be output to HDFS files can be output to local files,Hive Query Language,Pack
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- HIVEDATAWAREHOUSINGANALYTICSONHADOOPPPT
