欢迎来到麦多课文档分享! | 帮助中心 海量文档,免费浏览,给你所需,享你所想!
麦多课文档分享
全部分类
  • 标准规范>
  • 教学课件>
  • 考试资料>
  • 办公文档>
  • 学术论文>
  • 行业资料>
  • 易语言源码>
  • ImageVerifierCode 换一换
    首页 麦多课文档分享 > 资源分类 > PPT文档下载
    分享到微信 分享到微博 分享到QQ空间

    HIVEData Warehousing Analytics on Hadoop.ppt

    • 资源ID:372921       资源大小:279.50KB        全文页数:19页
    • 资源格式: PPT        下载积分:2000积分
    快捷下载 游客一键下载
    账号登录下载
    微信登录下载
    二维码
    微信扫一扫登录
    下载资源需要2000积分(如需开发票,请勿充值!)
    邮箱/手机:
    温馨提示:
    如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如需开发票,请勿充值!如填写123,账号就是123,密码也是123。
    支付方式: 支付宝扫码支付    微信扫码支付   
    验证码:   换一换

    加入VIP,交流精品资源
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    HIVEData Warehousing Analytics on Hadoop.ppt

    1、HIVE Data Warehousing & Analytics on Hadoop,Joydeep Sen Sarma, Ashish Thusoo Facebook Data Team,Why Another Data Warehousing System?,Problem: Data, data and more data 200GB per day in March 2008 back to 1TB compressed per day today The Hadoop Experiment Problem: Map/Reduce is great but every one is

    2、not a Map/Reduce expert I know SQL and I am a python and php expert So what do we do: HIVE,What is HIVE?,A system for querying and managing structured data built on top of Map/Reduce and Hadoop We had: Structured logs with rich data types (structs, lists and maps) A user base wanting to access this

    3、data in the language of their choice A lot of traditional SQL workloads on this data (filters, joins and aggregations) Other non SQL workloads,Data Warehousing at Facebook Today,Web Servers,Scribe Servers,Filers,Hive on Hadoop Cluster,Oracle RAC,Federated MySQL,HIVE: Components,HDFS,Hive CLI,DDL,Que

    4、ries,Browsing,Map Reduce,MetaStore,Thrift API,SerDe,Thrift,Jute,JSON,Execution,Hive QL,Parser,Planner,Mgmt. Web UI,Data Model,Logical Partitioning,HashPartitioning,Schema,Library,clicks,HDFS,MetaStore,/hive/clicks,/hive/clicks/ds=2008-03-25,/hive/clicks/ds=2008-03-25/0,Tables,#Buckets=32 Bucketing I

    5、nfo Partitioning Cols,Dealing with Structured Data,Type system Primitive types Recursively build up using Composition/Maps/Lists Generic (De)Serialization Interface (SerDe) To recursively list schema To recursively access fields within a row object Serialization families implement interface Thrift D

    6、DL based SerDe Delimited text based SerDe You can write your own SerDe Schema Evolution,MetaStore,Stores Table/Partition properties: Table schema and SerDe library Table Location on HDFS Logical Partitioning keys and types Other information Thrift API Current clients in Php (Web Interface), Python (

    7、old CLI), Java (Query Engine and CLI), Perl (Tests) Metadata can be stored as text files or even in a SQL backend,Hive CLI,DDL: create table/drop table/rename table alter table add column Browsing: show tables describe table cat table Loading Data Queries,Hive Query Language,Philosophy SQL like cons

    8、tructs + Hadoop StreamingQuery Operators in initial version Projections Equijoins and Cogroups Group by SamplingOutput of these operators can be: passed to Streaming mappers/reducers can be stored in another Hive Table can be output to HDFS files can be output to local files,Hive Query Language,Pack

    9、age these capabilities into a more formal SQL like query language in next version Introduce other important constructs: Ability to stream data thru custom mappers/reducers Multi table inserts Multiple group bys SQL like column expressions and some XPath like expressions Etc,Joins,Joins FROM page_vie

    10、w pv JOIN user u ON (pv.userid = u.id) INSERT INTO TABLE pv_users SELECT pv.*, u.gender, u.age WHERE pv.date = 2008-03-03;Outer JoinsFROM page_view pv FULL OUTER JOIN user u ON (pv.userid = u.id) INSERT INTO TABLE pv_users SELECT pv.*, u.gender, u.age WHERE pv.date = 2008-03-03;,Aggregations and Mul

    11、ti-Table Inserts,FROM pv_users INSERT INTO TABLE pv_gender_uu SELECT pv_users.gender, count(DISTINCT pv_users.userid) GROUP BY(pv_users.gender) INSERT INTO TABLE pv_ip_uu SELECT pv_users.ip, count(DISTINCT pv_users.id) GROUP BY(pv_users.ip);,Running Custom Map/Reduce Scripts,FROM ( FROM pv_users SEL

    12、ECT TRANSFORM(pv_users.userid, pv_users.date) USING map_script AS(dt, uid) CLUSTER BY(dt) map INSERT INTO TABLE pv_users_reduced SELECT TRANSFORM(map.dt, map.uid) USING reduce_script AS (date, count);,Inserts into Files, Tables and Local Files,FROM pv_users INSERT INTO TABLE pv_gender_sum SELECT pv_

    13、users.gender, count_distinct(pv_users.userid) GROUP BY(pv_users.gender) INSERT INTO DIRECTORY /user/facebook/tmp/pv_age_sum.dir SELECT pv_users.age, count_distinct(pv_users.userid) GROUP BY(pv_users.age) INSERT INTO LOCAL DIRECTORY /home/me/pv_age_sum.dirFIELDS TERMINATED BY , LINES TERMINATED BY 01

    14、3 SELECT pv_users.age, count_distinct(pv_users.userid) GROUP BY(pv_users.age);,Hadoop Usage Facebook,Types of Applications: Summarization Eg: Daily/Weekly aggregations of impression/click counts Ad hoc Analysis Eg: how many group admins broken down by state/country Data Mining (Assembling training d

    15、ata) Eg: User Engagement as a function of user attributes,Hadoop Usage Facebook,Usage statistics: Total Users: 140 (about 50% of engineering !) in the last 1 months Hive Data (compressed): 80 TB total, 1TB incoming per day Job statistics: 1000 jobs/day 100 loader jobs/day,Hadoop Improvements Faceboo

    16、k,Some problems: No Fair Sharing: Big tasks can hog the cluster No snapshots: What if a software bug corrupts the NameNode transaction log Solutions: Simple fair sharing (Matie Zaharia) Investigating Snapshots (Dhrubha Bortharkur),Conclusion,JIRA http:/issues.apache.org/jira/browse/HADOOP-3601 Soon to be checked into hadoop trunk Release available in hadoop version 0.19 People: Suresh Anthony Zheng Shao Prasad Chakka Pete Wyckoff Namit Jain Raghu Murthy Joydeep Sen Sarma Ashish Thusoo,


    注意事项

    本文(HIVEData Warehousing Analytics on Hadoop.ppt)为本站会员(刘芸)主动上传,麦多课文档分享仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文档分享(点击联系客服),我们立即给予删除!




    关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

    copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
    备案/许可证编号:苏ICP备17064731号-1 

    收起
    展开