Dynamo- Amazon's Highly Available Key-value .ppt
《Dynamo- Amazon's Highly Available Key-value .ppt》由会员分享,可在线阅读,更多相关《Dynamo- Amazon's Highly Available Key-value .ppt(26页珍藏版)》请在麦多课文档分享上搜索。
1、Dynamo: Amazons Highly Available Key-value Store Giuseppe DeCandia et al. A,Jagrut Sharma jagrutshusc.edu CSCI-572 (Prof. Chris Mattmann) 20-Jul-2010,Outline of Talk,Motivation (1) Contribution (1) Context (1) Background (3) Related Work (2) System Architecture (7) Implementation (1) Experiences, Re
2、sults & Lessons Learnt (4) Conclusion (1) Pros (1) Cons (1) Questions (1),2,Motivation,3,Tens of millions of customers,Tens of thousands of servers,Globally distributed data centers 24 * 7 * 365 operations,Performance,Reliability,Efficiency,Scalability,Financial consequences,Customer Trust,DATA MGMT
3、,Contribution,Evaluation of how different techniques can be combined to provide a highly-available systemDemonstration of how a consistent storage system (like Dynamo) can be used in production environment with demanding applicationsProvision of tuning methods to meet requirements of production syst
4、ems with very strict performance demands,4,Context,Amazons e-commerce platform Highly de-centralized Loosely coupled Service-oriented architecture Hundreds of services Millions of components Failure is a way of lifeCritical requirement Always available storageStorage techniques S3 (Amazon Simple Sto
5、rage Service) Dynamo Highly available and scalable distributed data store for Amazons platform Provides primary-key only interface for selected applications (e.g. shopping cart) Combined multiple, high-performance techniques & algorithms Excellent performance in real-world scenarios,5,Background (1
6、of 3),E-commerce platform services: Stateless & Stateful Relational Databases an over-kill for stateful lookups by primary key Dynamo: Simple key/value interface Highly available Efficient in resource usage Scalable Each service that uses Dynamo runs its own Dynamo instances Dynamos target applicati
7、ons: Store small-sized objects (1 MB) Operate with weaker consistency if this gives high availability Simple read-write to a data item uniquely identified by a key No query operations span multiple data items Services use Dynamo to give priority to latency & throughput Amazons SLAs are expressed and
8、 measured at the 99.9th percentile of the distribution (in contrast to common industry approach of using average, median and expected variance),6,Background (2 of 3),Assumptions About DynamoUsed only by Amazons internal servicesOperation environment is non-hostileThere are no security-related requir
9、ements (e.g. authentication, authorization)Each service uses its distinct instance of DynamoDynamos initial design targets a scale of up to hundreds of storage hosts,7,Background (3 of 3),SOA of Amazons platform,8,Dynamo Design ConsiderationsConflict resolution between replication & consistency ? Ev
10、entually consistent data storeWhen to resolve update conflicts ? “always writeable” data storeWho performs conflict resolution? Both data store & application allowedIncremental scalability at node-level Symmetry among nodes Favors decentralization Capable of exploiting infrastructure heterogeneity,R
11、elated Work (1 of 2),Peer to Peer Systems Tackle problems of data storage and distribution Only support flat namespaces Unstructured P2P: Freenet, Gnutella Search query floods network Structured P2P systems: Pastry, Chord, Oceanstore, PAST Employ globally consistent query routing protocol Bounded nu
12、mber of hops Maintain local routing tables Provide rich storage services with conflict resolutionDistributed File Systems and Databases Support both flat & hierarchical namespaces Ficus, Coda: high availability at expense of consistency Farsite: high availability and scalability using replication Go
13、ogle File System: master server, chunkservers Bayou: Distributed RDBMS, disconnected operations Antiquity: Wide-area distributed storage system BigTable: Distributed storage system for structured data,9,Related Work (2 of 2),Dynamo Vs Other SystemsTargeted mainly at apps that need an “always writeab
14、le” data storeBuilt for an infrastructure within a single administrative domain where all nodes are assumed to be trustedApplications using Dynamo do not require support for hierarchical namespaces or complex relational schemaBuilt for latency sensitive applications that require at least 99.9% of re
15、ad and write operations to be performed within a few hundred milliseconds.Avoids routing requests through multiple nodes. Hence, similar to a zero-hop Distributed Hash Table.,10,System Architecture (1 of 7),11,List Of Techniques Used By Dynamo & Their Advantages,System Architecture (2 of 7),System I
16、nterfaceget (key) locates the object replicas associated with key in the storage system Returns a single object/list of objects with conflicting versions + contextput(key, context, object) Determines where the replicas of the object should be placed based on the associated key Writes replicas to dis
17、kcontext encodes system metadata about object includes additional information (e.g. object version)key, object: considered as an opaque array of bytes MD5 hash (key) - 128-bit identifier, used to determine the storage nodes that are responsible for serving the key,12,System Architecture (3 of 7),Par
18、titioning AlgorithmProvides mechanism to dynamically partition the data over the set of nodes (i.e. storage hosts) Uses variant of consistent hashing (output range of a hash function is treated as a fixed circular space or ring - largest hash value wraps around to the smallest hash value) Advantage:
19、 departure or arrival of a node only affects its immediate neighbors Limitation 1: leads to non-uniform data and load distribution Limitation 2: oblivious to heterogeneity in the performance of nodes(single node) - multiple points in the ring i.e. virtual nodes Advantages of virtual nodes: Graceful
20、handling of failure of a node Easy accommodation of a new node Heterogeneity in physical infrastructure can be exploited,13,System Architecture (4 of 7),14,ReplicationEach data item replicated at N hosts N is configured per-instance Each node is responsible for the region of the ring between it and
21、its Nth predecessor Preference list: List of nodes responsible for storing a particular keyData VersioningEventual consistency: Allows updates to be propagated to all replicas asynchronously put() may return to caller before update has been applied at all replicas get() may return an object that doe
22、s not have the latest updates Multiple versions of an object can be present in the system at same time syntactic reconciliation: performed by system semantic reconciliation: performed by client vector clock: (node, counter) pair. Used for capturing causality between different versions of the same ob
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- DYNAMOAMAZON SHIGHLYAVAILABLEKEYVALUEPPT
