欢迎来到麦多课文档分享! | 帮助中心 海量文档,免费浏览,给你所需,享你所想!
麦多课文档分享
全部分类
  • 标准规范>
  • 教学课件>
  • 考试资料>
  • 办公文档>
  • 学术论文>
  • 行业资料>
  • 易语言源码>
  • ImageVerifierCode 换一换
    首页 麦多课文档分享 > 资源分类 > PPT文档下载
    分享到微信 分享到微博 分享到QQ空间

    High Performance Cluster ComputingArchitectures and Systems.ppt

    • 资源ID:372910       资源大小:1.22MB        全文页数:89页
    • 资源格式: PPT        下载积分:2000积分
    快捷下载 游客一键下载
    账号登录下载
    微信登录下载
    二维码
    微信扫一扫登录
    下载资源需要2000积分(如需开发票,请勿充值!)
    邮箱/手机:
    温馨提示:
    如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如需开发票,请勿充值!如填写123,账号就是123,密码也是123。
    支付方式: 支付宝扫码支付    微信扫码支付   
    验证码:   换一换

    加入VIP,交流精品资源
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    High Performance Cluster ComputingArchitectures and Systems.ppt

    1、High Performance Cluster Computing Architectures and Systems,Hai Jin,Cluster and Grid Computing Lab,Cluster Computing at a Glance,Introduction Scalable Parallel Computer Architecture Towards Low Cost Parallel Computing and Motivations Windows of Opportunity A Cluster Computer and its Architecture Cl

    2、usters Classifications Commodity Components for Clusters Network Service/Communications SW Cluster Middleware and Single System Image Resource Management and Scheduling (RMS) Programming Environments and Tools Cluster Applications Representative Cluster Systems Cluster of SMPs (CLUMPS) Summary and C

    3、onclusions,Introduction,Need more computing power Improve the operating speed of processors & other components constrained by the speed of light, thermodynamic laws, & the high financial costs for processor fabrication Connect multiple processors together & coordinate their computational efforts par

    4、allel computers allow the sharing of a computational task among multiple processors,How to Run Applications Faster ?,There are 3 ways to improve performance: Work Harder Work Smarter Get Help Computer Analogy Using faster hardware Optimized algorithms and techniques used to solve computational tasks

    5、 Multiple computers to solve a particular task,Two Eras of Computing,Scalable Parallel Computer Architectures,Taxonomy based on how processors, memory & interconnect are laid out Massively Parallel Processors (MPP) Symmetric Multiprocessors (SMP) Cache-Coherent Nonuniform Memory Access (CC-NUMA) Dis

    6、tributed Systems Clusters,Scalable Parallel Computer Architectures,MPP A large parallel processing system with a shared-nothing architecture Consist of several hundred nodes with a high-speed interconnection network/switch Each node consists of a main memory & one or more processors Runs a separate

    7、copy of the OS SMP 2-64 processors today Shared-everything architecture All processors share all the global resources available Single copy of the OS runs on these systems,Scalable Parallel Computer Architectures,CC-NUMA a scalable multiprocessor system having a cache-coherent nonuniform memory acce

    8、ss architecture every processor has a global view of all of the memory Distributed systems considered conventional networks of independent computers have multiple system images as each node runs its own OS the individual machines could be combinations of MPPs, SMPs, clusters, & individual computers

    9、Clusters a collection of workstations of PCs that are interconnected by a high-speed network work as an integrated collection of resources have a single system image spanning all its nodes,Key Characteristics of Scalable Parallel Computers,Towards Low Cost Parallel Computing,Parallel processing link

    10、ing together 2 or more computers to jointly solve some computational problem an increasing trend to move away from expensive and specialized proprietary parallel supercomputers towards networks of workstations the rapid improvement in the availability of commodity high performance components for wor

    11、kstations and networks Low-cost commodity supercomputing from specialized traditional supercomputing platforms to cheaper, general purpose systems consisting of loosely coupled components built up from single or multiprocessor PCs or workstations need to standardization of many of the tools and util

    12、ities used by parallel applications (ex) MPI, HPF,Motivations of using NOW over Specialized Parallel Computers,Individual workstations are becoming increasing powerful Communication bandwidth between workstations is increasing and latency is decreasing Workstation clusters are easier to integrate in

    13、to existing networks Typical low user utilization of personal workstations Development tools for workstations are more mature Workstation clusters are a cheap and readily available Clusters can be easily grown,Trend,Workstations with UNIX for science & industry vs PC-based machines for administrativ

    14、e work & word processing A rapid convergence in processor performance and kernel-level functionality of UNIX workstations and PC-based machines,Windows of Opportunities,Parallel Processing Use multiple processors to build MPP/DSM-like systems for parallel computing Network RAM Use memory associated

    15、with each workstation as aggregate DRAM cache Software RAID Redundant array of inexpensive disks Possible to provide parallel I/O support to applications Use arrays of workstation disks to provide cheap, highly available, and scalable file storage Multipath Communication Use multiple networks for pa

    16、rallel data transfer between nodes,Cluster Computer and its Architecture,A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource A node a single

    17、or multiprocessor system with memory, I/O facilities, & OS generally 2 or more computers (nodes) connected together in a single cabinet, or physically separated & connected via a LAN appear as a single system to users and applications provide a cost-effective way to gain features and benefits,Cluste

    18、r Computer Architecture,Prominent Components of Cluster Computers (I),Multiple High Performance Computers PCs Workstations SMPs (CLUMPS) Distributed HPC Systems leading to Metacomputing,Prominent Components of Cluster Computers (II),State of the art Operating Systems Linux (Beowulf) Microsoft NT (Il

    19、linois HPVM) SUN Solaris (Berkeley NOW) IBM AIX (IBM SP2) HP UX (Illinois - PANDA) Mach (Microkernel based OS) (CMU) Cluster Operating Systems (Solaris MC, SCO Unixware, MOSIX (academic project) OS gluing layers (Berkeley Glunix),Prominent Components of Cluster Computers (III),High Performance Netwo

    20、rks/Switches Ethernet (10Mbps) Fast Ethernet (100Mbps) Gigabit Ethernet (1Gbps) SCI (Dolphin - MPI- 12micro-sec latency) Myrinet (2Gbps) Infiniband (10Gbps) ATM Digital Memory Channel FDDI,Prominent Components of Cluster Computers (IV),Network Interface Card Myrinet has NIC User-level access support

    21、,Prominent Components of Cluster Computers (V),Fast Communication Protocols and Services Active Messages (Berkeley) Fast Messages (Illinois) U-net (Cornell) XTP (Virginia),Prominent Components of Cluster Computers (VI),Cluster Middleware Single System Image (SSI) System Availability (SA) Infrastruct

    22、ure Hardware DEC Memory Channel, DSM (Alewife, DASH), SMP Techniques Operating System Kernel/Gluing Layers Solaris MC, Unixware, GLUnix Applications and Subsystems Applications (system management and electronic forms) Runtime systems (software DSM, PFS etc.) Resource management and scheduling softwa

    23、re (RMS) CODINE, LSF, PBS, NQS, etc.,Prominent Components of Cluster Computers (VII),Parallel Programming Environments and Tools Threads (PCs, SMPs, NOW) POSIX Threads Java Threads MPI Linux, NT, on many Supercomputers PVM Software DSMs (Shmem) Compilers C/C+/Java Parallel programming with C+ (MIT P

    24、ress book) RAD (rapid application development tools) GUI based tools for PP modeling Debuggers Performance Analysis Tools Visualization Tools,Prominent Components of Cluster Computers (VIII),Applications Sequential Parallel / Distributed (Cluster-aware app.) Grand Challenging applications Weather Fo

    25、recasting Quantum Chemistry Molecular Biology Modeling Engineering Analysis (CAD/CAM) . PDBs, web servers, data-mining,Key Operational Benefits of Clustering,High Performance Expandability and Scalability High Throughput High Availability,Clusters Classification (I),Application Target High Performan

    26、ce (HP) Clusters Grand challenging applications High Availability (HA) Clusters Mission critical applications,Clusters Classification (II),Node Ownership Dedicated Clusters Non-dedicated clusters Adaptive parallel computing Communal multiprocessing,Clusters Classification (III),Node Hardware Cluster

    27、s of PCs (CoPs) Piles of PCs (PoPs) Clusters of Workstations (COWs) Clusters of SMPs (CLUMPs),Clusters Classification (IV),Node Operating System Linux Clusters (e.g., Beowulf) Solaris Clusters (e.g., Berkeley NOW) NT Clusters (e.g., HPVM) AIX Clusters (e.g., IBM SP2) SCO/Compaq Clusters (Unixware) D

    28、igital VMS Clusters HP-UX clusters Microsoft Wolfpack clusters,Clusters Classification (V),Node Configuration Homogeneous Clusters All nodes have similar architectures and run the same OSs Heterogeneous Clusters All nodes have different architectures and run different OSs,Clusters Classification (VI

    29、),Levels of Clustering Group Clusters (#nodes: 2-99) Nodes are connected by SAN like Myrinet Departmental Clusters (#nodes: 10s to 100s) Organizational Clusters (#nodes: many 100s) National Metacomputers (WAN/Internet-based) International Metacomputers (Internet-based, #nodes: 1000s to many millions

    30、) Metacomputing Web-based Computing Agent Based Computing Java plays a major in web and agent based computing,Commodity Components for Clusters (I),Processors Intel x86 Processors Pentium Pro, Pentium Xeon AMD x86, Cyrix x86, etc. Digital Alpha Alpha 21364 processor integrates processing, memory con

    31、troller, network interface into a single chip IBM PowerPC Sun SPARC SGI MIPS HP PA Berkeley Intelligent RAM (IRAM) integrates processor and DRAM onto a single chip,Commodity Components for Clusters (II),Memory and Cache Standard Industry Memory Module (SIMM) Extended Data Out (EDO) Allow next access

    32、 to begin while the previous data is still being read Fast page Allow multiple adjacent accesses to be made more efficiently Access to DRAM is extremely slow compared to the speed of the processor the very fast memory used for Cache is expensive & cache control circuitry becomes more complex as the

    33、size of the cache grows Within Pentium-based machines, common to have a 64-bit wide memory bus as well as a chip set that support 2Mbytes of external cache,Commodity Components for Clusters (III),Disk and I/O Overall improvement in disk access time has been less than 10% per year Amdahls law Speed-u

    34、p obtained by from faster processors is limited by the slowest system component Parallel I/O Carry out I/O operations in parallel, supported by parallel file system based on hardware or software RAID,Commodity Components for Clusters (IV),System Bus ISA bus (AT bus) Clocked at 5MHz and 8 bits wide C

    35、locked at 13MHz and 16 bits wide VESA bus 32 bits bus matched systems clock speed PCI bus 133Mbytes/s transfer rate Adopted both in Pentium-based PC and non-Intel platform (e.g., Digital Alpha Server),Commodity Components for Clusters (V),Cluster Interconnects Communicate over high-speed networks us

    36、ing a standard networking protocol such as TCP/IP or a low-level protocol such as AM Standard Ethernet 10 Mbps cheap, easy way to provide file and printer sharing bandwidth & latency are not balanced with the computational power Ethernet, Fast Ethernet, and Gigabit Ethernet Fast Ethernet 100 Mbps Gi

    37、gabit Ethernet preserve Ethernets simplicity deliver a very high bandwidth to aggregate multiple Fast Ethernet segments,Commodity Components for Clusters (VI),Cluster Interconnects Asynchronous Transfer Mode (ATM) Switched virtual-circuit technology Cell (small fixed-size data packet) use optical fi

    38、ber - expensive upgrade telephone style cables (CAT-3) & better quality cable (CAT-5) Scalable Coherent Interfaces (SCI) IEEE 1596-1992 standard aimed at providing a low-latency distributed shared memory across a cluster Point-to-point architecture with directory-based cache coherence reduce the del

    39、ay of interprocessor communication eliminate the need for runtime layers of software protocol-paradigm translation less than 12 usec zero message-length latency on Sun SPARC Designed to support distributed multiprocessing with high bandwidth and low latency SCI cards for SPARCs SBus and PCI-based SC

    40、I cards from Dolphin Scalability constrained by the current generation of switches & relatively expensive components,Commodity Components for Clusters (VII),Cluster Interconnects Myrinet 1.28 Gbps full duplex interconnection network Use low latency cut-through routing switches, which is able to offe

    41、r fault tolerance by automatic mapping of the network configuration Support both Linux & NT Advantages Very low latency (5s, one-way point-to-point) Very high throughput Programmable on-board processor for greater flexibility Disadvantages Expensive: $1500 per host Complicated scaling: switches with

    42、 more than 128 ports are unavailable,Commodity Components for Clusters (VIII),Operating Systems 2 fundamental services for users make the computer hardware easier to use create a virtual machine that differs markedly from the real machine share hardware resources among users Processor - multitasking

    43、 The new concept in OS services support multiple threads of control in a process itself parallelism within a process multithreading POSIX thread interface is a standard programming environment Trend Modularity MS Windows, IBM OS/2 Microkernel provide only essential OS services high level abstraction

    44、 of OS portability,Commodity Components for Clusters (IX),Operating Systems Linux UNIX-like OS Runs on cheap x86 platform, yet offers the power and flexibility of UNIX Readily available on the Internet and can be downloaded without cost Easy to fix bugs and improve system performance Users can devel

    45、op or fine-tune hardware drivers which can easily be made available to other users Features such as preemptive multitasking, demand-page virtual memory, multiuser, multiprocessor support,Commodity Components for Clusters (X),Operating Systems Solaris UNIX-based multithreading and multiuser OS Suppor

    46、t Intel x86 & SPARC-based platforms Real-time scheduling feature critical for multimedia applications Support two kinds of threads Light Weight Processes (LWPs) User level thread Support both BSD and several non-BSD file system CacheFS AutoClient TmpFS: uses main memory to contain a file system Proc

    47、 file system Volume file system Support distributed computing & is able to store & retrieve distributed information OpenWindows allows application to be run on remote systems,Commodity Components for Clusters (XI),Operating Systems Microsoft Windows NT (New Technology) Preemptive, multitasking, mult

    48、iuser, 32-bits OS Object-based security model and special file system (NTFS) that allows permissions to be set on a file and directory basis Support multiple CPUs and provide multitasking using symmetrical multiprocessing Support different CPUs and multiprocessor machines with threads Have the netwo

    49、rk protocols & services integrated with the base OS several built-in networking protocols (IPX/SPX., TCP/IP, NetBEUI), & APIs (NetBIOS, DCE RPC, Window Sockets (Winsock),Windows NT 4.0 Architecture,Network Services/ Communication SW,Communication infrastructure support protocol for Bulk-data transpo

    50、rt Streaming data Group communications Communication service provide cluster with important QoS parameters Latency Bandwidth Reliability Fault-tolerance Jitter control Network service are designed as hierarchical stack of protocols with relatively low-level communication API, provide means to implement wide range of communication methodologies RPC DSM Stream-based and message passing interface (e.g., MPI, PVM),


    注意事项

    本文(High Performance Cluster ComputingArchitectures and Systems.ppt)为本站会员(吴艺期)主动上传,麦多课文档分享仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文档分享(点击联系客服),我们立即给予删除!




    关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

    copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
    备案/许可证编号:苏ICP备17064731号-1 

    收起
    展开