Architecture of Parallel ComputersCSC - ECE 506 .ppt
《Architecture of Parallel ComputersCSC - ECE 506 .ppt》由会员分享,可在线阅读,更多相关《Architecture of Parallel ComputersCSC - ECE 506 .ppt(37页珍藏版)》请在麦多课文档分享上搜索。
1、Architecture of Parallel Computers CSC / ECE 506 OpenFabrics Alliance Lecture 18,7/17/2006Dr Steve Hunter,CSC / ECE 506,Outline,Infiniband and Ethernet Review DDP and RDMA OpenFabrics Alliance IP over Infiniband (IPoIB) Sockets Direct Protocol (SDP) Network File System (NFS) SCSI RDMA Protocol (SRP)
2、 iSCSI Extensions for RDMA (iSER) Reliable Datagram Sockets (RDS),CSC / ECE 506,Infiniband Goals - Review,Interconnect for server I/O and efficient interprocess communications Standard across the industry backed by all the major players 200+ companies With an architecture able to match future system
3、s: Low overhead Scalable bandwidth, up and down Scalable fanout, few to thousands Low cost, excellent price/performance Robust reliability, availability, and serviceability Leverages Internet Protocol suite and paradigms,CSC / ECE 506,The Basic Unit: an IB Subnet - Review,Basic whole IB system is a
4、subnet Elements: Endnodes Links Switches What it does: Communicate endnodes with endnodes, via message queues, which process messages over several transport types, and are SARed into packets, which are placed on links, and routed by switches.,End Node,Switch,End Node,End Node,End Node,End Node,End N
5、ode,End Node,End Node,Switch,Switch,End Node,End Node,Switch,Links,CSC / ECE 506,End Node Attachment to IB - Review,End nodes attach to IB via Channel Adapters: Host CAs (HCAs) O/S API/KPIs not specified Queues and memory accessible via verbs QP, CQ, and RDMA engines Must support three IB Transports
6、 Can include: Dual ports load balancing, availability (path migration) Attach to same or different subnets Partitioning Atomics, Target CAs (TCAs) Queue access method is vendor unique QP and CQ engines Need only support Unreliable Datagram ULP can be standard or proprietary In other words A smaller
7、subset of required functions.,CSC / ECE 506,Infiniband Summary,InfiniBand architecture is a very high performance, low latency interconnect technology based on an industry-standard approach to Remote Direct Memory Access (RDMA) An InfiniBand fabric is built from hardware and software that are config
8、ured, monitored and operated to deliver a variety of services to users and applications Characteristics of the technology that differentiate it from comparative interconnects such as the traditional Ethernet include: End-to-end reliable delivery, Scalable bandwidths from 10 to 60 Gbps available toda
9、y moving to 120 Gbps in the near future Scalability without performance degradation Low latency between devices Greatly reduced server CPU utilization for protocol processing Efficient I/O channel architecture for network and storage virtualizations,CSC / ECE 506,Advanced Ethernet - Review,TCP/IP Mo
10、del,Ethernet,Examples,IP,TCP, UDP,Copper, Optical,HTTP, SMTP, FTP,Physical,Link,Network,Transport,Application,RDMA NIC (RNIC),SCSI,iSER / RNIC Model shown with SCSI application,Physical,Media Access Control (MAC),Internet Protocol (IP),Direct Data Placement (DDP),Transmission Control Protocol (TCP),
11、SCSl app,iSCSI Extensions for RDMA (iSER),Internet SCSI (iSCSI),Markers with PDU Alignment (MPA),Remote Direct Memory Access Protocol (RDMAP),MAC Service,IP Service,TCP Service,RDMA Service,SCSI Service,Its expected the OpenFabrics effort (i.e., OpenIB / OpenRDMA merger) will enable even more advanc
12、ed functions into NIC technology,CSC / ECE 506,Advanced Ethernet Summary,The iWARP technology, implemented as RDMA Network Interface Card (RNIC), achieves Zero-copy, RDMA, and protocol offload over existing TCP/IP networks It was demonstrated that a 10GbE based RNIC can reduce the CPU processing ove
13、rhead from 80-90% to less than 10% comparing to its host stack equivalent Additionally, its achievable end-to-end latency is now 5 microseconds or less. iWARP together with the emerging low latency (low hundreds of nanoseconds) 10 GbE switches can also provide a powerful infrastructure for clustered
14、 computing, server-to-server processing, visualization and file system The advantage of the iWARP technology includes its ability to leverage the widely deployed TCP/IP infrastructure, its broad knowledge base, and mature management and monitoring capabilities. In addition, an iWARP infrastructure i
15、s a routable infrastructure, thereby eliminating the need for gateways to connect to the LAN or WAN internet.,CSC / ECE 506,DDP and RDMA,IETF RFC http:/ central idea of general-purpose DDP is that a data sender will supplement the data it sends with placement information that allows the receivers ne
16、twork interface to place the data directly at its final destination without any copying. DDP can be used to steer received data to its final destination, without requiring layer- specific behavior for each different layer. Data sent with such DDP information is said to be tagged. The central compone
17、nts of the DDP architecture are the “buffer”, which is an object with beginning and ending addresses, and a method (set(), which sets the value of an octet at an address. In many cases, a buffer corresponds directly to a portion of host user memory. However, DDP does not depend on this; a buffer cou
18、ld be a disk file, or anything else that can be viewed as an addressable collection of octets.,CSC / ECE 506,DDP and RDMA,Remote Direct Memory Access (RDMA) extends the capabilities of DDP with two primary functions. It adds the ability to read from buffers registered to a socket (RDMA Read). This a
19、llows a client protocol to perform arbitrary, bidirectional data movement without involving the remote client. When RDMA is implemented in hardware, arbitrary data movement can be performed without involving the remote host CPU at all. RDMA specifies a transport-independent untagged message service
20、(Send) with characteristics that are both very efficient to implement in hardware, and convenient for client protocols. The RDMA architecture is patterned after the traditional model for device programming, where the client requests an operation using Send-like actions (programmed I/O), the server p
21、erforms the necessary data transfers for the operation (DMA reads and writes), and notifies the client of completion. The programmed I/O+DMA model efficiently supports a high degree of concurrency and flexibility for both the client and server, even when operations have a wide range of intrinsic lat
22、encies.,CSC / ECE 506,OpenFabrics Alliance,The OpenFabric Alliance is an international organization comprised of industry, academic and research groups that have developed a unified core of open source software stacks (OpenSTAC) leveraging RDMA architectures for both the Linux and Windows operating
23、systems over both InfiniBand and Ethernet. RDMA is a communications technique allowing data to be transmitted from the memory of one computer to the memory of another computer without passing through either devices CPU, without needing extensive buffering, and without calling to an operating system
24、kernel The core OpenSTAC software supports all the well known standard upper layer protocols such as MPI, IP, SDP, NFS, SRP, iSER, and RDS on top of Ethernet and InfiniBand (IB) infrastructures The OpenFabric software and supporting services better enables low-latency InfiniBand and 10 GbE to delive
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- ARCHITECTUREOFPARALLELCOMPUTERSCSCECE506PPT

链接地址:http://www.mydoc123.com/p-378549.html