多核编程A.ppt
《多核编程A.ppt》由会员分享,可在线阅读,更多相关《多核编程A.ppt(143页珍藏版)》请在麦多课文档分享上搜索。
1、Parallel programming with MPI,2,Agenda,Part : Seeking Parallelism/Concurrency Part : Parallel Algorithm Design Part : Message-Passing Programming,Part ,Seeking Parallel/Concurrency,4,Outline,1 Introduction 2 Seeking Parallel,5,1 Introduction(1/6),Well done is quickly done Caesar Auguest Fast, Fast,
2、Fast-is not “fast” enough. How to get Higher Performance Parallel Computing.,6,1 Introduction(2/6),What is parallel computing? is the use of a parallel computer to reduce the time needed to solve a single computational problem. is now considered a standard way for computational scientists and engine
3、ers to solve problems in areas as diverse as galactic evolution, climate modeling, aircraft design, molecular dynamics and economic analysis.,7,Parallel Computing,A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we basi
4、cally need? The ability to start the tasks A way for them to communicate,8,1 Introduction(3/6),Whats parallel computer? Is a Multi-processor computer system supporting parallel programming. Multi-computer Is a parallel computer constructed out of multiple computers and an interconnection network. Th
5、e processors on different computers interact by passing message to each other. Centralized multiprocessor (SMP: Symmetrical multiprocessor) Is a more high integrated system in which all CPUs share access to a single global memory. The shared memory supports communications and synchronization among p
6、rocessors.,9,1 Introduction(4/6),Multi-core platform Integrated duo/quad or more core in one processor, and each core has their own registers and Level 1 cache, all cores share Level 2 cache, which supports communications and synchronizations among cores. All cores share access to a global memory.,1
7、0,1 Introduction(5/6),Whats parallel programming? Is programming in language that allows you to explicitly indicate how different portions of the computation may be executed paralleled/concurrently by different processors/cores. Do I need parallel programming really? YES, for the reasons of: Althoug
8、h a lot of research has been invested in and many experimental parallelizing compilers have been developed, there are still no commercial system thus far. The alternative is for you to write your own parallel programs.,11,1 Introduction(6/6),Why should I program using MPI and OpenMP? MPI ( Message P
9、assing Interface) is a standard specification for message passing libraries. Which is available on virtually every parallel computer system. Free. If you develop programs using MPI, you will be able to reuse them when you get access to a newer, faster parallel computer. On Multi-core platform or SMP
10、, the cores/CPUs have a shared memory space. While MPI is a perfect satisfactory way for cores/processors to communicate with each other, OpenMP is a better way for cores/processors with a single Processor/SMP to interact. The hybrid MPI/OpenMP program can get even high performance.,12,2 Seeking Par
11、allel(1/7),In order to take advantage of multi-core/multiple processors, programmers must be able to identify operations that may be performed in parallel. Several ways: Data Dependence Graphs Data Parallelism Functional Parallelism Pipelining ,13,2 Seeking Parallel(2/7),Data Dependence Graphs A dir
12、ected graph Each vertex: represent a task to be completed. An edge from vertex u to vertex v means: task u must be completed before task v begins. - Task v is dependent on task u. If there is no path from u to v, then the tasks are independent and may be performed parallelized.,14,2 Seeking Parallel
13、(3/7),Data Dependence Graphs,15,2 Seeking Parallel(4/7),Data Parallelism Independent tasks applying the same operation to different elements of a data set. e.g.,16,2 Seeking Parallel(5/7),Functional Parallelism Independent tasks applying different operations to different data elements of a data set.
14、,17,2 Seeking Parallel(6/7),Pipelining A data dependence graph forming a simple path/chain admits no parallelism if only a single problem instance must be processed. If multiple problems instance to be processed: If a computation can be divided into several stage with the same time consumption. Then
15、, can support parallelism. E.g. Assembly line.,18,2 Seeking Parallel(7/7),Pipelining,19,For Example:,Landscape maintains Prepare for dinner Data cluster ,20,Homework,Given a task that can be divided into m subtasks, each require one unit of time, how much time is needed for an m-stage pipeline to pr
16、ocess n tasks? Consider the data dependence graph in figure below. identify all sources of data parallelism; identify all sources of functional parallelism.,Parallel Algorithm Design,Part ,22,1.Introduction 2.The Task/Channel Model 3.Fosters Design Methodology,Outline,23,1.Introduction,Foster, Ian.
17、Design and Building Parallel Programs: Concepts and Tools for Parallel Software engineering. Reading, MA: Addison-Wesley, 1995. Describe the Task/Channel Model; A few simple problems,24,2.The Task/Channel Model,The model represents a parallel computation as a set of tasks that may interact with each
18、 other by sending message through channels.,Task: is a program, its local memory, and a collection of I/O ports. Local memory: instructionsprivate data,25,2.The Task/Channel Model,channel: Via channel: A task can send local data to other tasks via output ports; A task can receive data value from oth
19、er tasks via input ports. A channel is a message queue: Connect one tasks output port with another tasks input port. Data value appears at the inputs port in the same order in which they were placed in the output port of the other end of the channel. Receiving data can be blocked: Synchronous. Sendi
20、ng data can never be blocked: Asynchronous. Access to local memory: faster than nonlocal data access.,26,3.Fosters Design Methodology,Four-step process: Partitioning Communication Agglomeration mapping,27,3.Fosters Design Methodology,Partitioning Is the process of dividing the computation and the da
21、ta into pieces. More small pieces is good. How to Data-centric approach Function-centric approach Domain Decomposition First, divide data into pieces; Then, determine how to associate computations with the data. Focus on: the largest and/or most frequently accessed data structure in the program. E.g
22、., Functional Decomposition,28,3.Fosters Design Methodology Domain Decomposition,1-D,2-D,3-D,Better,Primitive Task,29,3.Fosters Design Methodology Functional Decomposition,Yield collections of tasks that achieve parallel through pipelining. E.g., a system supporting interactive image-guided surgery.
23、,30,3.Fosters Design Methodology,The quality of Partition (evaluation) At least an order of magnitude more primitive tasks than processors in the target parallel computer. Otherwise: later design options may be too constrained. Redundant computations and redundant data structure storage are minimize
24、d. Otherwise: the design may not work well when the size of the problem increases. Primitive tasks are roughly the same size. Otherwise: it may be hard to balance work among the processors/cores. The number of tasks is an increasing function of the problem size. Otherwise: it may be impossible to us
25、e more processor/cores to solve large problem.,31,3.Fosters Design Methodology,Communication After identifying the primitive tasks, the communications type between those primitive tasks should be determined. Two kinds of communication type: Local Global,32,3.Fosters Design Methodology,Communication
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 多核 编程 APPT
