Advanced Charm++ Tutorial.ppt
《Advanced Charm++ Tutorial.ppt》由会员分享,可在线阅读,更多相关《Advanced Charm++ Tutorial.ppt(73页珍藏版)》请在麦多课文档分享上搜索。
1、1,Advanced Charm+ Tutorial,Charm Workshop Tutorial Sameer Kumar Orion Sky Lawlor charm.cs.uiuc.edu 2004/10/19,2,How to Become a Charm+ Hacker,Advanced Charm+ Advanced Messaging Writing system libraries Groups Delegation Communication framework Advanced load-balancing Checkpointing Threads SDAG,3,Adv
2、anced Messaging,4,Prioritized Execution,If several messages available, Charm will process the message with highest priority Otherwise, oldest message (FIFO) Has no effect: If only one message is available (common for network-bound applications!) On outgoing messages Very useful for speculative work,
3、 ordering timesteps, etc.,5,Priority Classes,Charm+ scheduler has three queues: high, default, and low As signed integer priorities: -MAXINT Highest priority - -1 0 Default priority 1 - +MAXINT Lowest priority As unsigned bitvector priorities: 0x0000 Highest priority - 0x7FFF 0x8000 Default priority
4、 0x8001 - 0xFFFF Lowest priority,6,Prioritized Marshalled Messages,Pass “CkEntryOptions” as last parameter For signed integer priorities: CkEntryOptions opts; opts.setPriority(-1); fooProxy.bar(x,y,opts);For bitvector priorities: CkEntryOptions opts; unsigned int prio2=0x7FFFFFFF,0xFFFFFFFF; opts.se
5、tPriority(64,prio); fooProxy.bar(x,y,opts);,7,Prioritized Messages,Number of priority bits passed during message allocationFooMsg * msg = new (size, nbits) FooMsg; Priorities stored at the end of messagesSigned integer priorities: *CkPriorityPtr(msg)=-1; CkSetQueueing(m, CK_QUEUEING_IFIFO); Unsigned
6、 bitvector priorities CkPriorityPtr(msg)0=0x7fffffff; CkSetQueueing(m, CK_QUEUEING_BFIFO);,8,Advanced Message Features,Read-only messages Entry method agrees not to modify or delete the message Avoids message copy for broadcasts, saving time Expedited messages Message do not go through the charm+ sc
7、heduler (faster) Immediate messages Entries are executed in a interrupt or the communication thread Very fast, but tough to get right,9,Read-Only, Expedited, Immediate,All declared in the .ci file.entry nokeep void foo_readonly(Msg *);entry expedited void foo_exp(Msg *);entry immediate void foo_imm(
8、Msg *);/ Immediate messages only currently work /for NodeGroups,10,Groups,11,Object Groups,A collection of objects (chares) Also called branch office chares Exactly one representative on each processor Ideally suited for system libraries A single proxy for the group as a whole Similar to arrays: Bro
9、adcasts, reductions, indexing But not completely like arrays: Non-migratable; one per processor,12,Declarations,.ci filegroup mygroup entry mygroup(); /Constructorentry void foo(foomsg *); /Entry method; C+ file class mygroup : public Group mygroup() void foo(foomsg *m) CkPrintf(“Do Nothing”); ;,13,
10、Creating and Calling Groups,Creation p = CProxy_mygroup:ckNew(); Remote invocation p.foo(msg); /broadcast p1.foo(msg); /asynchronous invocation Direct local access mygroup *g=p.ckLocalBranch(); g-foo(.); /local invocation Danger: if you migrate, the group stays behind!,14,Delegation,15,Delegation,En
11、ables Charm+ proxy messages to be forwarded to a delegation manager group Delegation manager can trap calls to proxy sends and apply optimizations Delegation manager must inherit from CkDelegateMgr User program must to call proxy.ckDelegate(mgrID);,16,Delegation Interface,.ci file group MyDelegateMg
12、r entry MyDelegateMgr(); /Constructor; .h file class MyDelegateMgr : public CkDelegateMgr MyDelegateMgr();void ArraySend(.,int ep,void *m,const CkArrayIndexMax ,17,Communication Optimization,18,Automatic Communication Optimizations,The parallel-objects Runtime System can observe, instrument, and mea
13、sure communication patterns Communication libraries can optimize By substituting most suitable algorithm for each operation Learning at runtime E.g. All to all communication Performance depends on many runtime characteristics Library switches between different algorithms Communication is from/to obj
14、ects, not processors Streaming messages optimization,19,Managing Collective Communication,Communication operation where all (or most) the processors participate For example broadcast, barrier, all reduce, all to all communication etc Applications: NAMD multicast, NAMD PME, CPAIMD Issues Performance
15、impediment Nave implementations often do not scale Synchronous implementations do not utilize the co-processor effectively,20,All to All Communication,All processors send data to all other processors All to all personalized communication (AAPC) MPI_Alltoall All to all multicast/broadcast (AAMC) MPI_
16、Allgather,21,Strategies For AAPC,Short message optimizations High software over head () Message combining Large messages Network contention,22,Short Message Optimizations,Direct all to all communication is dominated Message combining for small messages Reduce the total number of messages Multistage
17、algorithm to send messages along a virtual topology Group of messages combined and sent to an intermediate processor which then forwards them to their final destinations AAPC strategy may send same message multiple times,23,Virtual Topology: Mesh,Organize processors in a 2D (virtual) Mesh,Phase 1: P
18、rocessors send messages to row neighbors,Message from (x1,y1) to (x2,y2) goes via (x1,y2),Phase 2: Processors send messages to column neighbors,2* messages instead of P-1,24,AAPC Performance,25,Large Message Issues,Network contention Contention free schedules Topology specific optimizations,26,Ring
19、Strategy for Collective Multicast,Performs all to all multicast by sending messages along a ring formed by the processors Congestion free on most topologies,27,Streaming Messages,Programs often have streams of short messages Streaming library combines a bunch of messages and sends them off Stripping
20、 large charm+ header Short array message packing Effective message performance of about 3us,28,Using communication library,Communication optimizations embodied as strategies EachToManyMulticastStrategy RingMulticast PipeBroadcast Streaming MeshStreaming,29,Bracketed vs. Non-bracketed,Bracketed Strat
21、egies Require user to give specific end points for each iteration of message sends Endpoints declared by calling ComlibBegin() and ComlibEnd() Examples: EachToManyMulticast Non bracketed strategies No such end points necessary Examples: Streaming, PipeBroadcast,30,Accessing the Communication Library
22、,From mainchare:main Creating a strategy Strategy *strat = new EachToManyMulticastStrategy(USE_MESH) Strat = new StreamingStrategy(); Strat-enableShortMessagePacking();Associating a proxy with a StrategyComlibAssociateProxy(strat, myproxy); myproxy should be passed to all array elements,31,Sending M
23、essages,ComlibBegin(myproxy);/Bracketed Strategies for( ) myproxy.foo(msg); ComlibEnd(); /Bracketed strategies,32,Handling Migration,Migrating array element PUPs the comlib associated proxyFooArray:pup(PUP:er ,33,Compiling,You must include compile time option module commlib,34,Advanced Load-balancer
24、s Writing a Load-balancing Strategy,35,Advanced load balancing: Writing a new strategy,Inherit from CentralLB and implement the work() functionclass foolb : public CentralLB public:void work (CentralLB:LDStats* stats, int count);,36,LB Database,struct LDStats ProcStats *procs; LDObjData* objData; LD
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- ADVANCEDCHARMTUTORIALPPT
