A Multi-Level Parallel Implementation of a Program for Finding .ppt
《A Multi-Level Parallel Implementation of a Program for Finding .ppt》由会员分享,可在线阅读,更多相关《A Multi-Level Parallel Implementation of a Program for Finding .ppt(47页珍藏版)》请在麦多课文档分享上搜索。
1、A Multi-Level Parallel Implementation of a Program for Finding Frequent Patterns in a Large Sparse Graph,Steve Reinhardt, Interactive Supercomputing George Karypis, Dept. of Computer Science, University of Minnesota,Outline,Problem definition Prior work Problem and Approach Results Issues and Concl
2、usions,Graph Datasets,Flexible and powerful representation Evidence extraction and link discovery (EELD) Social Networks/Web graphs Chemical compounds Protein structures Biological Pathways Object recognition and retrieval Multi-relational datasets,Finding Patterns in Graphs Many Dimensions,Structur
3、e of the graph dataset many small graphs graph transaction setting one large graph single-graph setting Type of patterns connected subgraphs induced subgraphs Nature of the algorithm Finds all patterns that satisfy the minimum support requirement Complete Finds some of the patterns Incomplete Nature
4、 of the patterns occurrence The pattern occurs exactly in the input graph Exact algorithms There is a sufficiently similar embedding of the pattern in the graph Inexact algorithms,MIS calculation for frequency exact approximate upper bound Algorithm vertical (depth-first) horizontal (breadth-first),
5、Single Graph Setting,Find all frequent subgraphs from a single sparse graph. Choice of frequency definition,vSIGRAM: Vertical Solution,Candidate generation by extension Add one more edge to a current embedding. Solve MIS on embeddings in the same equivalence class. No downward-closure-based pruning
6、Two important components Frequency-based pruning of extensions Treefication based on canonical labeling,vSIGRAM: Connection Table,Frequency-based pruning. Trying every possible extension is expensive and inefficient. A particular extension might have been tested before. Categorize extensions into eq
7、uivalent classes (in terms of isomorphism), and record if each class is frequent or not. If a class becomes infrequent, never try it in later exploration.,Parallelization,Two clear sources of parallelism in the algorithm Amount of parallelism from each source not known in advance The code is typical
8、 C code structs, pointers, frequent mallocs/frees of small areas, etc. nothing like the “Fortran”-like (dense linear algebra) examples shown for many parallel programming methods Parallel structures need to accommodate dynamic parallelism Dynamic specification of parallel work Dynamic allocation of
9、processors to work Chose OpenMP taskq/task constructs Proposed extensions to OpenMP standard Support parallel work being defined in multiple places in a program, but be placed on a single conceptual queue and executed accordingly 20 lines of code changes in 15,000 line program Electric Fence was ver
10、y useful in finding coding errors,Algorithmic Parallelism,vSiGraM (G, MIS_type, f) 1. F 2. F1 all frequent size-1 subgraphs in G 3. for each F1 in F1 do 4. M(F1) all embeddings of F1 5. for each F1 in F1 do / high-level parallelism 6. F F vSiGraM-Extend(F1, G, f) return FvSiGraM-Extend(Fk, G , f) 1.
11、 F 2. for each embedding m in M(Fk) do / low-level parallelism 3. Ck+1 C k+1 all (k+1)-subgraphs of G containing m 4. for each Ck+1 in Ck+1 do 5. if Fk is not the generating parent of Ck+1 then 6. continue 7. compute Ck+1.freq from M(Ck+1) 8. if Ck+1.freq f then 9. continue 10. F F vSiGraM-Extend(Ck
12、+1, G, f) 11.return F,Simple Taskq/Task Example,main() int val; #pragma intel omp taskqval = fib(12345); fib(int n) int partret2;if (n2) #pragma intel omp taskfor(i=n-2; in; i+) partretn-2-i = fib(i);return (partret0 + partret1); else return 1; ,High-Level Parallelism with taskq/task,/ At the bottom
13、 of expand_subgraph, after all child / subgraphs have been identified, start them all. #pragma intel omp taskqfor (ii=0; iict, lg, ls, o); / end-task,Low-Level Parallelism with taskq/task,#pragma omp parallel shared(nt, priv_es) #pragma omp masternt = omp_get_num_threads(); /#threads in parpriv_es =
14、 (ExtensionSet *)kmp_calloc(nt, sizeof(ExtensionSet *); #pragma omp barrier #pragma intel omp taskqfor (i = 0; i sg_vmap_size(sg); i+) #pragma intel omp task captureprivate(i)int th = omp_get_thread_num();if (priv_esth = NULL) priv_esth = exset_init(128);expand_map(sg, ct, ams, i, priv_esth, lg); /
15、end parallel section; next loop is serial reductionfor (i=0; i nt; i+) if (priv_esi != NULL) exset_merge(priv_esi,es);kmp_free(priv_es); ,Implementation due to Grant Haab and colleagues from Intel OpenMP library group,Experimental Results,SGI Altix 32 Itanium2 sockets (64 cores), 1.6GHz 64 GBytes (t
16、hough not memory limited) Linux No special dplace/cpuset configuration Minimum frequencies chosen to illuminate scaling behavior, not provide maximum performance,Dataset 1 - Chemical,Dataset 2 aviation,Performance of High-level Parallelism,When sufficient quantity of work (i.e., frequency threshold
17、is low enough) Good speed-ups to 16P Reasonable speed-ups to 30P Little or no benefit above 30P No insight into performance plateau,Poor Performance of Low-level Parallelism,Several possible effects ruled out Granularity of data allocation Barrier before master-only reduction Source: highly variable
18、 times for register_extension 100X slower in parallel than serial, but different instances from execution to execution Apparently due to highly variable run-times for malloc Not understood,Issues and Conclusions,OpenMP taskq/task were straightforward to use in this program and implemented the desire
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- AMULTILEVELPARALLELIMPLEMENTATIONOFAPROGRAMFORFINDINGPPT

链接地址:http://www.mydoc123.com/p-377846.html