Autovectorization in GCC.ppt
《Autovectorization in GCC.ppt》由会员分享,可在线阅读,更多相关《Autovectorization in GCC.ppt(32页珍藏版)》请在麦多课文档分享上搜索。
1、Autovectorization in GCC,Dorit Naishlos ,2,Vectorization in GCC - Talk Layout,Background: GCC HRL and GCC,Vectorization Background The GCC Vectorizer Developing a vectorizer in GCC Status & Results Future Work Working with an Open Source Community Concluding Remarks,3,GCC GNU Compiler Collection,Ope
2、n Source Download from gcc.gnu.orgMulti-platform 2.1 million lines of code, 15 years of development How does it work cvs mailing list: gcc-patchesgcc.gnu.org steering committee, maintainers Whos involved Volunteers Linux distributors Apple, IBM HRL (Haifa Research Lab),4,GCC Passes,machine descripti
3、on,C front-end,Java front-end,C+ front-end,parse trees,int i, a16, b16 for (i=0; i 16; i+)ai = ai + bi;,int i; int T.1, T.2, T.3;i = 0; L1:if (i 16) break;T.1 = ai ;T.2 = bi ;T.3 = T.1 + T.2;ai = T.3;i = i + 1;goto L1; L2:,int i_0, i_1, i_2; int T.1_3, T.2_4, T.3_5;i_0 = 0; L1: i_1 = PHIif (i_1 16)
4、break;T.1_3 = ai_1 ;T.2_4 = bi_1 ;T.3_5 = T.1_3 + T.2_4;ai_1 = T.3_5;i_2 = i_1 + 1;goto L1; L2:,GIMPLE:,SSA,5,GCC Passes,GCC 4.0,6,GCC Passes,The Haifa GCC team: Leehod Baruch Revital Eres Olga Golovanevsky Mustafa Hagog Razya Ladelsky Victor Leikehman Dorit Naishlos Mircea Namolaru Ira Rosen Ayal Z
5、aks,machine description,Fortran 95 front-end,IPO CP Aliasing Data layout,Vectorization,Loop unrolling,Scheduler,Modulo Scheduling,Power4,7,Vectorization in GCC - Talk Layout,Background: GCC HRL and GCC,Vectorization Background The GCC Vectorizer Developing a vectorizer in GCC Status & Results Future
6、 Work Working with an Open Source Community Concluding Remarks,8,Programming for Vector Machines,Proliferation of SIMD (Single Instruction Multiple Data) model MMX/SSE, Altivec Communications, Video, Gaming Fortran90 a0:N = b0:N + c0:N; Intrinsicsvector float vb = vec_load (0, ptr_b);vector float vc
7、 = vec_load (0, ptr_c);vector float va = vec_add (vb, vc); vec_store (va, 0, ptr_a); Autovectorization: Automatically transform serial code to vector code by the compiler.,9,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,OP(a) OP(b) OP(c) OP(d),Data in Memory:,VOP( a, b, c, d ),VR1,What is vectorization,Vector Reg
8、isters,Data elements packed into vectors Vector length Vectorization Factor (VF),VF = 4,Vector operation,vectorization,10,Vectorization,original serial loop: for(i=0; iN; i+) ai = ai + bi; loop in vector notation: for (i=0; iN; i+=VF) ai:i+VF = ai:i+VF + bi:i+VF; ,loop in vector notation: for (i=0;
9、i(N-N%VF); i+=VF) ai:i+VF = ai:i+VF + bi:i+VF; for ( ; i N; i+) ai = ai + bi; ,vectorization,Loop based vectorization No dependences between iterations,vectorized loop,epilog loop,11,Loop Dependence Tests,for (i=0; iN; i+) for (j=0; jN; j+) Ai+1j = Aij + X,for (i=0; iN; i+) Di = Ai + Y Ai+1 = Bi + X
10、 ,for (i=0; iN; i+) Bi = Ai + Y Ai+1 = Bi + X ,12,Loop Dependence Tests,for (i=0; iN; i+) for (j=0; jN; j+) Ai+1j = Aij + X,for (i=0; iN; i+) Ai+1 = Bi + X,for (i=0; iN; i+) Di = Ai + Y,for (i=0; iN; i+) Ai+1 = Bi + X Di = Ai + Y ,for (i=0; iN; i+) Di = Ai + Y Ai+1 = Bi + X ,for (i=0; iN; i+) Bi = A
11、i + Y Ai+1 = Bi + X ,13,Classic loop vectorizer,dependence graph,int exist_dep(ref1, ref2, Loop) Separable Subscript testsZeroIndexVar SingleIndexVar MultipleIndexVar (GCD, Banerjee.) Coupled Subscript tests (Gamma, Delta, Omega),find SCCs reduce graph topological sort for all nodes:Cyclic: keep seq
12、uential loop for this nest. non Cyclic:,for ifor jfor kA5 i+1 j = AN i k,for ifor jfor kA5 i+1 i = AN i k,replace node with vector code,loop transform to break cycles,14,Vectorizer Skeleton,get candidate loops nesting, entry/exit, countable,scalar dependences,vectorizable operations data-types, VF,
13、target support,vectorize loop,known loop bound,1D aligned arrays,Basic vectorizer 01.01.2004,idiom recognition,invariants,saturation,conditional code,for (i=0; iN; i+)ai = bi + ci; ,li r9,4 li r2,0 mtctr r9 L2:lvx v0,r2,r30lvx v1,r2,r29vaddfp v0,v0,v1stvx v0,r2,r0addi r2,r2,16bdnz L2,arrays and poin
14、ters,unaligned accesses,force alignment,mainline,15,Vectorization on SSA-ed GIMPLE trees,int T.1, T.2, T.3;loop:if ( i 16 ) break; S1: T.1 = ai ; S2: T.2 = bi ; S3: T.3 = T.1 + T.2; S4: ai = T.3; S5: i = i + 1; goto loop;,loop: if (i 16) break;T.11 = ai ;T.12 = ai+1;T.13 = ai+2; T.14 = ai+3; T.21 =
15、bi ;T.22 = bi+1;T.23 = bi+2; T.24 = bi+3;T.31 = T.11 + T.21;T.32 = T.12 + T.22;T.33 = T.13 + T.23;T.34 = T.14 + T.24; ai = T.31;ai+1 = T.32;ai+2 = T.33;ai+3 = T.34;i = i + 4;goto loop;,VF = 4,“unroll by VF and replace”,int i; int aN, bN; for (i=0; i 16; i+)ai = ai + bi ;,v4si VT.1, VT.2, VT.3; v4si
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- AUTOVECTORIZATIONINGCCPPT
