欢迎来到麦多课文档分享! | 帮助中心 海量文档,免费浏览,给你所需,享你所想!
麦多课文档分享
全部分类
  • 标准规范>
  • 教学课件>
  • 考试资料>
  • 办公文档>
  • 学术论文>
  • 行业资料>
  • 易语言源码>
  • ImageVerifierCode 换一换
    首页 麦多课文档分享 > 资源分类 > PPT文档下载
    分享到微信 分享到微博 分享到QQ空间

    A Vector API for Java.ppt

    • 资源ID:377909       资源大小:1.80MB        全文页数:45页
    • 资源格式: PPT        下载积分:2000积分
    快捷下载 游客一键下载
    账号登录下载
    微信登录下载
    二维码
    微信扫一扫登录
    下载资源需要2000积分(如需开发票,请勿充值!)
    邮箱/手机:
    温馨提示:
    如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如需开发票,请勿充值!如填写123,账号就是123,密码也是123。
    支付方式: 支付宝扫码支付    微信扫码支付   
    验证码:   换一换

    加入VIP,交流精品资源
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    A Vector API for Java.ppt

    1、A Vector API for Java,Ian Graves ,Legal Disclaimers,2,INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTELS TERMS AND CONDITIO

    2、NS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYR

    3、IGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A “Mission Critical Application“ is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTELS PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDE

    4、MNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY

    5、, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without noti

    6、ce. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved“ or “undefined“. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The informati

    7、on here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available o

    8、n request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or

    9、 go to: http:/ Intel, the Intel logo, Intel Xeon, and Xeon logos are trademarks of Intel Corporation in the U.S. and/or other countries. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor fam

    10、ilies: Go to: Learn About Intel Processor Numbers http:/ *Other names and brands may be claimed as the property of others. Copyright 2015 Intel Corporation. All rights reserved.,Legal Disclaimers Continued,3,Some results have been estimated based on internal Intel analysis and are provided for infor

    11、mational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are

    12、measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performa

    13、nce of that product when combined with other products. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmar

    14、ks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platfo

    15、rm into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported. SPEC, SPECint, SPECfp, SPECrate, SPECpower, SPECjbb, SPECompG, SPEC MPI, and SPECjEnterprise* are trademarks of

    16、the Standard Performance Evaluation Corporation. See http:/www.spec.org for more information. TPC Benchmark, TPC-C, TPC-H, and TPC-E are trademarks of the Transaction Processing Council. See http:/www.tpc.org for more information. Intel Advanced Vector Extensions (Intel AVX)* are designed to achieve

    17、 higher throughput to certain integer and floating point operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel Turbo Boost Technology 2.0 to not achieve any or maximum tu

    18、rbo frequencies. Performance varies depending on hardware, software, and system configuration and you should consult your system manufacturer for more information. Intel Advanced Vector Extensions refers to Intel AVX, Intel AVX2 or Intel AVX-512. For more information on Intel Turbo Boost Technology

    19、2.0, visit http:/ In this Presentation,Is still a rough prototype! Subject to change! Part of the OpenJDK Project Panama Licensed Under GPLv2 With ClassPath Exception Get the code here! http:/ CodeSnippets Vector API Design Wrap Up,Introduction: Vector API Project Team,Oracle Vladimir Ivanov John Ro

    20、se Paul Sandoz Intel Michael Berg Steve Dohrmann Ian Graves Shravya Rukmannagari Sandhya Viswanathan,Terminology,Code Snippets: Encoding instructions as data in Java Binding to MethodHandle Vector API: API encompassing operations with vector instruction support. Implemented on top of Code Snippets.,

    21、Motivation,Many popular applications benefit from data-parallel computations Architectural support remains opaque to the JVM developer Looking to expose “pure Java” performant solutions that map to the architecture well. No JNI interfacing single language solutions Minimized Boilerplate generated co

    22、de is good quality,Project Goals,Expose data-parallel vector operations for developer use in Java Portability and performance Scalability Idiomatic,Code Snippets,CodeSnippets as a Substrate,A portable API for expressing primitives More flexible than HotSpot intrinsics Less technical debt with Graal

    23、on the horizon ISAs can use the same API In prototype phase, but good perf observed Value objects to registers MethodHandle invocation achieves good code quality.,Implementing a Primitive,Primitives Bind to MethodHandle Invoked via MethodHandle methods MethodHandles library has additional combinator

    24、s Types of CodeSnippets represented as MethodType objects Vector represented by Long2/4/8 objects Wrappers for 128,256,and 512-bit values. Wrappers are elided in the best case. Values registerized. Escape analysis a work in progress,Binding to Machine Instruction,static final MethodType MT_L4_BINARY

    25、 = MethodType.methodType(Long4.class, Long4.class, Long4.class);private static final MethodHandle MHm256_vaddps = MachineCodeSnippet.make(“mm256_vaddps“, MT_L4_BINARY, requires(AVX),new RegisterxmmRegistersSSE, xmmRegistersSSE, xmmRegistersSSE,(Register regs) - Register out = regs0;Register in1 = re

    26、gs1;Register in2 = regs2;int vex = vex_prefix(rBit(out),X_LOW,bBit(in2),M_0F,W_LOW,in1,L_256,PP_NONE);return vex_emit(vex, 0x58, modRM(out, in2););,Registers via JVMCI,Desired Register Masks,MethodHandle Type,Feature-checking predicate,Macro-ized x86 encoding,Checked Invocation,private static Long4

    27、vaddps_naive(Long4 a, Long4 b) float res = new float8;for (int i = 0; i 8; i+) resi = getFloat(a, i) + getFloat(b, i);return long4FromFloatArray(res,0);public static Long4 vaddps(Long4 a, Long4 b) try Long4 res = (Long4) MHm256_vaddps.invokeExact(a, b);assert assertEquals(res, vaddps_naive(a, b);ret

    28、urn res; catch (Throwable e) throw new Error(e);,Pure Java equivalent function.,Type-safe invocation point.,A Small Example,public static float proc(float left, float right, float res)if(left.length != right.length)throw new UnsupportedOperationException(“Arrays unequal.“); else if (left.length % 8

    29、!= 0) throw new UnsupportedOperationException(“Length must be n*8“);for(int i = 0; i left.length; i+=8)addArrays(left,right,res,i);return res; /Convenience,Loop Kernel,Small Example (contd),/Isolated for code quality purposes in prototypepublic static void addArrays(float left, float right, float re

    30、s, int i)/VMOVDQU ymmX, YMMWORD PTR Long4 l = PatchableVecUtils.long4FromFloatArray(left,i);Long4 rr = PatchableVecUtils.vaddps(l,right,i);/VMOVDQU YMMWORD PTR , ymmXPatchableVecUtils.long4ToFloatArray(res,i,rr);,Scaled load,Scaled store,vaddps reg, YMMWORD PTR .,Generating C2 Code,java -XaddExports

    31、:java.base/jdk.internal.misc=ALL-UNNAMED -XaddExports:java.base/jdk.internal.vm.annotation=ALL-UNNAMED -XX:+UnlockDiagnosticVMOptions -XX:-UseSuperWord-XX:LoopMaxUnroll=1-XX:PrintAssemblyOptions=intel -XX:CompileCommand=option,*AddArraysLong4PS:addArrays,PrintAssembly-cp build AddArraysLong4PS,Snipp

    32、ets!,Generated Code,Performance of This Example,Compared to Scalar implementation Disabled SuperWord and Loop Unrolling We see a 40% reduction in clock cycles spent in the loop kernel with the vectorized version. This workload is a prototype PoC, we need more advanced workloads that better leverage

    33、vectorization. Bigger, more intensive workloads to come Wall clock time indicates overhead coming from outside of the loop kernel vs. the scalar version more work to do!,The Vector API,Java Needs an Abstraction for Vectors,Vector ISA Extensions are powerful, expressive, and deep. Most instructions h

    34、ave many different forms and support differing operand sizes NxM problems abound for API writers Needs to be to capture the essence of vectorization in the spirit of Java Platform independence Snippets too low level Meaningful static checking Familiar patterns to abstract operational complexity,Vect

    35、or API,Intended API to encompass the CodeSnippets implementation Proposed by John Rose*. Work continues within the Panama Project interface Vector S - Shape type describes the size of the Vector E - The element type of the Vector Broadest support for Float, Integer, Double Draft implementations chec

    36、ked into Project Panama,* http:/ of the API,Vector,FloatVector,FloatVector128,FloatVector256,FloatVectorXYZ,Factory-Constructed Classes,Factory methods here.,Basic Vector-Vector Functionality,interface Vector Vector add(Vector v2);Vector mul(Vector v2);Vector and(Vector v2); ,Immutability!,More Adva

    37、nced,interface Vector E getElement(int i);Vector putElement(int i, E elem);E sumAll();E toArray();fromArray(E ary, int offset); ,Scalar/Vector Interfacing,Horizontal Reductions. Multiple snippets.,Loading and storing to arrays,Fully Realized Expressiveness,interface Vector Vector map(UnaryOperator o

    38、p);Vector mapWhere(Mask mask, UnaryOperator op);Vector map(BinaryOperator op, Vector v2);Vector mapWhere(Mask mask, BinaryOperator op, Vector this2); ,Kernel with Vector API,public static void addArrays(float left, float right, float res, int i)FloatVector l = float256FromArray(left,i),r = float256F

    39、romArray(right,i),lr = l.add(r);lr.intoArray(res,i); ,27,Higher Order Components,Highly desirable, modern part of this API A programmer specifies a loop body Minimal thought given to vectorization Using regular arithmetic and logical syntactic operators Requires a way to “crack” or inspect lambdas a

    40、t runtime Ways Forward We need better control of our higher order components Factories for constructing primitive arithmetic operations Need to be composable,Kernel Construction,We can construct our “higher order” operations from existing parts. We can constrain our support to operations that are ve

    41、ctorizable. Arity-one, or arity-two (maybe three) operations Restricting to arithmetic and logical operations that are broadly supported Our existing work on CodeSnippets can form the base! MethodHandles are highly composable, even with snippets,f = (x,y) - (x+y) * y;,MethodType mt = MethodType.meth

    42、odType(Long4.class,Long4.class,Long4.class);MethodHandle MHm256_vaddps = CodeSnippet.make(,mt,),MHm256_vmulps = CodeSnippet.make(,mt,);MethodHandle f_pre = MethodHandles.collectArguments(MHm256_vmulps, 0, MHm256_vaddps);MethodHandle f = MethodHandles.permuteArguments(f_pre,mt,0,1,1);,Statically Type

    43、d Wrappers,A layer over MethodHandles for encapsulating the lower level details and making them type safe will coincide with the existing API spec. One method proposed is VectorOp Proposed on Project Panama* Vector Operations explicit and exposed to the user to compose and use as kernels. Another ap

    44、proach is to use a lightweight syntax tree Hand off to a Vector object for interpretation/conversion to an equivalent MethodHandle structure for execution. Vector objects visit the tree to compose the according MethodHandles. Same syntax trees could be handed off to different Vector types. Still ver

    45、y much in the works!,* http:/ Thoughts.,Most Vector operations are simple expressions Expressions are (basically) trees MethodHandles can be combined together in a tree-like fashion permuteArguments() collectArguments() filterArguments() filterReturn() Method Handles have added benefits (high level

    46、models matter!) Weve already observed good code with Method Handles, so lets try it! Coding this way can elide the need to box Long2/4/8,32,Expressions Bind to Method Handles.,33,*,+,y,y,x,(x,y) -,AST Visitor,Theres more!,34,34,*,+,y,y,x,(x,y) -,256_visitor,128_visitor,XYZ_visitor,Babys First EDSL,i

    47、nterface Expression default Expression add(Expression right)return new AddExpression(this,right);default Expression mul(Expression right)return new MulExpression(this,right);default Expression not()return new NotExpression(this);default Expression trace(Consumer f)return new TraceExpression(this,f);

    48、default Expression fromFloat(Float f)return new ConstExpression(f);R evaluate(ExpressionEvaluator e); ,35,Careful!,BinaryOperation expr = (l,r) - Expression e1 = l.add(r);return e1.mul(r); ,36,expr.apply(Symbol.LEFT,Symbol.RIGHT);To populate leaf nodes. Symbol non-public.,MethodHandle binaryReduction(float left, float right, float dst, BinaryOperator);MethodHandle br = binaryReduction(left,right,dst,(l,r) - Expression e1 = l.add(r);return e1.mul(r); );/Execute the entire computation br.invokeExact();/Making it hot for inspection for(int i = 0; i BIGNUMBER; i+)br.invokeExact(),


    注意事项

    本文(A Vector API for Java.ppt)为本站会员(twoload295)主动上传,麦多课文档分享仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文档分享(点击联系客服),我们立即给予删除!




    关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

    copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
    备案/许可证编号:苏ICP备17064731号-1 

    收起
    展开