Automatic Tuning for Parallel FFTs.ppt
《Automatic Tuning for Parallel FFTs.ppt》由会员分享,可在线阅读,更多相关《Automatic Tuning for Parallel FFTs.ppt(21页珍藏版)》请在麦多课文档分享上搜索。
1、2008/6/24,Second French-Japanese PAAP Workshop,1,Automatic Tuning for Parallel FFTs,Daisuke Takahashi University of Tsukuba, Japan,2008/6/24,Second French-Japanese PAAP Workshop,2,Outline,Background Objectives Approach Block Six-Step/Nine-Step FFT Algorithm Automatic Tuning for Parallel FFTs Perform
2、ance Results Conclusion,2008/6/24,Second French-Japanese PAAP Workshop,3,Background,The fast Fourier transform (FFT) is an algorithm widely used today in science and engineering. Parallel FFT algorithms on distributed-memory parallel computers have been well studied. Many numerical libraries with an
3、 automatic performance tuning have been developed, e.g., ATLAS, FFTW, and I-LIB.,2008/6/24,Second French-Japanese PAAP Workshop,4,Background (contd),One goal for large FFTs is to minimize the number of cache misses. Many FFT algorithms work well when data sets fit into a cache. When a problem exceed
4、s the cache size, however, the performance of these FFT algorithms decreases dramatically. We modified the conventional six-step FFT algorithm to reuse data in the cache memory. We will call it a “block six-step FFT”.,2008/6/24,Second French-Japanese PAAP Workshop,5,Related Works,FFTW Frigo and John
5、son (MIT) The recursive call is employed to access main memory hierarchically. This technique is very effective in the case that the total amount of data is not so much greater than the cache size. For 1-D parallel MPI FFT, the six-step FFT is used. http:/www.fftw.org SPIRAL Pueschel et al. (CMU) Th
6、e goal of SPIRAL is to push the limits of automation in software and hardware development and optimization for digital signal processing (DSP) algorithms. http:/,2008/6/24,Second French-Japanese PAAP Workshop,6,FFTE: A High-Performance FFT Library,FFTE is a Fortran subroutine library for computing t
7、he Fast Fourier Transform (FFT) in one or more dimensions. It includes complex, mixed-radix and parallel transforms. Shared / Distributed memory parallel computers (OpenMP, MPI and OpenMP + MPI) It also supports Intels SSE2/SSE3 instructions. HPC Challenge Benchmark FFTEs 1-D parallel FFT routine ha
8、s been incorporated into the HPC Challenge (HPCC) benchmark http:/www.ffte.jp,2008/6/24,Second French-Japanese PAAP Workshop,7,Objectives,To improve the performance, we need to select the optimal parameters according to the computational environment and the problem size. We implement an automatic tu
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- AUTOMATICTUNINGFORPARALLELFFTSPPT
