Dursun, Hikmet; Kunaseth, Manaschai; Nomura, Ken-ichi; Chame, Jacqueline; Lucas, Robert; Chen, Chun; Hall, Mary; Kalia, Rajiv; Nakano, Aiichiro; Vashishta, Priya

doi:10.1007/s11227-012-0764-z

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters.
Authors: Dursun, Hikmet; Kunaseth, Manaschai; Nomura, Ken-ichi; Chame, Jacqueline; Lucas, Robert; Chen, Chun; Hall, Mary; Kalia, Rajiv; Nakano, Aiichiro; Vashishta, Priya
Abstract: We present a scalable parallelization scheme for high-order stencil computations that also optimizes memory behavior on multicore clusters. Our multilevel approach combines: (i) inter-node parallelization via spatial decomposition; (ii) inter-core parallelization via multithreading and explicit non-uniform memory access (NUMA) control; (iii) data locality optimizations through auto-tuned tiling for efficient use of hierarchical memory; and (iv) register blocking and data parallelism via single-instruction multiple-data techniques to utilize registers and exploit data locality. The scheme is applied to a sixth-order stencil based finite-difference time-domain code. Weak-scaling parallel efficiency is over 98 % on 32,768 BlueGene/P processors. Multithreading with explicit NUMA control attains 9.9-fold speedup on a dual 12-core AMD Opteron system. Data locality optimizations achieve 7.7-fold reduction of the last level cache miss rate of Intel Nehalem, whereas register blocking increases data parallelism and thereby achieves 5.9 Gflops performance on a single core. Register blocking + multithreading optimizations achieve 5.8-fold speedup on a single quadcore Nehalem.
Subjects: HIERARCHICAL storage management (Computers); MULTICORE processors; NON-uniform memory access; THREADS (Computer programs); RANDOM access memory; FINITE difference time domain method
Publication: Journal of Supercomputing, 2012, Vol 62, Issue 2, p946
ISSN: 0920-8542
Publication type: Article
DOI: 10.1007/s11227-012-0764-z

We found a match

Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters.

Dursun, Hikmet; Kunaseth, Manaschai; Nomura, Ken-ichi; Chame, Jacqueline; Lucas, Robert; Chen, Chun; Hall, Mary; Kalia, Rajiv; Nakano, Aiichiro; Vashishta, Priya

HIERARCHICAL storage management (Computers); MULTICORE processors; NON-uniform memory access; THREADS (Computer programs); RANDOM access memory; FINITE difference time domain method

Journal of Supercomputing, 2012, Vol 62, Issue 2, p946

0920-8542

Article

10.1007/s11227-012-0764-z