We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
An effective 3-D fast fourier transform framework for multi-GPU accelerated distributed-memory systems.
- Authors
Zhou, Binbin; Lu, Lu
- Abstract
This paper introduces an efficient and flexible 3D FFT framework for state-of-the-art multi-GPU distributed-memory systems. In contrast to the traditional pure MPI implementation, the multi-GPU distributed-memory systems can be exploited by employing a hybrid multi-GPU programming model that combines MPI with OpenMP to achieve effective communication. An asynchronous strategy that creates multiple streams and threads to reduce blocking time is adopted to accelerate intra-node communication. Furthermore, we combine our scheme with the GPU-Aware MPI implementation to perform GPU-GPU data transfers without CPU involvement. We also optimize the local FFT and transpose by creating fast parallel kernels to accelerate the total transform. Results show that our framework outperforms the state-of-the-art distributed 3D FFT library, being up to achieve 2× faster in a single node and 1.65× faster using two nodes.
- Subjects
FAST Fourier transforms; GRAPHICS processing units
- Publication
Journal of Supercomputing, 2022, Vol 78, Issue 15, p17055
- ISSN
0920-8542
- Publication type
Academic Journal
- DOI
10.1007/s11227-022-04491-7