We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Memory-accelerated parallel method for multidimensional fast fourier implementation on GPU.
- Authors
Hu, Yichang; Lu, Lu; Li, Cuixu
- Abstract
Fast Fourier transform (FFT) is a well-known algorithm that calculates the discrete Fourier transform (DFT) of discrete data and is an essential tool in scientific and engineering computation. Due to the large amounts of data, parallelly executing FFT in graphics processing unit (GPU) can effectively optimize the performance. Following this approach, FFTW and some other FFT packages were designed, but the fixed computation pattern makes it hard to utilize the computing power of GPU. Additionally, the memory access pattern is not optimized to alleviate the bottleneck of data exchange. Motivated by these challenges, we propose an efficient GPU-accelerated multidimensional FFT library to achieve better performance in this paper. We present a detailed and clear implementation strategy and optimize FFT by having as few memory transfers as possible. The data will be reshuffled on the CPU, and the access mode is also optimized to coordinate with the GPU memory access pattern. Several optimizations are also demonstrated to enhance the performance of our approach for varying FFT sizes, and the evaluation shows that our approach consistently outperforms rocFFT with a speedup of about 25% to 250% on average in AMD Instinct MI100 GPU.
- Subjects
DISCRETE Fourier transforms; MULTIDIMENSIONAL databases; FAST Fourier transforms; GRAPHICS processing units; PATTERNMAKING
- Publication
Journal of Supercomputing, 2022, Vol 78, Issue 16, p18189
- ISSN
0920-8542
- Publication type
Academic Journal
- DOI
10.1007/s11227-022-04570-9