We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Performance Optimization of GCR in GRAPES Based on CPU+GPU Heterogeneous Parallel.
- Authors
HUANG Dongqiang; HUANG Jianqiang; JIA Jinfang; WU Li; LIU Lingbin; WANG Xiaoying
- Abstract
In order to improve the computational efficiency of the GRAPES( global/regional assimilation and prediction system) numerical weather prediction model, and to improve the performance of the dynamic framework, In order to solve the problem that the GCR algorithm was time-consuming in GRAPES mode, a CPU + GPU heterogeneous parallel preprocessing GCR algorithm was implemented. Firstly, incomplete LU decomposition was used to preprocess the coefficient matrix to reduce the number of iterations. On this basis, finegrained parallelism of OpenMP and coarse-grained parallelism of MPI were implemented. OpenMP parallelism was mainly used to improve the performance of the program by using compiler guidance to the loop body without data dependence in the way of loop unrolling. MPI parallelism was used to divide the data into various processes and improve the scalability of parallel programs by non-blocking communication and optimizing the amount of communication data. MPI was responsible for process communication and iterative control between nodes, while CUDA was responsible for processing computation-intensive tasks. The time-consuming matrix calculation part of GCR was transferred to GPU for processing, and memory optimization and data transmission optimization were adopted to reduce the data transmission overhead between CPU and GPU. The experimental results showed that the parallel acceleration ratio of OpenMP was 2. 24 times that of the serial program, the parallel acceleration ratio of MPI was 3. 32 times that of the serial program, and the parallel acceleration ratio of MPI + CUDA was 4. 69 times that of the serial program. The performance optimization of the generalized conjugate redundancy algorithm on the heterogeneous platform was realized, and the computational efficiency of the program was improved.
- Publication
Journal of Zhengzhou University: Engineering Science, 2022, Vol 43, Issue 6, p15
- ISSN
1671-6833
- Publication type
Article
- DOI
10.13705/j.issn.1671-6833.2022.03.018