We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Many-BSP: an analytical performance model for CUDA kernels.
- Authors
Riahi, Ali; Savadi, Abdorreza; Naghibzadeh, Mahmoud
- Abstract
The unknown behavior of GPUs and the differing characteristics among their generations present a serious challenge in the analysis and optimization of programs in these processors. As a result, performance models have been developed to better analyze and describe the behavior of these processors. These models help programmers to configure applications and developers to improve the performance of these devices. This paper introduces an analytical model, called Many-BSP, to predict the execution time of a CUDA kernel. This model has high portability and can easily be used on various devices. There are many GPU features and behaviors that affect performance and will be discussed, including multi-threading, coalesced access to global memory, shared memory bank conflict, dual-issue instructions, limitation of functional units, parallelism in instruction, thread and warp levels, the instruction pipeline, branch divergence, and intra-block and inter-block overlapping between communications and computations. This model also employs the tree hierarchy and parameters of the Multi-BSP model to estimate the communication latency with memory. In Many-BSP, the execution time of a kernel is predicted by static analysis of CUDA and PTX codes. The performance of the model is tested on three devices of different generations and three real-world benchmarks. The results show that the execution time of a CUDA kernel can be predicted with a maximum error of 12.33%.
- Subjects
COMMUNICATION models; GRAPHICS processing units; MEMORY
- Publication
Computing, 2024, Vol 106, Issue 5, p1519
- ISSN
0010-485X
- Publication type
Article
- DOI
10.1007/s00607-023-01255-w