We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Optimizing sparse general matrix–matrix multiplication for DCUs.
- Authors
Guo, Hengliang; Wang, Haolei; Chen, Wanting; Zhang, Congxiang; Han, Yubo; Zhu, Shengguang; Zhang, Dujuan; Guo, Yang; Shang, Jiandong; Wan, Tao; Li, Qingyang; Wu, Gang
- Abstract
Sparse general matrix–matrix multiplication (SpGEMM) is a crucial and complex computational task in many practical applications. Improving the performance of SpGEMM on SIMT processors like modern GPUs is challenging due to the unpredictable sparsity of sparse matrices. Although existing GPU solutions have made progress in improving performance through advanced algorithm design, they ignore some optimizations related to specific processor architectures. This can result in a partially inefficient implementation of their algorithms. This paper focuses on optimizing four inefficient parts of the NSparse algorithm on DCU (a GPU-like accelerator). The optimizations include: 1) setting parameters to improve the load balance of the second matrix by extracting maximum row information at runtime; 2) reducing overhead of binning operations by making full use of registers and shared memory effectively; 3) improving numerical SpGEMM performance by adjusting its calculation mode; and 4) enhancing global load balance through finer-grained grouping and kernel configurations. Experiment results demonstrate that when compared to five state-of-the-art SpGEMM algorithms (bhSparse, KokkosKernels, NSparse, rocSparse, and spECK), our optimized method achieves an average of 7.99x (up to 18.2x), 8.01x (up to 20.83x), 2.37x (up to 6.16x), 1.82x (up to 4.20x), and 1.63x (up to 5.01x) speedups on 29 sparse matrices with different sparse structures, respectively.
- Subjects
SPARSE matrices; PARALLEL algorithms; MULTIPLICATION; ALGORITHMS; MATRICES (Mathematics)
- Publication
Journal of Supercomputing, 2024, Vol 80, Issue 14, p20176
- ISSN
0920-8542
- Publication type
Article
- DOI
10.1007/s11227-024-06234-2