Wang, Ruimin; Yang, Zhiwei; Xu, Hao; Lu, Lu

doi:10.1007/s11227-021-03936-9

Results

Title: A high-performance batched matrix multiplication framework for GPUs under unbalanced input distribution.
Authors: Wang, Ruimin; Yang, Zhiwei; Xu, Hao; Lu, Lu
Abstract: In the past few decades, general matrix multiplication (GEMM), as the basic component of the Basic Linear Algebra Subprograms (BLAS) library, has played a vital role in various fields such as machine learning, image processing, and fluid dynamics. Because these fields tend to deconstruct the problem into multiple smaller sub-problems, today's BLAS libraries have implemented batched GEMM routines to achieve high performance in this scenario. MAGMA proposes a vbatch routine to calculate batched GEMM with variable size on GPU, but unbalanced input will cause some workgroups and threads to be idle, thereby affecting performance. In addition, unbalanced input will also affect the load balancing of the Computing Unit in GPU, and extreme input will lead to insufficient utilization of hardware resources. In this paper we proposes a high-performance batched GEMM computing framework on GPU. For a large batch of small matrices with variable sizes and unbalanced distribution, the proposed framework considered the hardware architecture and the possible data distribution, and adopted three methods (flexible tile, sort-up and split-down) to improve hardware utilization and achieve better load balancing. Experimental results show that our framework has a 3.02× performance improvement compared to the latest MAGMA implementation on AMD Radeon Instinct MI50 GPU, and 3.14× speedup on MI100.
Subjects: MATRIX multiplications; GRAPHICS processing units; LINEAR algebra; FLUID dynamics; IMAGE processing; DATA distribution
Publication: Journal of Supercomputing, 2022, Vol 78, Issue 2, p1741
ISSN: 0920-8542
Publication type: Academic Journal
DOI: 10.1007/s11227-021-03936-9

A high-performance batched matrix multiplication framework for GPUs under unbalanced input distribution.

Wang, Ruimin; Yang, Zhiwei; Xu, Hao; Lu, Lu

MATRIX multiplications; GRAPHICS processing units; LINEAR algebra; FLUID dynamics; IMAGE processing; DATA distribution

Journal of Supercomputing, 2022, Vol 78, Issue 2, p1741

0920-8542

Academic Journal

10.1007/s11227-021-03936-9