Wu, Ruohan; Zhu, Xianyu; Chen, Junshi; Liu, Sha; Zheng, Tianyu; Liu, Xin; An, Hong

doi:10.1007/s11227-024-05890-8

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: SWattention: designing fast and memory-efficient attention for a new Sunway Supercomputer.
Authors: Wu, Ruohan; Zhu, Xianyu; Chen, Junshi; Liu, Sha; Zheng, Tianyu; Liu, Xin; An, Hong
Abstract: In the past few years, Transformer-based large language models (LLM) have become the dominant technology in a series of applications. To scale up the sequence length of the Transformer, FlashAttention is proposed to compute exact attention with reduced memory requirements and faster execution. However, implementing the FlashAttention algorithm on the new generation Sunway Supercomputer faces many constraints such as the unique heterogeneous architecture and the limited memory bandwidth. This work proposes SWattention, a highly efficient method for computing the exact attention on the SW26010pro processor. To fully utilize the 6 core groups (CG) and 64 cores per CG on the processor, we design a two-level parallel task partition strategy. Asynchronous memory access is employed to ensure that memory access overlaps with computation. Additionally, a tiling strategy is introduced to determine optimal SRAM block sizes. Compared with the standard attention, SWattention achieves around 2.0x speedup for FP32 training and 2.5x speedup for mixed-precision training. The sequence lengths range from 1k to 8k and scale up to 16k without being out of memory. As for the end-to-end performance, SWattention achieves up to 1.26x speedup for training GPT-style models, which demonstrates that SWattention enables longer sequence length for LLM training.
Subjects: LANGUAGE models; SUPERCOMPUTERS; ARTIFICIAL intelligence
Publication: Journal of Supercomputing, 2024, Vol 80, Issue 10, p13657
ISSN: 0920-8542
Publication type: Article
DOI: 10.1007/s11227-024-05890-8

We found a match

SWattention: designing fast and memory-efficient attention for a new Sunway Supercomputer.

Wu, Ruohan; Zhu, Xianyu; Chen, Junshi; Liu, Sha; Zheng, Tianyu; Liu, Xin; An, Hong

LANGUAGE models; SUPERCOMPUTERS; ARTIFICIAL intelligence

Journal of Supercomputing, 2024, Vol 80, Issue 10, p13657

0920-8542

Article

10.1007/s11227-024-05890-8