We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
LGANet: Local and global attention are both you need for action recognition.
- Authors
Wang, Hao; Zhao, Bin; Zhang, Wenjia; Liu, Guohua
- Abstract
Due to redundancy in the spatiotemporal neighborhood and the global dependency between video frames, video recognition remains a challenge. Some prior works have been mainly driven by 3D convolutional neural networks (CNNs) or 2D CNNs with a well‐designed module for temporal information. However, convolution‐based networks lack the capability to capture the global dependency due to the limited receptive field. Alternatively, transformer for video recognition is proposed to build long‐range dependency between frame patches. Nevertheless, most transformer‐based networks have significant computational costs because attention is calculated among all the tokens. Based on these observations, we propose an efficient network which we dub LGANet. Unlike conventional CNNs and transformers for video recognition, the LGANet can tackle both spatiotemporal redundancy and dependency by learning local and global token affinity in shallow and deep layers, respectively. Specifically, local attention is implemented in the shallow layers to reduce parameters and eliminate redundancy. In the deep layers, spatial‐wise and channel‐wise self‐attention are embedded to realize the global dependency of high‐level features. Moreover, several key designs are made in the multi‐head self‐attention (MSA) and feed‐forward network (FFN). Extensive experiments are conducted on the popular video benchmarks, such as Kinetics‐400, Something‐Something V1&V2. Without any bells and whistles, the LGANet achieves state‐of‐the‐art performance. The code will be released soon.
- Subjects
CONVOLUTIONAL neural networks; TRANSFORMER models; VIDEO compression; RECOGNITION (Psychology)
- Publication
IET Image Processing (Wiley-Blackwell), 2023, Vol 17, Issue 12, p3453
- ISSN
1751-9659
- Publication type
Article
- DOI
10.1049/ipr2.12876