Your institution may have rights to this item. Sign in to continue.

Title: Convolutional non-local spatial-temporal learning for multi-modality action recognition.
Authors: Ziliang Ren; Huaqiang Yuan; Wenhong Wei; Tiezhu Zhao; Qieshi Zhang
Abstract: Traditional deep convolutional networks have shown that both RGB and depth are complementary for video action recognition. However, it is difficult to enhance the action recognition accuracy because of the limitation of the single convolutional networks to extract the underlying relationship and complementary features between these two kinds of modalities. The authors proposed a novel two stream convolutional networks for multi-modality action recognition by joint optimisation learning to extract global features from RGB and depth sequences. Specifically, a non-local multi-modality compensation block is introduced to learn the semantic fusion features for the recognition performance. Experimental results on two multi-modality human action datasets, including NTU RGB+D 120 and PKU-MMD dataset, verify the effectiveness of our proposed recognition framework and demonstrate that the proposed non-local multi-modality compensation block can learn complementary features and enhance the recognition accuracy.
Subjects: HUMAN behavior; GLOBAL method of teaching; HUMAN activity recognition
Publication: Electronics Letters (Wiley-Blackwell), 2022, Vol 58, Issue 20, p765
ISSN: 0013-5194
Publication type: Article
DOI: 10.1049/ell2.12597

We found a match

Convolutional non-local spatial-temporal learning for multi-modality action recognition.