We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Real-time continuous detection and recognition of dynamic hand gestures in untrimmed sequences based on end-to-end architecture with 3D DenseNet and LSTM.
- Authors
Lu, Zhi; Qin, Shiyin; Lv, Pin; Sun, Liguo; Tang, Bo
- Abstract
With the continuous development of the deep learning theory, novel gesture recognition approaches have been constantly emerging, and the performance has also been continuously improved. However, most research methods focus on the recognition of isolated gestures, and the detection and recognition of continuous gestures are rarely studied. To this end, aiming at the real-time detection and classification of dynamic gestures in untrimmed sequences, a well-designed end-to-end architecture based on the variants of 3D DenseNet and unidirectional LSTM was hereby proposed as an effective tool to extract the discriminative spatio-temporal features of untrimmed hand gesture sequences. Then, connectionist temporal classification was combined to train the network on a publicly available dataset, and some effective capacities could be transferred to enhance the learning ability of the proposed network by training large gesture samples. In this way, the class-conditional probability of an incoming sequence belonging to a given gesture class was predicted and then compared with a predefined threshold to automatically determine the start and end of gestures. In addition, to enhance the classification accuracy of segmented gestures, a bidirectional LSTM network was utilized to model the temporal information, with both the past frames and the future ones taken into account. Finally, a continuous gesture dataset collected indoors for specific application was introduced to validate the proposed method. On this challenge dataset, the 3D DenseNet-LSTM model achieves real-time early detection and classification tasks on unsegmented gesture sequences, and the 3D DenseNet-BiLSTM not only achieves an accuracy of 92.06 % on segmented gestures, but also a classification accuracy of 89.8 % and 99.7 % on nvGesture and SKIG public datasets, respectively. The experimental results demonstrate the performance advantages of the detection and classification as well as the real-time response speed.
- Publication
Multimedia Tools & Applications, 2024, Vol 83, Issue 6, p16275
- ISSN
1380-7501
- Publication type
Article
- DOI
10.1007/s11042-023-16130-1