EBSCO Logo
Connecting you to content on EBSCOhost
Results
Title

Computer vision-based hybrid efficient convolution for isolated dynamic sign language recognition.

Authors

Chowdhury, Prothoma Khan; Oyshe, Kabiratun Ummi; Rahaman, Muhammad Aminur; Debnath, Tanoy; Rahman, Anichur; Kumar, Neeraj

Abstract

Isolated dynamic sign language recognition (IDSLR) has the potential to change accessibility and inclusion by enabling speech and/or hearing-impaired people to engage more completely in a variety of spheres of life, including social interactions, work, and more. IDSLR is a challenging task due to considering a sequence of image frame analysis with multiple linguistic features for a single gesture in cluttered backgrounds and an illumination variation environment. We have proposed a Hybrid Efficient Convolution (HEC) model that ensembles EfficientNet-B3 and a few modified layers as an alternative to traditional machine learning techniques with improved performances in cluttered backgrounds with illumination variation environments. The architecture of the HCE integrates pre-trained layers of EfficientNet-B3 loaded with customized weights and a new custom dense layer featuring 256 units, followed by batch normalization, dropout, and the final output layer. To enhance the robustness of the system, we employed the augmentation technique during pre-processing. Then, the system executes channel-wise feature transformation through point-wise convolution that reduces the computational complexity and increases the accuracy. The updated dense layer with 256 units processes the output from the standard EfficientNet-B3, shaping the model into a hybrid form to achieve better performance. We have created our own gesture dataset, called "BdSL_OPA_23_GESTURES," which consists of 6000 video clips of 100 isolated dynamic Bangla Sign Language words, with 60 videos for each word from 20 different people in the cluttered background with illumination variation environments to train and evaluate the performances of the proposed model. We have considered 80% of the total dataset for training purpose, while the remaining 20% is dedicated to testing and validation. In a small number of epochs, our proposed HEC model achieves a superior accuracy of 93.17% on our created "BdSL_OPA_23_GESTURES" dataset. All the information of the proposed model with the dataset has been shared along with the scientific community to provide access publicly at: https://github.com/Prothoma2001/Bangla-Continuous-Sign-Language-Recognition.git.

Subjects

SIGN language; BENGALI language; HYBRID computers (Computer architecture); VIDEO excerpts; SPEECH

Publication

Neural Computing & Applications, 2024, Vol 36, Issue 32, p19951

ISSN

0941-0643

Publication type

Academic Journal

DOI

10.1007/s00521-024-10258-3

EBSCO Connect | Privacy policy | Terms of use | Copyright | Manage my cookies
Journals | Subjects | Sitemap
© 2025 EBSCO Industries, Inc. All rights reserved