Mehra, Sunakshi; Ranga, Virender; Agarwal, Ritu

doi:10.1007/s11227-024-06015-x

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: A deep learning approach to dysarthric utterance classification with BiLSTM-GRU, speech cue filtering, and log mel spectrograms.
Authors: Mehra, Sunakshi; Ranga, Virender; Agarwal, Ritu
Abstract: Assessing the intelligibility of dysarthric speech, characterized by intricate speaking rhythms presents formidable challenges. Current techniques for objectively testing speech intelligibility are burdensome and subjective, particularly struggling with dysarthric spoken utterances. To tackle these hurdles, our method conducts an ablation analysis across speakers afflicted with speech impairment. We utilize a unified approach that incorporates both auditory and visual elements to improve the classification of dysarthric spoken utterances. In our quest to enhance spoken utterance recognition, we propose employing two distinctive extractive transformer-based approaches. Initially, we employ SepFormer to refine the speech signal, prioritizing the enhancement of signal clarity. Subsequently, we feed the improved audio samples into Swin transformer after converting them into log mel spectrograms. Additionally, we harness the power of the Swin transformer for visual classification, trained on a dataset of 14 million annotated images from ImageNet. The pre-trained scores from the Swin transformer are utilized as input for the deep bidirectional long short-term memory with gated recurrent unit (deep BiLSTM-GRU) model, facilitating the classification of spoken utterances. Our proposed deep BiLSTM-GRU model for spoken utterance classification yields impressive results on the EasyCall speech corpus, encompassing cognitive characteristics across spoken utterances ranging from 10 to 20, delivered by both healthy individuals and those with dysarthria. Notably, our results showcase an accuracy of 98.56% for 20 utterances in male speakers, 95.11% in female speakers, and 97.64% in combined male and female speakers. Across diverse scenarios, our approach consistently achieves remarkable accuracy, surpassing other contemporary methods, all without necessitating data augmentation.
Subjects: DEEP learning; INTELLIGIBILITY of speech; TRANSFORMER models; SPEECH; DATA augmentation; SPECTROGRAMS; POWER transformers
Publication: Journal of Supercomputing, 2024, Vol 80, Issue 10, p14520
ISSN: 0920-8542
Publication type: Article
DOI: 10.1007/s11227-024-06015-x

We found a match

A deep learning approach to dysarthric utterance classification with BiLSTM-GRU, speech cue filtering, and log mel spectrograms.

Mehra, Sunakshi; Ranga, Virender; Agarwal, Ritu

DEEP learning; INTELLIGIBILITY of speech; TRANSFORMER models; SPEECH; DATA augmentation; SPECTROGRAMS; POWER transformers

Journal of Supercomputing, 2024, Vol 80, Issue 10, p14520

0920-8542

Article

10.1007/s11227-024-06015-x