Rajan, Rajeev; Hridya Raj, T. V.

doi:10.1007/s10772-023-10071-8

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: SENet-based speech emotion recognition using synthesis-style transfer data augmentation.
Authors: Rajan, Rajeev; Hridya Raj, T. V.
Abstract: This paper addresses speech emotion recognition using a channel-attention mechanism with a synthesized data augmentation approach. Convolutional neural network (CNN) produces channel attention map by exploiting the inter-channel relationship of features. The main issue faced in the speech emotion recognition domain is insufficient data for building an efficient model. The proposed work uses a style transfer scheme to achieve data augmentation by multi-voice synthesis from the text. It consists of text-to-speech (TTS) and style transfer modules. Synthesized speech is generated from the text for a target speaker's voice by a TTS converter in the front end. Later, the emotion of the synthesized speech is obtained based on the emotional content fed to the style-transfer module. The text-to-speech module is trained using LibriSpeech and NUS-48E corpus. The quality of the synthesized speech samples is also rated using subjective evaluation through mean opinion score (MOS). The speech emotion recognition approach is systematically evaluated using the Berlin EMO-DB corpus. The channel-attention-based Squeeze and Excitation Network (SEnet) shows its promise in the speech emotion recognition experiment.
Publication: International Journal of Speech Technology, 2023, Vol 26, Issue 4, p1017
ISSN: 1381-2416
Publication type: Article
DOI: 10.1007/s10772-023-10071-8

We found a match

SENet-based speech emotion recognition using synthesis-style transfer data augmentation.

Rajan, Rajeev; Hridya Raj, T. V.

International Journal of Speech Technology, 2023, Vol 26, Issue 4, p1017

1381-2416

Article

10.1007/s10772-023-10071-8