We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Modeling Speech Emotion Recognition via Attention-Oriented Parallel CNN Encoders.
- Authors
Makhmudov, Fazliddin; Kutlimuratov, Alpamis; Akhmedov, Farkhod; Abdallah, Mohamed S.; Cho, Young-Im
- Abstract
Meticulous learning of human emotions through speech is an indispensable function of modern speech emotion recognition (SER) models. Consequently, deriving and interpreting various crucial speech features from raw speech data are complicated responsibilities in terms of modeling to improve performance. Therefore, in this study, we developed a novel SER model via attention-oriented parallel convolutional neural network (CNN) encoders that parallelly acquire important features that are used for emotion classification. Particularly, MFCC, paralinguistic, and speech spectrogram features were derived and encoded by designing different CNN architectures individually for the features, and the encoded features were fed to attention mechanisms for further representation, and then classified. Empirical veracity executed on EMO-DB and IEMOCAP open datasets, and the results showed that the proposed model is more efficient than the baseline models. Especially, weighted accuracy (WA) and unweighted accuracy (UA) of the proposed model were equal to 71.8% and 70.9% in EMO-DB dataset scenario, respectively. Moreover, WA and UA rates were 72.4% and 71.1% with the IEMOCAP dataset.
- Subjects
EMOTION recognition; CONVOLUTIONAL neural networks; AUTOMATIC speech recognition; ARTIFICIAL neural networks; EMOTIONS; SPEECH
- Publication
Electronics (2079-9292), 2022, Vol 11, Issue 23, p4047
- ISSN
2079-9292
- Publication type
Article
- DOI
10.3390/electronics11234047