Pereira, Bianca Valéria L.; de Carvalho, Mateus B. F.; Alves, Pedro Augusto A. da S. de A. Nava; Ribeiro, Paulo Rogerio de A.; de Oliveira, Alexandre Cesar M.; de Almeida Neto, Areolino

doi:10.1007/s11227-024-06098-6

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Automatic phoneme recognition by deep neural networks.
Authors: Pereira, Bianca Valéria L.; de Carvalho, Mateus B. F.; Alves, Pedro Augusto A. da S. de A. Nava; Ribeiro, Paulo Rogerio de A.; de Oliveira, Alexandre Cesar M.; de Almeida Neto, Areolino
Abstract: This work presents a lightweight phoneme recognition model using object detection techniques. This model is mainly proposed to run on devices with low processing power, such as tablets and mobile phones. The use of the combination of hardware network architecture research complemented by the NetAdapt algorithm has led to the use of a simpler and lighter network architecture called MobileNet. The MobileNetV3 convolutional network architecture was combined with the Single-Shot Detection. The databases used in model training were TIMIT and LibriSpeech, both have spoken audios in English. To generate a graphical representation using the audiobases, for each audio, its spectrogram was calculated on the Mel scale. To train the algorithm of phoneme location detection, the temporal position of the occurrence of each phoneme in respective spectrogram is used. Additionally, it was necessary to increase the training dataset, in order to provide improvement in the generalization of the model. Therefore, the two databases were joined and data augmentation techniques were applied to audios. The main idea was to achieve learning using a lightweight architecture that can be used on devices with low processing power, such as tablets and mobile phones. Thus, this research used the MobileNet-Large architecture, which obtained an accuracy of 0.72 mAP@0.5IOU. For comparison, the MobileNet-Small architecture was also used, which obtained an accuracy of 0.63 mAP@0.5IOU.
Subjects: ARTIFICIAL neural networks; PHONEME (Linguistics); DATA augmentation; COMPUTER performance; CELL phones
Publication: Journal of Supercomputing, 2024, Vol 80, Issue 11, p16654
ISSN: 0920-8542
Publication type: Article
DOI: 10.1007/s11227-024-06098-6

We found a match

Automatic phoneme recognition by deep neural networks.

Pereira, Bianca Valéria L.; de Carvalho, Mateus B. F.; Alves, Pedro Augusto A. da S. de A. Nava; Ribeiro, Paulo Rogerio de A.; de Oliveira, Alexandre Cesar M.; de Almeida Neto, Areolino

ARTIFICIAL neural networks; PHONEME (Linguistics); DATA augmentation; COMPUTER performance; CELL phones

Journal of Supercomputing, 2024, Vol 80, Issue 11, p16654

0920-8542

Article

10.1007/s11227-024-06098-6