We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Automatic phoneme recognition by deep neural networks.
- Authors
Pereira, Bianca Valéria L.; de Carvalho, Mateus B. F.; Alves, Pedro Augusto A. da S. de A. Nava; Ribeiro, Paulo Rogerio de A.; de Oliveira, Alexandre Cesar M.; de Almeida Neto, Areolino
- Abstract
This work presents a lightweight phoneme recognition model using object detection techniques. This model is mainly proposed to run on devices with low processing power, such as tablets and mobile phones. The use of the combination of hardware network architecture research complemented by the NetAdapt algorithm has led to the use of a simpler and lighter network architecture called MobileNet. The MobileNetV3 convolutional network architecture was combined with the Single-Shot Detection. The databases used in model training were TIMIT and LibriSpeech, both have spoken audios in English. To generate a graphical representation using the audiobases, for each audio, its spectrogram was calculated on the Mel scale. To train the algorithm of phoneme location detection, the temporal position of the occurrence of each phoneme in respective spectrogram is used. Additionally, it was necessary to increase the training dataset, in order to provide improvement in the generalization of the model. Therefore, the two databases were joined and data augmentation techniques were applied to audios. The main idea was to achieve learning using a lightweight architecture that can be used on devices with low processing power, such as tablets and mobile phones. Thus, this research used the MobileNet-Large architecture, which obtained an accuracy of 0.72 mAP@0.5IOU. For comparison, the MobileNet-Small architecture was also used, which obtained an accuracy of 0.63 mAP@0.5IOU.
- Subjects
ARTIFICIAL neural networks; PHONEME (Linguistics); DATA augmentation; COMPUTER performance; CELL phones
- Publication
Journal of Supercomputing, 2024, Vol 80, Issue 11, p16654
- ISSN
0920-8542
- Publication type
Article
- DOI
10.1007/s11227-024-06098-6