We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Current advances and algorithmic solutions in speech generation.
- Authors
Oralbekova, Dina; Mamyrbayev, Orken; Kassymova, Dinara; Othman, Mohamed
- Abstract
Currently, Text-to-Speech (TTS) technology, aimed at reproducing a natural human voice from text, is gaining increasing demand in natural language processing. Key criteria for evaluating the quality of synthesized sound include its clarity and naturalness, which largely depend on the accurate modeling of intonations using the acoustic model in the speech generation system. This paper presents fundamental methods such as concatenative and parametric speech synthesis, speech synthesis based on hidden Markov models, and deep learning approaches like end-to-end models for building the acoustic model. The article discusses metrics for evaluating the quality of synthesized voice. Brief overviews of modern text-to-speech architectures, such as WaveNet, Tacotron, and Deep Voice, applying deep learning and demonstrating quality ratings close to professionally recorded speech, are also provided.
- Subjects
DEEP learning; NATURAL language processing; SPEECH; SPEECH synthesis; HIDDEN Markov models; ACOUSTIC models
- Publication
Vibroengineering Procedia, 2024, Vol 54, Issue 1, p160
- ISSN
2345-0533
- Publication type
Article
- DOI
10.21595/vp.2024.23940