Garg, Saurabh; Ruan, Haoyao; Hamarneh, Ghassan; Behne, Dawn M.; Jongman, Allard; Sereno, Joan; Wang, Yue

doi:10.1007/s10772-023-10030-3

Back to matches

Your institution may have rights to this item. Sign in to continue.

Title: Mouth2Audio: intelligible audio synthesis from videos with distinctive vowel articulation.
Authors: Garg, Saurabh; Ruan, Haoyao; Hamarneh, Ghassan; Behne, Dawn M.; Jongman, Allard; Sereno, Joan; Wang, Yue
Abstract: Humans use both auditory and facial cues to perceive speech, especially when auditory input is degraded, indicating a direct association between visual articulatory and acoustic speech information. This study investigates how well an audio signal of a word can be synthesized based on visual speech cues. Specifically, we synthesized audio waveforms of the vowels in monosyllabic English words from motion trajectories extracted from image sequences in the video recordings of the same words. The articulatory movements were recorded in two different speech styles: plain and clear. We designed a deep network trained on mouth landmark motion trajectories on a spectrogram and formant-based custom loss for different speech styles separately. Human and automatic evaluation show that our framework using visual cues can generate identifiable audio of the target vowels from distinct mouth landmark movements. Our results further demonstrate that intelligible audio can be synthesized from novel unseen talkers that were independent of the training data.
Subjects: VOWELS; SPEECH; VIDEO recording; INTELLIGIBILITY of speech; ENGLISH language; EYE tracking
Publication: International Journal of Speech Technology, 2023, Vol 26, Issue 2, p459
ISSN: 1381-2416
Publication type: Article
DOI: 10.1007/s10772-023-10030-3

We found a match

Mouth2Audio: intelligible audio synthesis from videos with distinctive vowel articulation.

Garg, Saurabh; Ruan, Haoyao; Hamarneh, Ghassan; Behne, Dawn M.; Jongman, Allard; Sereno, Joan; Wang, Yue

VOWELS; SPEECH; VIDEO recording; INTELLIGIBILITY of speech; ENGLISH language; EYE tracking

International Journal of Speech Technology, 2023, Vol 26, Issue 2, p459

1381-2416

Article

10.1007/s10772-023-10030-3