Wu, Zhizheng; Chng, Eng; Li, Haizhou

doi:10.1007/s11042-014-2180-2

Back to matches

Your institution may have rights to this item. Sign in to continue.

Title: Exemplar-based voice conversion using joint nonnegative matrix factorization.
Authors: Wu, Zhizheng; Chng, Eng; Li, Haizhou
Abstract: Exemplar-based sparse representation is a nonparametric framework for voice conversion. In this framework, a target spectrum is generated as a weighted linear combination of a set of basis spectra, namely exemplars, extracted from the training data. This framework adopts coupled source-target dictionaries consisting of acoustically aligned source-target exemplars, and assumes they can share the same activation matrix. At runtime, a source spectrogram is factorized as a product of the source dictionary and the common activation matrix, which is applied to the target dictionary to generate the target spectrogram. In practice, either low-resolution mel-scale filter bank energies or high-resolution spectra are adopted in the source dictionary. Low-resolution features are flexible in capturing the temporal information without increasing the computational cost and the memory occupation significantly, while high-resolution spectra contain significant spectral details. In this paper, we propose a joint nonnegative matrix factorization technique to find the common activation matrix using low- and high-resolution features at the same time. In this way, the common activation matrix is able to benefit from low- and high-resolution features directly. We conducted experiments on the VOICES database to evaluate the performance of the proposed method. Both objective and subjective evaluations confirmed the effectiveness of the proposed methods.
Subjects: SPEECH synthesis; SPECTROGRAMS; VOICEPRINTS; FACTORIZATION; VOICE analysis; MATHEMATICAL models
Publication: Multimedia Tools & Applications, 2015, Vol 74, Issue 22, p9943
ISSN: 1380-7501
Publication type: Article
DOI: 10.1007/s11042-014-2180-2

We found a match

Exemplar-based voice conversion using joint nonnegative matrix factorization.

Wu, Zhizheng; Chng, Eng; Li, Haizhou

SPEECH synthesis; SPECTROGRAMS; VOICEPRINTS; FACTORIZATION; VOICE analysis; MATHEMATICAL models

Multimedia Tools & Applications, 2015, Vol 74, Issue 22, p9943

1380-7501

Article

10.1007/s11042-014-2180-2