We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Exemplar-based voice conversion using joint nonnegative matrix factorization.
- Authors
Wu, Zhizheng; Chng, Eng; Li, Haizhou
- Abstract
Exemplar-based sparse representation is a nonparametric framework for voice conversion. In this framework, a target spectrum is generated as a weighted linear combination of a set of basis spectra, namely exemplars, extracted from the training data. This framework adopts coupled source-target dictionaries consisting of acoustically aligned source-target exemplars, and assumes they can share the same activation matrix. At runtime, a source spectrogram is factorized as a product of the source dictionary and the common activation matrix, which is applied to the target dictionary to generate the target spectrogram. In practice, either low-resolution mel-scale filter bank energies or high-resolution spectra are adopted in the source dictionary. Low-resolution features are flexible in capturing the temporal information without increasing the computational cost and the memory occupation significantly, while high-resolution spectra contain significant spectral details. In this paper, we propose a joint nonnegative matrix factorization technique to find the common activation matrix using low- and high-resolution features at the same time. In this way, the common activation matrix is able to benefit from low- and high-resolution features directly. We conducted experiments on the VOICES database to evaluate the performance of the proposed method. Both objective and subjective evaluations confirmed the effectiveness of the proposed methods.
- Subjects
SPEECH synthesis; SPECTROGRAMS; VOICEPRINTS; FACTORIZATION; VOICE analysis; MATHEMATICAL models
- Publication
Multimedia Tools & Applications, 2015, Vol 74, Issue 22, p9943
- ISSN
1380-7501
- Publication type
Article
- DOI
10.1007/s11042-014-2180-2