高盛祥; 杨元樟; 王琳钦; 莫尚斌; 余正涛; 董凌

doi:10.16088/j.issn.1001-6600.2023111303

Back to matches

Your institution may have rights to this item. Sign in to continue.

Title: 面向域外说话人适应场景的多层级解耦个性化语音合成.
Authors: 高盛祥; 杨元樟; 王琳钦; 莫尚斌; 余正涛; 董凌
Abstract: Personalized speech synthesis aims to generate speech with specific speaker’s characteristics. Traditional approaches often exhibit noticeable timbre disparities when synthesizing speech from unseen speakers, making it challenging to disentangle speaker-specific timbre features. This paper proposes a multi-level disentangled personalized speech synthesis approach designed for out-of-domain speakers. By fusing features at different granularities, the proposed method effectively enhances the performance of synthesizing speech from unseen speakers under zero-resource conditions. This is achieved by utilizing fast Fourier convolution to extract global speaker features, thereby enhancing the model's generalization to unseen speakers and enabling sentence-level speaker decoupling. Additionally, leveraging a speech recognition model, the method decouples speaker features at the phoneme level and captures phoneme-level timbre features through an attention mechanism, achieving phoneme-level speaker disentanglement. Experimental results on the publicly available dataset AISHELL3 demonstrate that the proposed approach achieves a cosine similarity of 0.697 for speaker feature vectors of cross-speaker adaptation, indicating a 6.25% improvement compared with the baseline model. This enhancement shows the method’s capability in modeling timbre features for speech from unseen speakers in cross-speaker adaptation scenarios.
Subjects: SPEECH synthesis; PHONEME (Linguistics); GENERALIZATION; SPEECH perception
Publication: Journal of Guangxi Normal University - Natural Science Edition, 2024, Vol 42, Issue 4, p11
ISSN: 1001-6600
Publication type: Article
DOI: 10.16088/j.issn.1001-6600.2023111303

We found a match

面向域外说话人适应场景的多层级解耦个性化语音合成.

高盛祥; 杨元樟; 王琳钦; 莫尚斌; 余正涛; 董凌

SPEECH synthesis; PHONEME (Linguistics); GENERALIZATION; SPEECH perception

Journal of Guangxi Normal University - Natural Science Edition, 2024, Vol 42, Issue 4, p11

1001-6600

Article

10.16088/j.issn.1001-6600.2023111303