AdaVocoder: Adaptive Vocoder for Custom Voice

Xin Yuan; Yongbing Feng; Mingming Ye; Cheng Tuo; Minghang Zhang

arXiv:2203.09825·cs.SD·January 6, 2023

AdaVocoder: Adaptive Vocoder for Custom Voice

Xin Yuan, Yongbing Feng, Mingming Ye, Cheng Tuo, Minghang Zhang

PDF

Open Access

TL;DR

This paper introduces AdaVocoder, an adaptive neural vocoder that uses cross-domain consistency loss to improve custom voice synthesis, especially in few-shot scenarios, by fine-tuning pre-trained models on limited target data.

Contribution

The paper proposes a novel adaptive vocoder framework that effectively adapts to new speakers with limited data, addressing dataset mismatch and overfitting issues in neural vocoders.

Findings

01

AdaVocoder achieves high-quality voice synthesis with few-shot data.

02

Pre-trained models on large datasets can be effectively fine-tuned for new speakers.

03

Cross-domain consistency loss improves transfer learning stability.

Abstract

Custom voice is to construct a personal speech synthesis system by adapting the source speech synthesis model to the target model through the target few recordings. The solution to constructing a custom voice is to combine an adaptive acoustic model with a robust vocoder. However, training a robust vocoder usually requires a multi-speaker dataset, which should include various age groups and various timbres, so that the trained vocoder can be used for unseen speakers. Collecting such a multi-speaker dataset is difficult, and the dataset distribution always has a mismatch with the distribution of the target speaker dataset. This paper proposes an adaptive vocoder for custom voice from another novel perspective to solve the above problems. The adaptive vocoder mainly uses a cross-domain consistency loss to solve the overfitting problem encountered by the GAN-based neural vocoder in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Music and Audio Processing