Generating Novel and Realistic Speakers for Voice Conversion
Meiying Melissa Chen, Zhenyu Wang, Zhiyao Duan

TL;DR
This paper introduces SpeakerVAE, a lightweight hierarchical variational autoencoder that generates novel, realistic speaker representations for voice conversion, enabling the creation of unseen voices without retraining existing VC models.
Contribution
We propose SpeakerVAE, a flexible plug-in module that models speaker timbre space and generates new speakers for voice conversion without additional training of base models.
Findings
Successfully generates novel speakers with quality comparable to training speakers.
Compatible with multiple VC models like FACodec and CosyVoice2.
No need for co-training or fine-tuning of base VC systems.
Abstract
Voice conversion models modify timbre while preserving paralinguistic features, enabling applications like dubbing and identity protection. However, most VC systems require access to target utterances, limiting their use when target data is unavailable or when users desire conversion to entirely novel, unseen voices. To address this, we introduce a lightweight method SpeakerVAE to generate novel speakers for VC. Our approach uses a deep hierarchical variational autoencoder to model the speaker timbre space. By sampling from the trained model, we generate novel speaker representations for voice synthesis in a VC pipeline. The proposed method is a flexible plug-in module compatible with various VC models, without co-training or fine-tuning of the base VC system. We evaluated our approach with state-of-the-art VC models: FACodec and CosyVoice2. The results demonstrate that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Face recognition and analysis
