Boosting Star-GANs for Voice Conversion with Contrastive Discriminator
Shijing Si, Jianzong Wang, Xulong Zhang, Xiaoyang Qu, Ning Cheng and, Jing Xiao

TL;DR
This paper introduces SimSiam-StarGAN-VC, a novel voice conversion framework that enhances training stability and prevents overfitting by integrating contrastive learning with a Siamese discriminator, leading to superior performance.
Contribution
It proposes a new contrastive learning-based discriminator for StarGAN-VC, improving training stability and conversion quality in nonparallel multi-domain voice conversion.
Findings
Outperforms existing methods on VCC 2018 dataset
Improves training stability and prevents overfitting
Achieves higher objective and subjective quality metrics
Abstract
Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios. However, the training of these models usually poses a challenge due to their complicated adversarial network architectures. To address this, in this work we leverage the state-of-the-art contrastive learning techniques and incorporate an efficient Siamese network structure into the StarGAN discriminator. Our method is called SimSiam-StarGAN-VC and it boosts the training stability and effectively prevents the discriminator overfitting issue in the training process. We conduct experiments on the Voice Conversion Challenge (VCC 2018) dataset, plus a user study to validate the performance of our framework. Our experimental results show that SimSiam-StarGAN-VC significantly outperforms existing StarGAN-VC methods in terms of both the objective and subjective metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
MethodsContrastive Learning · Siamese Network
