RAF: Relativistic Adversarial Feedback For Universal Speech Synthesis
Yongjoon Lee, Jung-Woo Choi

TL;DR
This paper introduces Relativistic Adversarial Feedback (RAF), a new training method for GAN vocoders that enhances speech synthesis quality and generalization by leveraging self-supervised models and relativistic pairing.
Contribution
RAF is a novel training objective that improves GAN vocoders' generalization and quality by integrating self-supervised learning and relativistic pairing techniques.
Findings
RAF improves in-domain fidelity and generalization.
RAF-trained BigVGAN outperforms LSGAN in perceptual quality.
RAF achieves these results with fewer parameters.
Abstract
We propose Relativistic Adversarial Feedback (RAF), a novel training objective for GAN vocoders that improves in-domain fidelity and generalization to unseen scenarios. Although modern GAN vocoders employ advanced architectures, their training objectives often fail to promote generalizable representations. RAF addresses this problem by leveraging speech self-supervised learning models to assist discriminators in evaluating sample quality, encouraging the generator to learn richer representations. Furthermore, we utilize relativistic pairing for real and fake waveforms to improve the modeling of the training data distribution. Experiments across multiple datasets show consistent gains in both objective and subjective metrics on GAN-based vocoders. Importantly, the RAF-trained BigVGAN-base outperforms the LSGAN-trained BigVGAN in perceptual quality using only 12\% of the parameters.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis
