RAF: Relativistic Adversarial Feedback For Universal Speech Synthesis

Yongjoon Lee; Jung-Woo Choi

arXiv:2603.11678·eess.AS·March 13, 2026

RAF: Relativistic Adversarial Feedback For Universal Speech Synthesis

Yongjoon Lee, Jung-Woo Choi

PDF

Open Access

TL;DR

This paper introduces Relativistic Adversarial Feedback (RAF), a new training method for GAN vocoders that enhances speech synthesis quality and generalization by leveraging self-supervised models and relativistic pairing.

Contribution

RAF is a novel training objective that improves GAN vocoders' generalization and quality by integrating self-supervised learning and relativistic pairing techniques.

Findings

01

RAF improves in-domain fidelity and generalization.

02

RAF-trained BigVGAN outperforms LSGAN in perceptual quality.

03

RAF achieves these results with fewer parameters.

Abstract

We propose Relativistic Adversarial Feedback (RAF), a novel training objective for GAN vocoders that improves in-domain fidelity and generalization to unseen scenarios. Although modern GAN vocoders employ advanced architectures, their training objectives often fail to promote generalizable representations. RAF addresses this problem by leveraging speech self-supervised learning models to assist discriminators in evaluating sample quality, encouraging the generator to learn richer representations. Furthermore, we utilize relativistic pairing for real and fake waveforms to improve the modeling of the training data distribution. Experiments across multiple datasets show consistent gains in both objective and subjective metrics on GAN-based vocoders. Importantly, the RAF-trained BigVGAN-base outperforms the LSGAN-trained BigVGAN in perceptual quality using only 12\% of the parameters.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis