RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses
Shengyuan Xu, Wenxiao Zhao, Jing Guo

TL;DR
RefineGAN is a neural vocoder that achieves high-fidelity waveform generation with accurate pitch and intensity, outperforming ground truth in subjective tests and generalizing well across languages and speakers.
Contribution
The paper introduces RefineGAN, a novel GAN-based neural vocoder with a pitch-guided refine architecture and multi-scale spectrogram loss for improved robustness and accuracy.
Findings
Generated audio surpasses ground-truth in subjective quality.
Model maintains performance on unseen languages and speakers.
High-speed full-band audio generation achieved.
Abstract
Most GAN(Generative Adversarial Network)-based approaches towards high-fidelity waveform generation heavily rely on discriminators to improve their performance. However, GAN methods introduce much uncertainty into the generation process and often result in mismatches of pitch and intensity, which is fatal when it comes to sensitive use cases such as singing voice synthesis(SVS). To address this problem, we propose RefineGAN, a high-fidelity neural vocoder focused on the robustness, pitch and intensity accuracy, and high-speed full-band audio generation. We applyed a pitch-guided refine architecture with a multi-scale spectrogram-based loss function to help stabilize the training process and maintain the robustness of the neural vocoder while using the GAN-based training method. Audio generated using this method shows a better performance in subjective tests when compared with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
