Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation
Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda

TL;DR
This paper presents an improved unified source-filter GAN for neural vocoding, which separately models periodic and aperiodic sources and adopts advanced training to enhance sound quality while preserving voice control.
Contribution
It introduces a new source excitation network that separately generates harmonic and noise components and employs HiFiGAN training, significantly improving sound quality over previous models.
Findings
Enhanced sound quality demonstrated by objective and subjective evaluations.
Maintains voice controllability despite improved sound quality.
Significant improvements over previous uSFGAN models.
Abstract
This paper introduces a unified source-filter network with a harmonic-plus-noise source excitation generation mechanism. In our previous work, we proposed unified Source-Filter GAN (uSFGAN) for developing a high-fidelity neural vocoder with flexible voice controllability using a unified source-filter neural network architecture. However, the capability of uSFGAN to model the aperiodic source excitation signal is insufficient, and there is still a gap in sound quality between the natural and generated speech. To improve the source excitation modeling and generated sound quality, a new source excitation generation network separately generating periodic and aperiodic components is proposed. The advanced adversarial training procedure of HiFiGAN is also adopted to replace that of Parallel WaveGAN used in the original uSFGAN. Both objective and subjective evaluation results show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsWGAN-GP Loss · HuMan(Expedia)||How do I get a human at Expedia? · Dense Connections · Convolution · Dropout · Tanh Activation · Phase Shuffle · *Communicated@Fast*How Do I Communicate to Expedia? · WaveGAN
