Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation   Generation

Reo Yoneyama; Yi-Chiao Wu; Tomoki Toda

arXiv:2205.06053·cs.SD·July 4, 2022

Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation

Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda

PDF

Open Access

TL;DR

This paper presents an improved unified source-filter GAN for neural vocoding, which separately models periodic and aperiodic sources and adopts advanced training to enhance sound quality while preserving voice control.

Contribution

It introduces a new source excitation network that separately generates harmonic and noise components and employs HiFiGAN training, significantly improving sound quality over previous models.

Findings

01

Enhanced sound quality demonstrated by objective and subjective evaluations.

02

Maintains voice controllability despite improved sound quality.

03

Significant improvements over previous uSFGAN models.

Abstract

This paper introduces a unified source-filter network with a harmonic-plus-noise source excitation generation mechanism. In our previous work, we proposed unified Source-Filter GAN (uSFGAN) for developing a high-fidelity neural vocoder with flexible voice controllability using a unified source-filter neural network architecture. However, the capability of uSFGAN to model the aperiodic source excitation signal is insufficient, and there is still a gap in sound quality between the natural and generated speech. To improve the source excitation modeling and generated sound quality, a new source excitation generation network separately generating periodic and aperiodic components is proposed. The advanced adversarial training procedure of HiFiGAN is also adopted to replace that of Parallel WaveGAN used in the original uSFGAN. Both objective and subjective evaluation results show that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsWGAN-GP Loss · HuMan(Expedia)||How do I get a human at Expedia? · Dense Connections · Convolution · Dropout · Tanh Activation · Phase Shuffle · *Communicated@Fast*How Do I Communicate to Expedia? · WaveGAN