Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks
Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku

TL;DR
This paper explores using GANs for waveform generation in text-to-speech synthesis, focusing on efficiency and quality, and finds that GAN-based glottal excitation models can match WaveNet vocoders in quality.
Contribution
It introduces a GAN-based approach for waveform generation in TTS, demonstrating competitive quality with traditional WaveNet vocoders, especially in modeling glottal excitation.
Findings
GAN-based glottal excitation model achieves comparable quality to WaveNet vocoder.
Direct waveform generation with GANs is still behind WaveNet in quality.
Parallel inference with GANs offers computational advantages.
Abstract
The state-of-the-art in text-to-speech synthesis has recently improved considerably due to novel neural waveform generation methods, such as WaveNet. However, these methods suffer from their slow sequential inference process, while their parallel versions are difficult to train and even more expensive computationally. Meanwhile, generative adversarial networks (GANs) have achieved impressive results in image generation and are making their way into audio applications; parallel inference is among their lucrative properties. By adopting recent advances in GAN training techniques, this investigation studies waveform generation for TTS in two domains (speech signal and glottal excitation). Listening test results show that while direct waveform generation with GAN is still far behind WaveNet, a GAN-based glottal excitation model can achieve quality and voice similarity on par with a WaveNet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMixture of Logistic Distributions · Convolution · Dilated Causal Convolution · WaveNet · Dogecoin Customer Service Number +1-833-534-1729
