Voice command generation using Progressive Wavegans
Thomas Wiest, Nicholas Cummins, Alice Baird, Simone Hantke, Judith, Dineley, Bj\"orn Schuller

TL;DR
This paper introduces extensions to WaveGAN, a GAN-based sound generation method, to enhance the human likeness of synthetic speech, demonstrating moderate improvements through listening tests.
Contribution
The paper proposes preprocessing, audio-to-audio generation, skip connections, and progressive structures to improve WaveGAN's speech synthesis quality.
Findings
Moderate improvement in human likeness (Cohen's d=0.65)
Listening tests with 30 volunteers support effectiveness
Extensions outperform original WaveGAN in speech quality
Abstract
Generative Adversarial Networks (GANs) have become exceedingly popular in a wide range of data-driven research fields, due in part to their success in image generation. Their ability to generate new samples, often from only a small amount of input data, makes them an exciting research tool in areas with limited data resources. One less-explored application of GANs is the synthesis of speech and audio samples. Herein, we propose a set of extensions to the WaveGAN paradigm, a recently proposed approach for sound generation using GANs. The aim of these extensions - preprocessing, Audio-to-Audio generation, skip connections and progressive structures - is to improve the human likeness of synthetic speech samples. Scores from listening tests with 30 volunteers demonstrated a moderate improvement (Cohen's d coefficient of 0.65) in human likeness using the proposed extensions compared to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Speech and Audio Processing · Music and Audio Processing
