Voice command generation using Progressive Wavegans

Thomas Wiest; Nicholas Cummins; Alice Baird; Simone Hantke; Judith; Dineley; Bj\"orn Schuller

arXiv:1903.07395·cs.CL·March 19, 2019·1 cites

Voice command generation using Progressive Wavegans

Thomas Wiest, Nicholas Cummins, Alice Baird, Simone Hantke, Judith, Dineley, Bj\"orn Schuller

PDF

Open Access

TL;DR

This paper introduces extensions to WaveGAN, a GAN-based sound generation method, to enhance the human likeness of synthetic speech, demonstrating moderate improvements through listening tests.

Contribution

The paper proposes preprocessing, audio-to-audio generation, skip connections, and progressive structures to improve WaveGAN's speech synthesis quality.

Findings

01

Moderate improvement in human likeness (Cohen's d=0.65)

02

Listening tests with 30 volunteers support effectiveness

03

Extensions outperform original WaveGAN in speech quality

Abstract

Generative Adversarial Networks (GANs) have become exceedingly popular in a wide range of data-driven research fields, due in part to their success in image generation. Their ability to generate new samples, often from only a small amount of input data, makes them an exciting research tool in areas with limited data resources. One less-explored application of GANs is the synthesis of speech and audio samples. Herein, we propose a set of extensions to the WaveGAN paradigm, a recently proposed approach for sound generation using GANs. The aim of these extensions - preprocessing, Audio-to-Audio generation, skip connections and progressive structures - is to improve the human likeness of synthetic speech samples. Scores from listening tests with 30 volunteers demonstrated a moderate improvement (Cohen's d coefficient of 0.65) in human likeness using the proposed extensions compared to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Speech and Audio Processing · Music and Audio Processing