Adversarial Audio Synthesis
Chris Donahue, Julian McAuley, Miller Puckette

TL;DR
WaveGAN is a novel application of GANs for unsupervised raw-waveform audio synthesis, capable of generating coherent audio across various domains including speech and musical sounds.
Contribution
This paper introduces WaveGAN, the first GAN-based model for raw audio waveform synthesis, demonstrating its ability to generate diverse and intelligible audio without labels.
Findings
WaveGAN can synthesize one-second audio clips with global coherence.
It produces intelligible words from small-vocabulary speech datasets.
It successfully generates audio in domains like drums, bird calls, and piano.
Abstract
Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales. Generative adversarial networks (GANs) have seen wide success at generating images that are both locally and globally coherent, but they have seen little application to audio generation. In this paper we introduce WaveGAN, a first attempt at applying GANs to unsupervised synthesis of raw-waveform audio. WaveGAN is capable of synthesizing one second slices of audio waveforms with global coherence, suitable for sound effect generation. Our experiments demonstrate that, without labels, WaveGAN learns to produce intelligible words when trained on a small-vocabulary speech dataset, and can also synthesize audio from other domains such as drums, bird vocalizations, and piano. We compare WaveGAN to a method which applies GANs designed for image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsBatch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Tanh Activation · Deep Convolutional GAN · Dense Connections · Convolution · HuMan(Expedia)||How do I get a human at Expedia? · Dropout · Griffin-Lim Algorithm · WGAN-GP Loss
