GANSynth: Adversarial Neural Audio Synthesis
Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris, Donahue, Adam Roberts

TL;DR
This paper introduces GANSynth, a novel GAN-based approach for high-fidelity, locally-coherent audio synthesis that outperforms autoregressive models like WaveNet in quality and speed.
Contribution
It demonstrates that GANs can generate high-quality, locally-coherent audio by modeling spectral features, achieving faster synthesis than traditional autoregressive models.
Findings
GANs outperform WaveNet in audio quality metrics
GANSynth generates audio orders of magnitude faster
Spectral domain modeling enables local coherence
Abstract
Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence. Autoregressive models, such as WaveNet, model local structure at the expense of global latent structure and slow iterative sampling, while Generative Adversarial Networks (GANs), have global latent conditioning and efficient parallel sampling, but struggle to generate locally-coherent audio waveforms. Herein, we demonstrate that GANs can in fact generate high-fidelity and locally-coherent audio by modeling log magnitudes and instantaneous frequencies with sufficient frequency resolution in the spectral domain. Through extensive empirical investigations on the NSynth dataset, we demonstrate that GANs are able to outperform strong WaveNet baselines on automated and human evaluation metrics, and efficiently generate audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Digital Media Forensic Detection
MethodsMixture of Logistic Distributions · Dilated Causal Convolution · WaveNet
