DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro

TL;DR
DiffWave is a versatile, non-autoregressive diffusion model that efficiently generates high-quality audio waveforms for various tasks, outperforming existing models in speed and quality.
Contribution
This paper introduces DiffWave, a novel diffusion probabilistic model for audio synthesis that achieves high fidelity and fast inference across multiple waveform generation tasks.
Findings
DiffWave matches WaveNet in speech quality (MOS: 4.44 vs. 4.43).
DiffWave is significantly faster than autoregressive models.
DiffWave outperforms GAN-based models in unconditional audio generation.
Abstract
In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave produces high-fidelity audios in different waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. We demonstrate that DiffWave matches a strong WaveNet vocoder in terms of speech quality (MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsDiffusion · Dilated Causal Convolution · Mixture of Logistic Distributions · WaveNet
