WaveFlow: A Compact Flow-based Model for Raw Audio
Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song

TL;DR
WaveFlow is a compact, flow-based generative model for raw audio that achieves high-fidelity synthesis with significantly faster speed and fewer parameters compared to previous models like WaveGlow.
Contribution
It introduces WaveFlow, a small-footprint flow-based model that unifies likelihood-based approaches and offers faster, high-quality audio generation with fewer parameters.
Findings
Generates high-fidelity speech similar to WaveNet.
Produces audio 42.6 times faster than real-time on GPU.
Uses only 5.91 million parameters, 15 times fewer than WaveGlow.
Abstract
In this work, we propose WaveFlow, a small-footprint generative flow for raw audio, which is directly trained with maximum likelihood. It handles the long-range structure of 1-D waveform with a dilated 2-D convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases. It generates high-fidelity speech as WaveNet, while synthesizing several orders of magnitude faster as it only requires a few sequential steps to generate very long waveforms with hundreds of thousands of time-steps. Furthermore, it can significantly reduce the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has only 5.91M parameters, which is 15 smaller…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsMixture of Logistic Distributions · Affine Coupling · Normalizing Flows · Invertible 1x1 Convolution · WaveGlow · Dilated Causal Convolution · WaveNet
