BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis
Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

TL;DR
This paper introduces BDDM, a bilateral denoising diffusion model for speech synthesis that achieves high-quality audio generation with significantly fewer sampling steps, enabling faster and more efficient speech synthesis.
Contribution
The paper presents a novel bilateral diffusion model with a new training objective, enabling faster sampling and leveraging pre-trained models for high-quality speech synthesis.
Findings
Generates high-fidelity speech with as few as three sampling steps.
Achieves comparable or better quality than state-of-the-art vocoders.
Speeds up sampling by over 140 times compared to previous methods.
Abstract
Diffusion probabilistic models (DPMs) and their extensions have emerged as competitive generative models yet confront challenges of efficient sampling. We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward and reverse processes with a schedule network and a score network, which can train with a novel bilateral modeling objective. We show that the new surrogate objective can achieve a lower bound of the log marginal likelihood tighter than a conventional surrogate. We also find that BDDM allows inheriting pre-trained score network parameters from any DPMs and consequently enables speedy and stable learning of the schedule network and optimization of a noise schedule for sampling. Our experiments demonstrate that BDDMs can generate high-fidelity audio samples with as few as three sampling steps. Moreover, compared to other state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsDiffusion · 1x1 Convolution · HuMan(Expedia)||How do I get a human at Expedia? · WaveGrad UBlock · Residual Connection · WaveGrad DBlock · FiLM Module · Convolution · WaveGrad
