BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality   Speech Synthesis

Max W. Y. Lam; Jun Wang; Dan Su; Dong Yu

arXiv:2203.13508·eess.AS·March 28, 2022·26 cites

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces BDDM, a bilateral denoising diffusion model for speech synthesis that achieves high-quality audio generation with significantly fewer sampling steps, enabling faster and more efficient speech synthesis.

Contribution

The paper presents a novel bilateral diffusion model with a new training objective, enabling faster sampling and leveraging pre-trained models for high-quality speech synthesis.

Findings

01

Generates high-fidelity speech with as few as three sampling steps.

02

Achieves comparable or better quality than state-of-the-art vocoders.

03

Speeds up sampling by over 140 times compared to previous methods.

Abstract

Diffusion probabilistic models (DPMs) and their extensions have emerged as competitive generative models yet confront challenges of efficient sampling. We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward and reverse processes with a schedule network and a score network, which can train with a novel bilateral modeling objective. We show that the new surrogate objective can achieve a lower bound of the log marginal likelihood tighter than a conventional surrogate. We also find that BDDM allows inheriting pre-trained score network parameters from any DPMs and consequently enables speedy and stable learning of the schedule network and optimization of a noise schedule for sampling. Our experiments demonstrate that BDDMs can generate high-fidelity audio samples with as few as three sampling steps. Moreover, compared to other state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tencent-ailab/bddm
pytorchOfficial

Videos

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis· slideslive

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsDiffusion · 1x1 Convolution · HuMan(Expedia)||How do I get a human at Expedia? · WaveGrad UBlock · Residual Connection · WaveGrad DBlock · FiLM Module · Convolution · WaveGrad