Conditional Diffusion Probabilistic Model for Speech Enhancement

Yen-Ju Lu; Zhong-Qiu Wang; Shinji Watanabe; Alexander Richard; Cheng; Yu; Yu Tsao

arXiv:2202.05256·eess.AS·February 11, 2022

Conditional Diffusion Probabilistic Model for Speech Enhancement

Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng, Yu, Yu Tsao

PDF

Open Access 2 Repos

TL;DR

This paper introduces a conditional diffusion probabilistic model for speech enhancement that adapts to real-world noise, producing more natural speech outputs and demonstrating strong performance and generalization capabilities.

Contribution

It proposes a novel conditional diffusion model that incorporates observed noisy speech characteristics, improving speech enhancement over existing generative approaches.

Findings

01

Outperforms existing generative models in speech enhancement tasks.

02

Shows strong generalization to unseen noise datasets.

03

Produces more natural and less distorted speech outputs.

Abstract

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes. More specifically, we propose a generalized formulation of the diffusion probabilistic model named conditional diffusion probabilistic model that, in its reverse process, can adapt to non-Gaussian real noises in the estimated speech signal. In our experiments, we demonstrate strong performance of the proposed approach compared to representative generative models, and investigate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsDiffusion