Cold Diffusion for Speech Enhancement

Hao Yen; Fran\c{c}ois G. Germain; Gordon Wichern; Jonathan Le Roux

arXiv:2211.02527·eess.AS·May 24, 2023

Cold Diffusion for Speech Enhancement

Hao Yen, Fran\c{c}ois G. Germain, Gordon Wichern, Jonathan Le Roux

PDF

Open Access

TL;DR

This paper explores the application of cold diffusion, an advanced iterative diffusion model, for speech enhancement, demonstrating improved generalization and strong performance on benchmark datasets.

Contribution

It introduces a novel training algorithm and objective for cold diffusion models to improve speech enhancement capabilities.

Findings

01

Outperforms existing diffusion-based speech enhancement models

02

Achieves superior results on VoiceBank-DEMAND dataset

03

Demonstrates better generalization during sampling process

Abstract

Diffusion models have recently shown promising results for difficult enhancement tasks such as the conditional and unconditional restoration of natural images and audio signals. In this work, we explore the possibility of leveraging a recently proposed advanced iterative diffusion model, namely cold diffusion, to recover clean speech signals from noisy signals. The unique mathematical properties of the sampling process from cold diffusion could be utilized to restore high-quality samples from arbitrary degradations. Based on these properties, we propose an improved training algorithm and objective to help the model generalize better during the sampling process. We verify our proposed framework by investigating two model architectures. Experimental results on benchmark speech enhancement dataset VoiceBank-DEMAND demonstrate the strong performance of the proposed approach compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsDiffusion