On the Application of Diffusion Models for Simultaneous Denoising and Dereverberation
Adrian Meise, Tobias Cord-Landwehr, Reinhold Haeb-Umbach

TL;DR
This paper explores the use of diffusion models for simultaneous denoising and dereverberation of speech, comparing cascaded and single-model approaches on artificial and real data.
Contribution
It investigates different strategies for speech enhancement with diffusion models, highlighting the effectiveness of a single model trained on diverse distortion subsets.
Findings
Cascaded models work best when applied in order of dominant distortion.
A single model trained on multiple distortion types provides a good compromise.
Diffusion models can effectively enhance speech with combined noise and reverberation.
Abstract
Diffusion models have been shown to achieve natural-sounding enhancement of speech degraded by noise or reverberation. However, their simultaneous denoising and dereverberation capability has so far not been studied much, although this is arguably the most common scenario in a practical application. In this work, we investigate different approaches to enhance noisy and/or reverberant speech. We examine the cascaded application of models, each trained on only one of the distortions, and compare it with a single model, trained either solely on data that is both noisy and reverberated, or trained on data comprising subsets of purely noisy, of purely reverberated, and of noisy reverberant speech. Tests are performed both on artificially generated and real recordings of noisy and/or reverberant data. The results show that, when using the cascade of models, satisfactory results are only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
