Unsupervised Multi-channel Speech Dereverberation via Diffusion

Yulun Wu; Zhongweiyang Xu; Jianchong Chen; Zhong-Qiu Wang; Romit Roy Choudhury

arXiv:2508.02071·cs.SD·December 2, 2025

Unsupervised Multi-channel Speech Dereverberation via Diffusion

Yulun Wu, Zhongweiyang Xu, Jianchong Chen, Zhong-Qiu Wang, Romit Roy Choudhury

PDF

Open Access

TL;DR

This paper introduces USD-DPS, an unsupervised diffusion-based method for multi-channel speech dereverberation that estimates room impulse responses and enforces mixture consistency, achieving superior results without supervised training.

Contribution

The paper proposes a novel unsupervised diffusion model for multi-channel speech dereverberation that jointly estimates RIRs and enforces mixture consistency, advancing beyond prior supervised methods.

Findings

01

Outperforms existing unsupervised dereverberation methods

02

Effectively estimates multi-channel RIRs using a combined approach

03

Achieves superior dereverberation quality in experiments

Abstract

We consider the problem of multi-channel single-speaker blind dereverberation, where multi-channel mixtures are used to recover the clean anechoic speech. To solve this problem, we propose USD-DPS, {U}nsupervised {S}peech {D}ereverberation via {D}iffusion {P}osterior {S}ampling. USD-DPS uses an unconditional clean speech diffusion model as a strong prior to solve the problem by posterior sampling. At each diffusion sampling step, we estimate all microphone channels' room impulse responses (RIRs), which are further used to enforce a multi-channel mixture consistency constraint for diffusion guidance. For multi-channel RIR estimation, we estimate reference-channel RIR by optimizing RIR parameters of a sub-band RIR signal model, with the Adam optimizer. We estimate non-reference channels' RIRs analytically using forward convolutive prediction (FCP). We found that this combination provides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation