Diffusion-based Frameworks for Unsupervised Speech Enhancement

Jean-Eudes Ayilo; Mostafa Sadeghi; Romain Serizel; and Xavier Alameda-Pineda

arXiv:2601.09931·cs.SD·February 2, 2026

Diffusion-based Frameworks for Unsupervised Speech Enhancement

Jean-Eudes Ayilo, Mostafa Sadeghi, Romain Serizel, and Xavier Alameda-Pineda

PDF

Open Access

TL;DR

This paper introduces a novel unsupervised speech enhancement framework using diffusion models that jointly model speech and noise, improving performance and robustness over previous methods.

Contribution

It proposes explicitly modeling both speech and noise as latent variables and replacing NMF noise priors with diffusion-based noise models, advancing unsupervised speech enhancement techniques.

Findings

01

Explicit noise modeling improves speech enhancement performance.

02

Diffusion-based noise models outperform NMF-based models in quality and intelligibility.

03

The proposed framework is more robust under mismatched conditions.

Abstract

This paper addresses unsupervised diffusion-based single-channel speech enhancement (SE). Prior work in this direction combines a score-based diffusion model trained on clean speech with a Gaussian noise model whose covariance is structured by non-negative matrix factorization (NMF). This combination is used within an iterative expectation-maximization (EM) scheme, in which a diffusion-based posterior-sampling E-step estimates the clean speech. We first revisit this framework and propose to explicitly model both speech and acoustic noise as latent variables, jointly sampling them in the E-step instead of sampling speech alone as in previous approaches. We then introduce a new unsupervised SE framework that replaces the NMF noise prior with a diffusion-based noise model, learned jointly with the speech prior in a single conditional score model. Within this framework, we derive two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques