Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement

Mostafa Sadeghi (MULTISPEECH); Jean-Eudes Ayilo (MULTISPEECH); Romain Serizel (MULTISPEECH); Xavier Alameda-Pineda (ROBOTLEARN)

arXiv:2507.02391·cs.SD·July 4, 2025

Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement

Mostafa Sadeghi (MULTISPEECH), Jean-Eudes Ayilo (MULTISPEECH), Romain Serizel (MULTISPEECH), Xavier Alameda-Pineda (ROBOTLEARN)

PDF

TL;DR

This paper introduces two novel algorithms for unsupervised speech enhancement using diffusion models, improving robustness and eliminating hyperparameter tuning by directly modeling reverse transition distributions.

Contribution

It proposes two new methods that directly model the reverse transition distribution in diffusion models for speech enhancement, removing the need for hyperparameter tuning and improving robustness.

Findings

01

Enhanced speech quality metrics on WSJ0-QUT and VoiceBank-DEMAND datasets.

02

Greater robustness to domain shifts compared to baselines.

03

Elimination of hyperparameter tuning in the diffusion process.

Abstract

We explore unsupervised speech enhancement using diffusion models as expressive generative priors for clean speech. Existing approaches guide the reverse diffusion process using noisy speech through an approximate, noise-perturbed likelihood score, combined with the unconditional score via a trade-off hyperparameter. In this work, we propose two alternative algorithms that directly model the conditional reverse transition distribution of diffusion states. The first method integrates the diffusion prior with the observation model in a principled way, removing the need for hyperparameter tuning. The second defines a diffusion process over the noisy speech itself, yielding a fully tractable and exact likelihood score. Experiments on the WSJ0-QUT and VoiceBank-DEMAND datasets demonstrate improved enhancement metrics and greater robustness to domain shifts compared to both supervised and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.