Speech Enhancement Based on Drifting Models

Liang Xu; Diego Caviedes-Nozal; W. Bastiaan Kleijn; Longfei Felix Yan; and Rasmus Kongsgaard Olsson

arXiv:2604.24199·cs.SD·May 21, 2026

Speech Enhancement Based on Drifting Models

Liang Xu, Diego Caviedes-Nozal, W. Bastiaan Kleijn, Longfei Felix Yan, and Rasmus Kongsgaard Olsson

PDF

TL;DR

This paper introduces DriftSE, a novel speech enhancement framework that uses a drifting model to achieve high-quality denoising in a single step by matching distributions, outperforming existing diffusion-based methods.

Contribution

DriftSE is a new generative framework for speech enhancement that enables one-step inference by evolving distributions, simplifying the process and improving performance.

Findings

01

DriftSE achieves high-fidelity speech enhancement in a single step.

02

It outperforms multi-step diffusion baselines on VoiceBank-DEMAND.

03

The framework facilitates training on unpaired data by distribution matching.

Abstract

We propose Speech Enhancement based on Drifting Models (DriftSE), a novel generative framework that formulates denoising as an equilibrium problem. Rather than relying on iterative sampling, DriftSE natively achieves one-step inference by evolving the pushforward distribution of a mapping function to directly match the clean speech distribution. This evolution is driven by a Drifting Field, a learned correction vector that guides samples toward the high-density regions of the clean distribution, which naturally facilitates training on unpaired data by matching distributions rather than paired samples. We investigate the framework under two formulations: a direct mapping from the noisy observation, and a stochastic conditional generative model from a Gaussian prior. Experiments on the VoiceBank-DEMAND benchmark demonstrate that DriftSE achieves high-fidelity enhancement in a single step,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.