Speech Enhancement Based on Drifting Models
Liang Xu, Diego Caviedes-Nozal, W. Bastiaan Kleijn, Longfei Felix Yan, and Rasmus Kongsgaard Olsson

TL;DR
This paper introduces DriftSE, a novel speech enhancement framework that uses a drifting model to achieve high-quality denoising in a single step by matching distributions, outperforming existing diffusion-based methods.
Contribution
DriftSE is a new generative framework for speech enhancement that enables one-step inference by evolving distributions, simplifying the process and improving performance.
Findings
DriftSE achieves high-fidelity speech enhancement in a single step.
It outperforms multi-step diffusion baselines on VoiceBank-DEMAND.
The framework facilitates training on unpaired data by distribution matching.
Abstract
We propose Speech Enhancement based on Drifting Models (DriftSE), a novel generative framework that formulates denoising as an equilibrium problem. Rather than relying on iterative sampling, DriftSE natively achieves one-step inference by evolving the pushforward distribution of a mapping function to directly match the clean speech distribution. This evolution is driven by a Drifting Field, a learned correction vector that guides samples toward the high-density regions of the clean distribution, which naturally facilitates training on unpaired data by matching distributions rather than paired samples. We investigate the framework under two formulations: a direct mapping from the noisy observation, and a stochastic conditional generative model from a Gaussian prior. Experiments on the VoiceBank-DEMAND benchmark demonstrate that DriftSE achieves high-fidelity enhancement in a single step,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
