Rethinking Flow and Diffusion Bridge Models for Speech Enhancement
Dahan Wang, Jun Gao, Tong Lei, Yuxiang Hu, Changbao Zhu, Kai Chen, and Jing Lu

TL;DR
This paper unifies flow and diffusion bridge models for speech enhancement, revealing their predictive nature, and introduces an improved model that outperforms existing methods with fewer resources.
Contribution
It provides a unified framework for flow and diffusion models, linking them to predictive speech enhancement, and proposes an enhanced model with better performance and efficiency.
Findings
Outperforms existing flow and diffusion baselines
Uses fewer parameters and less computation
Highlights predictive nature limits performance
Abstract
Flow matching and diffusion bridge models have emerged as leading paradigms in generative speech enhancement, modeling stochastic processes between paired noisy and clean speech signals based on principles such as flow matching, score matching, and Schr\"odinger bridge. In this paper, we present a framework that unifies existing flow and diffusion bridge models by interpreting them as constructions of Gaussian probability paths with varying means and variances between paired data. Furthermore, we investigate the underlying consistency between the training/inference procedures of these generative models and conventional predictive models. Our analysis reveals that each sampling step of a well-trained flow or diffusion bridge model optimized with a data prediction loss is theoretically analogous to executing predictive speech enhancement. Motivated by this insight, we introduce an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis
