FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching

Ziqian Wang; Zikai Liu; Xinfa Zhu; Yike Zhu; Mingshuai Liu; Jun Chen; Longshuai Xiao; Chao Weng; Lei Xie

arXiv:2505.19476·eess.AS·May 28, 2025·Interspeech

FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching

Ziqian Wang, Zikai Liu, Xinfa Zhu, Yike Zhu, Mingshuai Liu, Jun Chen, Longshuai Xiao, Chao Weng, Lei Xie

PDF

Open Access 1 Repo

TL;DR

FlowSE introduces a flow-matching approach for speech enhancement that reduces inference latency and improves quality, effectively handling noisy speech with or without textual information.

Contribution

This paper presents FlowSE, a novel flow-matching model for speech enhancement that outperforms existing generative methods and simplifies the transformation process.

Findings

01

FlowSE achieves higher speech quality than state-of-the-art methods.

02

FlowSE operates efficiently with reduced inference latency.

03

FlowSE performs well with or without textual transcripts.

Abstract

Generative models have excelled in audio tasks using approaches such as language models, diffusion, and flow matching. However, existing generative approaches for speech enhancement (SE) face notable challenges: language model-based methods suffer from quantization loss, leading to compromised speaker similarity and intelligibility, while diffusion models require complex training and high inference latency. To address these challenges, we propose FlowSE, a flow-matching-based model for SE. Flow matching learns a continuous transformation between noisy and clean speech distributions in a single pass, significantly reducing inference latency while maintaining high-quality reconstruction. Specifically, FlowSE trains on noisy mel spectrograms and optional character sequences, optimizing a conditional flow matching loss with ground-truth mel spectrograms as supervision. It implicitly learns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

honee-w/flowse
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsDiffusion