Speech Enhancement based on cascaded two flows

Seonggyu Lee; Sein Cheong; Sangwook Han; Kihyuk Kim; Jong Won Shin

arXiv:2508.06842·eess.AS·August 20, 2025·Interspeech

Speech Enhancement based on cascaded two flows

Seonggyu Lee, Sein Cheong, Sangwook Han, Kihyuk Kim, Jong Won Shin

PDF

Open Access

TL;DR

This paper introduces a cascaded flow matching approach for speech enhancement that uses an identical model for both initial speech generation and enhancement, achieving high performance with fewer function evaluations.

Contribution

It proposes a novel cascaded flow matching method employing the same model for both initial speech generation and enhancement, reducing computational cost.

Findings

01

Achieves comparable or better performance than previous methods.

02

Requires same or fewer NFEs with cascaded models.

03

Effective use of identical models for generation and enhancement.

Abstract

Speech enhancement (SE) based on diffusion probabilistic models has exhibited impressive performance, while requiring a relatively high number of function evaluations (NFE). Recently, SE based on flow matching has been proposed, which showed competitive performance with a small NFE. Early approaches adopted the noisy speech as the only conditioning variable. There have been other approaches which utilize speech enhanced with a predictive model as another conditioning variable and to sample an initial value, but they require a separate predictive model on top of the generative SE model. In this work, we propose to employ an identical model based on flow matching for both SE and generating enhanced speech used as an initial starting point and a conditioning variable. Experimental results showed that the proposed method required the same or fewer NFEs even with two cascaded generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development