Speech Enhancement based on cascaded two flows
Seonggyu Lee, Sein Cheong, Sangwook Han, Kihyuk Kim, Jong Won Shin

TL;DR
This paper introduces a cascaded flow matching approach for speech enhancement that uses an identical model for both initial speech generation and enhancement, achieving high performance with fewer function evaluations.
Contribution
It proposes a novel cascaded flow matching method employing the same model for both initial speech generation and enhancement, reducing computational cost.
Findings
Achieves comparable or better performance than previous methods.
Requires same or fewer NFEs with cascaded models.
Effective use of identical models for generation and enhancement.
Abstract
Speech enhancement (SE) based on diffusion probabilistic models has exhibited impressive performance, while requiring a relatively high number of function evaluations (NFE). Recently, SE based on flow matching has been proposed, which showed competitive performance with a small NFE. Early approaches adopted the noisy speech as the only conditioning variable. There have been other approaches which utilize speech enhanced with a predictive model as another conditioning variable and to sample an initial value, but they require a separate predictive model on top of the generative SE model. In this work, we propose to employ an identical model based on flow matching for both SE and generating enhanced speech used as an initial starting point and a conditioning variable. Experimental results showed that the proposed method required the same or fewer NFEs even with two cascaded generative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
