Source Separation by Flow Matching

Robin Scheibler; John R. Hershey; Arnaud Doucet; Henry Li

arXiv:2505.16119·cs.SD·July 21, 2025

Source Separation by Flow Matching

Robin Scheibler, John R. Hershey, Arnaud Doucet, Henry Li

PDF

Open Access

TL;DR

This paper introduces FLOSS, a flow matching-based method for single-channel audio source separation that ensures strict mixture consistency and leverages an equivariant neural network architecture, demonstrating effectiveness on speech separation tasks.

Contribution

The paper proposes FLOSS, a novel flow matching approach with an equivariant neural network for source separation, addressing permutation invariance and mixture consistency.

Findings

01

Effective separation of overlapping speech demonstrated.

02

Strict mixture consistency achieved through flow matching.

03

Neural network architecture is equivariant by design.

Abstract

We consider the problem of single-channel audio source separation with the goal of reconstructing $K$ sources from their mixture. We address this ill-posed problem with FLOSS (FLOw matching for Source Separation), a constrained generation method based on flow matching, ensuring strict mixture consistency. Flow matching is a general methodology that, when given samples from two probability distributions defined on the same space, learns an ordinary differential equation to output a sample from one of the distributions when provided with a sample from the other. In our context, we have access to samples from the joint distribution of $K$ sources and so the corresponding samples from the lower-dimensional distribution of their mixture. To apply flow matching, we augment these mixture samples with artificial noise components to match the dimensionality of the $K$ source distribution.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Blind Source Separation Techniques