FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement
Yoshiki Masuyama, Kohei Saijo, Francesco Paissan, Jiangyu Han, Marc Delcroix, Ryo Aihara, Fran\c{c}ois G. Germain, Gordon Wichern, Jonathan Le Roux

TL;DR
FlexIO is a versatile speech separation and enhancement system capable of handling varying numbers of speakers and microphone configurations, demonstrating robustness across diverse real-world conditions.
Contribution
The paper introduces FlexIO, a novel flexible SSE system that unifies single- and multi-channel separation with conditional prompts for arbitrary speaker counts.
Findings
Successfully separates 1-5 microphones and 1-3 speakers.
Demonstrates robustness on CHiME-4 real data.
Unifies single- and multi-channel SSE approaches.
Abstract
Speech separation and enhancement (SSE) has advanced remarkably and achieved promising results in controlled settings, such as a fixed number of speakers and a fixed array configuration. Towards a universal SSE system, single-channel systems have been extended to deal with a variable number of speakers (i.e., outputs). Meanwhile, multi-channel systems accommodating various array configurations (i.e., inputs) have been developed. However, these attempts have been pursued separately. In this paper, we propose a flexible input and output SSE system, named FlexIO. It performs conditional separation using prompt vectors, one per speaker as a condition, allowing separation of an arbitrary number of speakers. Multi-channel mixtures are processed together with the prompt vectors via an array-agnostic channel communication mechanism. Our experiments demonstrate that FlexIO successfully covers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
