FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement

Yoshiki Masuyama; Kohei Saijo; Francesco Paissan; Jiangyu Han; Marc Delcroix; Ryo Aihara; Fran\c{c}ois G. Germain; Gordon Wichern; Jonathan Le Roux

arXiv:2510.21485·cs.SD·October 27, 2025

FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement

Yoshiki Masuyama, Kohei Saijo, Francesco Paissan, Jiangyu Han, Marc Delcroix, Ryo Aihara, Fran\c{c}ois G. Germain, Gordon Wichern, Jonathan Le Roux

PDF

TL;DR

FlexIO is a versatile speech separation and enhancement system capable of handling varying numbers of speakers and microphone configurations, demonstrating robustness across diverse real-world conditions.

Contribution

The paper introduces FlexIO, a novel flexible SSE system that unifies single- and multi-channel separation with conditional prompts for arbitrary speaker counts.

Findings

01

Successfully separates 1-5 microphones and 1-3 speakers.

02

Demonstrates robustness on CHiME-4 real data.

03

Unifies single- and multi-channel SSE approaches.

Abstract

Speech separation and enhancement (SSE) has advanced remarkably and achieved promising results in controlled settings, such as a fixed number of speakers and a fixed array configuration. Towards a universal SSE system, single-channel systems have been extended to deal with a variable number of speakers (i.e., outputs). Meanwhile, multi-channel systems accommodating various array configurations (i.e., inputs) have been developed. However, these attempts have been pursued separately. In this paper, we propose a flexible input and output SSE system, named FlexIO. It performs conditional separation using prompt vectors, one per speaker as a condition, allowing separation of an arbitrary number of speakers. Multi-channel mixtures are processed together with the prompt vectors via an array-agnostic channel communication mechanism. Our experiments demonstrate that FlexIO successfully covers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.