Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers

Yuzhu Wang; Archontis Politis; Konstantinos Drossos; Tuomas Virtanen

arXiv:2505.16607·eess.AS·May 23, 2025

Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers

Yuzhu Wang, Archontis Politis, Konstantinos Drossos, Tuomas Virtanen

PDF

Open Access

TL;DR

This paper introduces an attractor-based speech separation model capable of handling unknown numbers of speakers and multiple utterances, outperforming existing methods especially in noisy and reverberant conditions.

Contribution

It presents a novel attractor-based architecture that jointly estimates speaker count, detects speaker activity, and separates utterances in single-channel recordings.

Findings

01

Accurately estimates the number of speakers in various conditions.

02

Effectively detects speaker activity and separates utterances.

03

Outperforms existing methods in noisy and reverberant environments.

Abstract

This paper addresses the problem of single-channel speech separation, where the number of speakers is unknown, and each speaker may speak multiple utterances. We propose a speech separation model that simultaneously performs separation, dynamically estimates the number of speakers, and detects individual speaker activities by integrating an attractor module. The proposed system outperforms existing methods by introducing an attractor-based architecture that effectively combines local and global temporal modeling for multi-utterance scenarios. To evaluate the method in reverberant and noisy conditions, a multi-speaker multi-utterance dataset was synthesized by combining Librispeech speech signals with WHAM! noise signals. The results demonstrate that the proposed system accurately estimates the number of sources. The system effectively detects source activities and separates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Blind Source Separation Techniques