Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers
Yuzhu Wang, Archontis Politis, Konstantinos Drossos, Tuomas Virtanen

TL;DR
This paper introduces an attractor-based speech separation model capable of handling unknown numbers of speakers and multiple utterances, outperforming existing methods especially in noisy and reverberant conditions.
Contribution
It presents a novel attractor-based architecture that jointly estimates speaker count, detects speaker activity, and separates utterances in single-channel recordings.
Findings
Accurately estimates the number of speakers in various conditions.
Effectively detects speaker activity and separates utterances.
Outperforms existing methods in noisy and reverberant environments.
Abstract
This paper addresses the problem of single-channel speech separation, where the number of speakers is unknown, and each speaker may speak multiple utterances. We propose a speech separation model that simultaneously performs separation, dynamically estimates the number of speakers, and detects individual speaker activities by integrating an attractor module. The proposed system outperforms existing methods by introducing an attractor-based architecture that effectively combines local and global temporal modeling for multi-utterance scenarios. To evaluate the method in reverberant and noisy conditions, a multi-speaker multi-utterance dataset was synthesized by combining Librispeech speech signals with WHAM! noise signals. The results demonstrate that the proposed system accurately estimates the number of sources. The system effectively detects source activities and separates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Blind Source Separation Techniques
