Recursive speech separation for unknown number of speakers

Naoya Takahashi; Sudarsanam Parthasaarathy; Nabarun Goswami and; Yuki Mitsufuji

arXiv:1904.03065·cs.SD·September 4, 2019·6 cites

Recursive speech separation for unknown number of speakers

Naoya Takahashi, Sudarsanam Parthasaarathy, Nabarun Goswami and, Yuki Mitsufuji

PDF

Open Access

TL;DR

This paper introduces a recursive speech separation method that handles an unknown number of speakers with a single model, using permutation invariant training and speaker detection, achieving state-of-the-art results.

Contribution

It presents a novel recursive separation approach with OR-PIT and speaker number detection, enabling a single model to separate varying speaker counts, including unseen four-speaker mixtures.

Findings

01

Achieves state-of-the-art results on two- and three-speaker mixtures.

02

Successfully separates four-speaker mixtures unseen during training.

03

Accurately detects the number of speakers during recursive separation.

Abstract

In this paper we propose a method of single-channel speaker-independent multi-speaker speech separation for an unknown number of speakers. As opposed to previous works, in which the number of speakers is assumed to be known in advance and speech separation models are specific for the number of speakers, our proposed method can be applied to cases with different numbers of speakers using a single model by recursively separating a speaker. To make the separation model recursively applicable, we propose one-and-rest permutation invariant training (OR-PIT). Evaluation on WSJ0-2mix and WSJ0-3mix datasets show that our proposed method achieves state-of-the-art results for two- and three-speaker mixtures with a single model. Moreover, the same model can separate four-speaker mixture, which was never seen during the training. We further propose the detection of the number of speakers in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing