Many-Speakers Single Channel Speech Separation with Optimal Permutation   Training

Shaked Dovrat; Eliya Nachmani; Lior Wolf

arXiv:2104.08955·cs.SD·November 9, 2021

Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

Shaked Dovrat, Eliya Nachmani, Lior Wolf

PDF

1 Repo

TL;DR

This paper introduces a novel permutation invariant training method using the Hungarian algorithm for single-channel speech separation, enabling separation of up to 20 speakers efficiently and outperforming previous methods for large speaker counts.

Contribution

The paper proposes a new training approach employing the Hungarian algorithm and a modified architecture to handle large numbers of speakers in single-channel speech separation.

Findings

01

Successfully separates up to 20 speakers.

02

Significantly outperforms previous methods for large C.

03

Reduces training complexity from factorial to cubic in C.

Abstract

Single channel speech separation has experienced great progress in the last few years. However, training neural speech separation for a large number of speakers (e.g., more than 10 speakers) is out of reach for the current methods, which rely on the Permutation Invariant Loss (PIT). In this work, we present a permutation invariant training that employs the Hungarian algorithm in order to train with an $O (C^{3})$ time complexity, where $C$ is the number of speakers, in comparison to $O (C!)$ of PIT based methods. Furthermore, we present a modified architecture that can handle the increased number of speakers. Our approach separates up to $20$ speakers and improves the previous results for large $C$ by a wide margin.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shakeddovrat/librimix
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.