TL;DR
This paper introduces a novel permutation invariant training method using the Hungarian algorithm for single-channel speech separation, enabling separation of up to 20 speakers efficiently and outperforming previous methods for large speaker counts.
Contribution
The paper proposes a new training approach employing the Hungarian algorithm and a modified architecture to handle large numbers of speakers in single-channel speech separation.
Findings
Successfully separates up to 20 speakers.
Significantly outperforms previous methods for large C.
Reduces training complexity from factorial to cubic in C.
Abstract
Single channel speech separation has experienced great progress in the last few years. However, training neural speech separation for a large number of speakers (e.g., more than 10 speakers) is out of reach for the current methods, which rely on the Permutation Invariant Loss (PIT). In this work, we present a permutation invariant training that employs the Hungarian algorithm in order to train with an time complexity, where is the number of speakers, in comparison to of PIT based methods. Furthermore, we present a modified architecture that can handle the increased number of speakers. Our approach separates up to speakers and improves the previous results for large by a wide margin.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
