Speeding Up Permutation Invariant Training for Source Separation

Thilo von Neumann; Christoph Boeddeker; Keisuke Kinoshita; Marc; Delcroix; Reinhold Haeb-Umbach

arXiv:2107.14445·eess.AS·August 2, 2021·6 cites

Speeding Up Permutation Invariant Training for Source Separation

Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc, Delcroix, Reinhold Haeb-Umbach

PDF

Open Access 1 Repo

TL;DR

This paper introduces an efficient decomposition of permutation invariant training (PIT) for source separation, reducing computational complexity from exponential to polynomial, enabling practical use for large speaker counts and long recordings.

Contribution

The paper proposes a novel decomposition of PIT into matrix computation and a monotonic function, allowing the use of efficient algorithms like Hungarian for uPIT and new algorithms for Graph-PIT.

Findings

01

Complexity reduced from exponential to polynomial

02

Efficient algorithms enable large-scale source separation

03

Improved feasibility for long recordings and many speakers

Abstract

Permutation invariant training (PIT) is a widely used training criterion for neural network-based source separation, used for both utterance-level separation with utterance-level PIT (uPIT) and separation of long recordings with the recently proposed Graph-PIT. When implemented naively, both suffer from an exponential complexity in the number of utterances to separate, rendering them unusable for large numbers of speakers or long realistic recordings. We present a decomposition of the PIT criterion into the computation of a matrix and a strictly monotonously increasing function so that the permutation or assignment problem can be solved efficiently with several search algorithms. The Hungarian algorithm can be used for uPIT and we introduce various algorithms for the Graph-PIT assignment problem to reduce the complexity to be polynomial in the number of utterances.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fgnt/graph_pit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

Methodsutterance level permutation invariant training