Single-channel speech separation using Soft-minimum Permutation Invariant Training
Midia Yousefi, John H.L. Hansen

TL;DR
This paper introduces a probabilistic optimization framework called trainable Soft-minimum Permutation Invariant Training to improve speech separation, outperforming traditional PIT by significantly enhancing SDR and SIR metrics.
Contribution
It proposes a novel Soft-minimum PIT method that better handles label permutation ambiguity in speech separation, improving upon existing PIT techniques.
Findings
Outperforms conventional PIT with +1dB SDR improvement
Achieves +1.5dB SIR enhancement over traditional PIT
Statistically significant results (p-value < 0.01)
Abstract
The goal of speech separation is to extract multiple speech sources from a single microphone recording. Recently, with the advancement of deep learning and availability of large datasets, speech separation has been formulated as a supervised learning problem. These approaches aim to learn discriminative patterns of speech, speakers, and background noise using a supervised learning algorithm, typically a deep neural network. A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal, referred to as label permutation ambiguity. Permutation ambiguity refers to the problem of determining the output-label assignment between the separated sources and the available single-speaker speech labels. Finding the best output-label assignment is required for calculation of separation error, which is later used for updating parameters of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
