Single-channel speech separation using Soft-minimum Permutation   Invariant Training

Midia Yousefi; John H.L. Hansen

arXiv:2111.08635·eess.AS·November 17, 2021

Single-channel speech separation using Soft-minimum Permutation Invariant Training

Midia Yousefi, John H.L. Hansen

PDF

TL;DR

This paper introduces a probabilistic optimization framework called trainable Soft-minimum Permutation Invariant Training to improve speech separation, outperforming traditional PIT by significantly enhancing SDR and SIR metrics.

Contribution

It proposes a novel Soft-minimum PIT method that better handles label permutation ambiguity in speech separation, improving upon existing PIT techniques.

Findings

01

Outperforms conventional PIT with +1dB SDR improvement

02

Achieves +1.5dB SIR enhancement over traditional PIT

03

Statistically significant results (p-value < 0.01)

Abstract

The goal of speech separation is to extract multiple speech sources from a single microphone recording. Recently, with the advancement of deep learning and availability of large datasets, speech separation has been formulated as a supervised learning problem. These approaches aim to learn discriminative patterns of speech, speakers, and background noise using a supervised learning algorithm, typically a deep neural network. A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal, referred to as label permutation ambiguity. Permutation ambiguity refers to the problem of determining the output-label assignment between the separated sources and the available single-speaker speech labels. Finding the best output-label assignment is required for calculation of separation error, which is later used for updating parameters of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.