SepIt: Approaching a Single Channel Speech Separation Bound

Shahar Lutati; Eliya Nachmani; Lior Wolf

arXiv:2205.11801·eess.AS·May 23, 2023·1 cites

SepIt: Approaching a Single Channel Speech Separation Bound

Shahar Lutati, Eliya Nachmani, Lior Wolf

PDF

Open Access

TL;DR

This paper establishes an upper bound for single-channel speech separation, introduces SepIt, a neural network that iteratively refines speaker estimates, and demonstrates its superior performance across multiple speaker scenarios.

Contribution

The paper proposes a new upper bound for speech separation and introduces SepIt, a neural network with iterative refinement and adaptive iteration count based on mutual information.

Findings

01

SepIt outperforms state-of-the-art methods for 2, 3, 5, and 10 speakers.

02

The upper bound reveals room for improvement in multi-speaker separation.

03

Iterative refinement improves speaker estimation accuracy.

Abstract

We present an upper bound for the Single Channel Speech Separation task, which is based on an assumption regarding the nature of short segments of speech. Using the bound, we are able to show that while the recent methods have made significant progress for a few speakers, there is room for improvement for five and ten speakers. We then introduce a Deep neural network, SepIt, that iteratively improves the different speakers' estimation. At test time, SpeIt has a varying number of iterations per test sample, based on a mutual information criterion that arises from our analysis. In an extensive set of experiments, SepIt outperforms the state-of-the-art neural networks for 2, 3, 5, and 10 speakers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing