SepIt: Approaching a Single Channel Speech Separation Bound
Shahar Lutati, Eliya Nachmani, Lior Wolf

TL;DR
This paper establishes an upper bound for single-channel speech separation, introduces SepIt, a neural network that iteratively refines speaker estimates, and demonstrates its superior performance across multiple speaker scenarios.
Contribution
The paper proposes a new upper bound for speech separation and introduces SepIt, a neural network with iterative refinement and adaptive iteration count based on mutual information.
Findings
SepIt outperforms state-of-the-art methods for 2, 3, 5, and 10 speakers.
The upper bound reveals room for improvement in multi-speaker separation.
Iterative refinement improves speaker estimation accuracy.
Abstract
We present an upper bound for the Single Channel Speech Separation task, which is based on an assumption regarding the nature of short segments of speech. Using the bound, we are able to show that while the recent methods have made significant progress for a few speakers, there is room for improvement for five and ten speakers. We then introduce a Deep neural network, SepIt, that iteratively improves the different speakers' estimation. At test time, SpeIt has a varying number of iterations per test sample, based on a mutual information criterion that arises from our analysis. In an extensive set of experiments, SepIt outperforms the state-of-the-art neural networks for 2, 3, 5, and 10 speakers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
