Voice Separation with an Unknown Number of Multiple Speakers
Eliya Nachmani, Yossi Adi, Lior Wolf

TL;DR
This paper introduces a neural network-based approach for separating multiple simultaneous speakers in audio recordings, capable of handling an unknown number of speakers and outperforming existing methods especially with more than two speakers.
Contribution
The paper proposes a novel gated neural network model that dynamically adapts to the number of speakers and improves separation performance over previous techniques.
Findings
Outperforms current state-of-the-art methods for more than two speakers
Uses separate models trained for different speaker counts to identify the actual number of speakers
Employs fixed speaker outputs at multiple processing steps for effective separation
Abstract
We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
Methodsutterance level permutation invariant training · Convolution · Parameterized ReLU · *Communicated@Fast*How Do I Communicate to Expedia? · Long Short-Term Memory
