Voice Separation with an Unknown Number of Multiple Speakers

Eliya Nachmani; Yossi Adi; Lior Wolf

arXiv:2003.01531·eess.AS·September 2, 2020·19 cites

Voice Separation with an Unknown Number of Multiple Speakers

Eliya Nachmani, Yossi Adi, Lior Wolf

PDF

Open Access 4 Repos 1 Video

TL;DR

This paper introduces a neural network-based approach for separating multiple simultaneous speakers in audio recordings, capable of handling an unknown number of speakers and outperforming existing methods especially with more than two speakers.

Contribution

The paper proposes a novel gated neural network model that dynamically adapts to the number of speakers and improves separation performance over previous techniques.

Findings

01

Outperforms current state-of-the-art methods for more than two speakers

02

Uses separate models trained for different speaker counts to identify the actual number of speakers

03

Employs fixed speaker outputs at multiple processing steps for effective separation

Abstract

We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Voice Separation with an Unknown Number of Multiple Speakers· slideslive

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development

Methodsutterance level permutation invariant training · Convolution · Parameterized ReLU · *Communicated@Fast*How Do I Communicate to Expedia? · Long Short-Term Memory