The Cone of Silence: Speech Separation by Localization

Teerapat Jenrungrot; Vivek Jayaram; Steve Seitz; Ira; Kemelmacher-Shlizerman

arXiv:2010.06007·cs.SD·October 14, 2020·25 cites

The Cone of Silence: Speech Separation by Localization

Teerapat Jenrungrot, Vivek Jayaram, Steve Seitz, Ira, Kemelmacher-Shlizerman

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a deep learning method for localizing and separating multiple speakers in multi-microphone recordings, capable of handling moving speakers and unknown counts with high accuracy even in noisy environments.

Contribution

It presents a waveform-domain deep network that localizes and separates sources within angular regions, enabling efficient binary search for multiple speakers, including unseen and moving ones.

Findings

01

Achieves state-of-the-art separation and localization performance.

02

Handles an arbitrary number of moving speakers at test time.

03

Performs well in high background noise conditions.

Abstract

Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers. At the core of our method is a deep network, in the waveform domain, which isolates sources within an angular region $θ \pm w /2$ , given an angle of interest $θ$ and angular window size $w$ . By exponentially decreasing $w$ , we can perform a binary search to localize and separate all sources in logarithmic time. Our algorithm allows for an arbitrary number of potentially moving speakers at test time, including more speakers than seen during training. Experiments demonstrate state-of-the-art performance for both source separation and source localization, particularly in high levels of background noise.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vivjay30/Cone-of-Silence
pytorchOfficial

Videos

The Cone of Silence: Speech Separation by Localization· slideslive

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Blind Source Separation Techniques