Deep Neural Networks for Multiple Speaker Detection and Localization

Weipeng He; Petr Motlicek; Jean-Marc Odobez

arXiv:1711.11565·cs.SD·September 18, 2018

Deep Neural Networks for Multiple Speaker Detection and Localization

Weipeng He, Petr Motlicek, Jean-Marc Odobez

PDF

1 Repo

TL;DR

This paper introduces neural network methods for detecting and localizing multiple sound sources simultaneously in human-robot interaction, outperforming traditional spatial spectrum techniques.

Contribution

It presents a likelihood-based encoding for neural networks to detect an arbitrary number of sound sources and explores sub-band cross-correlation features and three neural architectures.

Findings

01

Significantly outperforms traditional spatial spectrum methods.

02

Effective detection of multiple sound sources in real robot data.

03

Improved localization accuracy with sub-band cross-correlation features.

Abstract

We propose to use neural networks for simultaneous detection and localization of multiple sound sources in human-robot interaction. In contrast to conventional signal processing techniques, neural network-based sound source localization methods require fewer strong assumptions about the environment. Previous neural network-based methods have been focusing on localizing a single sound source, which do not extend to multiple sources in terms of detection and localization. In this paper, we thus propose a likelihood-based encoding of the network output, which naturally allows the detection of an arbitrary number of sources. In addition, we investigate the use of sub-band cross-correlation information as features for better localization in sound mixtures, as well as three different network architectures based on different motivations. Experiments on real data recorded from a robot show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepspike/Binary-Neural-Network-for-Sound-Localization
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.