TL;DR
This paper introduces neural network methods for detecting and localizing multiple sound sources simultaneously in human-robot interaction, outperforming traditional spatial spectrum techniques.
Contribution
It presents a likelihood-based encoding for neural networks to detect an arbitrary number of sound sources and explores sub-band cross-correlation features and three neural architectures.
Findings
Significantly outperforms traditional spatial spectrum methods.
Effective detection of multiple sound sources in real robot data.
Improved localization accuracy with sub-band cross-correlation features.
Abstract
We propose to use neural networks for simultaneous detection and localization of multiple sound sources in human-robot interaction. In contrast to conventional signal processing techniques, neural network-based sound source localization methods require fewer strong assumptions about the environment. Previous neural network-based methods have been focusing on localizing a single sound source, which do not extend to multiple sources in terms of detection and localization. In this paper, we thus propose a likelihood-based encoding of the network output, which naturally allows the detection of an arbitrary number of sources. In addition, we investigate the use of sub-band cross-correlation information as features for better localization in sound mixtures, as well as three different network architectures based on different motivations. Experiments on real data recorded from a robot show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
