Co-Localization of Audio Sources in Images Using Binaural Features and   Locally-Linear Regression

Antoine Deleforge; Radu Horaud; Yoav Schechner; Laurent Girin

arXiv:1408.2700·cs.SD·April 18, 2016

Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

Antoine Deleforge, Radu Horaud, Yoav Schechner, Laurent Girin

PDF

TL;DR

This paper introduces a supervised binaural audio localization method that efficiently localizes multiple sources without source separation, adaptable to variable sounds, and capable of audio-visual fusion, validated on a new real-room dataset.

Contribution

It presents a locally-linear Gaussian regression model for binaural source localization that works with variable-length sounds and enables audio-visual mapping, improving accuracy and speed.

Findings

01

Enhanced localization accuracy over state-of-the-art methods

02

Effective for speech and white noise sources

03

Supports real-time audio-visual applications

Abstract

This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally-linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings