Towards Domain Independence in CNN-based Acoustic Localization using Deep Cross Correlations
Juan Manuel Vera-Diaz, Daniel Pizarro, Javier Macias-Guarasa

TL;DR
This paper introduces a CNN-based method for acoustic source localization that maintains high accuracy across different environments without needing re-training, outperforming traditional and deep learning methods in mismatched conditions.
Contribution
The authors propose a novel encoder-decoder CNN architecture that estimates smoothed correlation signals, enabling robust acoustic localization across varying environments without re-training.
Findings
Outperforms SRP-PHAT and other deep learning methods in mismatched conditions.
Effective in three publicly available realistic datasets.
Does not require re-training for different environments.
Abstract
Time delay estimation is essential in Acoustic Source Localization (ASL) systems. One of the most used techniques for this purpose is the Generalized Cross Correlation (GCC) between a pair of signals and its use in Steered Response Power (SRP) techniques, which estimate the acoustic power at a specific location. Nowadays, Deep Learning strategies may outperform these methods. However, they are generally dependent on the geometric and sensor configuration conditions that are available during the training phases, thus having limited generalization capabilities when facing new environments if no re-training nor adaptation is applied. In this work, we propose a method based on an encoder-decoder CNN architecture capable of outperforming the well known SRP-PHAT algorithm, and also other Deep Learning strategies when working in mismatched training-testing conditions without requiring a model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
