Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds
Abdelrahman Younes, Daniel Honerkamp, Tim Welschehold, Abhinav, Valada

TL;DR
This paper introduces a new benchmark and a reinforcement learning method for audio-visual navigation that effectively locates moving and unheard sound sources in noisy, complex environments, outperforming existing approaches.
Contribution
It presents a novel dynamic audio-visual navigation benchmark and a robust RL-based approach that fuses spatial audio-visual data for complex sound source localization.
Findings
Outperforms state-of-the-art methods across all tasks
Effective in noisy and dynamic environments
Validated on Matterport3D and Replica datasets
Abstract
Audio-visual navigation combines sight and hearing to navigate to a sound-emitting source in an unmapped environment. While recent approaches have demonstrated the benefits of audio input to detect and find the goal, they focus on clean and static sound sources and struggle to generalize to unheard sounds. In this work, we propose the novel dynamic audio-visual navigation benchmark which requires catching a moving sound source in an environment with noisy and distracting sounds, posing a range of new challenges. We introduce a reinforcement learning approach that learns a robust navigation policy for these complex settings. To achieve this, we propose an architecture that fuses audio-visual information in the spatial feature space to learn correlations of geometric information inherent in both local maps and audio signals. We demonstrate that our approach consistently outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies
