Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources   in Unmapped 3D Environments

Abdelrahman Younes

arXiv:2201.04279·cs.CV·January 13, 2022·1 cites

Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources in Unmapped 3D Environments

Abdelrahman Younes

PDF

Open Access

TL;DR

This paper introduces a new dynamic audio-visual navigation benchmark where an AI agent must catch moving sound sources in complex, noisy, and unmapped 3D environments, using a multi-modal reinforcement learning approach.

Contribution

The paper presents a novel benchmark and a multi-modal reinforcement learning method that improves generalization and robustness in dynamic, noisy, and unseen sound source navigation tasks.

Findings

01

Outperforms state-of-the-art methods in new benchmark

02

Shows better generalization to unheard sounds

03

Demonstrates robustness in noisy scenarios

Abstract

Recent work on audio-visual navigation targets a single static sound in noise-free audio environments and struggles to generalize to unheard sounds. We introduce the novel dynamic audio-visual navigation benchmark in which an embodied AI agent must catch a moving sound source in an unmapped environment in the presence of distractors and noisy sounds. We propose an end-to-end reinforcement learning approach that relies on a multi-modal architecture that fuses the spatial audio-visual information from a binaural audio signal and spatial occupancy maps to encode the features needed to learn a robust navigation policy for our new complex task settings. We demonstrate that our approach outperforms the current state-of-the-art with better generalization to unheard sounds and better robustness to noisy scenarios on the two challenging 3D scanned real-world datasets Replica and Matterport3D,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies