Reliability-Aware Geometric Fusion for Robust Audio-Visual Navigation
Teng Liu, Yinfeng Yu

TL;DR
The paper introduces RAVN, a reliability-aware framework for audio-visual navigation that dynamically calibrates sensor fusion based on learned audio reliability cues, improving robustness in complex acoustic environments.
Contribution
It proposes a novel reliability-aware fusion method with AGR and RAGM, enabling better generalization and robustness without geometric labels during inference.
Findings
RAVN outperforms baselines in navigation tasks.
AGR effectively learns observation-dependent reliability cues.
RAGM mitigates cross-modal conflicts, enhancing robustness.
Abstract
Audio-Visual Navigation (AVN) requires an embodied agent to navigate toward a sound source by utilizing both vision and binaural audio. A core challenge arises in complex acoustic environments, where binaural cues become intermittently unreliable, particularly when generalizing to previously unheard sound categories. To address this, we propose RAVN (Reliability-Aware Audio-Visual Navigation), a framework that conditions cross-modal fusion on audio-derived reliability cues, dynamically calibrating the integration of audio and visual inputs. RAVN introduces an Acoustic Geometry Reasoner (AGR) that is trained with geometric proxy supervision. Using a heteroscedastic Gaussian NLL objective, AGR learns observation-dependent dispersion as a practical reliability cue, eliminating the need for geometric labels during inference. Additionally, we introduce Reliability-Aware Geometric Modulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
