Reliability-Aware Geometric Fusion for Robust Audio-Visual Navigation

Teng Liu; Yinfeng Yu

arXiv:2604.02391·cs.SD·April 6, 2026

Reliability-Aware Geometric Fusion for Robust Audio-Visual Navigation

Teng Liu, Yinfeng Yu

PDF

TL;DR

The paper introduces RAVN, a reliability-aware framework for audio-visual navigation that dynamically calibrates sensor fusion based on learned audio reliability cues, improving robustness in complex acoustic environments.

Contribution

It proposes a novel reliability-aware fusion method with AGR and RAGM, enabling better generalization and robustness without geometric labels during inference.

Findings

01

RAVN outperforms baselines in navigation tasks.

02

AGR effectively learns observation-dependent reliability cues.

03

RAGM mitigates cross-modal conflicts, enhancing robustness.

Abstract

Audio-Visual Navigation (AVN) requires an embodied agent to navigate toward a sound source by utilizing both vision and binaural audio. A core challenge arises in complex acoustic environments, where binaural cues become intermittently unreliable, particularly when generalizing to previously unheard sound categories. To address this, we propose RAVN (Reliability-Aware Audio-Visual Navigation), a framework that conditions cross-modal fusion on audio-derived reliability cues, dynamically calibrating the integration of audio and visual inputs. RAVN introduces an Acoustic Geometry Reasoner (AGR) that is trained with geometric proxy supervision. Using a heteroscedastic Gaussian NLL objective, AGR learns observation-dependent dispersion as a practical reliability cue, eliminating the need for geometric labels during inference. Additionally, we introduce Reliability-Aware Geometric Modulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.