Navig-AI-tion: Navigation by Contextual AI and Spatial Audio
Mathias N. Lystb{\ae}k, Haley Adams, Ranjith Kagathi Ananda, Eric J Gonzalez, Luca Ballan, Qiuxuan Wu, Andrea Cola\c{c}o, Peter Tan, Mar Gonzalez-Franco

TL;DR
This paper introduces a navigation system combining a Vision Language Model with spatial audio cues to improve orientation and reduce errors in audio-only walking navigation.
Contribution
It presents a novel integration of environmental landmark extraction with directional spatial audio cues, enhancing audio-only navigation accuracy and user experience.
Findings
Spatial audio cues reduced route deviations in user study.
Users reported better orientation with landmark-anchored instructions.
The system outperformed VLM-only and Google Maps in navigation accuracy.
Abstract
Audio-only walking navigation can leave users disoriented, relying on vague cardinal directions and lacking real-time environmental context, leading to frequent errors. To address this, we present a novel system that integrates a Vision Language Model (VLM) with a spatial audio cue. Our system extracts environmental landmarks to anchor navigation instructions and, crucially, provides a directional spatial audio signal when the user faces the wrong direction, indicating the precise turn direction. In a user study (n=12), the spatial audio cue with VLM reduced route deviations compared to both VLM-only and Google Maps (audio-only) baseline systems. Users reported that the spatial audio cue effectively supported orientation and that landmark-anchored instructions provided a better navigation experience over audio-only Google Maps. This work serves as an initial look at the utility of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
