Navig-AI-tion: Navigation by Contextual AI and Spatial Audio

Mathias N. Lystb{\ae}k; Haley Adams; Ranjith Kagathi Ananda; Eric J Gonzalez; Luca Ballan; Qiuxuan Wu; Andrea Cola\c{c}o; Peter Tan; Mar Gonzalez-Franco

arXiv:2603.13200·cs.HC·April 9, 2026

Navig-AI-tion: Navigation by Contextual AI and Spatial Audio

Mathias N. Lystb{\ae}k, Haley Adams, Ranjith Kagathi Ananda, Eric J Gonzalez, Luca Ballan, Qiuxuan Wu, Andrea Cola\c{c}o, Peter Tan, Mar Gonzalez-Franco

PDF

TL;DR

This paper introduces a navigation system combining a Vision Language Model with spatial audio cues to improve orientation and reduce errors in audio-only walking navigation.

Contribution

It presents a novel integration of environmental landmark extraction with directional spatial audio cues, enhancing audio-only navigation accuracy and user experience.

Findings

01

Spatial audio cues reduced route deviations in user study.

02

Users reported better orientation with landmark-anchored instructions.

03

The system outperformed VLM-only and Google Maps in navigation accuracy.

Abstract

Audio-only walking navigation can leave users disoriented, relying on vague cardinal directions and lacking real-time environmental context, leading to frequent errors. To address this, we present a novel system that integrates a Vision Language Model (VLM) with a spatial audio cue. Our system extracts environmental landmarks to anchor navigation instructions and, crucially, provides a directional spatial audio signal when the user faces the wrong direction, indicating the precise turn direction. In a user study (n=12), the spatial audio cue with VLM reduced route deviations compared to both VLM-only and Google Maps (audio-only) baseline systems. Users reported that the spatial audio cue effectively supported orientation and that landmark-anchored instructions provided a better navigation experience over audio-only Google Maps. This work serves as an initial look at the utility of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.