LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation
Yuwei Ning, Ganlong Zhao, Yipeng Qin, Si Liu, Yang Liu, Liang Lin, Guanbin Li

TL;DR
LookasideVLN introduces a novel aerial navigation approach that leverages directional cues in natural language to improve spatial reasoning and computational efficiency in UAV navigation tasks.
Contribution
It proposes a new paradigm with three core components that utilize directional cues for better navigation accuracy and efficiency in complex urban environments.
Findings
Outperforms state-of-the-art CityNavAgent in navigation tasks.
Uses directional cues to enhance spatial reasoning.
Achieves significant improvements with a single-level lookahead.
Abstract
Aerial Vision-and-Language Navigation (Aerial VLN) enables unmanned aerial vehicles (UAVs) to follow natural language instructions and navigate complex urban environments. While recent advances have achieved progress through large-scale memory graphs and lookahead path planning, they remain limited by shallow instruction understanding and high computational cost. In particular, existing methods rely primarily on landmark descriptions, overlooking directional cues "a key source of spatial context in human navigation". In this work, we propose LookasideVLN, a new paradigm that exploits directional cues in natural language to achieve both more accurate spatial reasoning and greater computational efficiency. LookasideVLN comprises three core components: (1) an Egocentric Lookaside Graph (ELG) that dynamically encodes instruction-relevant landmarks and their directional relationships, (2) a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
