AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation

Wenxuan Guo; Xiuwei Xu; Yichen Liu; Xiangyu Li; Hang Yin; Huangxing Chen; Wenzhao Zheng; Jianjiang Feng; Jie Zhou; and Jiwen Lu

arXiv:2605.22816·cs.RO·May 22, 2026

AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation

Wenxuan Guo, Xiuwei Xu, Yichen Liu, Xiangyu Li, Hang Yin, Huangxing Chen, Wenzhao Zheng, Jianjiang Feng, Jie Zhou, and Jiwen Lu

PDF

1 Repo

TL;DR

AwareVLN introduces a self-aware reasoning framework for vision-language navigation, enhancing spatial understanding and task progress comprehension without relying on explicit scene maps or additional sensors.

Contribution

It proposes a novel self-aware reasoning mechanism with a structural reasoning module and an automatic data engine, advancing end-to-end vision-language navigation.

Findings

01

Outperforms previous state-of-the-art methods on Habitat datasets.

02

Demonstrates significant improvements in navigation accuracy.

03

Validates effectiveness of self-awareness in complex environments.

Abstract

Vision-and-Language Navigation (VLN) requires an agent to ground language instructions to its own movement within a visual environment. While state-of-the-art methods leverage the reasoning capabilities of Vision-Language Models (VLMs) for end-to-end action prediction, they often lack an explicit and explainable understanding of the relationships between the agent, the instruction, and the scene. Conversely, explicitly building a scene map for heuristic planning is intuitively appealing but relies on additional 3D sensors and hinders large-scale vision-language pre-training. To bridge this gap, we propose AwareVLN, a novel framework that equips the navigation model with a self-aware reasoning mechanism, enabling it to understand the agent's state and task progress in a fully end-to-end and data-driven manner. Our approach features two key innovations: (1) a structural reasoning module…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gwxuan.github.io/AwareVLN
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.