TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation

Jiaxing Liu; Zexi Zhang; Xiaoyan Li; Boyue Wang; Yongli Hu; Baocai Yin

arXiv:2603.02972·cs.CV·April 21, 2026

TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation

Jiaxing Liu, Zexi Zhang, Xiaoyan Li, Boyue Wang, Yongli Hu, Baocai Yin

PDF

1 Repo

TL;DR

TagaVLM introduces a topology-aware framework for vision-language navigation, explicitly integrating spatial structures into large models to improve global action reasoning and navigation performance.

Contribution

It proposes novel topological modules, STAR-Att and navigation prompts, to enhance spatial reasoning in VLMs for embodied navigation tasks.

Findings

01

Achieves state-of-the-art results on R2R benchmark with 51.09% SR.

02

Outperforms prior methods by 3.39% SR and 9.08 SPL in unseen environments.

03

Demonstrates that targeted enhancements on smaller models can surpass brute-force scaling.

Abstract

Vision-Language Navigation (VLN) presents a unique challenge for Large Vision-Language Models (VLMs) due to their inherent architectural mismatch: VLMs are primarily pretrained on static, disembodied vision-language tasks, which fundamentally clash with the dynamic, embodied, and spatially-structured nature of navigation. Existing large-model-based methods often resort to converting rich visual and spatial information into text, forcing models to implicitly infer complex visual-topological relationships or limiting their global action capabilities. To bridge this gap, we propose TagaVLM (Topology-Aware Global Action reasoning), an end-to-end framework that explicitly injects topological structures into the VLM backbone. To introduce topological edge information, Spatial Topology Aware Residual Attention (STAR-Att) directly integrates it into the VLM's self-attention mechanism, enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://apex-bjut.github.io/Taga-VLM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.