VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory
Shaoan Wang, Yuanfei Luo, Xingyu Chen, Aocheng Luo, Dongyue Li, Chang Liu, Sheng Chen, Yangang Zhang, Junzhi Yu

TL;DR
VLingNav introduces an adaptive reasoning and persistent memory framework for embodied navigation, significantly improving performance and generalization in complex, long-horizon tasks and enabling zero-shot transfer to real robots.
Contribution
The paper presents VLingNav, a novel embodied navigation model with an adaptive chain-of-thought reasoning mechanism and a visual-assisted linguistic memory, along with a large reasoning-annotated dataset and reinforcement learning training.
Findings
Achieves state-of-the-art results on multiple navigation benchmarks.
Successfully transfers to real-world robotic platforms zero-shot.
Demonstrates improved reasoning and memory capabilities in navigation tasks.
Abstract
VLA models have shown promising potential in embodied navigation by unifying perception and planning while inheriting the strong generalization abilities of large VLMs. However, most existing VLA models rely on reactive mappings directly from observations to actions, lacking the explicit reasoning capabilities and persistent memory required for complex, long-horizon navigation tasks. To address these challenges, we propose VLingNav, a VLA model for embodied navigation grounded in linguistic-driven cognition. First, inspired by the dual-process theory of human cognition, we introduce an adaptive chain-of-thought mechanism, which dynamically triggers explicit reasoning only when necessary, enabling the agent to fluidly switch between fast, intuitive execution and slow, deliberate planning. Second, to handle long-horizon spatial dependencies, we develop a visual-assisted linguistic memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Action Observation and Synchronization · Reinforcement Learning in Robotics
