Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation

Meng Wei; Chenyang Wan; Jiaqi Peng; Xiqian Yu; Yuqiang Yang; Delin Feng; Wenzhe Cai; Chenming Zhu; Tai Wang; Jiangmiao Pang; Xihui Liu

arXiv:2512.08186·cs.RO·December 10, 2025

Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation

Meng Wei, Chenyang Wan, Jiaqi Peng, Xiqian Yu, Yuqiang Yang, Delin Feng, Wenzhe Cai, Chenming Zhu, Tai Wang, Jiangmiao Pang, Xihui Liu

PDF

Open Access 4 Models

TL;DR

This paper introduces DualVLN, a dual-system foundation model for vision-and-language navigation that combines high-level reasoning with low-level control, improving robustness, real-time performance, and adaptability in complex environments.

Contribution

The paper presents the first dual-system VLN foundation model that integrates a global planner with a local policy, enhancing generalization and real-time navigation in dynamic settings.

Findings

01

Outperforms prior VLN methods on all benchmarks.

02

Demonstrates robust long-horizon planning in real-world tests.

03

Achieves real-time, adaptive navigation in dynamic environments.

Abstract

While recent large vision-language models (VLMs) have improved generalization in vision-language navigation (VLN), existing methods typically rely on end-to-end pipelines that map vision-language inputs directly to short-horizon discrete actions. Such designs often produce fragmented motions, incur high latency, and struggle with real-world challenges like dynamic obstacle avoidance. We propose DualVLN, the first dual-system VLN foundation model that synergistically integrates high-level reasoning with low-level action execution. System 2, a VLM-based global planner, "grounds slowly" by predicting mid-term waypoint goals via image-grounded reasoning. System 1, a lightweight, multi-modal conditioning Diffusion Transformer policy, "moves fast" by leveraging both explicit pixel goals and latent features from System 2 to generate smooth and accurate trajectories. The dual-system design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Reinforcement Learning in Robotics