CLASH: Collaborative Large-Small Hierarchical Framework for Continuous Vision-and-Language Navigation
Liuyi Wang, Zongtao He, Jinlong Li, Ruihao Xia, Mengxian Hu, Chenpeng Yao, Chengju Liu, Yang Tang, Qijun Chen

TL;DR
CLASH introduces a hierarchical framework combining large and small models for improved continuous vision-and-language navigation, achieving state-of-the-art results in simulation and real-world environments.
Contribution
The paper proposes a novel collaborative hierarchical framework integrating large and small models with adaptive decision fusion for VLN tasks.
Findings
Achieves state-of-the-art performance on VLN-CE leaderboard
Demonstrates robustness in real-world navigation scenarios
Improves success rate and path efficiency over previous methods
Abstract
Vision-and-Language Navigation (VLN) requires robots to follow natural language instructions and navigate complex environments without prior maps. While recent vision-language large models demonstrate strong reasoning abilities, they often underperform task-specific panoramic small models in VLN tasks. To address this, we propose CLASH (Collaborative Large-Small Hierarchy), a VLN-CE framework that integrates a reactive small-model planner (RSMP) with a reflective large-model reasoner (RLMR). RSMP adopts a causal-learning-based dual-branch architecture to enhance generalization, while RLMR leverages panoramic visual prompting with chain-of-thought reasoning to support interpretable spatial understanding and navigation. We further introduce an uncertainty-aware collaboration mechanism (UCM) that adaptively fuses decisions from both models. For obstacle avoidance, in simulation, we replace…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Robotic Path Planning Algorithms
