CLASH: Collaborative Large-Small Hierarchical Framework for Continuous Vision-and-Language Navigation

Liuyi Wang; Zongtao He; Jinlong Li; Ruihao Xia; Mengxian Hu; Chenpeng Yao; Chengju Liu; Yang Tang; Qijun Chen

arXiv:2512.10360·cs.RO·January 26, 2026

CLASH: Collaborative Large-Small Hierarchical Framework for Continuous Vision-and-Language Navigation

Liuyi Wang, Zongtao He, Jinlong Li, Ruihao Xia, Mengxian Hu, Chenpeng Yao, Chengju Liu, Yang Tang, Qijun Chen

PDF

Open Access

TL;DR

CLASH introduces a hierarchical framework combining large and small models for improved continuous vision-and-language navigation, achieving state-of-the-art results in simulation and real-world environments.

Contribution

The paper proposes a novel collaborative hierarchical framework integrating large and small models with adaptive decision fusion for VLN tasks.

Findings

01

Achieves state-of-the-art performance on VLN-CE leaderboard

02

Demonstrates robustness in real-world navigation scenarios

03

Improves success rate and path efficiency over previous methods

Abstract

Vision-and-Language Navigation (VLN) requires robots to follow natural language instructions and navigate complex environments without prior maps. While recent vision-language large models demonstrate strong reasoning abilities, they often underperform task-specific panoramic small models in VLN tasks. To address this, we propose CLASH (Collaborative Large-Small Hierarchy), a VLN-CE framework that integrates a reactive small-model planner (RSMP) with a reflective large-model reasoner (RLMR). RSMP adopts a causal-learning-based dual-branch architecture to enhance generalization, while RLMR leverages panoramic visual prompting with chain-of-thought reasoning to support interpretable spatial understanding and navigation. We further introduce an uncertainty-aware collaboration mechanism (UCM) that adaptively fuses decisions from both models. For obstacle avoidance, in simulation, we replace…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Robotic Path Planning Algorithms