EvolveNav: Empowering LLM-Based Vision-Language Navigation via Self-Improving Embodied Reasoning

Bingqian Lin; Yunshuang Nie; Khun Loun Zai; Ziming Wei; Mingfei Han; Rongtao Xu; Minzhe Niu; Jianhua Han; Hanwang Zhang; Liang Lin; Bokui Chen; Cewu Lu; Xiaodan Liang

arXiv:2506.01551·cs.CV·October 15, 2025

EvolveNav: Empowering LLM-Based Vision-Language Navigation via Self-Improving Embodied Reasoning

Bingqian Lin, Yunshuang Nie, Khun Loun Zai, Ziming Wei, Mingfei Han, Rongtao Xu, Minzhe Niu, Jianhua Han, Hanwang Zhang, Liang Lin, Bokui Chen, Cewu Lu, Xiaodan Liang

PDF

Open Access

TL;DR

EvolveNav introduces a self-improving embodied reasoning framework that enhances LLM-based vision-language navigation by combining formalized CoT fine-tuning with iterative self-refinement, leading to improved accuracy and interpretability.

Contribution

The paper proposes a novel two-stage training paradigm for LLM-based VLN that integrates formalized CoT supervision and self-reflective post-training for better reasoning and generalization.

Findings

01

EvolveNav outperforms previous methods on R2R, REVERIE, CVDN, and SOON benchmarks.

02

The approach improves reasoning speed and decision accuracy.

03

Self-reflective training enhances reasoning diversity and robustness.

Abstract

Recent studies have revealed the potential of training open-source Large Language Models (LLMs) to unleash LLMs' reasoning ability for enhancing vision-language navigation (VLN) performance, and simultaneously mitigate the domain gap between LLMs' training corpus and the VLN task. However, these approaches predominantly adopt straightforward input-output mapping paradigms, causing the mapping learning difficult and the navigational decisions unexplainable. Chain-of-Thought (CoT) training is a promising way to improve both navigational decision accuracy and interpretability, while the complexity of the navigation task makes the perfect CoT labels unavailable and may lead to overfitting through pure CoT supervised fine-tuning. To address these issues, we propose EvolveNav, a novel sElf-improving embodied reasoning paradigm that realizes adaptable and generalizable navigational reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Speech and dialogue systems