ETP-R1: Evolving Topological Planning with Reinforcement Fine-tuning for Vision-Language Navigation in Continuous Environments
Shuhao Ye, Sitong Mao, Yuxiang Cui, Xuan Yu, Shichao Zhai, Wen Chen, Shunbo Zhou, Rong Xiong, Yue Wang

TL;DR
ETP-R1 introduces a novel framework that combines large-scale data pretraining and reinforcement fine-tuning to significantly improve graph-based vision-language navigation in continuous environments, achieving state-of-the-art results.
Contribution
The paper presents a new three-stage training paradigm with reinforcement fine-tuning for graph-based VLN-CE models, leveraging large-scale pretraining data and a unified dataset from R2R and RxR tasks.
Findings
Achieved new state-of-the-art performance on R2R-CE and RxR-CE benchmarks.
Demonstrated the effectiveness of reinforcement fine-tuning with the GRPO algorithm.
Built a large-scale, high-quality pretraining dataset for topological navigation.
Abstract
Vision-Language Navigation in Continuous Environments (VLN-CE) requires an embodied agent to navigate towards target in continuous environments, following natural language instructions. While current graph-based methods offer an efficient, structured approach by abstracting the environment into a topological map and simplifying the action space to waypoint selection, they lag behind methods based on Large Vision-Language Models (LVLMs) in leveraging large-scale data and advanced training paradigms. In this paper, we try to bridge this gap by introducing ETP-R1, a framework that applies the paradigm of scaling up data and Reinforcement Fine-Tuning (RFT) to a graph-based VLN-CE model. To build a strong foundation, we first construct a high-quality, large-scale pretraining dataset using the Gemini API. This dataset consists of diverse, low-hallucination instructions for topological…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Robotics and Sensor-Based Localization
