ETP-R1: Evolving Topological Planning with Reinforcement Fine-tuning for Vision-Language Navigation in Continuous Environments

Shuhao Ye; Sitong Mao; Yuxiang Cui; Xuan Yu; Shichao Zhai; Wen Chen; Shunbo Zhou; Rong Xiong; Yue Wang

arXiv:2512.20940·cs.RO·December 25, 2025

ETP-R1: Evolving Topological Planning with Reinforcement Fine-tuning for Vision-Language Navigation in Continuous Environments

Shuhao Ye, Sitong Mao, Yuxiang Cui, Xuan Yu, Shichao Zhai, Wen Chen, Shunbo Zhou, Rong Xiong, Yue Wang

PDF

Open Access 1 Datasets

TL;DR

ETP-R1 introduces a novel framework that combines large-scale data pretraining and reinforcement fine-tuning to significantly improve graph-based vision-language navigation in continuous environments, achieving state-of-the-art results.

Contribution

The paper presents a new three-stage training paradigm with reinforcement fine-tuning for graph-based VLN-CE models, leveraging large-scale pretraining data and a unified dataset from R2R and RxR tasks.

Findings

01

Achieved new state-of-the-art performance on R2R-CE and RxR-CE benchmarks.

02

Demonstrated the effectiveness of reinforcement fine-tuning with the GRPO algorithm.

03

Built a large-scale, high-quality pretraining dataset for topological navigation.

Abstract

Vision-Language Navigation in Continuous Environments (VLN-CE) requires an embodied agent to navigate towards target in continuous environments, following natural language instructions. While current graph-based methods offer an efficient, structured approach by abstracting the environment into a topological map and simplifying the action space to waypoint selection, they lag behind methods based on Large Vision-Language Models (LVLMs) in leveraging large-scale data and advanced training paradigms. In this paper, we try to bridge this gap by introducing ETP-R1, a framework that applies the paradigm of scaling up data and Reinforcement Fine-Tuning (RFT) to a graph-based VLN-CE model. To build a strong foundation, we first construct a high-quality, large-scale pretraining dataset using the Gemini API. This dataset consists of diverse, low-hallucination instructions for topological…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

cepillarskeira/ETP-R1-extra-files
dataset· 60 dl
60 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Robotics and Sensor-Based Localization