Trajectory-Diversity-Driven Robust Vision-and-Language Navigation

Jiangyang Li; Cong Wan; SongLin Dong; Chenhao Ding; Qiang Wang; Zhiheng Ma; Yihong Gong

arXiv:2603.15370·cs.CV·March 17, 2026

Trajectory-Diversity-Driven Robust Vision-and-Language Navigation

Jiangyang Li, Cong Wan, SongLin Dong, Chenhao Ding, Qiang Wang, Zhiheng Ma, Yihong Gong

PDF

Open Access

TL;DR

This paper introduces NavGRPO, a reinforcement learning framework for vision-and-language navigation that enhances robustness and generalization by exploring diverse trajectories and optimizing group performance, outperforming existing methods.

Contribution

NavGRPO is a novel RL approach that improves robustness in VLN by leveraging trajectory diversity and group relative policy optimization without extra value networks.

Findings

01

Achieves +3.0% and +1.71% SPL improvements on R2R and REVERIE benchmarks.

02

Demonstrates +14.89% SPL gain under extreme perturbations.

03

Builds more robust navigation policies through goal-directed RL training.

Abstract

Vision-and-Language Navigation (VLN) requires agents to navigate photo-realistic environments following natural language instructions. Current methods predominantly rely on imitation learning, which suffers from limited generalization and poor robustness to execution perturbations. We present NavGRPO, a reinforcement learning framework that learns goal-directed navigation policies through Group Relative Policy Optimization. By exploring diverse trajectories and optimizing via within-group performance comparisons, our method enables agents to distinguish effective strategies beyond expert paths without requiring additional value networks. Built on ScaleVLN, NavGRPO achieves superior robustness on R2R and REVERIE benchmarks with +3.0% and +1.71% SPL improvements in unseen environments. Under extreme early-stage perturbations, we demonstrate +14.89% SPL gain over the baseline, confirming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Advanced Neural Network Applications