Beyond Imitation: Reinforcement Learning Fine-Tuning for Adaptive Diffusion Navigation Policies
Junhe Sheng, Ruofei Bai, Kuan Xu, Ruimeng Liu, Jie Chen, Shenghai Yuan, Wei-Yun Yau, Lihua Xie

TL;DR
This paper introduces a reinforcement learning fine-tuning framework for diffusion-based robot navigation policies, improving their adaptability and safety in unseen environments without extensive retraining.
Contribution
It proposes a novel RL fine-tuning method using Group Relative Policy Optimization that enhances diffusion navigation policies while preserving pretrained features.
Findings
Success Rate improved from 52.0% to 58.7%.
SPL increased from 0.49 to 0.54.
Reduced collision frequency in unseen environments.
Abstract
Diffusion-based robot navigation policies trained on large-scale imitation learning datasets, can generate multi-modal trajectories directly from the robot's visual observations, bypassing the traditional localization-mapping-planning pipeline and achieving strong zero-shot generalization. However, their performance remains constrained by the coverage of offline datasets, and when deployed in unseen settings, distribution shift often leads to accumulated trajectory errors and safety-critical failures. Adapting diffusion policies with reinforcement learning is challenging because their iterative denoising structure hinders effective gradient backpropagation, while also making the training of an additional value network computationally expensive and less stable. To address these issues, we propose a reinforcement learning fine-tuning framework tailored for diffusion-based navigation. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Robotic Path Planning Algorithms
