PlannerRFT: Reinforcing Diffusion Planners through Closed-Loop and Sample-Efficient Fine-Tuning

Hongchen Li; Tianyu Li; Jiazhi Yang; Haochen Tian; Caojun Wang; Lei Shi; Mingyang Shang; Zengrong Lin; Gaoqiang Wu; Zhihui Hao; Xianpeng Lang; Jia Hu; Hongyang Li

arXiv:2601.12901·cs.RO·January 21, 2026

PlannerRFT: Reinforcing Diffusion Planners through Closed-Loop and Sample-Efficient Fine-Tuning

Hongchen Li, Tianyu Li, Jiazhi Yang, Haochen Tian, Caojun Wang, Lei Shi, Mingyang Shang, Zengrong Lin, Gaoqiang Wu, Zhihui Hao, Xianpeng Lang, Jia Hu, Hongyang Li

PDF

Open Access

TL;DR

PlannerRFT introduces a sample-efficient reinforcement fine-tuning framework for diffusion-based autonomous driving planners, improving robustness and adaptability without changing the inference pipeline, supported by a fast simulator.

Contribution

It proposes PlannerRFT with dual-branch optimization and introduces nuMax, a faster simulator, enabling scalable, adaptive, and robust diffusion planner fine-tuning.

Findings

01

Achieves state-of-the-art performance in autonomous driving planning.

02

Develops nuMax, a simulator 10 times faster than nuPlan.

03

Demonstrates emergence of distinct behaviors during learning.

Abstract

Diffusion-based planners have emerged as a promising approach for human-like trajectory generation in autonomous driving. Recent works incorporate reinforcement fine-tuning to enhance the robustness of diffusion planners through reward-oriented optimization in a generation-evaluation loop. However, they struggle to generate multi-modal, scenario-adaptive trajectories, hindering the exploitation efficiency of informative rewards during fine-tuning. To resolve this, we propose PlannerRFT, a sample-efficient reinforcement fine-tuning framework for diffusion-based planners. PlannerRFT adopts a dual-branch optimization that simultaneously refines the trajectory distribution and adaptively guides the denoising process toward more promising exploration, without altering the original inference pipeline. To support parallel learning at scale, we develop nuMax, an optimized simulator that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Robotic Path Planning Algorithms · Reinforcement Learning in Robotics