ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

Huimin Wang; Yue Wang; Bihao Cui; Pengxiang Li; Ben Lu; Mingqian Wang; Tong Wang; Chuan Tang; Teng Zhang; Kun Zhan

arXiv:2605.04647·cs.RO·May 13, 2026

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

Huimin Wang, Yue Wang, Bihao Cui, Pengxiang Li, Ben Lu, Mingqian Wang, Tong Wang, Chuan Tang, Teng Zhang, Kun Zhan

PDF

TL;DR

ReflectDrive-2 introduces a discrete diffusion planning method with self-editing capabilities for autonomous driving, improving trajectory planning efficiency and accuracy through reinforcement learning and innovative decoding techniques.

Contribution

The paper presents a novel discrete diffusion planner with in-place trajectory editing and reinforcement learning training, enhancing autonomous driving decision-making.

Findings

01

Achieves 91.0 PDMS with camera-only input on NAVSIM.

02

Improves PDMS by 1.9 points with RL-based fine-tuning.

03

Runs at 31.8 ms latency on NVIDIA Thor.

Abstract

We introduce ReflectDrive-2, a masked discrete diffusion planner with separate action expert for autonomous driving that represents plans as discrete trajectory tokens and generates them through parallel masked decoding. This discrete token space enables in-place trajectory revision: AutoEdit rewrites selected tokens using the same model, without requiring an auxiliary refinement network. To train this capability, we use a two-stage procedure. First, we construct structure-aware perturbations of expert trajectories along longitudinal progress and lateral heading directions and supervise the model to recover the original expert trajectory. We then fine-tune the full decision--draft--reflect rollout with reinforcement learning (RL), assigning terminal driving reward to the final post-edit trajectory and propagating policy-gradient credit through full-rollout transitions. Full-rollout RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.