ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction

Chaojun Ni; Guosheng Zhao; Xiaofeng Wang; Zheng Zhu; Wenkang Qin; Xinze Chen; Guanghong Jia; Guan Huang; Wenjun Mei

arXiv:2508.08170·cs.CV·August 22, 2025

ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction

Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Xinze Chen, Guanghong Jia, Guan Huang, Wenjun Mei

PDF

Open Access 3 Reviews

TL;DR

ReconDreamer-RL integrates diffusion-based scene reconstruction with reinforcement learning to create realistic, diverse driving scenarios, significantly reducing collisions in autonomous driving models.

Contribution

The paper introduces ReconDreamer-RL, combining diffusion priors and physical models for improved scene reconstruction and corner-case scenario generation in autonomous driving training.

Findings

01

5x reduction in collision ratio compared to imitation learning

02

Enhanced simulation-to-reality transfer in reinforcement learning

03

Generation of diverse corner-case traffic scenarios

Abstract

Reinforcement learning for training end-to-end autonomous driving models in closed-loop simulations is gaining growing attention. However, most simulation environments differ significantly from real-world conditions, creating a substantial simulation-to-reality (sim2real) gap. To bridge this gap, some approaches utilize scene reconstruction techniques to create photorealistic environments as a simulator. While this improves realistic sensor simulation, these methods are inherently constrained by the distribution of the training data, making it difficult to render high-quality sensor data for novel trajectories or corner case scenarios. Therefore, we propose ReconDreamer-RL, a framework designed to integrate video diffusion priors into scene reconstruction to aid reinforcement learning, thereby enhancing end-to-end autonomous driving training. Specifically, in ReconDreamer-RL, we…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 5

Strengths

- **Significance and Scope.** The paper addresses an important and timely problem in autonomous driving—closed-loop, end-to-end learning—by coupling photorealistic simulation with reinforcement learning in a way that directly targets deployment-relevant failures. - **Comprehensive Qualitative Evaluation.** The study presents rich qualitative evidence across multiple benchmarks and driving scenarios, helping to substantiate generality and offering interpretable insights through scenario visualiza

Weaknesses

Despite the strengths mentioned above, I have some significant concerns on the technical contribution claimed by the paper, listed in this and the question section below: - **Unclear presentation of the ReconSimulator**. The core technical contribution of this work comes from this ReconSimulator, however, the details of this simulator is unclear. For instance, How is the diffusion prior incorporated? From section 3.3, it seems that the ReconSimulator use a training checkpoint rendered by 3DGS a

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper introduces a novel method of integrating video diffusion priors directly into the reinforcement learning loop, creating a dynamic simulator that effectively addresses the sim2real gap. 2. It proposes a well-structured solution with dedicated modules (DAA and CTG) to systematically generate adversarial corner cases and mitigate common training data biases. 3. The framework demonstrates a significant and quantifiable performance improvement, achieving a 5x reduction in collision rat

Weaknesses

1. Lack of originality. Most techniques proposed in the paper have already been well studied in other works. Using video models to boost novel-trajcectory reconstruction performance is studied in many works such as Drivedreamer4d[1]; Decoupled static and dynamic scene representation is proposed in MTGS[2], OmniRe[3]; Adversary agent interaction is studied in a line of research featuring safty-critical interaction such as DiffScene[4]. 2. Missing details in the dynamic adversary agent section. H

Reviewer 03Rating 6Confidence 3

Strengths

1. The paper shows strong empirical gains on collision metrics across standard and corner‑case evaluations; large gap over both IL baselines and RAD. 2. RL‑friendly simulator: 3DGS‑based with diffusion priors while retaining real‑time speed (Table 2). This addresses a common bottleneck in closed‑loop training.

Weaknesses

1. The policy is evaluated in the same simulator family that was fine‑tuned using diffusion priors, and many corner cases are generated by the paper’s own procedures (DAA, CTG). The real sim‑to‑real implications remain untested; no closed‑loop evaluation outside the authors’ reconstructions (e.g., Bench2Drive/nuPlan/CARLA‑v2 evaluations or any real‑vehicle trials). 2. Metrics are limited for driving quality. Collisions are critical, but comfort (jerk/accel), traffic‑rule compliance, and route co

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Traffic control and management