AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models

Tianyi Yan; Tao Tang; Xingtai Gui; Yongkang Li; Jiasen Zhesng; Weiyao Huang; Lingdong Kong; Wencheng Han; Xia Zhou; Xueyang Zhang; Yifei Zhan; Kun Zhan; Cheng-zhong Xu; Jianbing Shen

arXiv:2511.20325·cs.CV·March 12, 2026

AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models

Tianyi Yan, Tao Tang, Xingtai Gui, Yongkang Li, Jiasen Zhesng, Weiyao Huang, Lingdong Kong, Wencheng Han, Xia Zhou, Xueyang Zhang, Yifei Zhan, Kun Zhan, Cheng-zhong Xu, Jianbing Shen

PDF

Open Access

TL;DR

This paper introduces AD-R1, a reinforcement learning framework for autonomous driving that uses an impartial world model trained to accurately predict dangerous scenarios, significantly improving safety and failure prediction.

Contribution

It presents a novel Impartial World Model trained with Counterfactual Synthesis to faithfully forecast dangers, integrated into a closed-loop RL system for safer autonomous driving.

Findings

01

Outperforms baselines in failure prediction on new benchmark

02

Reduces safety violations in challenging simulations

03

Enables the agent to 'dream' of dangerous outcomes for better safety

Abstract

End-to-end models for autonomous driving hold the promise of learning complex behaviors directly from sensor data, but face critical challenges in safety and handling long-tail events. Reinforcement Learning (RL) offers a promising path to overcome these limitations, yet its success in autonomous driving has been elusive. We identify a fundamental flaw hindering this progress: a deep seated optimistic bias in the world models used for RL. To address this, we introduce a framework for post-training policy refinement built around an Impartial World Model. Our primary contribution is to teach this model to be honest about danger. We achieve this with a novel data synthesis pipeline, Counterfactual Synthesis, which systematically generates a rich curriculum of plausible collisions and off-road events. This transforms the model from a passive scene completer into a veridical forecaster that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics