PerlAD: Towards Enhanced Closed-loop End-to-end Autonomous Driving with Pseudo-simulation-based Reinforcement Learning

Yinfeng Gao; Qichao Zhang; Deqing Liu; Zhongpu Xia; Guang Li; Kun Ma; Guang Chen; Hangjun Ye; Long Chen; Da-Wei Ding; Dongbin Zhao

arXiv:2603.14908·cs.RO·March 17, 2026

PerlAD: Towards Enhanced Closed-loop End-to-end Autonomous Driving with Pseudo-simulation-based Reinforcement Learning

Yinfeng Gao, Qichao Zhang, Deqing Liu, Zhongpu Xia, Guang Li, Kun Ma, Guang Chen, Hangjun Ye, Long Chen, Da-Wei Ding, Dongbin Zhao

PDF

Open Access

TL;DR

PerlAD introduces a novel pseudo-simulation-based reinforcement learning approach for end-to-end autonomous driving, enabling efficient, rendering-free training and improved performance in closed-loop scenarios using offline data.

Contribution

The paper proposes PerlAD, a pseudo-simulation framework with a world model and hierarchical planner, to enhance closed-loop autonomous driving training without online interactions.

Findings

01

Achieves 10.29% higher Driving Score on Bench2Drive benchmark.

02

Outperforms previous E2E RL methods in efficiency and safety.

03

Demonstrates robustness in occlusion scenarios on DOS benchmark.

Abstract

End-to-end autonomous driving policies based on Imitation Learning (IL) often struggle in closed-loop execution due to the misalignment between inadequate open-loop training objectives and real driving requirements. While Reinforcement Learning (RL) offers a solution by directly optimizing driving goals via reward signals, the rendering-based training environments introduce the rendering gap and are inefficient due to high computational costs. To overcome these challenges, we present a novel Pseudo-simulation-based RL method for closed-loop end-to-end autonomous driving, PerlAD. Based on offline datasets, PerlAD constructs a pseudo-simulation that operates in vector space, enabling efficient, rendering-free trial-and-error training. To bridge the gap between static datasets and dynamic closed-loop environments, PerlAD introduces a prediction world model that generates reactive agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Robotic Path Planning Algorithms