Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

Zhexi Lian; Haoran Wang; Xuerun Yan; Weimeng Lin; Xianhong Zhang; Yongyu Chen; Jia Hu

arXiv:2603.13842·cs.RO·April 13, 2026

Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

Zhexi Lian, Haoran Wang, Xuerun Yan, Weimeng Lin, Xianhong Zhang, Yongyu Chen, Jia Hu

PDF

TL;DR

PaIR-Drive introduces a parallel framework combining imitation and reinforcement learning for autonomous driving, enabling collaborative optimization and surpassing previous performance limits.

Contribution

The paper proposes a novel parallel training framework that separates IL and RL, allowing fully collaborative optimization without retraining and improving exploration in autonomous driving.

Findings

01

Achieves 91.2 PDMS and 87.9 EPDMS on NAVSIM benchmarks.

02

Outperforms existing RL fine-tuning methods.

03

Can correct suboptimal human behaviors.

Abstract

End-to-end autonomous driving is typically built upon imitation learning (IL), yet its performance is constrained by the quality of human demonstrations. To overcome this limitation, recent methods incorporate reinforcement learning (RL) through sequential fine-tuning. However, such a paradigm remains suboptimal: sequential RL fine-tuning can introduce policy drift and often leads to a performance ceiling due to its dependence on the pretrained IL policy. To address these issues, we propose PaIR-Drive, a general Parallel framework for collaborative Imitation and Reinforcement learning in end-to-end autonomous driving. During training, PaIR-Drive separates IL and RL into two parallel branches with conflict-free training objectives, enabling fully collaborative optimization. This design eliminates the need to retrain RL when applying a new IL policy. During inference, RL leverages the IL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.