From Imitation to Optimization: A Comparative Study of Offline Learning for Autonomous Driving
Antonio Guillen-Perez

TL;DR
This paper compares imitation learning and offline reinforcement learning for autonomous driving, demonstrating that offline RL significantly improves policy robustness and safety over behavioral cloning in large-scale real-world datasets.
Contribution
It introduces a comprehensive pipeline for offline learning in autonomous driving, showing that offline RL with CQL outperforms behavioral cloning in robustness and safety on real-world data.
Findings
CQL achieves 3.2x higher success rate than BC
CQL reduces collision rate by 7.4x
Offline RL enhances robustness in long-horizon driving scenarios
Abstract
Learning robust driving policies from large-scale, real-world datasets is a central challenge in autonomous driving, as online data collection is often unsafe and impractical. While Behavioral Cloning (BC) offers a straightforward approach to imitation learning, policies trained with BC are notoriously brittle and suffer from compounding errors in closed-loop execution. This work presents a comprehensive pipeline and a comparative study to address this limitation. We first develop a series of increasingly sophisticated BC baselines, culminating in a Transformer-based model that operates on a structured, entity-centric state representation. While this model achieves low imitation loss, we show that it still fails in long-horizon simulations. We then demonstrate that by applying a state-of-the-art Offline Reinforcement Learning algorithm, Conservative Q-Learning (CQL), to the same data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
