Rethinking Closed-loop Training for Autonomous Driving
Chris Zhang, Runsheng Guo, Wenyuan Zeng, Yuwen Xiong, Binbin Dai, Rui, Hu, Mengye Ren, Raquel Urtasun

TL;DR
This paper investigates the design of training benchmarks for closed-loop autonomous driving, identifies limitations of existing RL algorithms, and introduces TRAVL, a new RL-based agent that improves learning speed and safety through planning and imagined data.
Contribution
It provides the first empirical analysis of benchmark design effects and proposes TRAVL, a novel RL agent with planning and imagined data for efficient autonomous driving training.
Findings
Benchmark design significantly impacts learning success.
Many RL algorithms struggle with long-term planning in driving.
TRAVL learns faster and safer than baseline methods.
Abstract
Recent advances in high-fidelity simulators have enabled closed-loop training of autonomous driving agents, potentially solving the distribution shift in training v.s. deployment and allowing training to be scaled both safely and cheaply. However, there is a lack of understanding of how to build effective training benchmarks for closed-loop training. In this work, we present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents, such as how to design traffic scenarios and scale training environments. Furthermore, we show that many popular RL algorithms cannot achieve satisfactory performance in the context of autonomous driving, as they lack long-term planning and take an extremely long time to train. To address these issues, we propose trajectory value learning (TRAVL), an RL-based driving agent that performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
