Building reliable sim driving agents by scaling self-play
Daphne Cornelisse, Aarav Pandya, Kevin Joseph, Joseph Su\'arez, Eugene Vinitsky

TL;DR
This paper presents a scalable self-play training approach for simulation agents in autonomous vehicle testing, achieving high reliability and generalization with minimal training time, and open-sourcing the resulting agents.
Contribution
We introduce a scalable self-play method trained on thousands of scenarios for reliable autonomous vehicle simulation agents, demonstrating high performance and quick adaptability.
Findings
Achieved 99.8% goal completion rate with less than 0.8% collisions on 10,000 scenarios.
Trained agents solve nearly the full dataset within a day on a single GPU.
Agents show partial robustness to out-of-distribution scenes and can be fine-tuned rapidly.
Abstract
Simulation agents are essential for designing and testing systems that interact with humans, such as autonomous vehicles (AVs). These agents serve various purposes, from benchmarking AV performance to stress-testing system limits, but all applications share one key requirement: reliability. To enable sound experimentation, a simulation agent must behave as intended. It should minimize actions that may lead to undesired outcomes, such as collisions, which can distort the signal-to-noise ratio in analyses. As a foundation for reliable sim agents, we propose scaling self-play to thousands of scenarios on the Waymo Open Motion Dataset under semi-realistic limits on human perception and control. Training from scratch on a single GPU, our agents solve almost the full training set within a day. They generalize to unseen test scenes, achieving a 99.8% goal completion rate with less than 0.8%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvacuation and Crowd Dynamics
MethodsSparse Evolutionary Training
