Human-compatible driving partners through data-regularized self-play reinforcement learning
Daphne Cornelisse, Eugene Vinitsky

TL;DR
This paper introduces HR-PPO, a reinforcement learning method that trains autonomous driving agents to be human-like and effective in multi-agent traffic scenarios using minimal human data, improving coordination and safety.
Contribution
The paper presents HR-PPO, a novel RL algorithm that incorporates human data regularization, enabling realistic and effective multi-agent driving policies with limited human demonstrations.
Findings
Achieves 93% success rate in multi-agent traffic scenarios.
Maintains low collision (3%) and off-road (3.5%) rates.
Produces human-like driving behavior aligned with real logs.
Abstract
A central challenge for autonomous vehicles is coordinating with humans. Therefore, incorporating realistic human agents is essential for scalable training and evaluation of autonomous driving systems in simulation. Simulation agents are typically developed by imitating large-scale, high-quality datasets of human driving. However, pure imitation learning agents empirically have high collision rates when executed in a multi-agent closed-loop setting. To build agents that are realistic and effective in closed-loop settings, we propose Human-Regularized PPO (HR-PPO), a multi-agent algorithm where agents are trained through self-play with a small penalty for deviating from a human reference policy. In contrast to prior work, our approach is RL-first and only uses 30 minutes of imperfect human demonstrations. We evaluate agents in a large set of multi-agent traffic scenes. Results show our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVirtual Reality Applications and Impacts · Digital Mental Health Interventions · Traffic control and management
MethodsSparse Evolutionary Training · Entropy Regularization · Proximal Policy Optimization
