Human-compatible driving partners through data-regularized self-play   reinforcement learning

Daphne Cornelisse; Eugene Vinitsky

arXiv:2403.19648·cs.RO·June 25, 2024·2 cites

Human-compatible driving partners through data-regularized self-play reinforcement learning

Daphne Cornelisse, Eugene Vinitsky

PDF

Open Access 1 Repo

TL;DR

This paper introduces HR-PPO, a reinforcement learning method that trains autonomous driving agents to be human-like and effective in multi-agent traffic scenarios using minimal human data, improving coordination and safety.

Contribution

The paper presents HR-PPO, a novel RL algorithm that incorporates human data regularization, enabling realistic and effective multi-agent driving policies with limited human demonstrations.

Findings

01

Achieves 93% success rate in multi-agent traffic scenarios.

02

Maintains low collision (3%) and off-road (3.5%) rates.

03

Produces human-like driving behavior aligned with real logs.

Abstract

A central challenge for autonomous vehicles is coordinating with humans. Therefore, incorporating realistic human agents is essential for scalable training and evaluation of autonomous driving systems in simulation. Simulation agents are typically developed by imitating large-scale, high-quality datasets of human driving. However, pure imitation learning agents empirically have high collision rates when executed in a multi-agent closed-loop setting. To build agents that are realistic and effective in closed-loop settings, we propose Human-Regularized PPO (HR-PPO), a multi-agent algorithm where agents are trained through self-play with a small penalty for deviating from a human reference policy. In contrast to prior work, our approach is RL-first and only uses 30 minutes of imperfect human demonstrations. We evaluate agents in a large set of multi-agent traffic scenes. Results show our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

emerge-lab/nocturne_lab
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVirtual Reality Applications and Impacts · Digital Mental Health Interventions · Traffic control and management

MethodsSparse Evolutionary Training · Entropy Regularization · Proximal Policy Optimization