Diverse Conventions for Human-AI Collaboration
Bidipta Sarkar, Andy Shih, Dorsa Sadigh

TL;DR
This paper introduces a method to generate diverse conventions in multi-agent reinforcement learning, enhancing coordination and generalization in human-AI collaboration, and surpassing human performance in some cases.
Contribution
The work proposes a novel technique combining reward maximization and minimization to produce diverse, semantically different conventions for better human-AI interaction.
Findings
The method produces diverse conventions that improve coordination.
It adapts to human conventions and surpasses human-level performance.
The approach is effective in various collaborative games like Overcooked.
Abstract
Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce \emph{mixed-play}, where an initial state is randomly generated by sampling self-play and cross-play transitions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEthics and Social Impacts of AI
