No-Press Diplomacy from Scratch
Anton Bakhtin, David Wu, Adam Lerer, Noam Brown

TL;DR
This paper introduces a novel algorithm for exploring actions and approximating equilibrium in complex games with enormous action spaces, enabling training of a Diplomacy-playing agent from scratch that surpasses human performance.
Contribution
The paper presents a new algorithm for large action space games, demonstrating the first from-scratch superhuman Diplomacy agent and evidence of multiple equilibria.
Findings
The agent, DORA, achieves superhuman performance in Diplomacy.
The method extends to full-scale no-press Diplomacy without human data.
Multiple equilibria are identified in Diplomacy, challenging previous assumptions.
Abstract
Prior AI successes in complex games have largely focused on settings with at most hundreds of actions at each decision point. In contrast, Diplomacy is a game with more than 10^20 possible actions per turn. Previous attempts to address games with large branching factors, such as Diplomacy, StarCraft, and Dota, used human data to bootstrap the policy or used handcrafted reward shaping. In this paper, we describe an algorithm for action exploration and equilibrium approximation in games with combinatorial action spaces. This algorithm simultaneously performs value iteration while learning a policy proposal network. A double oracle step is used to explore additional actions to add to the policy proposals. At each state, the target state value and policy for the model training are computed via an equilibrium search procedure. Using this algorithm, we train an agent, DORA, completely from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Sports Analytics and Performance
