Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games
Siqi Liu, Marc Lanctot, Luke Marris, Nicolas Heess

TL;DR
This paper introduces simplex-NeuPL, a neural approach that learns a diverse set of strategies and best-responses in symmetric zero-sum games, achieving near-optimal performance against any mixture of strategies.
Contribution
The paper presents a novel neural framework that simultaneously learns a population of diverse policies and best-responses to any mixture, enabling Bayes-optimality and strategic exploration.
Findings
Policies behave Bayes-optimally under uncertainty.
Effective learning of best-responses to any mixture policies.
Auxiliary task improves strategic exploration and population performance.
Abstract
Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games. In this paper, we propose simplex-NeuPL that satisfies two desiderata simultaneously: i) learning a population of strategically diverse basis policies, represented by a single conditional network; ii) using the same network, learn best-responses to any mixture over the simplex of basis policies. We show that the resulting conditional policies incorporate prior information about their opponents effectively, enabling near optimal returns against arbitrary mixture policies in a game with tractable best-responses. We verify that such policies behave Bayes-optimally under uncertainty and offer insights in using this flexibility at test time. Finally, we offer evidence that learning best-responses to any mixture policies is an effective auxiliary task for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Game Theory and Applications · Advanced Bandit Algorithms Research
