Policy Space Diversity for Non-Transitive Games
Jian Yao, Weiming Liu, Haobo Fu, Yaodong Yang, Stephen McAleer, Qiang, Fu, Wei Yang

TL;DR
This paper introduces a new diversity metric for policy populations in PSRO algorithms, ensuring better approximation to Nash Equilibria in non-transitive games, and demonstrates its effectiveness through empirical results.
Contribution
The paper proposes a novel diversity metric that guarantees improved NE approximation and develops PSD-PSRO, a new PSRO variant with convergence guarantees and superior empirical performance.
Findings
PSD-PSRO produces less exploitable policies.
The new diversity metric improves NE approximation.
Empirical results show enhanced performance across various games.
Abstract
Policy-Space Response Oracles (PSRO) is an influential algorithm framework for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous studies have been trying to promote policy diversity in PSRO. A major weakness in existing diversity metrics is that a more diverse (according to their diversity metrics) population does not necessarily mean (as we proved in the paper) a better approximation to a NE. To alleviate this problem, we propose a new diversity metric, the improvement of which guarantees a better approximation to a NE. Meanwhile, we develop a practical and well-justified method to optimize our diversity metric using only state-action samples. By incorporating our diversity regularization into the best response solving in PSRO, we obtain a new PSRO variant, Policy Space Diversity PSRO (PSD-PSRO). We present the convergence property of PSD-PSRO.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Auction Theory and Applications · Advanced Bandit Algorithms Research
