Policy Space Diversity for Non-Transitive Games

Jian Yao; Weiming Liu; Haobo Fu; Yaodong Yang; Stephen McAleer; Qiang; Fu; Wei Yang

arXiv:2306.16884·cs.GT·November 9, 2023·1 cites

Policy Space Diversity for Non-Transitive Games

Jian Yao, Weiming Liu, Haobo Fu, Yaodong Yang, Stephen McAleer, Qiang, Fu, Wei Yang

PDF

Open Access 1 Video

TL;DR

This paper introduces a new diversity metric for policy populations in PSRO algorithms, ensuring better approximation to Nash Equilibria in non-transitive games, and demonstrates its effectiveness through empirical results.

Contribution

The paper proposes a novel diversity metric that guarantees improved NE approximation and develops PSD-PSRO, a new PSRO variant with convergence guarantees and superior empirical performance.

Findings

01

PSD-PSRO produces less exploitable policies.

02

The new diversity metric improves NE approximation.

03

Empirical results show enhanced performance across various games.

Abstract

Policy-Space Response Oracles (PSRO) is an influential algorithm framework for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous studies have been trying to promote policy diversity in PSRO. A major weakness in existing diversity metrics is that a more diverse (according to their diversity metrics) population does not necessarily mean (as we proved in the paper) a better approximation to a NE. To alleviate this problem, we propose a new diversity metric, the improvement of which guarantees a better approximation to a NE. Meanwhile, we develop a practical and well-justified method to optimize our diversity metric using only state-action samples. By incorporating our diversity regularization into the best response solving in PSRO, we obtain a new PSRO variant, Policy Space Diversity PSRO (PSD-PSRO). We present the convergence property of PSD-PSRO.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Policy Space Diversity for Non-Transitive Games· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Auction Theory and Applications · Advanced Bandit Algorithms Research