Conflux-PSRO: Effectively Leveraging Collective Advantages in Policy Space Response Oracles
Yucong Huang, Jiesong Lian, Mingzhi Wang, Chengdong Ma, Ying Wen

TL;DR
Conflux-PSRO introduces a novel approach that adaptively selects and trains policies at the state level to leverage diversity effectively, improving Nash Equilibrium approximation in complex zero-sum games.
Contribution
It proposes a state-level adaptive policy selection and training method that fully exploits population diversity, enhancing performance and reducing exploitability in PSRO algorithms.
Findings
Significantly improves the utility of best responses.
Reduces exploitability compared to existing methods.
Enhances performance across various environments.
Abstract
Policy Space Response Oracle (PSRO) with policy population construction has been demonstrated as an effective method for approximating Nash Equilibrium (NE) in zero-sum games. Existing studies have attempted to improve diversity in policy space, primarily by incorporating diversity regularization into the Best Response (BR). However, these methods cause the BR to deviate from maximizing rewards, easily resulting in a population that favors diversity over performance, even when diversity is not always necessary. Consequently, exploitability is difficult to reduce until policies are fully explored, especially in complex games. In this paper, we propose Conflux-PSRO, which fully exploits the diversity of the population by adaptively selecting and training policies at state-level. Specifically, Conflux-PSRO identifies useful policies from the existing population and employs a routing policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsE-Government and Public Services · Crime, Illicit Activities, and Governance
