Computing Ex Ante Equilibrium in Heterogeneous Zero-Sum Team Games
Naming Liu, Mingzhi Wang, Xihuai Wang, Weinan Zhang, Yaodong Yang,, Youzhi Zhang, Bo An, Ying Wen

TL;DR
This paper introduces H-PSRO, a novel framework for computing ex ante equilibrium in heterogeneous two-team zero-sum games, overcoming limitations of existing methods by improving policy expressiveness and convergence.
Contribution
The paper proposes H-PSRO, the first PSRO framework tailored for heterogeneous team games, with theoretical guarantees and empirical success in complex game settings.
Findings
H-PSRO achieves lower exploitability than Team PSRO.
H-PSRO converges in matrix heterogeneous games where others fail.
H-PSRO outperforms non-heterogeneous baselines in diverse settings.
Abstract
The ex ante equilibrium for two-team zero-sum games, where agents within each team collaborate to compete against the opposing team, is known to be the best a team can do for coordination. Many existing works on ex ante equilibrium solutions are aiming to extend the scope of ex ante equilibrium solving to large-scale team games based on Policy Space Response Oracle (PSRO). However, the joint team policy space constructed by the most prominent method, Team PSRO, cannot cover the entire team policy space in heterogeneous team games where teammates play distinct roles. Such insufficient policy expressiveness causes Team PSRO to be trapped into a sub-optimal ex ante equilibrium with significantly higher exploitability and never converges to the global ex ante equilibrium. To find the global ex ante equilibrium without introducing additional computational complexity, we first parameterize…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- The paper is well-written, as it clearly explains the problem setup, notions and solutions to me. - The paper also includes a rich set of experiments on realistic team game instances. The results from the proposed method look good and the plots in the experiments are clear and informative.
- The insights and technical contributions are relatively shallow. The proposed methods seem to be well-expected, so I would encourage the authors to highlight the challenges in devising and employing the proposed methods.
The paper is written clearly. I fully agree with the authors that restricting to homogeneous policies would be severely limiting in the setting of team games.
This paper is built on the premise that Team-PSRO, as implemented by McAleer et al. (2023), cannot recover heterogeneous policies because it uses shared parameters across team members. This premise is shaky at best, if not outright false. It is very easy to use shared parameters and still allow heterogeneous policies, by using "agent indicators", i.e., including the index of each agent as part of its observation [1, 2]. Although McAleer et al. (2023) does not (as far as I can tell) explicitly st
1. Numerical results are provided where the proposed algorithm is competing with several benchmark algorithms in relatively large-scale two team zero-sum games.
1. This paper makes the claim that previous work [1] relies on the assumption that the two-team zero-sum has to be homogenous and a policy sharing mechanism that requires every player in the same team to play under the same strategy. (i.e. the policies of players in the same team are ($\pi_{k, share}, \pi_{k, share}, ..., \pi_{k, share} )$. However, to the reviewer's best knowledge, the algorithm provided in [1] does not require such strong assumption and does not employ a policy sharing mechani
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Voting Systems · Game Theory and Applications · Auction Theory and Applications
