Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games
Zelai Xu, Yancheng Liang, Chao Yu, Yu Wang, Yi Wu

TL;DR
This paper introduces Fictitious Cross-Play (FXP), a novel algorithm that effectively learns global Nash equilibria in mixed cooperative-competitive games by combining self-play and best response strategies, outperforming existing methods.
Contribution
FXP combines self-play and best response training to efficiently converge to global Nash equilibria in complex mixed games, overcoming scalability issues of prior approaches.
Findings
FXP converges to global Nash equilibria in matrix games.
FXP achieves higher Elo ratings and lower exploitability in gridworld domain.
FXP defeats state-of-the-art models in a challenging football game with over 94% win rate.
Abstract
Self-play (SP) is a popular multi-agent reinforcement learning (MARL) framework for solving competitive games, where each agent optimizes policy by treating others as part of the environment. Despite the empirical successes, the theoretical properties of SP-based methods are limited to two-player zero-sum games. However, for mixed cooperative-competitive games where agents on the same team need to cooperate with each other, we can show a simple counter-example where SP-based methods cannot converge to a global Nash equilibrium (NE) with high probability. Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w.r.t. previous policies are learned in each iteration. PSRO can be directly extended to mixed cooperative-competitive settings by jointly learning team best responses with all convergence properties unchanged.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance
