Learning Equilibria in Adversarial Team Markov Games: A Nonconvex-Hidden-Concave Min-Max Optimization Problem
Fivos Kalogiannis, Jingming Yan, Ioannis Panageas

TL;DR
This paper introduces a polynomial-sample complexity learning algorithm for Nash equilibria in adversarial team Markov games, addressing a complex nonconvex-nonconcave min-max problem by exploiting hidden problem structure.
Contribution
It develops the first learning algorithm with polynomial complexity for NE in ATMGs, utilizing MARL policy gradients and novel optimization techniques for nonconvex-concave reformulations.
Findings
Algorithm achieves polynomial iteration and sample complexity.
Addresses intractability of nonconvex-nonconcave saddle-point problems.
Extends optimization framework for weakly-smooth nonconvex functions.
Abstract
We study the problem of learning a Nash equilibrium (NE) in Markov games which is a cornerstone in multi-agent reinforcement learning (MARL). In particular, we focus on infinite-horizon adversarial team Markov games (ATMGs) in which agents that share a common reward function compete against a single opponent, the adversary. These games unify two-player zero-sum Markov games and Markov potential games, resulting in a setting that encompasses both collaboration and competition. Kalogiannis et al. (2023a) provided an efficient equilibrium computation algorithm for ATMGs which presumes knowledge of the reward and transition functions and has no sample complexity guarantees. We contribute a learning algorithm that utilizes MARL policy gradient methods with iteration and sample complexity that is polynomial in the approximation error and the natural parameters of the ATMG,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Supply Chain and Inventory Management · Game Theory and Applications
