Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning
Ian Gemp, Andreas Haupt, Luke Marris, Siqi Liu, Georgios Piliouras

TL;DR
This paper introduces convex Markov games, a broad class of multi-agent decision models that accommodate complex preferences and guarantees the existence of equilibria, enabling new solutions for fairness, safety, and strategic behavior in multi-agent systems.
Contribution
It defines convex Markov games with general convex preferences, proves the existence of pure strategy Nash equilibria, and develops empirical methods to approximate these equilibria.
Findings
Successfully solved classic repeated normal-form games.
Found fair solutions in asymmetric coordination scenarios.
Achieved safe, long-term behaviors in a robot warehouse environment.
Abstract
Behavioral diversity, expert imitation, fairness, safety goals and others give rise to preferences in sequential decision making domains that do not decompose additively across time. We introduce the class of convex Markov games that allow general convex preferences over occupancy measures. Despite infinite time horizon and strictly higher generality than Markov games, pure strategy Nash equilibria exist. Furthermore, equilibria can be approximated empirically by performing gradient descent on an upper bound of exploitability. Our experiments reveal novel solutions to classic repeated normal-form games, find fair solutions in a repeated asymmetric coordination game, and prioritize safe long-term behavior in a robot warehouse environment. In the prisoner's dilemma, our algorithm leverages transient imitation to find a policy profile that deviates from observed human play only slightly,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGame Theory and Applications · Evolutionary Game Theory and Cooperation
