Differentiable Arbitrating in Zero-sum Markov Games
Jing Wang, Meichen Song, Feng Gao, Boyi Liu, Zhaoran Wang, Yi Wu

TL;DR
This paper introduces a differentiable approach to perturb rewards in zero-sum Markov games to steer Nash equilibria, enabling end-to-end optimization with a novel backpropagation scheme through the equilibrium.
Contribution
It proposes a new method for differentiating through Nash equilibria in zero-sum Markov games, facilitating reward perturbation for desired equilibria, using black-box NE solvers.
Findings
Effective in two MARL environments
Convergence analysis provided for the framework
Enables end-to-end optimization of reward perturbations
Abstract
We initiate the study of how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating. Such a problem admits a bi-level optimization formulation. The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way. We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level. In particular, our method only requires a black-box solver for the (regularized) Nash equilibrium (NE). We develop the convergence analysis for the proposed framework with proper black-box NE solvers and demonstrate the empirical successes in two multi-agent reinforcement learning (MARL) environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
