Differentiable Arbitrating in Zero-sum Markov Games

Jing Wang; Meichen Song; Feng Gao; Boyi Liu; Zhaoran Wang; Yi Wu

arXiv:2302.10058·cs.MA·February 21, 2023

Differentiable Arbitrating in Zero-sum Markov Games

Jing Wang, Meichen Song, Feng Gao, Boyi Liu, Zhaoran Wang, Yi Wu

PDF

Open Access

TL;DR

This paper introduces a differentiable approach to perturb rewards in zero-sum Markov games to steer Nash equilibria, enabling end-to-end optimization with a novel backpropagation scheme through the equilibrium.

Contribution

It proposes a new method for differentiating through Nash equilibria in zero-sum Markov games, facilitating reward perturbation for desired equilibria, using black-box NE solvers.

Findings

01

Effective in two MARL environments

02

Convergence analysis provided for the framework

03

Enables end-to-end optimization of reward perturbations

Abstract

We initiate the study of how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating. Such a problem admits a bi-level optimization formulation. The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way. We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level. In particular, our method only requires a black-box solver for the (regularized) Nash equilibrium (NE). We develop the convergence analysis for the proposed framework with proper black-box NE solvers and demonstrate the empirical successes in two multi-agent reinforcement learning (MARL) environments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics