Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?
Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael I. Jordan

TL;DR
This paper introduces the first provably efficient reinforcement learning algorithms for finding Stackelberg-Nash equilibria in general-sum Markov games with myopic followers, applicable in large state spaces with function approximation.
Contribution
Develops novel sample-efficient RL algorithms for computing SNEs in general-sum Markov games with myopic followers, including theoretical guarantees.
Findings
Algorithms achieve sublinear regret and suboptimality in linear function approximation settings.
First provably efficient RL methods for SNE in general-sum Markov games with myopic followers.
Applicable to large state spaces with function approximation.
Abstract
We study multi-player general-sum Markov games with one of the players designated as the leader and the other players regarded as followers. In particular, we focus on the class of games where the followers are myopic, i.e., they aim to maximize their instantaneous rewards. For such a game, our goal is to find a Stackelberg-Nash equilibrium (SNE), which is a policy pair such that (i) is the optimal policy for the leader when the followers always play their best response, and (ii) is the best response policy of the followers, which is a Nash equilibrium of the followers' game induced by . We develop sample-efficient reinforcement learning (RL) algorithms for solving for an SNE in both online and offline settings. Our algorithms are optimistic and pessimistic variants of least-squares value iteration, and they are readily able to incorporate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Applications · Reinforcement Learning in Robotics · Opinion Dynamics and Social Influence
