Independent and Decentralized Learning in Markov Potential Games
Chinmay Maheshwari, Manxi Wu, Druv Pai, Shankar Sastry

TL;DR
This paper analyzes the long-term behavior of independent, decentralized multi-agent reinforcement learning in Markov potential games, showing convergence properties using two-timescale stochastic approximation.
Contribution
It introduces a novel learning dynamics inspired by actor-critic algorithms for decentralized multi-agent settings and characterizes its convergence in Markov potential games.
Findings
Convergence of the proposed learning dynamics is established.
Agents can learn optimal policies without communication or game parameter knowledge.
The analysis leverages two-timescale stochastic approximation theory.
Abstract
We study a multi-agent reinforcement learning dynamics, and analyze its asymptotic behavior in infinite-horizon discounted Markov potential games. We focus on the independent and decentralized setting, where players do not know the game parameters, and cannot communicate or coordinate. In each stage, players update their estimate of Q-function that evaluates their total contingent payoff based on the realized one-stage reward in an asynchronous manner. Then, players independently update their policies by incorporating an optimal one-stage deviation strategy based on the estimated Q-function. Inspired by the actor-critic algorithm in single-agent reinforcement learning, a key feature of our learning dynamics is that agents update their Q-function estimates at a faster timescale than the policies. Leveraging tools from two-timescale asynchronous stochastic approximation theory, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Applications · Experimental Behavioral Economics Studies · Economic Policies and Impacts
