Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games
Wonseok Yang, Thinh T. Doan

TL;DR
This paper introduces an internal state-based policy gradient method for multi-agent reinforcement learning in partially observable Markov potential games, providing convergence guarantees and demonstrating improved performance with finite-state controllers.
Contribution
The paper develops a novel internal state-based natural policy gradient approach with non-asymptotic convergence bounds for partially observable Markov potential games.
Findings
The method achieves consistent performance improvements over observation-only approaches.
Theoretical convergence bounds decompose into statistical and approximation errors.
Simulations validate the effectiveness of finite-state controllers in complex environments.
Abstract
This letter studies multi-agent reinforcement learning in partially observable Markov potential games. Solving this problem is challenging due to partial observability, decentralized information, and the curse of dimensionality. First, to address the first two challenges, we leverage the common information framework, which allows agents to act based on both shared and local information. Second, to ensure tractability, we study an internal state that compresses accumulated information, preventing it from growing unboundedly over time. We then implement an internal state-based natural policy gradient method to find Nash equilibria of the Markov potential game. Our main contribution is to establish a non-asymptotic convergence bound for this method. Our theoretical bound decomposes into two interpretable components: a statistical error term that also arises in standard Markov potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
