Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning
Zehao Dou, Jakub Grudzien Kuba, Yaodong Yang

TL;DR
This paper investigates the theoretical foundations of value decomposition algorithms in cooperative multi-agent reinforcement learning, identifying conditions for their convergence and introducing the concept of decomposable games.
Contribution
It introduces the concept of decomposable games, proves convergence of MA-FQI in these games, and analyzes convergence in non-decomposable games with neural network function approximation.
Findings
MA-FQI converges to the optimal Q-function in decomposable games.
In non-decomposable games, convergence is possible with Q-function projection.
Provides theoretical insights into when and why value decomposition algorithms succeed.
Abstract
Value function decomposition is becoming a popular rule of thumb for scaling up multi-agent reinforcement learning (MARL) in cooperative games. For such a decomposition rule to hold, the assumption of the individual-global max (IGM) principle must be made; that is, the local maxima on the decomposed value function per every agent must amount to the global maximum on the joint value function. This principle, however, does not have to hold in general. As a result, the applicability of value decomposition algorithms is concealed and their corresponding convergence properties remain unknown. In this paper, we make the first effort to answer these questions. Specifically, we introduce the set of cooperative games in which the value decomposition methods find their validity, which is referred as decomposable games. In decomposable games, we theoretically prove that applying the multi-agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
