Multi-agent Markov Entanglement
Shuze Chen, Tianyi Peng

TL;DR
This paper reveals that the effectiveness of value decomposition in multi-agent RL hinges on a property called Markov entanglement, which parallels quantum entanglement, and introduces measures to quantify and bound the decomposition error.
Contribution
It establishes a mathematical link between value decomposition and Markov entanglement, providing a new theoretical framework and practical tools for multi-agent RL.
Findings
Value decomposition is effective when Markov entanglement is low.
Index policies are shown to be weakly entangled with sublinear decomposition error.
A practical method to estimate Markov entanglement in multi-agent systems.
Abstract
Value decomposition has long been a fundamental technique in multi-agent dynamic programming and reinforcement learning (RL). Specifically, the value function of a global state is often approximated as the sum of local functions: . This approach traces back to the index policy in restless multi-armed bandit problems and has found various applications in modern RL systems. However, the theoretical justification for why this decomposition works so effectively remains underexplored. In this paper, we uncover the underlying mathematical structure that enables value decomposition. We demonstrate that a multi-agent Markov decision process (MDP) permits value decomposition if and only if its transition matrix is not "entangled" -- a concept analogous to quantum entanglement in quantum physics. Drawing inspiration from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpinion Dynamics and Social Influence
