Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version)
Joris Scharpff, Diederik M. Roijers, Frans A. Oliehoek, Matthijs T.J., Spaan, Mathijs M. de Weerdt

TL;DR
This paper introduces CoRe, a new algorithm for solving transition-independent multi-agent MDPs with sparse interactions, using compact representations called CRGs, enabling efficient computation and solving previously intractable problems.
Contribution
The paper presents a novel optimal solver for transition-independent multi-agent MDPs using conditional return graphs, improving efficiency and scalability over existing methods.
Findings
CoRe outperforms existing algorithms in runtime.
It can solve larger and more complex problems.
It provides tight bounds for partially specified policies.
Abstract
In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP setting (MMDP) such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these dependencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to problems previously unsolvable.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multi-Agent Systems and Negotiation · Optimization and Search Problems
