Cluster-Based Control of Transition-Independent MDPs
Carmel Fiscko, Soummya Kar, Bruno Sinopoli

TL;DR
This paper introduces a scalable clustering-based control method for transition-independent MDPs, improving policy computation efficiency in multi-agent systems with exponential action spaces.
Contribution
It proposes a clustered Bellman operator and CVI algorithm that accelerate policy computation and guarantees optimality for certain reward structures.
Findings
CVI converges exponentially faster than standard VI.
CVI can find policies close to the true optimal.
The greedy clustering algorithm improves value monotonically.
Abstract
This work studies efficient solution methods for cluster-based control policies of transition-independent Markov decision processes (TI-MDPs). We focus on control of multi-agent systems, whereby a central planner (CP) influences agents to select desirable group behavior. The agents are partitioned into disjoint clusters whereby agents in the same cluster receive the same controls but agents in different clusters may receive different controls. Under mild assumptions, this process can be modeled as a TI-MDP where each factor describes the behavior of one cluster. The action space of the TI-MDP becomes exponential with respect to the number of clusters. To efficiently find a policy in this rapidly scaling space, we propose a clustered Bellman operator that optimizes over the action space for one cluster at any evaluation. We present Clustered Value Iteration (CVI), which uses this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
