Cluster-Based Control of Transition-Independent MDPs

Carmel Fiscko; Soummya Kar; Bruno Sinopoli

arXiv:2207.05224·eess.SY·January 27, 2023·1 cites

Cluster-Based Control of Transition-Independent MDPs

Carmel Fiscko, Soummya Kar, Bruno Sinopoli

PDF

Open Access

TL;DR

This paper introduces a scalable clustering-based control method for transition-independent MDPs, improving policy computation efficiency in multi-agent systems with exponential action spaces.

Contribution

It proposes a clustered Bellman operator and CVI algorithm that accelerate policy computation and guarantees optimality for certain reward structures.

Findings

01

CVI converges exponentially faster than standard VI.

02

CVI can find policies close to the true optimal.

03

The greedy clustering algorithm improves value monotonically.

Abstract

This work studies efficient solution methods for cluster-based control policies of transition-independent Markov decision processes (TI-MDPs). We focus on control of multi-agent systems, whereby a central planner (CP) influences agents to select desirable group behavior. The agents are partitioned into disjoint clusters whereby agents in the same cluster receive the same controls but agents in different clusters may receive different controls. Under mild assumptions, this process can be modeled as a TI-MDP where each factor describes the behavior of one cluster. The action space of the TI-MDP becomes exponential with respect to the number of clusters. To efficiently find a policy in this rapidly scaling space, we propose a clustered Bellman operator that optimizes over the action space for one cluster at any evaluation. We present Clustered Value Iteration (CVI), which uses this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics