Reinforcement Learning in Decentralized Stochastic Control Systems with   Partial History Sharing

Jalal Arabneydi; Aditya Mahajan

arXiv:2012.02051·math.OC·December 4, 2020·ACC

Reinforcement Learning in Decentralized Stochastic Control Systems with Partial History Sharing

Jalal Arabneydi, Aditya Mahajan

PDF

Open Access

TL;DR

This paper develops a decentralized reinforcement learning algorithm for multi-agent systems with partial information sharing, converting the problem into a POMDP and introducing a novel finite-state RL approach that converges rapidly, demonstrated on a broadcast channel example.

Contribution

It introduces an incremental representation method enabling finite-state RL in decentralized systems with partial history sharing, extending existing approaches to more complex settings.

Findings

01

The algorithm learns epsilon-team-optimal strategies in decentralized systems.

02

Convergence of the approximation error is exponential.

03

Validated through a decentralized Q-learning implementation on a multi-user broadcast channel.

Abstract

In this paper, we are interested in systems with multiple agents that wish to collaborate in order to accomplish a common task while a) agents have different information (decentralized information) and b) agents do not know the model of the system completely i.e., they may know the model partially or may not know it at all. The agents must learn the optimal strategies by interacting with their environment i.e., by decentralized Reinforcement Learning (RL). The presence of multiple agents with different information makes decentralized reinforcement learning conceptually more difficult than centralized reinforcement learning. In this paper, we develop a decentralized reinforcement learning algorithm that learns $ϵ$ -team-optimal solution for partial history sharing information structure, which encompasses a large class of decentralized control systems including delayed sharing,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research