Reinforcement Learning in Decentralized Stochastic Control Systems with Partial History Sharing
Jalal Arabneydi, Aditya Mahajan

TL;DR
This paper develops a decentralized reinforcement learning algorithm for multi-agent systems with partial information sharing, converting the problem into a POMDP and introducing a novel finite-state RL approach that converges rapidly, demonstrated on a broadcast channel example.
Contribution
It introduces an incremental representation method enabling finite-state RL in decentralized systems with partial history sharing, extending existing approaches to more complex settings.
Findings
The algorithm learns epsilon-team-optimal strategies in decentralized systems.
Convergence of the approximation error is exponential.
Validated through a decentralized Q-learning implementation on a multi-user broadcast channel.
Abstract
In this paper, we are interested in systems with multiple agents that wish to collaborate in order to accomplish a common task while a) agents have different information (decentralized information) and b) agents do not know the model of the system completely i.e., they may know the model partially or may not know it at all. The agents must learn the optimal strategies by interacting with their environment i.e., by decentralized Reinforcement Learning (RL). The presence of multiple agents with different information makes decentralized reinforcement learning conceptually more difficult than centralized reinforcement learning. In this paper, we develop a decentralized reinforcement learning algorithm that learns -team-optimal solution for partial history sharing information structure, which encompasses a large class of decentralized control systems including delayed sharing,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
