Approximations and Learning for Decentralized Stochastic Control and Near Optimal Finite Window Policies
Omar Mrani-Zentar, Serdar Yuksel

TL;DR
This paper develops approximation and learning methods for complex decentralized stochastic control problems with delayed or periodic information sharing, establishing near optimal policies and convergence of Q-learning algorithms.
Contribution
It provides the first explicit conditions and rigorous results for approximation and learning in decentralized control with general spaces under specific information structures.
Findings
Finite window policies are near optimal under certain stability conditions.
Q-learning algorithms converge asymptotically to near optimal solutions.
Performance bounds are established for policies based on finite information windows.
Abstract
Decentralized stochastic control problems are difficult to study due to information structure dependent subtleties, which prevent many classical methods in stochastic control from being applicable. In this paper we consider such problems with general standard Borel spaces under two related information structures. (a) the one-step delayed information sharing pattern (OSDISP) where agents share their information with one-step delay, and (b) the -step periodic information sharing pattern (KSPISP), where information is shared periodically. It is known that OSDISP and KSPISP problems admit a centralized reduction where the agents view the problem from the perspective of a centralized controller that uses the common information to prescribe function valued actions (local policies) which map each agent's private information to an optimal action in the original problem. We provide rigorous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
