$QD$-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations
Soummya Kar, Jose' M.F. Moura, H. Vincent Poor

TL;DR
This paper introduces a distributed $Q$-learning algorithm for multi-agent reinforcement learning that enables agents to collaboratively learn optimal policies through local processing and sparse communication, without prior knowledge of the environment.
Contribution
It proposes a novel distributed $Q$-learning method, $ ext{QD}$-learning, for multi-agent MDPs that guarantees convergence to optimal policies under weak connectivity assumptions.
Findings
Almost sure convergence to the optimal value function.
Effective collaboration through sparse communication networks.
Addresses mixed time-scale stochastic dynamics.
Abstract
The paper considers a class of multi-agent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. The paper investigates a distributed reinforcement learning setup with no prior information on the global state transition and local agent cost statistics. Specifically, with the agents' objective consisting of minimizing a network-averaged infinite horizon discounted cost, the paper proposes a distributed version of -learning, -learning, in which the network agents collaborate by means of local processing and mutual information exchange over a sparse (possibly stochastic) communication network to achieve the network goal. Under the assumption that each agent is only aware of its local online cost data and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
