Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning
Milos S. Stankovic, Marko Beko, Srdjan S. Stankovic

TL;DR
This paper introduces novel distributed gradient-based algorithms for multi-agent reinforcement learning that enable off-policy value function approximation with limited inter-agent communication, providing convergence guarantees and variance reduction insights.
Contribution
The paper presents new distributed gradient algorithms with convergence analysis for multi-agent off-policy learning under strict communication constraints.
Findings
Algorithms converge to the solutions of associated ODEs.
Variance reduction effects are demonstrated through stochastic differential equation analysis.
Simulation results illustrate the algorithms' superior properties.
Abstract
In this paper we propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes with strict information structure constraints, limiting inter-agent communications to small neighborhoods. The algorithms are composed of: 1) local parameter updates based on single-agent off-policy gradient temporal difference learning algorithms, including eligibility traces with state dependent parameters, and 2) linear stochastic time varying consensus schemes, represented by directed graphs. The proposed algorithms differ by their form, definition of eligibility traces, selection of time scales and the way of incorporating consensus iterations. The main contribution of the paper is a convergence analysis based on the general properties of the underlying Feller-Markov processes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
