Multi-Agent Fully Decentralized Value Function Learning with Linear Convergence Rates
Lucas Cassano, Kun Yuan, Ali H. Sayed

TL;DR
This paper introduces a fully decentralized multi-agent algorithm for policy evaluation that guarantees linear convergence rates, applicable to diverse scenarios with different data collection policies and local rewards.
Contribution
It presents a novel variance-reduced, linear convergence algorithm for multi-agent policy evaluation with decentralized data and local rewards, combining off-policy learning and eligibility traces.
Findings
Achieves linear convergence with O(1) memory.
Effective in scenarios with diverse data collection policies.
Validated through simulations demonstrating efficiency.
Abstract
This work develops a fully decentralized multi-agent algorithm for policy evaluation. The proposed scheme can be applied to two distinct scenarios. In the first scenario, a collection of agents have distinct datasets gathered following different behavior policies (none of which is required to explore the full state space) in different instances of the same environment and they all collaborate to evaluate a common target policy. The network approach allows for efficient exploration of the state space and allows all agents to converge to the optimal solution even in situations where neither agent can converge on its own without cooperation. The second scenario is that of multi-agent games, in which the state is global and rewards are local. In this scenario, agents collaborate to estimate the value function of a target team policy. The proposed algorithm combines off-policy learning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
