Factored Value Functions for Graph-Based Multi-Agent Reinforcement Learning
Ahmed Rashwan, Keith Briggs, Chris Budd, Lisa Kreusser

TL;DR
This paper introduces the Diffusion Value Function (DVF), a novel factored value approach for graph-structured multi-agent reinforcement learning, enabling scalable, decentralized algorithms with improved performance in large-scale systems.
Contribution
The paper proposes DVF, a new factored value function for GMDPs, and develops scalable RL algorithms like DA2C and LD-GNN that outperform existing methods in complex multi-agent tasks.
Findings
DVF is well-defined and decomposes global value via an averaging property.
DA2C outperforms local and global critics by up to 11% in benchmark tasks.
Graph neural networks enable scalable estimation of the proposed value functions.
Abstract
Credit assignment is a core challenge in multi-agent reinforcement learning (MARL), especially in large-scale systems with structured, local interactions. Graph-based Markov decision processes (GMDPs) capture such settings via an influence graph, but standard critics are poorly aligned with this structure: global value functions provide weak per-agent learning signals, while existing local constructions can be difficult to estimate and ill-behaved in infinite-horizon settings. We introduce the Diffusion Value Function (DVF), a factored value function for GMDPs that assigns to each agent a value component by diffusing rewards over the influence graph with temporal discounting and spatial attenuation. We show that DVF is well-defined, admits a Bellman fixed point, and decomposes the global discounted value via an averaging property. DVF can be used as a drop-in critic in standard RL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Graph Neural Networks
