Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning
Diyi Hu, Bhaskar Krishnamachari

TL;DR
This paper introduces CLOVER, a wireless communication-aware value decomposition framework for multi-agent reinforcement learning that leverages a graph neural network to improve cooperation and performance.
Contribution
It proposes a novel GNN-based value mixer conditioned on a realistic wireless communication graph, enhancing expressiveness and adaptability in MARL.
Findings
CLOVER outperforms existing methods on Predator-Prey and Lumberjacks benchmarks.
Agents learn adaptive signaling and listening strategies.
The communication graph inductive bias is key to performance improvements.
Abstract
Cooperation in multi-agent reinforcement learning (MARL) benefits from inter-agent communication, yet most approaches assume idealized channels and existing value decomposition methods ignore who successfully shared information with whom. We propose CLOVER, a cooperative MARL framework whose centralized value mixer is conditioned on the communication graph realized under a realistic wireless channel. This graph introduces a relational inductive bias into value decomposition, constraining how individual utilities are mixed based on the realized communication structure. The mixer is a GNN with node-specific weights generated by a Permutation-Equivariant Hypernetwork: multi-hop propagation along communication edges reshapes credit assignment so that different topologies induce different mixing. We prove this mixer is permutation invariant, monotonic (preserving the IGM condition), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
