Decomposing Communication Gain and Delay Cost Under Cross-Timestep Delays in Cooperative Multi-Agent Reinforcement Learning
Zihong Gao, Hongjian Liang, Lei Hao, Liangjun Ke

TL;DR
This paper introduces a formal framework and metric for understanding the impact of cross-timestep communication delays in cooperative multi-agent reinforcement learning, proposing a new method that improves performance under such delays.
Contribution
It formalizes delayed communication in multi-agent RL as DeComm-POMG, introduces the CGDC metric, and develops CDCMA, a novel actor-critic approach that mitigates delay effects.
Findings
CDCMA outperforms baselines across multiple delay levels.
The CGDC metric effectively guides communication decisions.
Experiments show improved robustness and generalization.
Abstract
Communication is essential for coordination in \emph{cooperative} multi-agent reinforcement learning under partial observability, yet \emph{cross-timestep} delays cause messages to arrive multiple timesteps after generation, inducing temporal misalignment and making information stale when consumed. We formalize this setting as a delayed-communication partially observable Markov game (DeComm-POMG) and decompose a message's effect into \emph{communication gain} and \emph{delay cost}, yielding the Communication Gain and Delay Cost (CGDC) metric. We further establish a value-loss bound showing that the degradation induced by delayed messages is upper-bounded by a discounted accumulation of an information gap between the action distributions induced by timely versus delayed messages. Guided by CGDC, we propose \textbf{CDCMA}, an actor--critic framework that requests messages only when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
