Structural Equivalence and Learning Dynamics in Delayed MARL
Jules Sintes, Ana Bu\v{s}i\'c, Jiamin Zhu

TL;DR
This paper establishes a formal equivalence between Observation Delay and Action Delay in cooperative multi-agent systems, analyzing their structural similarities and differences in learning dynamics, and demonstrating practical policy transfer.
Contribution
It generalizes the equivalence between OD and AD to multi-agent settings, analyzes their learning dynamics, and enables zero-shot policy transfer between delay configurations.
Findings
OD and AD generate identical admissible joint-policy sets.
Structural equivalence does not imply identical learning dynamics.
Zero-shot policy transfer from OD to AD is feasible.
Abstract
We formally establish the equivalence between Observation Delay (OD) and Action Delay (AD) in cooperative partially observable multi-agent systems using observation-action histories. We show that both systems generate identical admissible joint-policy sets, and their induced state-action-observation trajectories are identical in distribution, leading to identical optimal solutions in Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). This formally generalizes existing infinite-horizon single-agent results to any-horizon partially observable cooperative multi-agent problems with decentralized policy execution, and allows any mixed-delay configuration to be reduced to a pure OD system. We further prove that in Transition-Independent MDPs (TI-MDPs), the observation-action history reduces to a tractable minimal local augmented state. However, we show through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
