ToMacVF : Temporal Macro-action Value Factorization for Asynchronous Multi-Agent Reinforcement Learning
Wenjing Zhang, Wei Zhang

TL;DR
ToMacVF introduces a fine-grained temporal credit assignment method for asynchronous multi-agent reinforcement learning, improving macro-action representation and credit distribution for better performance and robustness.
Contribution
It proposes ToMacVF with a new experience replay buffer and a formalized IGM condition, enabling more accurate macro-action value factorization in asynchronous settings.
Findings
Outperforms asynchronous baselines in various scenarios.
Achieves more accurate macro-action credit assignment.
Demonstrates robustness and adaptability across tasks.
Abstract
Existing asynchronous MARL methods based on MacDec-POMDP typically construct training trajectory buffers by simply sampling limited and biased data at the endpoints of macro-actions, and directly apply conventional MARL methods on the buffers. As a result, these methods lead to an incomplete and inaccurate representation of the macro-action execution process, along with unsuitable credit assignments. To solve these problems, the Temporal Macro-action Value Factorization (ToMacVF) is proposed to achieve fine-grained temporal credit assignment for macro-action contributions. A centralized training buffer, called Macro-action Segmented Joint Experience Replay Trajectory (Mac-SJERT), is designed to incorporate with ToMacVF to collect accurate and complete macro-action execution information, supporting a more comprehensive and precise representation of the macro-action process. To ensure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
