ToMacVF : Temporal Macro-action Value Factorization for Asynchronous Multi-Agent Reinforcement Learning

Wenjing Zhang; Wei Zhang

arXiv:2507.10251·cs.MA·July 15, 2025

ToMacVF : Temporal Macro-action Value Factorization for Asynchronous Multi-Agent Reinforcement Learning

Wenjing Zhang, Wei Zhang

PDF

TL;DR

ToMacVF introduces a fine-grained temporal credit assignment method for asynchronous multi-agent reinforcement learning, improving macro-action representation and credit distribution for better performance and robustness.

Contribution

It proposes ToMacVF with a new experience replay buffer and a formalized IGM condition, enabling more accurate macro-action value factorization in asynchronous settings.

Findings

01

Outperforms asynchronous baselines in various scenarios.

02

Achieves more accurate macro-action credit assignment.

03

Demonstrates robustness and adaptability across tasks.

Abstract

Existing asynchronous MARL methods based on MacDec-POMDP typically construct training trajectory buffers by simply sampling limited and biased data at the endpoints of macro-actions, and directly apply conventional MARL methods on the buffers. As a result, these methods lead to an incomplete and inaccurate representation of the macro-action execution process, along with unsuitable credit assignments. To solve these problems, the Temporal Macro-action Value Factorization (ToMacVF) is proposed to achieve fine-grained temporal credit assignment for macro-action contributions. A centralized training buffer, called Macro-action Segmented Joint Experience Replay Trajectory (Mac-SJERT), is designed to incorporate with ToMacVF to collect accurate and complete macro-action execution information, supporting a more comprehensive and precise representation of the macro-action process. To ensure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.