Value-Decomposition Multi-Agent Actor-Critics
Jianyu Su, Stephen Adams, Peter A. Beling

TL;DR
This paper introduces VDACs, a novel value-decomposition actor-critic framework for multi-agent reinforcement learning that balances training efficiency and performance, demonstrated on StarCraft II benchmarks.
Contribution
The paper proposes VDACs, extending value-decomposition to actor-critics compatible with A2C, improving training efficiency and performance in multi-agent tasks.
Findings
VDACs outperform other actor-critic methods on StarCraft II tasks.
Ablation experiments identify key factors influencing VDACs' performance.
VDACs achieve better median performance compared to existing methods.
Abstract
The exploitation of extra state information has been an active research area in multi-agent reinforcement learning (MARL). QMIX represents the joint action-value using a non-negative function approximator and achieves the best performance, by far, on multi-agent benchmarks, StarCraft II micromanagement tasks. However, our experiments show that, in some cases, QMIX is incompatible with A2C, a training paradigm that promotes algorithm training efficiency. To obtain a reasonable trade-off between training efficiency and algorithm performance, we extend value-decomposition to actor-critics that are compatible with A2C and propose a novel actor-critic framework, value-decomposition actor-critics (VDACs). We evaluate VDACs on the testbed of StarCraft II micromanagement tasks and demonstrate that the proposed framework improves median performance over other actor-critic methods. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Metaheuristic Optimization Algorithms Research
MethodsA2C
