Credit Assignment with Meta-Policy Gradient for Multi-Agent Reinforcement Learning
Jianzhun Shao, Hongchang Zhang, Yuhang Jiang, Shuncheng He, Xiangyang, Ji

TL;DR
This paper introduces a meta-learning-based mixing network with policy gradient for improved reward decomposition in multi-agent reinforcement learning, demonstrating superior performance on StarCraft II benchmarks.
Contribution
It proposes a novel meta-learning framework for reward decomposition that leverages global information and can enhance existing CTDE methods.
Findings
Outperforms state-of-the-art MARL algorithms on 4 of 5 StarCraft II scenarios.
The method is effective with a simple utility network and further improved with role-based utility networks.
Demonstrates the general applicability of the approach to monotonic mixing networks.
Abstract
Reward decomposition is a critical problem in centralized training with decentralized execution~(CTDE) paradigm for multi-agent reinforcement learning. To take full advantage of global information, which exploits the states from all agents and the related environment for decomposing Q values into individual credits, we propose a general meta-learning-based Mixing Network with Meta Policy Gradient~(MNMPG) framework to distill the global hierarchy for delicate reward decomposition. The excitation signal for learning global hierarchy is deduced from the episode reward difference between before and after "exercise updates" through the utility network. Our method is generally applicable to the CTDE method using a monotonic mixing network. Experiments on the StarCraft II micromanagement benchmark demonstrate that our method just with a simple utility network is able to outperform the current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Transportation and Mobility Innovations · Auction Theory and Applications
