Maximum Correntropy Value Decomposition for Multi-agent Deep Reinforcemen Learning
Kai Liu, Tianxian Zhang, Lingjiang Kong

TL;DR
This paper introduces MCVD, a novel value decomposition method for multi-agent deep reinforcement learning that adaptively handles non-monotonic value functions, overcoming limitations of fixed-weight schemes like Weighted QMIX.
Contribution
The paper proposes MCVD, a new algorithm based on maximum correntropy, to improve value decomposition in multi-agent reinforcement learning, especially for non-monotonic problems.
Findings
MCVD effectively handles non-monotonic value decomposition problems.
MCVD demonstrates broad applicability and stability across multiple scenarios.
MCVD simplifies implementation compared to existing methods.
Abstract
We explore value decomposition solutions for multi-agent deep reinforcement learning in the popular paradigm of centralized training with decentralized execution(CTDE). As the recognized best solution to CTDE, Weighted QMIX is cutting-edge on StarCraft Multi-agent Challenge (SMAC), with a weighting scheme implemented on QMIX to place more emphasis on the optimal joint actions. However, the fixed weight requires manual tuning according to the application scenarios, which painfully prevents Weighted QMIX from being used in broader engineering applications. In this paper, we first demonstrate the flaw of Weighted QMIX using an ordinary One-Step Matrix Game (OMG), that no matter how the weight is chosen, Weighted QMIX struggles to deal with non-monotonic value decomposition problems with a large variance of reward distributions. Then we characterize the problem of value decomposition as an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Experimental Behavioral Economics Studies
