TVDO: Tchebycheff Value-Decomposition Optimization for Multi-Agent Reinforcement Learning
Xiaoliang Hu, Pengcheng Guo, Yadong Li, Guanyu Li, Zhen Cui, Jian Yang

TL;DR
This paper introduces TVDO, a novel value-decomposition method for multi-agent reinforcement learning that ensures policy consistency and outperforms existing methods on benchmark tasks.
Contribution
The paper proposes a Tchebycheff-based value decomposition approach that guarantees the sufficiency and necessity of IGM, improving policy consistency in cooperative MARL.
Findings
TVDO guarantees policy consistency in cooperative MARL.
TVDO outperforms state-of-the-art baselines on SMAC benchmark.
Theoretical proof of IGM satisfaction with Tchebycheff aggregation.
Abstract
In cooperative multiagent reinforcement learning (MARL), centralized training with decentralized execution (CTDE) has recently attracted more attention due to the physical demand. However, the most dilemma therein is the inconsistency between jointly-trained policies and individually-executed actions. In this article, we propose a factorized Tchebycheff value-decomposition optimization (TVDO) method to overcome the trouble of inconsistency. In particular, a nonlinear Tchebycheff aggregation function is formulated to realize the global optimum by tightly constraining the upper bound of individual action-value bias, which is inspired by the Tchebycheff method of multi-objective optimization. We theoretically prove that, under no extra limitations, the factorized value decomposition with Tchebycheff aggregation satisfies the sufficiency and necessity of Individual-Global-Max (IGM), which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
