TVDO: Tchebycheff Value-Decomposition Optimization for Multi-Agent Reinforcement Learning

Xiaoliang Hu; Pengcheng Guo; Yadong Li; Guanyu Li; Zhen Cui; Jian Yang

arXiv:2306.13979·cs.MA·August 6, 2025

TVDO: Tchebycheff Value-Decomposition Optimization for Multi-Agent Reinforcement Learning

Xiaoliang Hu, Pengcheng Guo, Yadong Li, Guanyu Li, Zhen Cui, Jian Yang

PDF

Open Access

TL;DR

This paper introduces TVDO, a novel value-decomposition method for multi-agent reinforcement learning that ensures policy consistency and outperforms existing methods on benchmark tasks.

Contribution

The paper proposes a Tchebycheff-based value decomposition approach that guarantees the sufficiency and necessity of IGM, improving policy consistency in cooperative MARL.

Findings

01

TVDO guarantees policy consistency in cooperative MARL.

02

TVDO outperforms state-of-the-art baselines on SMAC benchmark.

03

Theoretical proof of IGM satisfaction with Tchebycheff aggregation.

Abstract

In cooperative multiagent reinforcement learning (MARL), centralized training with decentralized execution (CTDE) has recently attracted more attention due to the physical demand. However, the most dilemma therein is the inconsistency between jointly-trained policies and individually-executed actions. In this article, we propose a factorized Tchebycheff value-decomposition optimization (TVDO) method to overcome the trouble of inconsistency. In particular, a nonlinear Tchebycheff aggregation function is formulated to realize the global optimum by tightly constraining the upper bound of individual action-value bias, which is inspired by the Tchebycheff method of multi-objective optimization. We theoretically prove that, under no extra limitations, the factorized value decomposition with Tchebycheff aggregation satisfies the sufficiency and necessity of Individual-Global-Max (IGM), which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics