Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
Jianzhun Shao, Yun Qu, Chen Chen, Hongchang Zhang, Xiangyang Ji

TL;DR
This paper introduces CFCQL, a novel offline multi-agent reinforcement learning algorithm that applies conservative value estimation per agent in a counterfactual manner, improving performance and theoretical guarantees in high-dimensional multi-agent settings.
Contribution
CFCQL is the first method to perform agent-wise conservative regularization in a counterfactual way, with performance guarantees independent of the number of agents.
Findings
CFCQL outperforms existing methods on most datasets.
It maintains theoretical properties similar to single-agent conservative methods.
Demonstrates effectiveness in both discrete and continuous action environments.
Abstract
Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action out-of-distribution (OOD) and value overestimation phenomenon excessively severe. Tomitigate this problem, we propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct conservative value estimation. Rather than regarding all the agents as a high dimensional single one and directly applying single agent methods to it, CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation. We prove that it still enjoys the underestimation property and the performance guarantee as those single agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
MethodsQ-Learning
