Counterfactual Conservative Q Learning for Offline Multi-agent   Reinforcement Learning

Jianzhun Shao; Yun Qu; Chen Chen; Hongchang Zhang; Xiangyang Ji

arXiv:2309.12696·cs.AI·September 25, 2023·2 cites

Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning

Jianzhun Shao, Yun Qu, Chen Chen, Hongchang Zhang, Xiangyang Ji

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CFCQL, a novel offline multi-agent reinforcement learning algorithm that applies conservative value estimation per agent in a counterfactual manner, improving performance and theoretical guarantees in high-dimensional multi-agent settings.

Contribution

CFCQL is the first method to perform agent-wise conservative regularization in a counterfactual way, with performance guarantees independent of the number of agents.

Findings

01

CFCQL outperforms existing methods on most datasets.

02

It maintains theoretical properties similar to single-agent conservative methods.

03

Demonstrates effectiveness in both discrete and continuous action environments.

Abstract

Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action out-of-distribution (OOD) and value overestimation phenomenon excessively severe. Tomitigate this problem, we propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct conservative value estimation. Rather than regarding all the agents as a high dimensional single one and directly applying single agent methods to it, CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation. We prove that it still enjoys the underestimation property and the performance guarantee as those single agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-rllab/CFCQL
pytorchOfficial

Videos

Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification

MethodsQ-Learning