Learning from Good Trajectories in Offline Multi-Agent Reinforcement   Learning

Qi Tian; Kun Kuang; Furui Liu; Baoxiang Wang

arXiv:2211.15612·cs.LG·March 2, 2023

Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

Qi Tian, Kun Kuang, Furui Liu, Baoxiang Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel offline multi-agent reinforcement learning framework that leverages shared good trajectories and attention mechanisms to improve policy learning in diverse data quality scenarios.

Contribution

The paper proposes the Shared Individual Trajectories (SIT) framework, utilizing attention-based reward decomposition and graph attention networks to enhance offline MARL performance.

Findings

01

Significantly improved results in complex offline multi-agent datasets.

02

Effective handling of data quality disparities among individual trajectories.

03

Demonstrated success in both discrete and continuous control environments.

Abstract

Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning· underline

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsExperience Replay · Prioritized Experience Replay