Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning
Qi Tian, Kun Kuang, Furui Liu, Baoxiang Wang

TL;DR
This paper introduces a novel offline multi-agent reinforcement learning framework that leverages shared good trajectories and attention mechanisms to improve policy learning in diverse data quality scenarios.
Contribution
The paper proposes the Shared Individual Trajectories (SIT) framework, utilizing attention-based reward decomposition and graph attention networks to enhance offline MARL performance.
Findings
Significantly improved results in complex offline multi-agent datasets.
Effective handling of data quality disparities among individual trajectories.
Demonstrated success in both discrete and continuous control environments.
Abstract
Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsExperience Replay · Prioritized Experience Replay
