Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration
Hai Zhong, Xun Wang, Zhuoran Li, Longbo Huang

TL;DR
This paper introduces OVMSE, a novel framework for offline-to-online multi-agent reinforcement learning that preserves learned knowledge and enhances exploration, leading to improved efficiency and performance in complex multi-agent environments.
Contribution
The paper proposes Offline Value Function Memory and Sequential Exploration strategies to address challenges in multi-agent offline-to-online RL, enabling smoother transitions and more efficient exploration.
Findings
Outperforms existing methods on SMAC benchmark
Achieves higher sample efficiency and better overall performance
Effectively reduces exploration complexity in multi-agent settings
Abstract
Offline-to-Online Reinforcement Learning has emerged as a powerful paradigm, leveraging offline data for initialization and online fine-tuning to enhance both sample efficiency and performance. However, most existing research has focused on single-agent settings, with limited exploration of the multi-agent extension, i.e., Offline-to-Online Multi-Agent Reinforcement Learning (O2O MARL). In O2O MARL, two critical challenges become more prominent as the number of agents increases: (i) the risk of unlearning pre-trained Q-values due to distributional shifts during the transition from offline-to-online phases, and (ii) the difficulty of efficient exploration in the large joint state-action space. To tackle these challenges, we propose a novel O2O MARL framework called Offline Value Function Memory with Sequential Exploration (OVMSE). First, we introduce the Offline Value Function Memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
