Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Hai Zhong; Xun Wang; Zhuoran Li; Longbo Huang

arXiv:2410.19450·cs.AI·March 5, 2026

Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Hai Zhong, Xun Wang, Zhuoran Li, Longbo Huang

PDF

Open Access

TL;DR

This paper introduces OVMSE, a novel framework for offline-to-online multi-agent reinforcement learning that preserves learned knowledge and enhances exploration, leading to improved efficiency and performance in complex multi-agent environments.

Contribution

The paper proposes Offline Value Function Memory and Sequential Exploration strategies to address challenges in multi-agent offline-to-online RL, enabling smoother transitions and more efficient exploration.

Findings

01

Outperforms existing methods on SMAC benchmark

02

Achieves higher sample efficiency and better overall performance

03

Effectively reduces exploration complexity in multi-agent settings

Abstract

Offline-to-Online Reinforcement Learning has emerged as a powerful paradigm, leveraging offline data for initialization and online fine-tuning to enhance both sample efficiency and performance. However, most existing research has focused on single-agent settings, with limited exploration of the multi-agent extension, i.e., Offline-to-Online Multi-Agent Reinforcement Learning (O2O MARL). In O2O MARL, two critical challenges become more prominent as the number of agents increases: (i) the risk of unlearning pre-trained Q-values due to distributional shifts during the transition from offline-to-online phases, and (ii) the difficulty of efficient exploration in the large joint state-action space. To tackle these challenges, we propose a novel O2O MARL framework called Offline Value Function Memory with Sequential Exploration (OVMSE). First, we introduce the Offline Value Function Memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics