A Recipe for Stable Offline Multi-agent Reinforcement Learning

Dongsu Lee; Daehee Lee; Amy Zhang

arXiv:2603.08399·cs.LG·March 10, 2026

A Recipe for Stable Offline Multi-agent Reinforcement Learning

Dongsu Lee, Daehee Lee, Amy Zhang

PDF

Open Access

TL;DR

This paper identifies instability issues in offline multi-agent reinforcement learning due to non-linear value decomposition and proposes a normalization technique, SVN, to stabilize training and improve performance.

Contribution

The paper analyzes the causes of instability in offline MARL and introduces scale-invariant value normalization (SVN) to stabilize non-linear value decomposition.

Findings

01

SVN stabilizes actor-critic training in offline MARL.

02

Non-linear value decomposition causes value-scale amplification and instability.

03

The proposed recipe enhances offline MARL performance.

Abstract

Despite remarkable achievements in single-agent offline reinforcement learning (RL), multi-agent RL (MARL) has struggled to adopt this paradigm, largely persisting with on-policy training and self-play from scratch. One reason for this gap comes from the instability of non-linear value decomposition, leading prior works to avoid complex mixing networks in favor of linear value decomposition (e.g., VDN) with value regularization used in single-agent setups. In this work, we analyze the source of instability in non-linear value decomposition within the offline MARL setting. Our observations confirm that they induce value-scale amplification and unstable optimization. To alleviate this, we propose a simple technique, scale-invariant value normalization (SVN), that stabilizes actor-critic training without altering the Bellman fixed point. Empirically, we examine the interaction among key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Neural Networks and Reservoir Computing