A Recipe for Stable Offline Multi-agent Reinforcement Learning
Dongsu Lee, Daehee Lee, Amy Zhang

TL;DR
This paper identifies instability issues in offline multi-agent reinforcement learning due to non-linear value decomposition and proposes a normalization technique, SVN, to stabilize training and improve performance.
Contribution
The paper analyzes the causes of instability in offline MARL and introduces scale-invariant value normalization (SVN) to stabilize non-linear value decomposition.
Findings
SVN stabilizes actor-critic training in offline MARL.
Non-linear value decomposition causes value-scale amplification and instability.
The proposed recipe enhances offline MARL performance.
Abstract
Despite remarkable achievements in single-agent offline reinforcement learning (RL), multi-agent RL (MARL) has struggled to adopt this paradigm, largely persisting with on-policy training and self-play from scratch. One reason for this gap comes from the instability of non-linear value decomposition, leading prior works to avoid complex mixing networks in favor of linear value decomposition (e.g., VDN) with value regularization used in single-agent setups. In this work, we analyze the source of instability in non-linear value decomposition within the offline MARL setting. Our observations confirm that they induce value-scale amplification and unstable optimization. To alleviate this, we propose a simple technique, scale-invariant value normalization (SVN), that stabilizes actor-critic training without altering the Bellman fixed point. Empirically, we examine the interaction among key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Neural Networks and Reservoir Computing
