Pessimism-Free Offline Learning in General-Sum Games via KL Regularization
Claire Chen, Yuheng Zhang

TL;DR
This paper introduces KL regularization as a simple yet effective approach for offline multi-agent reinforcement learning in general-sum games, eliminating the need for manual pessimistic penalties.
Contribution
It proposes GANE and GAMD algorithms that recover regularized Nash equilibria and Coarse Correlated Equilibria with accelerated statistical rates, demonstrating KL regularization's standalone efficacy.
Findings
GANE recovers regularized Nash equilibria at rate ~O(1/n)
GAMD converges to Coarse Correlated Equilibrium at rate ~O(1/√n + 1/T)
KL regularization suffices for stable, pessimism-free offline learning in multi-agent settings.
Abstract
Offline multi-agent reinforcement learning in general-sum settings is challenged by the distribution shift between logged datasets and target equilibrium policies. While standard methods rely on manual pessimistic penalties, we demonstrate that KL regularization suffices to stabilize learning and achieve equilibrium recovery. We propose General-sum Anchored Nash Equilibrium (GANE), which recovers regularized Nash equilibria at an accelerated statistical rate of . For computational tractability, we develop General-sum Anchored Mirror Descent (GAMD), an iterative algorithm converging to a Coarse Correlated Equilibrium at the standard rate of . These results establish KL regularization as a standalone mechanism for pessimism-free offline learning that achieves equivalent or accelerated rates in multi-player general-sum games.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
