Offline Two-Player Zero-Sum Markov Games with KL Regularization

Claire Chen; Yuheng Zhang; Xinyu Liu; Zixuan Xie; Shuze Daniel Liu; Nan Jiang

arXiv:2605.13025·cs.LG·May 14, 2026

Offline Two-Player Zero-Sum Markov Games with KL Regularization

Claire Chen, Yuheng Zhang, Xinyu Liu, Zixuan Xie, Shuze Daniel Liu, Nan Jiang

PDF

TL;DR

This paper demonstrates that KL regularization alone can stabilize offline learning of Nash equilibria in two-player zero-sum Markov games, achieving faster convergence rates than previous methods.

Contribution

The paper introduces the ROSE framework and SOS-MD algorithm, showing improved convergence rates for offline two-player zero-sum Markov games using KL regularization.

Findings

01

ROSE achieves $ ilde{O}(1/n)$ convergence under unilateral concentrability.

02

SOS-MD attains the same $ ilde{O}(1/n)$ rate with a vanishing optimization error.

03

KL regularization suffices for stabilization without explicit pessimism.

Abstract

We study the problem of learning Nash equilibria in offline two-player zero-sum Markov games. While existing approaches often rely on explicit pessimism to address distribution shift, we show that KL regularization alone suffices to stabilize learning and guarantee convergence. We first introduce Regularized Offline Sequential Equilibrium (ROSE), a theoretical framework that achieves a fast $O (1/ n)$ convergence rate under \textit{unilateral concentrability}, improving over the standard $O (1/ n)$ rates in unregularized settings. We then propose Sequential Offline Self-play Mirror Descent (SOS-MD), a practical model-free algorithm based on least-squares value estimation and iterative self-play updates. We prove that the last iterate of SOS-MD attains the same $O (1/ n)$ statistical rate up to a vanishing optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.