Beyond Pessimism: Offline Learning in KL-regularized Games

Yuheng Zhang; Claire Chen; Nan Jiang

arXiv:2604.06738·cs.GT·May 11, 2026

Beyond Pessimism: Offline Learning in KL-regularized Games

Yuheng Zhang, Claire Chen, Nan Jiang

PDF

TL;DR

This paper introduces a novel offline learning algorithm for KL-regularized two-player zero-sum games that avoids pessimism, achieving faster statistical rates and providing practical policy optimization methods.

Contribution

It develops the first pessimism-free offline learning guarantee for KL-regularized games with a near-optimal sample complexity of 1/n.

Findings

01

Achieves a (1/n) sample complexity bound.

02

Introduces a self-play policy optimization algorithm with theoretical guarantees.

03

Provides the first pessimism-free guarantee for KL-regularized game learning.

Abstract

We study offline learning in KL-regularized two-player zero-sum games, where policies are optimized with respect to a fixed reference policy through KL regularization. Prior work relies on pessimistic value estimation to handle distribution shift, yielding only $O (1/ n)$ statistical rates. We develop a new pessimism-free algorithm and analytical framework for KL-regularized games, built on the smoothness of KL-regularized best responses and a stability property of the Nash equilibrium induced by skew symmetry. This yields, to our knowledge, the first pessimism-free offline learning guarantee for KL-regularized games, with a fast $O (1/ n)$ sample complexity bound. We further propose an efficient self-play policy optimization algorithm that replaces exact equilibrium computation with iterative KL-regularized policy updates, and prove that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.