On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization

Kaixuan Ji; Qiwei Di; Heyang Zhao; Qingyue Zhao; Quanquan Gu

arXiv:2605.02141·cs.LG·May 5, 2026

On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization

Kaixuan Ji, Qiwei Di, Heyang Zhao, Qingyue Zhao, Quanquan Gu

PDF

TL;DR

This paper characterizes the sample complexity of offline multi-armed bandits with KL regularization, providing matching upper and lower bounds across different regularization regimes.

Contribution

It offers a nearly complete analysis of the sample complexity for KL-regularized offline MABs, including sharp bounds and insights into regularization effects.

Findings

01

Achieves a sample complexity of () under large regularization.

02

Achieves a sample complexity of () under small regularization.

03

Provides matching lower bounds over all regularization strengths.

Abstract

Kullback-Leibler (KL) regularization is widely used in offline decision-making and offers several benefits, motivating recent work on the sample complexity of offline learning with respect to KL-regularized performance metrics. Nevertheless, the exact sample complexity of KL-regularized offline learning remains largely from fully characterized. In this paper, we study this question in the setting of multi-armed bandits (MABs). We provide a sharp analysis of KL-PCB (Zhao et al., 2026), showing that it achieves a sample complexity of $\tilde{O} (η S A C^{π^{*}} / ϵ)$ under large regularization $η = \tilde{O} (ϵ^{- 1})$ , and a sample complexity of $\tilde{Ω} (S A C^{π^{*}} / ϵ^{2})$ under small regularization $η = \tilde{Ω} (ϵ^{- 1})$ , where $η$ is the regularization parameter, $S$ is the number of contexts, $A$ is the number of arms, $C^{π^{*}}$ policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.