Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability

Qingyue Zhao; Kaixuan Ji; Heyang Zhao; Quanquan Gu

arXiv:2605.09214·cs.LG·May 12, 2026

Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability

Qingyue Zhao, Kaixuan Ji, Heyang Zhao, Quanquan Gu

PDF

TL;DR

This paper establishes the first $ ilde{O}(rac{1}{ ext{epsilon}})$ sample complexity bounds for offline contextual bandits with forward-KL regularization, improving upon previous $ ilde{O}(rac{1}{ ext{epsilon}^2})$ rates.

Contribution

It introduces a novel convex-analytical approach for analyzing forward-KL regularized offline contextual bandits under single-policy concentrability, achieving tight bounds.

Findings

01

First $ ilde{O}(rac{1}{ ext{epsilon}})$ upper bounds for forward-KL regularized offline CBs.

02

Unified analysis framework that bypasses previous proof routines based on the mean value theorem.

03

Rate-optimal lower bounds demonstrating the tightness of the upper bounds.

Abstract

\emph{Kullback-Leibler} (KL) regularization is ubiquitous in reinforcement learning algorithms in the form of \emph{reverse} or \emph{forward} KL. Recent studies have demonstrated $ϵ^{- 1}$ -type fast rates for decision making under reverse KL regularization, in contrast to the standard $ϵ^{- 2}$ -type sample complexity. However, for forward-KL-regularized objectives, existing statistical analyses are either not applicable or result in $\tilde{O} (ϵ^{- 2})$ slow rates. We take the first step towards addressing this problem via a streamlined analysis of forward-KL-regularized offline CBs. We give the first $\tilde{O} (ϵ^{- 1})$ upper bounds in tabular and general function approximation settings, both under notions of \emph{single-policy concentrability}. In particular, our convex-analytical pipeline unifies these settings by exploiting the pessimism principle in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.