Taming the Monster Every Context: Complexity Measure and Unified Framework for Offline-Oracle Efficient Contextual Bandits

Hao Qin; Chicheng Zhang

arXiv:2602.09456·cs.LG·February 11, 2026

Taming the Monster Every Context: Complexity Measure and Unified Framework for Offline-Oracle Efficient Contextual Bandits

Hao Qin, Chicheng Zhang

PDF

Open Access

TL;DR

This paper introduces OE2D, a unified framework for offline oracle-efficient contextual bandits that achieves near-optimal regret with logarithmic calls to regression oracles, supported by a new complexity measure called DOEC.

Contribution

The paper develops OE2D, a general algorithmic framework that reduces contextual bandit problems to offline regression, and introduces the DOEC complexity measure to analyze regret.

Findings

01

OE2D achieves near-optimal regret with logarithmic oracle calls.

02

DOEC is bounded in bounded Eluder dimension and smoothed regret settings.

03

A novel relationship between DOEC and DEC bridges offline and online bandit algorithms.

Abstract

We propose an algorithmic framework, Offline Estimation to Decisions (OE2D), that reduces contextual bandit learning with general reward function approximation to offline regression. The framework allows near-optimal regret for contextual bandits with large action spaces with $O (l o g (T))$ calls to an offline regression oracle over $T$ rounds, and makes $O (l o g l o g (T))$ calls when $T$ is known. The design of OE2D algorithm generalizes Falcon~\citep{simchi2022bypassing} and its linear reward version~\citep[][Section 4]{xu2020upper} in that it chooses an action distribution that we term ``exploitative F-design'' that simultaneously guarantees low regret and good coverage that trades off exploration and exploitation. Central to our regret analysis is a new complexity measure, the Decision-Offline Estimation Coefficient (DOEC), which we show is bounded in bounded Eluder dimension per-context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference