Taming the Monster Every Context: Complexity Measure and Unified Framework for Offline-Oracle Efficient Contextual Bandits
Hao Qin, Chicheng Zhang

TL;DR
This paper introduces OE2D, a unified framework for offline oracle-efficient contextual bandits that achieves near-optimal regret with logarithmic calls to regression oracles, supported by a new complexity measure called DOEC.
Contribution
The paper develops OE2D, a general algorithmic framework that reduces contextual bandit problems to offline regression, and introduces the DOEC complexity measure to analyze regret.
Findings
OE2D achieves near-optimal regret with logarithmic oracle calls.
DOEC is bounded in bounded Eluder dimension and smoothed regret settings.
A novel relationship between DOEC and DEC bridges offline and online bandit algorithms.
Abstract
We propose an algorithmic framework, Offline Estimation to Decisions (OE2D), that reduces contextual bandit learning with general reward function approximation to offline regression. The framework allows near-optimal regret for contextual bandits with large action spaces with calls to an offline regression oracle over rounds, and makes calls when is known. The design of OE2D algorithm generalizes Falcon~\citep{simchi2022bypassing} and its linear reward version~\citep[][Section 4]{xu2020upper} in that it chooses an action distribution that we term ``exploitative F-design'' that simultaneously guarantees low regret and good coverage that trades off exploration and exploitation. Central to our regret analysis is a new complexity measure, the Decision-Offline Estimation Coefficient (DOEC), which we show is bounded in bounded Eluder dimension per-context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference
