A Simple Reduction Scheme for Constrained Contextual Bandits with Adversarial Contexts via Regression
Dhruv Sarkar, Abhishek Sinha

TL;DR
This paper introduces a simple, modular reduction scheme for constrained contextual bandits with adversarial contexts, leveraging regression oracles to improve guarantees and analysis in a more general setting.
Contribution
It presents a novel reduction approach that simplifies constrained bandit problems with adversarial contexts using online regression, extending prior stochastic-focused methods.
Findings
Improved regret and constraint violation guarantees in adversarial context settings.
A modular algorithmic framework based on online regression oracles.
Transparent analysis demonstrating advantages over previous stochastic-focused methods.
Abstract
We study constrained contextual bandits (CCB) with adversarially chosen contexts, where each action yields a random reward and incurs a random cost. We adopt the standard realizability assumption: conditioned on the observed context, rewards and costs are drawn independently from fixed distributions whose expectations belong to known function classes. We consider the continuing setting, in which the algorithm operates over the entire horizon even after the budget is exhausted. In this setting, the objective is to simultaneously control regret and cumulative constraint violation. Building on the seminal SquareCB framework of Foster et al. (2018), we propose a simple and modular algorithmic scheme that leverages online regression oracles to reduce the constrained problem to a standard unconstrained contextual bandit problem with adaptively defined surrogate reward functions. In contrast…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques
