Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization
Sanath Kumar Krishnamurthy, Ruohan Zhan, Susan Athey, Emma Brunskill

TL;DR
This paper introduces a new family of efficient contextual bandit algorithms that balance simple and cumulative regret minimization, using conformal arm sets to adapt to various settings and provide strong guarantees.
Contribution
It proposes a novel algorithmic framework with conformal arm sets that achieves near-optimal simple and cumulative regret guarantees, adaptable to any function class and robust to misspecification.
Findings
Algorithms achieve state-of-the-art simple regret guarantees.
Algorithms establish near-minimax guarantees for cumulative regret.
Negative result shows trade-off limits in simultaneous guarantees.
Abstract
In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That is, to minimize simple regret. However, this objective remains understudied. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit setting, where a tuning parameter determines the weight placed on cumulative regret minimization (where we establish near-optimal minimax guarantees) versus simple regret minimization (where we establish state-of-the-art guarantees). Our algorithms work with any function class, are robust to model misspecification, and can be used in continuous arm settings. This flexibility comes from constructing and relying on "conformal arm sets" (CASs). CASs provide a set of arms for every context, encompassing the context-specific optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Healthcare Operations and Scheduling Optimization
