Conservative Contextual Linear Bandits
Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi-Yadkori and, Benjamin Van Roy

TL;DR
This paper introduces CLUCB, a safe and conservative algorithm for contextual linear bandits that guarantees performance above a baseline while minimizing regret, with proven bounds and empirical validation.
Contribution
The paper proposes a novel safe linear bandit algorithm, CLUCB, with theoretical regret bounds and empirical evidence demonstrating safety and effectiveness.
Findings
CLUCB maintains performance above the safety threshold over time.
Regret of CLUCB decomposes into standard UCB regret plus a constant safety loss.
Empirical results confirm the safety and theoretical analysis of CLUCB.
Abstract
Safety is a desirable property that can immensely increase the applicability of learning algorithms in real-world decision-making problems. It is much easier for a company to deploy an algorithm that is safe, i.e., guaranteed to perform at least as well as a baseline. In this paper, we study the issue of safety in contextual linear bandits that have application in many different fields including personalized ad recommendation in online marketing. We formulate a notion of safety for this class of algorithms. We develop a safe contextual linear bandit algorithm, called conservative linear UCB (CLUCB), that simultaneously minimizes its regret and satisfies the safety constraint, i.e., maintains its performance above a fixed percentage of the performance of a baseline strategy, uniformly over time. We prove an upper-bound on the regret of CLUCB and show that it can be decomposed into two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Machine Learning and Algorithms
