Directional Optimism for Safe Linear Bandits
Spencer Hutchinson, Berkay Turan, Mahnoosh Alizadeh

TL;DR
This paper introduces a new directional optimism approach for safe linear bandits, leading to improved regret guarantees and empirical performance, and extends the setting to convex constraints with a novel analysis method.
Contribution
It proposes a novel directional optimism technique, an improved algorithm with better empirical results, and extends the framework to convex constraints using convex analysis.
Findings
Improved regret guarantees for safe linear bandits.
Enhanced empirical performance over existing algorithms.
Extension to convex constraints with a new analytical approach.
Abstract
The safe linear bandit problem is a version of the classical stochastic linear bandit problem where the learner's actions must satisfy an uncertain constraint at all rounds. Due its applicability to many real-world settings, this problem has received considerable attention in recent years. By leveraging a novel approach that we call directional optimism, we find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets. Furthermore, we propose a novel algorithm for this setting that improves on existing algorithms in terms of empirical performance, while enjoying matching regret guarantees. Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
