Linear Stochastic Bandits Under Safety Constraints
Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

TL;DR
This paper introduces Safe-LUCB, a new algorithm for linear stochastic bandits with safety constraints, ensuring safety at all times while balancing exploration and exploitation, with proven regret bounds and practical improvements.
Contribution
The paper develops Safe-LUCB, a UCB-based algorithm that handles safety constraints in linear bandits, including a two-phase approach and regret analysis, advancing safe exploration methods.
Findings
Safe-LUCB guarantees safety with high probability.
The algorithm achieves regret bounds linked to the optimal safe action.
A heuristic improves regret based on problem-specific analysis.
Abstract
Bandit algorithms have various application in safety-critical systems, where it is important to respect the system constraints that rely on the bandit's unknown parameters at every round. In this paper, we formulate a linear stochastic multi-armed bandit problem with safety constraints that depend (linearly) on an unknown parameter vector. As such, the learner is unable to identify all safe actions and must act conservatively in ensuring that her actions satisfy the safety constraint at all rounds (at least with high probability). For these bandits, we propose a new UCB-based algorithm called Safe-LUCB, which includes necessary modifications to respect safety constraints. The algorithm has two phases. During the pure exploration phase the learner chooses her actions at random from a restricted set of safe actions with the goal of learning a good approximation of the entire unknown safe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
