Safe Linear Stochastic Bandits
Kia Khezeli, Eilyan Bitar

TL;DR
This paper proposes a safe linear stochastic bandit framework where the learner must choose arms with guaranteed minimum expected rewards, introducing a new algorithm that safely expands the set of safe arms while maintaining low regret.
Contribution
It introduces a novel safe linear stochastic bandit model and an algorithm that ensures safety constraints while achieving near-optimal regret bounds.
Findings
The algorithm guarantees safety at every stage.
Expected regret is bounded by O(√T log T).
The method effectively expands the safe set of arms over time.
Abstract
We introduce the safe linear stochastic bandit framework---a generalization of linear stochastic bandits---where, in each stage, the learner is required to select an arm with an expected reward that is no less than a predetermined (safe) threshold with high probability. We assume that the learner initially has knowledge of an arm that is known to be safe, but not necessarily optimal. Leveraging on this assumption, we introduce a learning algorithm that systematically combines known safe arms with exploratory arms to safely expand the set of safe arms over time, while facilitating safe greedy exploitation in subsequent stages. In addition to ensuring the satisfaction of the safety constraint at every stage of play, the proposed algorithm is shown to exhibit an expected regret that is no more than after stages of play.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
