Price of Safety in Linear Best Arm Identification
Xuedong Shang, Igor Colin, Merwan Barlier, Hamza Cherkaoui

TL;DR
This paper introduces a safe linear best-arm identification framework that ensures safety constraints are met during exploration, proposing a gap-based algorithm with theoretical guarantees and experimental validation.
Contribution
It presents the first safe best-arm identification algorithm with linear feedback, incorporating safety constraints into the exploration process.
Findings
The algorithm achieves meaningful sample complexity.
Safety constraints induce an extra exploration phase.
Experimental results validate the approach.
Abstract
We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector. The agent must take actions in a conservative way so as to ensure that the safety constraint is not violated with high probability at each round. Ways of leveraging the linear structure for ensuring safety has been studied for regret minimization, but not for best-arm identification to the best our knowledge. We propose a gap-based algorithm that achieves meaningful sample complexity while ensuring the stage-wise safety. We show that we pay an extra term in the sample complexity due to the forced exploration phase incurred by the additional safety constraint. Experimental illustrations are provided to justify the design of our algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Formal Methods in Verification
