Safe Linear Bandits over Unknown Polytopes
Aditya Gangrade, Tianrui Chen, Venkatesh Saligrama

TL;DR
This paper introduces a new optimistic strategy for safe linear bandits over unknown polytopes, achieving near-optimal regret and safety violation bounds despite the challenge of unknown constraints.
Contribution
It proposes DOSS, a doubly-optimistic algorithm, with tight regret and safety bounds, and develops a novel dual analysis framework for the problem.
Findings
DOSS achieves $O( ext{log}^2 T)$ regret bounds.
DOSS maintains $ ilde O( ext{sqrt}(T))$ safety violations.
The analysis introduces dual notions of gaps based on sensitivity analysis.
Abstract
The safe linear bandit problem (SLB) is an online approach to linear programming with unknown objective and unknown roundwise constraints, under stochastic bandit feedback of rewards and safety risks of actions. We study the tradeoffs between efficacy and smooth safety costs of SLBs over polytopes, and the role of aggressive doubly-optimistic play in avoiding the strong assumptions made by extant pessimistic-optimistic approaches. We first elucidate an inherent hardness in SLBs due the lack of knowledge of constraints: there exist `easy' instances, for which suboptimal extreme points have large `gaps', but on which SLB methods must still incur regret or safety violations, due to an inability to resolve unknown optima to arbitrary precision. We then analyse a natural doubly-optimistic strategy for the safe linear bandit problem, DOSS, which uses optimistic estimates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications
