Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints
Udvas Das, Debabrota Basu

TL;DR
This paper introduces new algorithms for efficiently exploring multi-armed bandits with unknown linear constraints, achieving near-optimal sample complexity while respecting safety and resource limitations.
Contribution
It proposes a Lagrangian relaxation approach and two novel algorithms, LATS and LAGEX, for pure exploration under unknown linear constraints, with theoretical guarantees and practical validation.
Findings
LAGEX achieves asymptotically optimal sample complexity.
LATS is asymptotically optimal up to constraint-dependent constants.
Numerical experiments validate the efficiency of LATS and LAGEX.
Abstract
Pure exploration in bandits formalises multiple real-world problems, such as tuning hyper-parameters or conducting user studies to test a set of items, where different safety, resource, and fairness constraints on the decision space naturally appear. We study these problems as pure exploration in multi-armed bandits with unknown linear constraints, where the aim is to identify an -optimal and feasible policy as fast as possible with a given level of confidence. First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration under constraints. Second, we leverage properties of convex optimisation in the Lagrangian lower bound to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorer, namely LATS and LAGEX. Then, we propose a constraint-adaptive stopping rule, and while tracking the lower bound, use optimistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Smart Grid Energy Management
MethodsSparse Evolutionary Training
