Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints

Udvas Das; Debabrota Basu

arXiv:2410.18844·cs.LG·February 5, 2026

Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints

Udvas Das, Debabrota Basu

PDF

Open Access

TL;DR

This paper introduces new algorithms for efficiently exploring multi-armed bandits with unknown linear constraints, achieving near-optimal sample complexity while respecting safety and resource limitations.

Contribution

It proposes a Lagrangian relaxation approach and two novel algorithms, LATS and LAGEX, for pure exploration under unknown linear constraints, with theoretical guarantees and practical validation.

Findings

01

LAGEX achieves asymptotically optimal sample complexity.

02

LATS is asymptotically optimal up to constraint-dependent constants.

03

Numerical experiments validate the efficiency of LATS and LAGEX.

Abstract

Pure exploration in bandits formalises multiple real-world problems, such as tuning hyper-parameters or conducting user studies to test a set of items, where different safety, resource, and fairness constraints on the decision space naturally appear. We study these problems as pure exploration in multi-armed bandits with unknown linear constraints, where the aim is to identify an $r$ -optimal and feasible policy as fast as possible with a given level of confidence. First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration under constraints. Second, we leverage properties of convex optimisation in the Lagrangian lower bound to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorer, namely LATS and LAGEX. Then, we propose a constraint-adaptive stopping rule, and while tracking the lower bound, use optimistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Smart Grid Energy Management

MethodsSparse Evolutionary Training