Algorithm Design and Stronger Guarantees for the Improving Multi-Armed Bandits Problem

Avrim Blum; Marten Garicano; Kavya Ravichandran; Dravyansh Sharma

arXiv:2511.10619·cs.LG·May 22, 2026

Algorithm Design and Stronger Guarantees for the Improving Multi-Armed Bandits Problem

Avrim Blum, Marten Garicano, Kavya Ravichandran, Dravyansh Sharma

PDF

TL;DR

This paper introduces new algorithms for the improving multi-armed bandits problem, providing stronger theoretical guarantees and empirical performance, especially under certain reward curve conditions.

Contribution

It proposes two parameterized families of algorithms with improved sample complexity bounds and guarantees, extending prior work with stronger theoretical and empirical results.

Findings

01

Achieved stronger guarantees with optimal dependence on the number of arms.

02

Bounded sample complexity for learning near-optimal algorithms from offline data.

03

Empirical evaluations demonstrated effectiveness on hyperparameter tuning benchmarks.

Abstract

The improving multi-armed bandits problem is a formal model for allocating effort under uncertainty, motivated by scenarios such as investing research effort into new technologies, performing clinical trials, and hyperparameter selection from learning curves. Each pull of an arm provides reward that increases monotonically with diminishing returns. A growing line of work has designed algorithms for improving bandits, albeit with somewhat pessimistic worst-case guarantees. Indeed, strong lower bounds of $Ω (k)$ and $Ω (k)$ multiplicative approximation factors are known for both deterministic and randomized algorithms (respectively) relative to the optimal arm, where $k$ is the number of bandit arms. In this work, we propose two new parameterized families of bandit algorithms and bound the sample complexity of learning the near-optimal algorithm from each family using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms