On the Power of Adaptivity for $\varepsilon$-Best Arm Identification in Linear Bandits

Arnab Maiti; Yunbei Xu; Kevin Jamieson

arXiv:2605.15663·cs.LG·May 18, 2026

On the Power of Adaptivity for $\varepsilon$-Best Arm Identification in Linear Bandits

Arnab Maiti, Yunbei Xu, Kevin Jamieson

PDF

TL;DR

This paper investigates the sample complexity of $ ext{epsilon}$-best arm identification in linear bandits, highlighting the advantages of adaptive sampling over non-adaptive methods for certain structured action sets.

Contribution

It provides matching bounds for non-adaptive methods, explores when adaptivity offers significant improvements, and constructs an action set where adaptivity yields polynomial gains.

Findings

01

Non-adaptive fixed-design method with optimal sample complexity derived.

02

Adaptive sampling can significantly outperform non-adaptive methods for specific action sets.

03

An adaptive algorithm estimates the $ ext{l}_2$-norm of the reward vector with near-optimal samples.

Abstract

We study the minimax sample complexity of $ε$ -best arm identification in linear bandits. Given a compact action set $X$ that spans $R^{d}$ and an unknown reward vector $θ \in R^{d}$ , the goal is to output an arm $x \in X$ such that $⟨ x, θ ⟩ \geq max_{x \in X} ⟨ x, θ ⟩ - ε$ with probability at least $1 - δ$ , using as few samples as possible. First, we present a non-adaptive fixed-design method with sample complexity $O (\frac{d l o g ( 1/ δ )}{ε ^{2}} + \frac{w ( X ) ^{2}}{ε ^{2}})$ , where $w (X)$ is a Gaussian width term dependent on $X$ , and we prove a matching lower bound $Ω (\frac{d l o g ( 1/ δ )}{ε ^{2}} + \frac{w ( X ) ^{2}}{ε ^{2}})$ for all non-adaptive fixed-design methods.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.