On the Power of Adaptivity for $\varepsilon$-Best Arm Identification in Linear Bandits
Arnab Maiti, Yunbei Xu, Kevin Jamieson

TL;DR
This paper investigates the sample complexity of $ ext{epsilon}$-best arm identification in linear bandits, highlighting the advantages of adaptive sampling over non-adaptive methods for certain structured action sets.
Contribution
It provides matching bounds for non-adaptive methods, explores when adaptivity offers significant improvements, and constructs an action set where adaptivity yields polynomial gains.
Findings
Non-adaptive fixed-design method with optimal sample complexity derived.
Adaptive sampling can significantly outperform non-adaptive methods for specific action sets.
An adaptive algorithm estimates the $ ext{l}_2$-norm of the reward vector with near-optimal samples.
Abstract
We study the minimax sample complexity of -best arm identification in linear bandits. Given a compact action set that spans and an unknown reward vector , the goal is to output an arm such that with probability at least , using as few samples as possible. First, we present a non-adaptive fixed-design method with sample complexity , where is a Gaussian width term dependent on , and we prove a matching lower bound for all non-adaptive fixed-design methods.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
