On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits
Leo Maynard-Zhang, Zhihan Xiong, Kevin Jamieson, Maryam Fazel

TL;DR
This paper investigates the difficulty of identifying the best arm in non-stationary linear bandits within a fixed budget, establishing arm-set-dependent lower bounds and proposing an algorithm that matches these bounds.
Contribution
It introduces arm-set-dependent lower bounds for non-stationary linear bandits and proposes the Adjacent-BAI algorithm that is proven to be minimax-optimal in this setting.
Findings
Lower bounds depend on the arm set structure.
The Adjacent-BAI algorithm matches the lower bounds.
Arm-set-dependent complexity is characterized.
Abstract
We study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits. Concretely, given a fixed time budget , finite arm set , and a potentially adversarial sequence of unknown parameters (hence non-stationary), a learner aims to identify the arm with the largest cumulative reward with high probability. In this setting, it is well-known that uniformly sampling arms from the G-optimal design yields a minimax-optimal error probability of , where scales proportionally with the dimension . However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Auction Theory and Applications
