Gamification of Pure Exploration for Linear Bandits
R\'emy Degenne, Pierre M\'enard, Xuedong Shang, Michal Valko

TL;DR
This paper introduces the first asymptotically optimal algorithm for pure exploration in linear bandits, improving the efficiency and robustness of best-arm identification methods.
Contribution
It provides a comprehensive comparison of optimality notions and develops an efficient, asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits.
Findings
The algorithm achieves asymptotic optimality in linear bandit pure exploration.
It bypasses previous computational difficulties in optimal design.
The approach is efficiently implementable in practice.
Abstract
We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental design and asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research
