Choosing Answers in $\varepsilon$-Best-Answer Identification for Linear Bandits
Marc Jourdan, R\'emy Degenne

TL;DR
This paper introduces a new approach for identifying an answer within an epsilon margin of the best in linear bandits, emphasizing the importance of selecting the furthest answer to optimize sample complexity.
Contribution
It develops a novel method for epsilon-best-answer identification in linear bandits, highlighting the need to choose the furthest answer rather than the highest mean for asymptotic optimality.
Findings
The proposed algorithm is asymptotically optimal.
It outperforms existing modified best-arm identification algorithms.
The method is empirically competitive.
Abstract
In pure-exploration problems, information is gathered sequentially to answer a question on the stochastic environment. While best-arm identification for linear bandits has been extensively studied in recent years, few works have been dedicated to identifying one arm that is -close to the best one (and not exactly the best one). In this problem with several correct answers, an identification algorithm should focus on one candidate among those answers and verify that it is correct. We demonstrate that picking the answer with highest mean does not allow an algorithm to reach asymptotic optimality in terms of expected sample complexity. Instead, a \textit{furthest answer} should be identified. Using that insight to choose the candidate answer carefully, we develop a simple procedure to adapt best-arm identification algorithms to tackle -best-answer identification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
