Non-Asymptotic Analysis of a UCB-based Top Two Algorithm
Marc Jourdan, R\'emy Degenne

TL;DR
This paper provides the first non-asymptotic analysis of a UCB-based Top Two algorithm for best arm identification, offering guarantees for any error level and demonstrating strong empirical performance.
Contribution
It introduces a non-asymptotic upper bound on sample complexity for a Top Two algorithm using UCB, extending theoretical guarantees beyond the asymptotic regime.
Findings
First non-asymptotic bound for Top Two algorithms
UCB-based Top Two achieves competitive empirical results
Guarantees hold for any fixed error level
Abstract
A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and good empirical performance, they have received increased attention in recent years. However, for fixed-confidence best arm identification, theoretical guarantees for Top Two methods have only been obtained in the asymptotic regime, when the error level vanishes. In this paper, we derive the first non-asymptotic upper bound on the expected sample complexity of a Top Two algorithm, which holds for any error level. Our analysis highlights sufficient properties for a regret minimization algorithm to be used as leader. These properties are satisfied by the UCB algorithm, and our proposed UCB-based Top Two algorithm simultaneously enjoys non-asymptotic guarantees and competitive empirical performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
