Non-Asymptotic Analysis of a UCB-based Top Two Algorithm

Marc Jourdan; R\'emy Degenne

arXiv:2210.05431·stat.ML·November 8, 2023

Non-Asymptotic Analysis of a UCB-based Top Two Algorithm

Marc Jourdan, R\'emy Degenne

PDF

Open Access 1 Video

TL;DR

This paper provides the first non-asymptotic analysis of a UCB-based Top Two algorithm for best arm identification, offering guarantees for any error level and demonstrating strong empirical performance.

Contribution

It introduces a non-asymptotic upper bound on sample complexity for a Top Two algorithm using UCB, extending theoretical guarantees beyond the asymptotic regime.

Findings

01

First non-asymptotic bound for Top Two algorithms

02

UCB-based Top Two achieves competitive empirical results

03

Guarantees hold for any fixed error level

Abstract

A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and good empirical performance, they have received increased attention in recent years. However, for fixed-confidence best arm identification, theoretical guarantees for Top Two methods have only been obtained in the asymptotic regime, when the error level vanishes. In this paper, we derive the first non-asymptotic upper bound on the expected sample complexity of a Top Two algorithm, which holds for any error level. Our analysis highlights sufficient properties for a regret minimization algorithm to be used as leader. These properties are satisfied by the UCB algorithm, and our proposed UCB-based Top Two algorithm simultaneously enjoys non-asymptotic guarantees and competitive empirical performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Non-Asymptotic Analysis of a UCB-based Top Two Algorithm· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems