On Lai's Upper Confidence Bound in Multi-Armed Bandits

Huachen Ren; Cun-Hui Zhang

arXiv:2410.02279·stat.ML·October 7, 2024

On Lai's Upper Confidence Bound in Multi-Armed Bandits

Huachen Ren, Cun-Hui Zhang

PDF

Open Access

TL;DR

This paper provides sharp non-asymptotic regret bounds for Lai's upper confidence bound algorithms in multi-armed bandits, matching Lai-Robbins lower bounds and emphasizing their significance in machine learning.

Contribution

It establishes new non-asymptotic regret bounds for Lai's UCB algorithms with Gaussian rewards, highlighting their optimality and importance.

Findings

01

Regret bounds match Lai-Robbins lower bound constants.

02

Non-asymptotic bounds for UCB with decreasing exploration functions.

03

Highlights importance of Lai's UCB in machine learning literature.

Abstract

In this memorial paper, we honor Tze Leung Lai's seminal contributions to the topic of multi-armed bandits, with a specific focus on his pioneering work on the upper confidence bound. We establish sharp non-asymptotic regret bounds for an upper confidence bound index with a constant level of exploration for Gaussian rewards. Furthermore, we establish a non-asymptotic regret bound for the upper confidence bound index of Lai (1987) which employs an exploration function that decreases with the sample size of the corresponding arm. The regret bounds have leading constants that match the Lai-Robbins lower bound. Our results highlight an aspect of Lai's seminal works that deserves more attention in the machine learning literature.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications

MethodsSoftmax · Attention Is All You Need · Focus