Regret Analysis of the Anytime Optimally Confident UCB Algorithm
Tor Lattimore

TL;DR
This paper introduces an anytime version of the OCUCB algorithm for stochastic bandits, providing the strongest finite-time regret guarantees and a nearly matching lower bound, advancing the theoretical understanding of regret minimization.
Contribution
The paper presents a novel, horizon-free, anytime OCUCB algorithm with improved finite-time regret bounds and a matching lower bound, enhancing theoretical guarantees in bandit problems.
Findings
Strong finite-time regret guarantees for the new algorithm
Nearly matching finite-time lower bound established
Algorithm is simple, intuitive, and horizon-free
Abstract
I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise. The new algorithm is simple, intuitive (in hindsight) and comes with the strongest finite-time regret guarantees for a horizon-free algorithm so far. I also show a finite-time lower bound that nearly matches the upper bound.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
