Regret Analysis of the Anytime Optimally Confident UCB Algorithm

Tor Lattimore

arXiv:1603.08661·cs.LG·May 9, 2016·23 cites

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

Tor Lattimore

PDF

Open Access

TL;DR

This paper introduces an anytime version of the OCUCB algorithm for stochastic bandits, providing the strongest finite-time regret guarantees and a nearly matching lower bound, advancing the theoretical understanding of regret minimization.

Contribution

The paper presents a novel, horizon-free, anytime OCUCB algorithm with improved finite-time regret bounds and a matching lower bound, enhancing theoretical guarantees in bandit problems.

Findings

01

Strong finite-time regret guarantees for the new algorithm

02

Nearly matching finite-time lower bound established

03

Algorithm is simple, intuitive, and horizon-free

Abstract

I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise. The new algorithm is simple, intuitive (in hindsight) and comes with the strongest finite-time regret guarantees for a horizon-free algorithm so far. I also show a finite-time lower bound that nearly matches the upper bound.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms