Optimally Confident UCB: Improved Regret for Finite-Armed Bandits

Tor Lattimore

arXiv:1507.07880·cs.LG·February 25, 2016·28 cites

Optimally Confident UCB: Improved Regret for Finite-Armed Bandits

Tor Lattimore

PDF

Open Access

TL;DR

This paper introduces a new UCB-based algorithm for finite-armed bandits that achieves optimal regret bounds both on a problem-dependent and worst-case basis, combining theoretical rigor with practical efficiency.

Contribution

The paper proposes the first algorithm that simultaneously attains order-optimal problem-dependent and worst-case regret for stochastic finite-armed bandits.

Findings

01

The algorithm is simple and efficient.

02

It empirically outperforms existing methods.

03

Theoretical analysis confirms optimal regret bounds.

Abstract

I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret. Besides the theoretical results, the new algorithm is simple, efficient and empirically superb. The approach is based on UCB, but with a carefully chosen confidence parameter that optimally balances the risk of failing confidence intervals against the cost of excessive optimism.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics