lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Kevin Jamieson; Matthew Malloy; Robert Nowak; S\'ebastien Bubeck

arXiv:1312.7308·stat.ML·December 30, 2013·59 cites

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Kevin Jamieson, Matthew Malloy, Robert Nowak, S\'ebastien Bubeck

PDF

Open Access

TL;DR

This paper introduces 'lil' UCB', an optimal exploration algorithm for multi-armed bandits that minimizes sample complexity in identifying the best arm, leveraging law of the iterated logarithm-based confidence bounds.

Contribution

The paper presents a new UCB algorithm that is theoretically optimal and practically superior, explicitly accounting for infinite horizon and avoiding union bounds over arms.

Findings

01

Achieves near-optimal sample complexity for best arm identification.

02

Outperforms existing UCB algorithms in simulations.

03

Provides a theoretically sound approach based on LIL.

Abstract

The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCB-type algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Reinforcement Learning in Robotics