A Closer Look at the Worst-case Behavior of Multi-armed Bandit   Algorithms

Anand Kalvit; Assaf Zeevi

arXiv:2106.02126·cs.LG·October 27, 2021·6 cites

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

Anand Kalvit, Assaf Zeevi

PDF

Open Access 1 Video

TL;DR

This paper analyzes the worst-case behavior of UCB algorithms in multi-armed bandit problems, revealing deterministic sampling rates and providing new asymptotic results and process-level characterizations, contrasting with Thompson Sampling.

Contribution

It offers new insights into UCB's arm-sampling behavior, including asymptotic determinism and a complete process-level characterization, and highlights differences from Thompson Sampling.

Findings

01

UCB arm-sampling rates are asymptotically deterministic.

02

New asymptotics and an alternative proof for UCB's minimax regret.

03

Distinct behaviors between UCB and Thompson Sampling, including incomplete learning in the latter.

Abstract

One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap. The celebrated Upper Confidence Bound (UCB) policy is among the simplest optimism-based MAB algorithms that naturally adapts to this gap: for a horizon of play n, it achieves optimal O(log n) regret in instances with "large" gaps, and a near-optimal O(\sqrt{n log n}) minimax regret when the gap can be arbitrarily "small." This paper provides new results on the arm-sampling behavior of UCB, leading to several important insights. Among these, it is shown that arm-sampling rates under UCB are asymptotically deterministic, regardless of the problem complexity. This discovery facilitates new sharp asymptotics and a novel alternative proof for the O(\sqrt{n log n}) minimax regret of UCB. Furthermore, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms

MethodsDiffusion