Tuning Confidence Bound for Stochastic Bandits with Bandit Distance

Xinyu Zhang; Srinjoy Das; Ken Kreutz-Delgado

arXiv:2110.02690·stat.ML·October 7, 2021·1 cites

Tuning Confidence Bound for Stochastic Bandits with Bandit Distance

Xinyu Zhang, Srinjoy Das, Ken Kreutz-Delgado

PDF

Open Access

TL;DR

This paper introduces UCB-DT, a modified UCB algorithm for stochastic bandits that tunes confidence bounds based on bandit distances, leading to improved regret performance and better exploration-exploitation balance.

Contribution

It proposes a novel distance-based confidence tuning method for UCB, along with the concept of Exploration Bargain Point for analyzing exploration-exploitation tradeoffs.

Findings

01

UCB-DT outperforms existing UCB-based methods in empirical tests.

02

The bandit distance measure effectively guides exploration and exploitation.

03

Exploration Bargain Point offers new insights into algorithm performance tradeoffs.

Abstract

We propose a novel modification of the standard upper confidence bound (UCB) method for the stochastic multi-armed bandit (MAB) problem which tunes the confidence bound of a given bandit based on its distance to others. Our UCB distance tuning (UCB-DT) formulation enables improved performance as measured by expected regret by preventing the MAB algorithm from focusing on non-optimal bandits which is a well-known deficiency of standard UCB. "Distance tuning" of the standard UCB is done using a proposed distance measure, which we call bandit distance, that is parameterizable and which therefore can be optimized to control the transition rate from exploration to exploitation based on problem requirements. We empirically demonstrate increased performance of UCB-DT versus many existing state-of-the-art methods which use the UCB formulation for the MAB problem. Our contribution also includes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems