Tuning Confidence Bound for Stochastic Bandits with Bandit Distance
Xinyu Zhang, Srinjoy Das, Ken Kreutz-Delgado

TL;DR
This paper introduces UCB-DT, a modified UCB algorithm for stochastic bandits that tunes confidence bounds based on bandit distances, leading to improved regret performance and better exploration-exploitation balance.
Contribution
It proposes a novel distance-based confidence tuning method for UCB, along with the concept of Exploration Bargain Point for analyzing exploration-exploitation tradeoffs.
Findings
UCB-DT outperforms existing UCB-based methods in empirical tests.
The bandit distance measure effectively guides exploration and exploitation.
Exploration Bargain Point offers new insights into algorithm performance tradeoffs.
Abstract
We propose a novel modification of the standard upper confidence bound (UCB) method for the stochastic multi-armed bandit (MAB) problem which tunes the confidence bound of a given bandit based on its distance to others. Our UCB distance tuning (UCB-DT) formulation enables improved performance as measured by expected regret by preventing the MAB algorithm from focusing on non-optimal bandits which is a well-known deficiency of standard UCB. "Distance tuning" of the standard UCB is done using a proposed distance measure, which we call bandit distance, that is parameterizable and which therefore can be optimized to control the transition rate from exploration to exploitation based on problem requirements. We empirically demonstrate increased performance of UCB-DT versus many existing state-of-the-art methods which use the UCB formulation for the MAB problem. Our contribution also includes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
