CORe: Capitalizing On Rewards in Bandit Exploration

Nan Wang; Branislav Kveton; Maryam Karimzadehgan

arXiv:2103.04387·cs.LG·March 9, 2021·1 cites

CORe: Capitalizing On Rewards in Bandit Exploration

Nan Wang, Branislav Kveton, Maryam Karimzadehgan

PDF

Open Access

TL;DR

CORe is a data-dependent bandit algorithm that explores by randomizing past observations, achieving adaptive exploration without external noise, and demonstrating strong theoretical and empirical performance.

Contribution

The paper introduces CORe, a novel bandit algorithm that exploits reward variance for exploration, providing a general, parameter-free, and adaptive approach with theoretical guarantees.

Findings

01

Achieves $ ilde O(d ext{sqrt}(n ext{log} K))$ regret bound in linear bandits

02

Demonstrates superior empirical performance on synthetic and real-world data

03

Explores purely through past reward variance without external noise

Abstract

We propose a bandit algorithm that explores purely by randomizing its past observations. In particular, the sufficient optimism in the mean reward estimates is achieved by exploiting the variance in the past observed rewards. We name the algorithm Capitalizing On Rewards (CORe). The algorithm is general and can be easily applied to different bandit settings. The main benefit of CORe is that its exploration is fully data-dependent. It does not rely on any external noise and adapts to different problems without parameter tuning. We derive a $\tilde{O} (d n lo g K)$ gap-free bound on the $n$ -round regret of CORe in a stochastic linear bandit, where $d$ is the number of features and $K$ is the number of arms. Extensive empirical evaluation on multiple synthetic and real-world problems demonstrates the effectiveness of CORe.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems