Residual Bootstrap Exploration for Bandit Algorithms

Chi-Hua Wang; Yang Yu; Botao Hao; Guang Cheng

arXiv:2002.08436·stat.ML·February 21, 2020·5 cites

Residual Bootstrap Exploration for Bandit Algorithms

Chi-Hua Wang, Yang Yu, Botao Hao, Guang Cheng

PDF

Open Access

TL;DR

This paper introduces ReBoot, a novel residual bootstrap exploration method for bandit algorithms that enhances exploration by data-driven variance inflation, achieving logarithmic regret guarantees and improved empirical performance.

Contribution

The paper presents ReBoot, a new perturbation-based exploration technique that captures distributional properties of errors and boosts exploration, with theoretical regret guarantees and superior empirical results.

Findings

01

ReBoot achieves logarithmic regret in Gaussian bandits.

02

ReBoot outperforms Giro and PHE in unbounded reward scenarios.

03

ReBoot maintains computational efficiency comparable to Thompson sampling.

Abstract

In this paper, we propose a novel perturbation-based exploration method in bandit algorithms with bounded or unbounded rewards, called residual bootstrap exploration (\texttt{ReBoot}). The \texttt{ReBoot} enforces exploration by injecting data-driven randomness through a residual-based perturbation mechanism. This novel mechanism captures the underlying distributional properties of fitting errors, and more importantly boosts exploration to escape from suboptimal solutions (for small sample sizes) by inflating variance level in an \textit{unconventional} way. In theory, with appropriate variance inflation level, \texttt{ReBoot} provably secures instance-dependent logarithmic regret in Gaussian multi-armed bandits. We evaluate the \texttt{ReBoot} in different synthetic multi-armed bandits problems and observe that the \texttt{ReBoot} performs better for unbounded rewards and more robustly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms