Bayesian bandits: balancing the exploration-exploitation tradeoff via   double sampling

I\~nigo Urteaga; Chris H. Wiggins

arXiv:1709.03162·stat.ML·August 10, 2018·6 cites

Bayesian bandits: balancing the exploration-exploitation tradeoff via double sampling

I\~nigo Urteaga, Chris H. Wiggins

PDF

Open Access 1 Repo

TL;DR

This paper introduces a double sampling algorithm for Bayesian bandits that leverages posterior uncertainty to balance exploration and exploitation, improving decision-making in costly or invasive interaction scenarios.

Contribution

It proposes a novel double sampling method that uses Bayesian posterior estimates to adaptively manage exploration and exploitation without distributional assumptions.

Findings

01

Reduced cumulative regret compared to existing methods

02

Applicable to complex reward distributions

03

Effective in domains with costly or invasive sampling

Abstract

Reinforcement learning studies how to balance exploration and exploitation in real-world systems, optimizing interactions with the world while simultaneously learning how the world operates. One general class of algorithms for such learning is the multi-armed bandit setting. Randomized probability matching, based upon the Thompson sampling approach introduced in the 1930s, has recently been shown to perform well and to enjoy provable optimality properties. It permits generative, interpretable modeling in a Bayesian setting, where prior knowledge is incorporated, and the computed posteriors naturally capture the full state of knowledge. In this work, we harness the information contained in the Bayesian posterior and estimate its sufficient statistics via sampling. In several application domains, for example in health and medicine, each interaction with the world can be expensive and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iurteaga/bandits
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms