BelMan: Bayesian Bandits on the Belief--Reward Manifold

Debabrota Basu; Pierre Senellart; St\'ephane Bressan

arXiv:1805.01627·cs.LG·June 25, 2019

BelMan: Bayesian Bandits on the Belief--Reward Manifold

Debabrota Basu, Pierre Senellart, St\'ephane Bressan

PDF

Open Access 1 Repo

TL;DR

BelMan introduces a Bayesian, information geometric method for multi-armed bandit problems, leveraging belief-reward manifolds and projections to balance exploration and exploitation effectively.

Contribution

It presents a novel geometric framework using belief-reward manifolds and projections, supporting various bandit problem types with competitive performance.

Findings

01

Outperforms state-of-the-art algorithms in multi-armed bandit tasks.

02

Supports pure exploration, exploration-exploitation, and two-phase bandit problems.

03

Effective in scenarios with many arms and continuous rewards.

Abstract

We propose a generic, Bayesian, information geometric approach to the exploration--exploitation trade-off in multi-armed bandit problems. Our approach, BelMan, uniformly supports pure exploration, exploration--exploitation, and two-phase bandit problems. The knowledge on bandit arms and their reward distributions is summarised by the barycentre of the joint distributions of beliefs and rewards of the arms, the \emph{pseudobelief-reward}, within the beliefs-rewards manifold. BelMan alternates \emph{information projection} and \emph{reverse information projection}, i.e., projection of the pseudobelief-reward onto beliefs-rewards to choose the arm to play, and projection of the resulting beliefs-rewards onto the pseudobelief-reward. It introduces a mechanism that infuses an exploitative bias by means of a \emph{focal distribution}, i.e., a reward distribution that gradually concentrates on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Debabrota-Basu/QBelMan
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms