Nonparametric Gaussian Mixture Models for the Multi-Armed Bandit

I\~nigo Urteaga; Chris H. Wiggins

arXiv:1808.02932·stat.ML·August 26, 2022·1 cites

Nonparametric Gaussian Mixture Models for the Multi-Armed Bandit

I\~nigo Urteaga, Chris H. Wiggins

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Bayesian nonparametric approach using Gaussian mixture models for Thompson sampling in multi-armed bandits, effectively handling reward model uncertainty and outperforming existing methods in diverse environments.

Contribution

It extends Thompson sampling with nonparametric mixture models to adaptively learn complex reward distributions, providing theoretical regret bounds and improved empirical performance.

Findings

01

Outperforms state-of-the-art methods in cumulative regret

02

Handles rewards outside exponential family distributions

03

Effective in environments with outliers and diverse reward types

Abstract

We here adopt Bayesian nonparametric mixture models to extend multi-armed bandits in general, and Thompson sampling in particular, to scenarios where there is reward model uncertainty. In the stochastic multi-armed bandit, the reward for the played arm is generated from an unknown distribution. Reward uncertainty, i.e., the lack of knowledge about the reward-generating distribution, induces the exploration-exploitation trade-off: a bandit agent needs to simultaneously learn the properties of the reward distribution and sequentially decide which action to take next. In this work, we extend Thompson sampling to scenarios where there is reward model uncertainty by adopting Bayesian nonparametric Gaussian mixture models for flexible reward density estimation. The proposed Bayesian nonparametric mixture model Thompson sampling sequentially learns the reward model that best approximates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iurteaga/bandits
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Influenza Virus Research Studies