Nonparametric Gaussian Mixture Models for the Multi-Armed Bandit
I\~nigo Urteaga, Chris H. Wiggins

TL;DR
This paper introduces a Bayesian nonparametric approach using Gaussian mixture models for Thompson sampling in multi-armed bandits, effectively handling reward model uncertainty and outperforming existing methods in diverse environments.
Contribution
It extends Thompson sampling with nonparametric mixture models to adaptively learn complex reward distributions, providing theoretical regret bounds and improved empirical performance.
Findings
Outperforms state-of-the-art methods in cumulative regret
Handles rewards outside exponential family distributions
Effective in environments with outliers and diverse reward types
Abstract
We here adopt Bayesian nonparametric mixture models to extend multi-armed bandits in general, and Thompson sampling in particular, to scenarios where there is reward model uncertainty. In the stochastic multi-armed bandit, the reward for the played arm is generated from an unknown distribution. Reward uncertainty, i.e., the lack of knowledge about the reward-generating distribution, induces the exploration-exploitation trade-off: a bandit agent needs to simultaneously learn the properties of the reward distribution and sequentially decide which action to take next. In this work, we extend Thompson sampling to scenarios where there is reward model uncertainty by adopting Bayesian nonparametric Gaussian mixture models for flexible reward density estimation. The proposed Bayesian nonparametric mixture model Thompson sampling sequentially learns the reward model that best approximates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Influenza Virus Research Studies
