Non-Stationary Bandit Learning via Predictive Sampling

Yueyang Liu; Xu Kuang; Benjamin Van Roy

arXiv:2205.01970·cs.LG·May 6, 2025·6 cites

Non-Stationary Bandit Learning via Predictive Sampling

Yueyang Liu, Xu Kuang, Benjamin Van Roy

PDF

Open Access

TL;DR

This paper introduces predictive sampling, a novel bandit algorithm that improves decision-making in non-stationary environments by prioritizing information with lasting usefulness, backed by theoretical guarantees and superior empirical performance.

Contribution

The paper proposes predictive sampling, a new approach that enhances bandit algorithms for non-stationary settings by accounting for the decay of information relevance, with scalable implementations and theoretical analysis.

Findings

01

Predictive sampling outperforms Thompson sampling in non-stationary environments.

02

Theoretical Bayesian regret bounds are established for the proposed method.

03

Numerical simulations confirm improved performance across various non-stationary scenarios.

Abstract

Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to non-stationary environments. We attribute such failures to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to non-stationarity. Building upon this insight, we propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness. A theoretical guarantee on the performance of predictive sampling is established through a Bayesian regret bound. We provide versions of predictive sampling for which computations tractably scale to complex bandit environments of practical interest. Through numerical simulations, we demonstrate that predictive sampling outperforms Thompson sampling in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Algorithms