Discounted Thompson Sampling for Non-Stationary Bandit Problems
Han Qi, Yue Wang, Li Zhu

TL;DR
This paper introduces Discounted Thompson Sampling (DS-TS), an algorithm designed for non-stationary multi-armed bandit problems, capable of adapting to abrupt and smooth changes in reward distributions with near-optimal regret bounds.
Contribution
The paper proposes DS-TS with Gaussian priors that passively adapts to non-stationarity, providing theoretical regret bounds and demonstrating competitive empirical performance.
Findings
DS-TS achieves near-optimal regret bounds in non-stationary environments.
Empirical results show DS-TS outperforms existing algorithms when prior knowledge is available.
Theoretical analysis confirms DS-TS's effectiveness in both abrupt and smooth change scenarios.
Abstract
Non-stationary multi-armed bandit (NS-MAB) problems have recently received significant attention. NS-MAB are typically modelled in two scenarios: abruptly changing, where reward distributions remain constant for a certain period and change at unknown time steps, and smoothly changing, where reward distributions evolve smoothly based on unknown dynamics. In this paper, we propose Discounted Thompson Sampling (DS-TS) with Gaussian priors to address both non-stationary settings. Our algorithm passively adapts to changes by incorporating a discounted factor into Thompson Sampling. DS-TS method has been experimentally validated, but analysis of the regret upper bound is currently lacking. Under mild assumptions, we show that DS-TS with Gaussian priors can achieve nearly optimal regret bound on the order of for abruptly changing and for smoothly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Machine Learning and Algorithms
