Discounted Thompson Sampling for Non-Stationary Bandit Problems

Han Qi; Yue Wang; Li Zhu

arXiv:2305.10718·cs.LG·May 23, 2023·2 cites

Discounted Thompson Sampling for Non-Stationary Bandit Problems

Han Qi, Yue Wang, Li Zhu

PDF

Open Access

TL;DR

This paper introduces Discounted Thompson Sampling (DS-TS), an algorithm designed for non-stationary multi-armed bandit problems, capable of adapting to abrupt and smooth changes in reward distributions with near-optimal regret bounds.

Contribution

The paper proposes DS-TS with Gaussian priors that passively adapts to non-stationarity, providing theoretical regret bounds and demonstrating competitive empirical performance.

Findings

01

DS-TS achieves near-optimal regret bounds in non-stationary environments.

02

Empirical results show DS-TS outperforms existing algorithms when prior knowledge is available.

03

Theoretical analysis confirms DS-TS's effectiveness in both abrupt and smooth change scenarios.

Abstract

Non-stationary multi-armed bandit (NS-MAB) problems have recently received significant attention. NS-MAB are typically modelled in two scenarios: abruptly changing, where reward distributions remain constant for a certain period and change at unknown time steps, and smoothly changing, where reward distributions evolve smoothly based on unknown dynamics. In this paper, we propose Discounted Thompson Sampling (DS-TS) with Gaussian priors to address both non-stationary settings. Our algorithm passively adapts to changes by incorporating a discounted factor into Thompson Sampling. DS-TS method has been experimentally validated, but analysis of the regret upper bound is currently lacking. Under mild assumptions, we show that DS-TS with Gaussian priors can achieve nearly optimal regret bound on the order of $\tilde{O} (T B_{T})$ for abruptly changing and $\tilde{O} (T^{β})$ for smoothly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Machine Learning and Algorithms