Taming Non-stationary Bandits: A Bayesian Approach

Vishnu Raj; Sheetal Kalyani

arXiv:1707.09727·stat.ML·August 1, 2017·48 cites

Taming Non-stationary Bandits: A Bayesian Approach

Vishnu Raj, Sheetal Kalyani

PDF

Open Access

TL;DR

This paper introduces a Bayesian variant of Thompson Sampling tailored for non-stationary multi-armed bandit problems, employing discounting and optimism techniques, validated through extensive empirical evaluations.

Contribution

It presents a novel Bayesian approach with discounting and optimism modifications for non-stationary bandits, along with exact probability analysis and comprehensive empirical validation.

Findings

01

Effective in non-stationary environments

02

Outperforms several state-of-the-art algorithms

03

Provides exact probability of sub-optimal arm selection

Abstract

We consider the multi armed bandit problem in non-stationary environments. Based on the Bayesian method, we propose a variant of Thompson Sampling which can be used in both rested and restless bandit scenarios. Applying discounting to the parameters of prior distribution, we describe a way to systematically reduce the effect of past observations. Further, we derive the exact expression for the probability of picking sub-optimal arms. By increasing the exploitative value of Bayes' samples, we also provide an optimistic version of the algorithm. Extensive empirical analysis is conducted under various scenarios to validate the utility of proposed algorithms. A comparison study with various state-of-the-arm algorithms is also included.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Data Stream Mining Techniques