Time-Sensitive Bandit Learning and Satisficing Thompson Sampling

Daniel Russo; David Tse; Benjamin Van Roy

arXiv:1704.09028·cs.LG·May 1, 2017·6 cites

Time-Sensitive Bandit Learning and Satisficing Thompson Sampling

Daniel Russo, David Tse, Benjamin Van Roy

PDF

Open Access

TL;DR

This paper introduces satisficing Thompson sampling, a new algorithm for time-sensitive bandit problems that optimizes discounted regret, addressing limitations of traditional methods in scenarios with costly learning.

Contribution

It proposes satisficing Thompson sampling and provides theoretical regret bounds for settings with time preference, extending bandit algorithms to more realistic, time-sensitive contexts.

Findings

01

Satisficing Thompson sampling achieves strong discounted regret bounds.

02

Traditional algorithms like UCB and standard Thompson sampling perform poorly under time preference.

03

The approach better balances exploration and exploitation in time-sensitive environments.

Abstract

The literature on bandit learning and regret analysis has focused on contexts where the goal is to converge on an optimal action in a manner that limits exploration costs. One shortcoming imposed by this orientation is that it does not treat time preference in a coherent manner. Time preference plays an important role when the optimal action is costly to learn relative to near-optimal actions. This limitation has not only restricted the relevance of theoretical results but has also influenced the design of algorithms. Indeed, popular approaches such as Thompson sampling and UCB can fare poorly in such situations. In this paper, we consider discounted rather than cumulative regret, where a discount factor encodes time preference. We propose satisficing Thompson sampling -- a variation of Thompson sampling -- and establish a strong discounted regret bound for this new algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Data Stream Mining Techniques