Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement   Learning

Tong Zhang

arXiv:2110.00871·cs.LG·October 5, 2021

Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning

Tong Zhang

PDF

Open Access

TL;DR

This paper introduces Feel-Good Thompson Sampling, a modified approach for contextual bandits that improves exploration and achieves optimal regret bounds, extending to linear and some MDP problems.

Contribution

It proposes Feel-Good Thompson Sampling, providing a theoretical framework with regret bounds that match minimax lower bounds and extends to linear and MDP settings.

Findings

01

Feel-Good Thompson Sampling improves exploration in contextual bandits.

02

Theoretical regret bounds match minimax lower bounds.

03

Framework extends to linear and certain MDP problems.

Abstract

Thompson Sampling has been widely used for contextual bandit problems due to the flexibility of its modeling power. However, a general theory for this class of methods in the frequentist setting is still lacking. In this paper, we present a theoretical analysis of Thompson Sampling, with a focus on frequentist regret bounds. In this setting, we show that the standard Thompson Sampling is not aggressive enough in exploring new actions, leading to suboptimality in some pessimistic situations. A simple modification called Feel-Good Thompson Sampling, which favors high reward models more aggressively than the standard Thompson Sampling, is proposed to remedy this problem. We show that the theoretical framework can be used to derive Bayesian regret bounds for standard Thompson Sampling, and frequentist regret bounds for Feel-Good Thompson Sampling. It is shown that in both cases, we can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Smart Grid Energy Management