An Information-Theoretic Analysis of Thompson Sampling

Daniel Russo; Benjamin Van Roy

arXiv:1403.5341·cs.LG·June 9, 2015·57 cites

An Information-Theoretic Analysis of Thompson Sampling

Daniel Russo, Benjamin Van Roy

PDF

Open Access

TL;DR

This paper offers an information-theoretic framework for analyzing Thompson sampling, providing regret bounds related to the entropy of the optimal actions, thereby enhancing understanding of how information influences learning efficiency.

Contribution

It introduces a broad, elegant analysis of Thompson sampling using information theory, strengthening existing regret bounds and offering new insights into information's role in learning.

Findings

01

Regret bounds scale with the entropy of the optimal-action distribution

02

Analysis applies across various online optimization problems

03

Provides new insights into information's impact on performance

Abstract

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Machine Learning and Algorithms