An Information-Theoretic Analysis of Thompson Sampling
Daniel Russo, Benjamin Van Roy

TL;DR
This paper offers an information-theoretic framework for analyzing Thompson sampling, providing regret bounds related to the entropy of the optimal actions, thereby enhancing understanding of how information influences learning efficiency.
Contribution
It introduces a broad, elegant analysis of Thompson sampling using information theory, strengthening existing regret bounds and offering new insights into information's role in learning.
Findings
Regret bounds scale with the entropy of the optimal-action distribution
Analysis applies across various online optimization problems
Provides new insights into information's impact on performance
Abstract
We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Machine Learning and Algorithms
