A Note on Information-Directed Sampling and Thompson Sampling

Li Zhou

arXiv:1503.06902·cs.LG·March 25, 2015

A Note on Information-Directed Sampling and Thompson Sampling

Li Zhou

PDF

Open Access

TL;DR

This paper provides an intuitive explanation and derivations for three Bayesian multi-armed bandit algorithms: Information-directed sampling, Thompson Sampling, and Generalized Thompson Sampling, focusing on their regret bounds.

Contribution

It offers clear explanations and derivations for these algorithms, enhancing understanding of their theoretical properties.

Findings

01

Provides regret bounds for the algorithms

02

Clarifies the intuition behind each method

03

Includes derivations omitted in original works

Abstract

This note introduce three Bayesian style Multi-armed bandit algorithms: Information-directed sampling, Thompson Sampling and Generalized Thompson Sampling. The goal is to give an intuitive explanation for these three algorithms and their regret bounds, and provide some derivations that are omitted in the original papers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics