Batched Thompson Sampling

Cem Kalkanli; Ayfer Ozgur

arXiv:2110.00202·cs.LG·October 4, 2021

Batched Thompson Sampling

Cem Kalkanli, Ayfer Ozgur

PDF

Open Access 1 Video

TL;DR

This paper introduces a batched version of Thompson sampling for multi-armed bandits that achieves near-optimal regret bounds with minimal feedback, matching the performance of algorithms with full feedback.

Contribution

It proposes an adaptive batched Thompson sampling policy that maintains optimal regret bounds and requires only a logarithmic number of batches, without prior knowledge of the time horizon.

Findings

01

Achieves $O( ext{log}(T))$ problem-dependent regret.

02

Achieves $O( ext{sqrt}(T ext{log}(T)))$ minimax regret.

03

Uses $O( ext{log}( ext{log}(T)))$ batches in expectation.

Abstract

We introduce a novel anytime Batched Thompson sampling policy for multi-armed bandits where the agent observes the rewards of her actions and adjusts her policy only at the end of a small number of batches. We show that this policy simultaneously achieves a problem dependent regret of order $O (lo g (T))$ and a minimax regret of order $O (T lo g (T))$ while the number of batches can be bounded by $O (lo g (T))$ independent of the problem instance over a time horizon $T$ . We also show that in expectation the number of batches used by our policy can be bounded by an instance dependent bound of order $O (lo g lo g (T))$ . These results indicate that Thompson sampling maintains the same performance in this batched setting as in the case when instantaneous feedback is available after each action, while requiring minimal feedback. These results also indicate that Thompson sampling performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Batched Thompson Sampling· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems