Batched Thompson Sampling for Multi-Armed Bandits

Nikolai Karpov; Qin Zhang

arXiv:2108.06812·cs.LG·August 17, 2021

Batched Thompson Sampling for Multi-Armed Bandits

Nikolai Karpov, Qin Zhang

PDF

Open Access

TL;DR

This paper introduces two batched Thompson Sampling algorithms for stochastic multi-armed bandits, balancing regret minimization with limited policy updates, supported by theoretical analysis and empirical validation.

Contribution

The paper proposes novel batched Thompson Sampling algorithms with theoretical regret-batch tradeoffs and demonstrates their effectiveness through experiments.

Findings

01

Effective regret-batch tradeoffs for two-arm bandits

02

Algorithms perform well on synthetic and real datasets

03

Theoretical analysis confirms near-optimal regret bounds

Abstract

We study Thompson Sampling algorithms for stochastic multi-armed bandits in the batched setting, in which we want to minimize the regret over a sequence of arm pulls using a small number of policy changes (or, batches). We propose two algorithms and demonstrate their effectiveness by experiments on both synthetic and real datasets. We also analyze the proposed algorithms from the theoretical aspect and obtain almost tight regret-batches tradeoffs for the two-arm case.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Smart Grid Energy Management