Batched Thompson Sampling for Multi-Armed Bandits
Nikolai Karpov, Qin Zhang

TL;DR
This paper introduces two batched Thompson Sampling algorithms for stochastic multi-armed bandits, balancing regret minimization with limited policy updates, supported by theoretical analysis and empirical validation.
Contribution
The paper proposes novel batched Thompson Sampling algorithms with theoretical regret-batch tradeoffs and demonstrates their effectiveness through experiments.
Findings
Effective regret-batch tradeoffs for two-arm bandits
Algorithms perform well on synthetic and real datasets
Theoretical analysis confirms near-optimal regret bounds
Abstract
We study Thompson Sampling algorithms for stochastic multi-armed bandits in the batched setting, in which we want to minimize the regret over a sequence of arm pulls using a small number of policy changes (or, batches). We propose two algorithms and demonstrate their effectiveness by experiments on both synthetic and real datasets. We also analyze the proposed algorithms from the theoretical aspect and obtain almost tight regret-batches tradeoffs for the two-arm case.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Smart Grid Energy Management
