Thompson Sampling for Combinatorial Semi-Bandits

Siwei Wang; Wei Chen

arXiv:1803.04623·cs.LG·June 22, 2022·28 cites

Thompson Sampling for Combinatorial Semi-Bandits

Siwei Wang, Wei Chen

PDF

Open Access

TL;DR

This paper applies Thompson sampling to combinatorial multi-armed bandits, providing improved regret bounds, analyzing the matroid setting, and demonstrating through experiments that TS outperforms existing algorithms.

Contribution

The paper introduces a refined analysis of Thompson sampling for CMAB, achieving tighter regret bounds, extends results to matroid bandits without independence assumptions, and highlights limitations of using approximation oracles.

Findings

01

Thompson sampling achieves better regret bounds than prior UCB-based methods.

02

In the matroid bandit setting, regret bounds match the theoretical lower bounds.

03

Experiments show Thompson sampling outperforms existing algorithms in practice.

Abstract

In this paper, we study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We first analyze the standard TS algorithm for the general CMAB model when the outcome distributions of all the base arms are independent, and obtain a distribution-dependent regret bound of $O (m lo g K_{m a x} lo g T / Δ_{m i n})$ , where $m$ is the number of base arms, $K_{m a x}$ is the size of the largest super arm, $T$ is the time horizon, and $Δ_{m i n}$ is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. This regret upper bound is better than the $O (m (lo g K_{m a x})^{2} lo g T / Δ_{m i n})$ bound in prior works. Moreover, our novel analysis techniques can help to tighten the regret bounds of other existing UCB-based policies (e.g., ESCB), as we improve the method of counting the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems

MethodsSpatio-temporal stability analysis