When Combinatorial Thompson Sampling meets Approximation Regret

Pierre Perrault

arXiv:2302.11182·stat.ML·February 23, 2023

When Combinatorial Thompson Sampling meets Approximation Regret

Pierre Perrault

PDF

Open Access 1 Video

TL;DR

This paper advances the understanding of Combinatorial Thompson Sampling (CTS) in multi-armed bandit problems by establishing a new logarithmic regret bound under a specific oracle condition, broadening its applicability.

Contribution

It introduces the REDUCE2EXACT condition, enabling a new $ ext{O}(rac{ ext{log}(T)}{ riangle})$ regret bound for CTS beyond the greedy oracle case.

Findings

01

First $ ext{O}(rac{ ext{log}(T)}{ riangle})$ regret bound for CTS

02

The REDUCE2EXACT condition applies to many concrete examples

03

Extension of results to probabilistically triggered arms setting

Abstract

We study the Combinatorial Thompson Sampling policy (CTS) for combinatorial multi-armed bandit problems (CMAB), within an approximation regret setting. Although CTS has attracted a lot of interest, it has a drawback that other usual CMAB policies do not have when considering non-exact oracles: for some oracles, CTS has a poor approximation regret (scaling linearly with the time horizon $T$ ) [Wang and Chen, 2018]. A study is then necessary to discriminate the oracles on which CTS could learn. This study was started by Kong et al. [2021]: they gave the first approximation regret analysis of CTS for the greedy oracle, obtaining an upper bound of order $O (lo g (T) / Δ^{2})$ , where $Δ$ is some minimal reward gap. In this paper, our objective is to push this study further than the simple case of the greedy oracle. We provide the first $O (lo g (T) /Δ)$ approximation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

When Combinatorial Thompson Sampling meets Approximation Regret· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems