Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms: A Case with Bounded Regret
A. \"Omer Sar{\i}ta\c{c}, Cem Tekin

TL;DR
This paper investigates combinatorial multi-armed bandit problems with probabilistically triggered arms, proposing policies that achieve bounded regret and demonstrating their effectiveness through theoretical analysis and real-world movie recommendation experiments.
Contribution
It introduces new UCB and Thompson Sampling algorithms for CMAB with PTAs that achieve bounded regret, improving upon previous regret bounds without assumptions on arm triggering probabilities.
Findings
CUCB-$7$ and CTS achieve bounded regret.
CUCB-$0$ and CTS have $O(\u221a{T})$ regret bounds.
Numerical experiments confirm theoretical results.
Abstract
In this paper, we study the combinatorial multi-armed bandit problem (CMAB) with probabilistically triggered arms (PTAs). Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a class of upper confidence bound (UCB) policies, named Combinatorial UCB with exploration rate (CUCB-), and Combinatorial Thompson Sampling (CTS), which estimates the expected states of the arms via Thompson sampling, achieve bounded regret. In addition, we prove that CUCB- and CTS incur gap-independent regret. These results improve the results in previous works, which show gap-dependent and gap-independent regrets, respectively, under no assumptions on the ATPs. Then, we numerically evaluate the performance of CUCB- and CTS in a real-world movie recommendation problem, where the actions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Spam and Phishing Detection
