Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms:   A Case with Bounded Regret

A. \"Omer Sar{\i}ta\c{c}; Cem Tekin

arXiv:1707.07443·cs.LG·July 25, 2017·2 cites

Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms: A Case with Bounded Regret

A. \"Omer Sar{\i}ta\c{c}, Cem Tekin

PDF

Open Access

TL;DR

This paper investigates combinatorial multi-armed bandit problems with probabilistically triggered arms, proposing policies that achieve bounded regret and demonstrating their effectiveness through theoretical analysis and real-world movie recommendation experiments.

Contribution

It introduces new UCB and Thompson Sampling algorithms for CMAB with PTAs that achieve bounded regret, improving upon previous regret bounds without assumptions on arm triggering probabilities.

Findings

01

CUCB-$7$ and CTS achieve bounded regret.

02

CUCB-$0$ and CTS have $O(\u221a{T})$ regret bounds.

03

Numerical experiments confirm theoretical results.

Abstract

In this paper, we study the combinatorial multi-armed bandit problem (CMAB) with probabilistically triggered arms (PTAs). Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a class of upper confidence bound (UCB) policies, named Combinatorial UCB with exploration rate $κ$ (CUCB- $κ$ ), and Combinatorial Thompson Sampling (CTS), which estimates the expected states of the arms via Thompson sampling, achieve bounded regret. In addition, we prove that CUCB- $0$ and CTS incur $O (T)$ gap-independent regret. These results improve the results in previous works, which show $O (lo g T)$ gap-dependent and $O (T lo g T)$ gap-independent regrets, respectively, under no assumptions on the ATPs. Then, we numerically evaluate the performance of CUCB- $κ$ and CTS in a real-world movie recommendation problem, where the actions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Spam and Phishing Detection