Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms
Wei Chen, Yajun Wang, Yang Yuan, Qinshi Wang

TL;DR
This paper introduces a general framework for combinatorial multi-armed bandits with probabilistically triggered arms, providing algorithms with tight regret bounds and applying to complex nonlinear reward problems like social influence maximization.
Contribution
It extends CMAB models to include probabilistically triggered arms, develops a CUCB algorithm with optimal regret bounds, and demonstrates applications to nonlinear reward problems.
Findings
CUCB achieves O(log n) regret bounds.
Framework applies to nonlinear reward structures.
Improves regret bounds over previous combinatorial bandit models.
Abstract
We define a general framework for a large class of combinatorial multi-armed bandit (CMAB) problems, where subsets of base arms with unknown distributions form super arms. In each round, a super arm is played and the base arms contained in the super arm are played and their outcomes are observed. We further consider the extension in which more based arms could be probabilistically triggered based on the outcomes of already triggered arms. The reward of the super arm depends on the outcomes of all played arms, and it only needs to satisfy two mild assumptions, which allow a large class of nonlinear reward instances. We assume the availability of an offline (\alpha,\beta)-approximation oracle that takes the means of the outcome distributions of arms and outputs a super arm that with probability {\beta} generates an {\alpha} fraction of the optimal expected reward. The objective of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
