Optimal Arm Elimination Algorithms for Combinatorial Bandits
Yuxiao Wen, Yanjun Han, Zhengyuan Zhou

TL;DR
This paper introduces a new arm elimination algorithm for combinatorial bandits that effectively balances exploration and exploitation, achieving near-optimal regret in complex feedback settings.
Contribution
A novel arm elimination scheme with explicit exploration for combinatorial bandits, outperforming UCB-based methods and matching lower regret bounds.
Findings
Achieves near-optimal regret in combinatorial multi-armed bandits with graph feedback.
Demonstrates effectiveness in combinatorial linear contextual bandits.
UCB-based methods can fail without explicit exploration.
Abstract
Combinatorial bandits extend the classical bandit framework to settings where the learner selects multiple arms in each round, motivated by applications such as online recommendation and assortment optimization. While extensions of upper confidence bound (UCB) algorithms arise naturally in this context, adapting arm elimination methods has proved more challenging. We introduce a novel elimination scheme that partitions arms into three categories (confirmed, active, and eliminated), and incorporates explicit exploration to update these sets. We demonstrate the efficacy of our algorithm in two settings: the combinatorial multi-armed bandit with general graph feedback, and the combinatorial linear contextual bandit. In both cases, our approach achieves near-optimal regret, whereas UCB-based methods can provably fail due to insufficient explicit exploration. Matching lower bounds are also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
