Combinatorial Bandits Revisited
Richard Combes, M. Sadegh Talebi, Alexandre Proutiere, Marc, Lelarge

TL;DR
This paper revisits combinatorial multi-armed bandit problems, providing new theoretical bounds, proposing efficient algorithms with improved guarantees, and demonstrating superior practical performance in both stochastic and adversarial settings.
Contribution
It introduces ESCB and CombEXP algorithms with better theoretical guarantees and computational efficiency for stochastic and adversarial combinatorial bandits.
Findings
ESCB outperforms existing algorithms in practice.
New regret lower bounds derived for stochastic setting.
CombEXP offers lower computational complexity in some cases.
Abstract
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the adversarial setting under bandit feedback, we propose \textsc{CombEXP}, an algorithm with the same regret scaling as state-of-the-art algorithms, but with lower computational complexity for some combinatorial problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
