Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits
Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari

TL;DR
This paper establishes tight regret bounds for a computationally efficient UCB-like algorithm in stochastic combinatorial semi-bandits, advancing understanding of sample and computational efficiency in this complex online learning setting.
Contribution
It provides the first tight regret bounds for a computationally efficient algorithm in stochastic combinatorial semi-bandits, addressing both gap-dependent and gap-free scenarios.
Findings
Gap-dependent regret bound: O(K L (1 / Δ) log n)
Gap-free regret bound: O(√(K L n log n))
Bounds are tight up to constant or polylogarithmic factors.
Abstract
A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove and upper bounds on its -step regret, where is the number of ground items, is the maximum number of chosen items, and is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
