Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Branislav Kveton; Zheng Wen; Azin Ashkan; and Csaba Szepesvari

arXiv:1410.0949·cs.LG·June 8, 2017·58 cites

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari

PDF

Open Access

TL;DR

This paper establishes tight regret bounds for a computationally efficient UCB-like algorithm in stochastic combinatorial semi-bandits, advancing understanding of sample and computational efficiency in this complex online learning setting.

Contribution

It provides the first tight regret bounds for a computationally efficient algorithm in stochastic combinatorial semi-bandits, addressing both gap-dependent and gap-free scenarios.

Findings

01

Gap-dependent regret bound: O(K L (1 / Δ) log n)

02

Gap-free regret bound: O(√(K L n log n))

03

Bounds are tight up to constant or polylogarithmic factors.

Abstract

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove $O (K L (1/Δ) lo g n)$ and $O (K L n lo g n)$ upper bounds on its $n$ -step regret, where $L$ is the number of ground items, $K$ is the maximum number of chosen items, and $Δ$ is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems