Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits   with Linear Payoff Functions

Kei Takemura; Shinji Ito; Daisuke Hatano; Hanna Sumita; Takuro; Fukunaga; Naonori Kakimura; Ken-ichi Kawarabayashi

arXiv:2101.07957·stat.ML·March 2, 2021

Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

Kei Takemura, Shinji Ito, Daisuke Hatano, Hanna Sumita, Takuro, Fukunaga, Naonori Kakimura, Ken-ichi Kawarabayashi

PDF

Open Access 1 Video

TL;DR

This paper improves regret bounds for the contextual combinatorial semi-bandit problem with linear payoffs, achieving near-optimal bounds that depend on feature dimension, number of arms, and rounds, applicable to practical scenarios.

Contribution

The authors refine upper and lower regret bounds for the problem, introducing an algorithm with optimal regret bounds under general constraints and demonstrating broader applicability.

Findings

01

C${}^2$UCB algorithm has optimal regret bound $ ilde{O}(d ext{sqrt}(kT) + dk)$ for partition matroids.

02

Modified reward estimates lead to optimal regret bounds for general constraints.

03

Numerical experiments confirm theoretical regret bounds and practical effectiveness.

Abstract

The contextual combinatorial semi-bandit problem with linear payoff functions is a decision-making problem in which a learner chooses a set of arms with the feature vectors in each round under given constraints so as to maximize the sum of rewards of arms. Several existing algorithms have regret bounds that are optimal with respect to the number of rounds $T$ . However, there is a gap of $\tilde{O} (max (d, k))$ between the current best upper and lower bounds, where $d$ is the dimension of the feature vectors, $k$ is the number of the chosen arms in a round, and $\tilde{O} (\cdot)$ ignores the logarithmic factors. The dependence of $k$ and $d$ is of practical importance because $k$ may be larger than $T$ in real-world applications such as recommender systems. In this paper, we fill the gap by improving the upper and lower bounds. More precisely, we show that the C $^{2}$ UCB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems