Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions
Kei Takemura, Shinji Ito, Daisuke Hatano, Hanna Sumita, Takuro, Fukunaga, Naonori Kakimura, Ken-ichi Kawarabayashi

TL;DR
This paper improves regret bounds for the contextual combinatorial semi-bandit problem with linear payoffs, achieving near-optimal bounds that depend on feature dimension, number of arms, and rounds, applicable to practical scenarios.
Contribution
The authors refine upper and lower regret bounds for the problem, introducing an algorithm with optimal regret bounds under general constraints and demonstrating broader applicability.
Findings
C${}^2$UCB algorithm has optimal regret bound $ ilde{O}(d ext{sqrt}(kT) + dk)$ for partition matroids.
Modified reward estimates lead to optimal regret bounds for general constraints.
Numerical experiments confirm theoretical regret bounds and practical effectiveness.
Abstract
The contextual combinatorial semi-bandit problem with linear payoff functions is a decision-making problem in which a learner chooses a set of arms with the feature vectors in each round under given constraints so as to maximize the sum of rewards of arms. Several existing algorithms have regret bounds that are optimal with respect to the number of rounds . However, there is a gap of between the current best upper and lower bounds, where is the dimension of the feature vectors, is the number of the chosen arms in a round, and ignores the logarithmic factors. The dependence of and is of practical importance because may be larger than in real-world applications such as recommender systems. In this paper, we fill the gap by improving the upper and lower bounds. More precisely, we show that the CUCB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
