TL;DR
This paper presents a novel algorithm for contextual combinatorial semi-bandits that achieves optimal regret in both adversarial and stochastic regimes, while also significantly improving computational efficiency for large-scale applications.
Contribution
It introduces a best-of-both-worlds algorithm with efficient implementation by transforming high-dimensional projections into single-variable root-finding problems.
Findings
Achieves $ ilde{O}(\sqrt{T})$ regret in adversarial regime.
Achieves $ ilde{O}(\ln T)$ regret in stochastic regime.
Provides substantial per-round speed-ups in empirical tests.
Abstract
We introduce the first best-of-both-worlds algorithm for contextual combinatorial semi-bandits that simultaneously guarantees regret in the adversarial regime and regret in the corrupted stochastic regime. Our approach builds on the Follow-the-Regularized-Leader (FTRL) framework equipped with a Shannon entropy regularizer, yielding a flexible method that admits efficient implementations. Beyond regret bounds, we tackle the practical bottleneck in FTRL (or, equivalently, Online Stochastic Mirror Descent) arising from the high-dimensional projection step encountered in each round of interaction. By leveraging the Karush-Kuhn-Tucker conditions, we transform the -dimensional convex projection problem into a single-variable root-finding problem, dramatically accelerating each round. Empirical evaluations demonstrate that…
Peer Reviews
Decision·ICLR 2026 Poster
1) This paper presents the first best-of-both-worlds (BOBW) regret guarantee for contextual combinatorial semi-bandits. 2) It introduces an efficient numerical scheme for the FTRL update that significantly reduces computational cost while preserving the theoretical guarantees. 3) The theoretical analysis is well-structured, with a new auxiliary “ghost context” game that simplifies the regret analysis. 4) Empirical evaluations demonstrate substantial runtime improvements compared with Newton a
1) This paper assumes a linear reward (or loss) function. The framework does not yet handle nonlinear or generalized linear models. 2) The proposed efficient projection scheme is tailored to the m-set (and potentially partition matroids). For more general combinatorial structures, this computational savings may no longer hold. 3) The Shannon-entropy regularizer adds an additional $O(\log T)$ term in the adversarial regret, slightly weakening the asymptotic bound.
- The authors complemented their theoretical results with some experiments. Although this is not strictly mandatory for theoretical online learning submissions, it is a nice addition. - To the best of my understanding, the paper is correct - Best-of-both-world results are important and of practical value - The authors also propose a numerical speed-up that enhances applicability
- The topic of the paper is extremely narrow, as the problem studied is quite involved and specialized. I am unsure whether it will garner general interest among the vast ICLR audience. - It is very challenging to gain a comprehensive understanding of the state-of-the-art and to compare it with the results presented in the paper. There are many parameters (namely, the context dimension $d$, the number of base actions $K$, and the cardinality constraint $m$), as well as numerous results implied
1. This work seems to be the first work in the literature to study the BOBW linear contextual combinatorial bandit. 2. Most parts of this work are generally well-written.
1. **Motivation**: I understand that the linear contextual combinatorial problem might be of some practical interest. However, given (a large number of) previous advances for BOBW combinatorial bandits, linear bandits, and linear contextual bandits, at this point, studying the BOBW linear contextual combinatorial problem seems somewhat not so technically appealing. 2. **Novelty**: My main concern is about the novelty of this work, which is also partially related to the concern above. In recent y
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
