Combinatorial Rising Bandits
Seockbean Song, Youngsik Yoon, Siwei Wang, Wei Chen, Jungseul Ok

TL;DR
This paper introduces the Combinatorial Rising Bandit framework to model and optimize sequential decision-making where rewards improve over time and propagate across shared components, with a new algorithm demonstrating strong theoretical and empirical performance.
Contribution
It proposes the CRB framework and the CRUCB algorithm, addressing the challenge of rising and propagating rewards in combinatorial bandit problems, with proven regret bounds and empirical validation.
Findings
CRUCB achieves tight regret bounds.
CRUCB performs effectively in deep reinforcement learning environments.
Empirical results validate the theoretical guarantees.
Abstract
Combinatorial online learning is a fundamental task for selecting the optimal action (or super arm) as a combination of base arms in sequential interactions with systems providing stochastic rewards. It is applicable to diverse domains such as robotics, social advertising, network routing, and recommendation systems. In many real-world scenarios, we often encounter rising rewards, where playing a base arm not only provides an instantaneous reward but also contributes to the enhancement of future rewards, e.g., robots improving through practice and social influence strengthening in the history of successful recommendations. Crucially, these enhancements may propagate to multiple super arms that share the same base arms, introducing dependencies beyond the scope of existing bandit models. To address this gap, we introduce the Combinatorial Rising Bandit (CRB) framework and propose a…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper is well structured, with a clear flow and proper introduction of concepts, making it easy to follow even for readers who may not work directly in this area. 2. The theoretical results are sound. I think this problem is genuinely challenging given both the combinatorial nature and the non-stationary reward that changes as arms are pulled over time. The authors set a good pace to introduce the theoretical results. 3. The simulation part is clear and effectively supports the theoretica
See Questions below.
The work introduces a new bandits setting that combines combinatorial and rising bandits. The setting is interesting and seems to have a wide range of application. The authors proposed a new algorithm CRUCB for this setting showcase its performance with a regret upper bound in $K$ and $L$. The work also includes a lower bound to highlight the difficulty of this setting and to further complement the results of CRUCB's regret upper bound. The authors provide a good experimental setup to valid
The Solver is treated as an oracle that exist to solve this combinatorial problem and cost associated is not discussed. Many combinatorial tasks in real world practice exhibits non-monotonic rewards but the work assumes $r$ as a monotonic function. The assumption of reward formulation being monotonic in $S$ \& $\mu$ limits its use case and generality. *also refer questions section.
1. This seems to be the first work in the literature to model and resolve the combinatorial rising bandit problem. 2. The paper is generally well-written.
1. **Novelty**: Though the problem formulation is relatively new, the technical novelty seems somewhat limited. To me, the core part of the proposed algorithm is to use a UCB constructed using exploration bonus, the empirical mean, and the predicted improvement in the past sliding window, which is exactly the way [1] deals with rising multi-armed bandits (MABs). Of course, their method cannot deal with the combinatorial rising bandit problem. However, it is not clear to me whether there are addi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research
MethodsBalanced Selection
