Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related Rewards
Behzad Nourani-Koliji, Steven Bilaj, Amir Rezaei Balef, Setareh, Maghsudi

TL;DR
This paper introduces an adaptive UCB-based algorithm for nonstationary combinatorial semi-bandit problems with causally related rewards, incorporating change detection and graph structure tracking to improve decision-making.
Contribution
It proposes a novel algorithm combining change-point detection, group restart, and causal graph tracking for nonstationary semi-bandit environments, with theoretical regret bounds.
Findings
The algorithm outperforms state-of-the-art benchmarks in real-world experiments.
Theoretical regret bounds account for structural and distributional changes.
Numerical results demonstrate the method's effectiveness in dynamic environments.
Abstract
We study the piecewise stationary combinatorial semi-bandit problem with causally related rewards. In our nonstationary environment, variations in the base arms' distributions, causal relationships between rewards, or both, change the reward generation process. In such an environment, an optimal decision-maker must follow both sources of change and adapt accordingly. The problem becomes aggravated in the combinatorial semi-bandit setting, where the decision-maker only observes the outcome of the selected bundle of arms. The core of our proposed policy is the Upper Confidence Bound (UCB) algorithm. We assume the agent relies on an adaptive approach to overcome the challenge. More specifically, it employs a change-point detector based on the Generalized Likelihood Ratio (GLR) test. Besides, we introduce the notion of group restart as a new alternative restarting strategy in the decision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques
MethodsBalanced Selection
