Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related   Rewards

Behzad Nourani-Koliji; Steven Bilaj; Amir Rezaei Balef; Setareh; Maghsudi

arXiv:2307.14138·cs.LG·July 27, 2023

Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related Rewards

Behzad Nourani-Koliji, Steven Bilaj, Amir Rezaei Balef, Setareh, Maghsudi

PDF

Open Access

TL;DR

This paper introduces an adaptive UCB-based algorithm for nonstationary combinatorial semi-bandit problems with causally related rewards, incorporating change detection and graph structure tracking to improve decision-making.

Contribution

It proposes a novel algorithm combining change-point detection, group restart, and causal graph tracking for nonstationary semi-bandit environments, with theoretical regret bounds.

Findings

01

The algorithm outperforms state-of-the-art benchmarks in real-world experiments.

02

Theoretical regret bounds account for structural and distributional changes.

03

Numerical results demonstrate the method's effectiveness in dynamic environments.

Abstract

We study the piecewise stationary combinatorial semi-bandit problem with causally related rewards. In our nonstationary environment, variations in the base arms' distributions, causal relationships between rewards, or both, change the reward generation process. In such an environment, an optimal decision-maker must follow both sources of change and adapt accordingly. The problem becomes aggravated in the combinatorial semi-bandit setting, where the decision-maker only observes the outcome of the selected bundle of arms. The core of our proposed policy is the Upper Confidence Bound (UCB) algorithm. We assume the agent relies on an adaptive approach to overcome the challenge. More specifically, it employs a change-point detector based on the Generalized Likelihood Ratio (GLR) test. Besides, we introduce the notion of group restart as a new alternative restarting strategy in the decision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques

MethodsBalanced Selection