Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related   Rewards

Saeed Ghoorchian; Setareh Maghsudi

arXiv:2307.09093·cs.LG·July 19, 2023·1 cites

Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related Rewards

Saeed Ghoorchian, Setareh Maghsudi

PDF

Open Access

TL;DR

This paper addresses the challenge of decision-making in non-stationary, delayed, combinatorial semi-bandit environments with causally related rewards, proposing a new algorithm that learns causal structures to improve long-term reward maximization.

Contribution

It introduces a novel policy that learns causal dependencies from delayed feedback in non-stationary settings, with proven regret bounds and empirical validation.

Findings

01

The proposed method effectively learns causal relations despite delays.

02

The algorithm adapts to environmental drifts while optimizing rewards.

03

Numerical analysis demonstrates practical utility in real-world data.

Abstract

Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This problem becomes significantly challenging in a non-stationary environment with structural dependencies amongst the reward distributions associated with the arms. Therefore, besides adapting to delays and environmental changes, learning the causal relations alleviates the adverse effects of feedback delay on the decision-making process. We formalize the described setting as a non-stationary and delayed combinatorial semi-bandit problem with causally related rewards. We model the causal relations by a directed graph in a stationary structural equation model. The agent maximizes the long-term average payoff, defined as a linear function of the base arms'…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · COVID-19 epidemiological studies

MethodsBalanced Selection