Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related Rewards
Saeed Ghoorchian, Setareh Maghsudi

TL;DR
This paper addresses the challenge of decision-making in non-stationary, delayed, combinatorial semi-bandit environments with causally related rewards, proposing a new algorithm that learns causal structures to improve long-term reward maximization.
Contribution
It introduces a novel policy that learns causal dependencies from delayed feedback in non-stationary settings, with proven regret bounds and empirical validation.
Findings
The proposed method effectively learns causal relations despite delays.
The algorithm adapts to environmental drifts while optimizing rewards.
Numerical analysis demonstrates practical utility in real-world data.
Abstract
Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This problem becomes significantly challenging in a non-stationary environment with structural dependencies amongst the reward distributions associated with the arms. Therefore, besides adapting to delays and environmental changes, learning the causal relations alleviates the adverse effects of feedback delay on the decision-making process. We formalize the described setting as a non-stationary and delayed combinatorial semi-bandit problem with causally related rewards. We model the causal relations by a directed graph in a stationary structural equation model. The agent maximizes the long-term average payoff, defined as a linear function of the base arms'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · COVID-19 epidemiological studies
MethodsBalanced Selection
