
TL;DR
This paper introduces the Chronological Causal Bandit (CCB), a model for sequential decision-making in dynamic causal systems where multiple causal bandits operate over time, sharing information to adapt to changing effects.
Contribution
The paper proposes the CCB framework, enabling transfer of information across causal bandits in a temporal setting with evolving causal effects.
Findings
Demonstrated on a toy problem showing potential for dynamic causal decision-making.
Highlights the importance of temporal information transfer in causal bandit settings.
Abstract
This paper studies an instance of the multi-armed bandit (MAB) problem, specifically where several causal MABs operate chronologically in the same dynamical system. Practically the reward distribution of each bandit is governed by the same non-trivial dependence structure, which is a dynamic causal model. Dynamic because we allow for each causal MAB to depend on the preceding MAB and in doing so are able to transfer information between agents. Our contribution, the Chronological Causal Bandit (CCB), is useful in discrete decision-making settings where the causal effects are changing across time and can be informed by earlier interventions in the same system. In this paper, we present some early findings of the CCB as demonstrated on a toy problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Auction Theory and Applications
