Causal Markov Decision Processes: Learning Good Interventions Efficiently
Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

TL;DR
This paper introduces causal Markov Decision Processes (C-MDPs), a new framework combining causal structures with standard MDPs, and proposes algorithms that leverage causal knowledge to improve learning efficiency and regret bounds.
Contribution
The paper develops the C-UCBVI algorithm exploiting causal structures in C-MDPs, providing improved regret bounds that scale with causal graph properties rather than action space size.
Findings
C-UCBVI achieves regret bounds independent of action space size.
CF-UCBVI extends to factored MDPs with exponentially reduced regret.
Empirical results validate the theoretical advantages of causal algorithms.
Abstract
We introduce causal Markov Decision Processes (C-MDPs), a new formalism for sequential decision making which combines the standard MDP formulation with causal structures over state transition and reward functions. Many contemporary and emerging application areas such as digital healthcare and digital marketing can benefit from modeling with C-MDPs due to the causal mechanisms underlying the relationship between interventions and states/rewards. We propose the causal upper confidence bound value iteration (C-UCBVI) algorithm that exploits the causal structure in C-MDPs and improves the performance of standard reinforcement learning algorithms that do not take causal knowledge into account. We prove that C-UCBVI satisfies an regret bound, where is the the total time steps, is the episodic horizon, and is the cardinality of the state space. Notably, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Advanced Causal Inference Techniques
