Causal Reinforcement Learning using Observational and Interventional Data
Maxime Gasse, Damien Grasset, Guillaume Gaudron, Pierre-Yves Oudeyer

TL;DR
This paper introduces a causal inference framework for reinforcement learning that combines observational and interventional data, improving model learning and performance in POMDPs with hidden information.
Contribution
It proposes a novel latent-based causal transition model that integrates offline observational data with online interventional data, leveraging do-calculus principles.
Findings
The method achieves better generalization with offline data.
It is proven to be correct and efficient in theory.
Empirical results on synthetic problems demonstrate effectiveness.
Abstract
Learning efficiently a causal model of the environment is a key challenge of model-based RL agents operating in POMDPs. We consider here a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment (interventional data), but has also access to a large collection of offline experiences, obtained by observing another agent interacting with the environment (observational data). A key ingredient, that makes this situation non-trivial, is that we allow the observed agent to interact with the environment based on hidden information, which is not observed by the learning agent. We then ask the following questions: can the online and offline experiences be safely combined for learning a causal model ? And can we expect the offline experiences to improve the agent's performances ? To answer these questions, we import ideas…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Bayesian Modeling and Causal Inference · Machine Learning and Algorithms
