Deconfounding Reinforcement Learning in Observational Settings
Chaochao Lu, Bernhard Sch\"olkopf, Jos\'e Miguel Hern\'andez-Lobato

TL;DR
This paper introduces a deconfounding approach for reinforcement learning in observational data settings, extending existing algorithms like Actor-Critic, and provides a new benchmark to evaluate their effectiveness in confounded environments.
Contribution
It presents the first framework for deconfounding in RL with observational data and develops a new benchmark for evaluating such algorithms.
Findings
Deconfounding RL algorithms outperform traditional methods in confounded environments.
Extended Actor-Critic to its deconfounding variant.
Provided a new benchmark based on modified OpenAI Gym and MNIST datasets.
Abstract
We propose a general formulation for addressing reinforcement learning (RL) problems in settings with observational data. That is, we consider the problem of learning good policies solely from historical data in which unobserved factors (confounders) affect both observed actions and rewards. Our formulation allows us to extend a representative RL algorithm, the Actor-Critic method, to its deconfounding variant, with the methodology for this extension being easily applied to other RL algorithms. In addition to this, we develop a new benchmark for evaluating deconfounding RL algorithms by modifying the OpenAI Gym environments and the MNIST dataset. Using this benchmark, we demonstrate that the proposed algorithms are superior to traditional RL methods in confounded environments with observational data. To the best of our knowledge, this is the first time that confounders are taken into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Causal Inference Techniques
