Deconfounding Reinforcement Learning in Observational Settings

Chaochao Lu; Bernhard Sch\"olkopf; Jos\'e Miguel Hern\'andez-Lobato

arXiv:1812.10576·cs.LG·December 31, 2018·34 cites

Deconfounding Reinforcement Learning in Observational Settings

Chaochao Lu, Bernhard Sch\"olkopf, Jos\'e Miguel Hern\'andez-Lobato

PDF

Open Access 1 Repo

TL;DR

This paper introduces a deconfounding approach for reinforcement learning in observational data settings, extending existing algorithms like Actor-Critic, and provides a new benchmark to evaluate their effectiveness in confounded environments.

Contribution

It presents the first framework for deconfounding in RL with observational data and develops a new benchmark for evaluating such algorithms.

Findings

01

Deconfounding RL algorithms outperform traditional methods in confounded environments.

02

Extended Actor-Critic to its deconfounding variant.

03

Provided a new benchmark based on modified OpenAI Gym and MNIST datasets.

Abstract

We propose a general formulation for addressing reinforcement learning (RL) problems in settings with observational data. That is, we consider the problem of learning good policies solely from historical data in which unobserved factors (confounders) affect both observed actions and rewards. Our formulation allows us to extend a representative RL algorithm, the Actor-Critic method, to its deconfounding variant, with the methodology for this extension being easily applied to other RL algorithms. In addition to this, we develop a new benchmark for evaluating deconfounding RL algorithms by modifying the OpenAI Gym environments and the MNIST dataset. Using this benchmark, we demonstrate that the proposed algorithms are superior to traditional RL methods in confounded environments with observational data. To the best of our knowledge, this is the first time that confounders are taken into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CausalRL/DRL
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Causal Inference Techniques