Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in   Partially Observed Markov Decision Processes

Andrew Bennett; Nathan Kallus

arXiv:2110.15332·cs.LG·March 24, 2023

Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

Andrew Bennett, Nathan Kallus

PDF

Open Access 1 Repo

TL;DR

This paper introduces proximal reinforcement learning (PRL), a method for off-policy evaluation in partially observed Markov decision processes, addressing confounding bias in observational data like healthcare and education.

Contribution

It extends proximal causal inference to POMDPs, providing conditions for identification and constructing efficient estimators for policy evaluation.

Findings

01

PRL outperforms existing methods in simulations.

02

PRL provides accurate policy value estimates in healthcare data.

03

The framework is applicable to real-world observational datasets.

Abstract

In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors, inducing confounding and biasing estimates derived under the assumption of a perfect Markov decision process (MDP) model. Here we tackle this by considering off-policy evaluation in a partially observed MDP (POMDP). Specifically, we consider estimating the value of a given target policy in a POMDP given trajectories with only partial state observations generated by a different and unknown policy that may depend on the unobserved state. We tackle two questions: what conditions allow us to identify the target policy value from the observed data and, given identification, how to best estimate it. To answer these, we extend the framework of proximal causal inference to our POMDP setting, providing a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

causalml/proximalrl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Hemodynamic Monitoring and Therapy · Health Systems, Economic Evaluations, Quality of Life