Double Reinforcement Learning for Efficient Off-Policy Evaluation in   Markov Decision Processes

Nathan Kallus; Masatoshi Uehara

arXiv:1908.08526·cs.LG·June 8, 2020·49 cites

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes

Nathan Kallus, Masatoshi Uehara

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new double reinforcement learning estimator for off-policy evaluation in Markov decision processes, achieving efficiency and robustness by leveraging memoryless properties.

Contribution

The paper develops the first semiparametric efficient estimator for OPE in MDPs, combining cross-fold q-function and density ratio estimation, with proven efficiency and robustness.

Findings

01

DRL estimator is efficient with fourth-root convergence rates.

02

DRL is doubly robust when only one component is consistent.

03

Empirical results show performance gains due to memorylessness.

Abstract

Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. We consider for the first time the semiparametric efficiency limits of OPE in Markov decision processes (MDPs), where actions, rewards, and states are memoryless. We show existing OPE estimators may fail to be efficient in this setting. We develop a new estimator based on cross-fold estimation of $q$ -functions and marginalized density ratios, which we term double reinforcement learning (DRL). We show that DRL is efficient when both components are estimated at fourth-root rates and is also doubly robust when only one component is consistent. We investigate these properties empirically and demonstrate the performance benefits due to harnessing memorylessness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CausalML/DoubleReinforcementLearningMDP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Reinforcement Learning in Robotics · Gene Regulatory Network Analysis