Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning
Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

TL;DR
This paper presents a comprehensive empirical benchmark for off-policy policy evaluation in reinforcement learning, emphasizing diverse experimental designs and providing practical guidelines for real-world applications.
Contribution
It introduces the Caltech OPE Benchmarking Suite (COBS), a standardized platform for stress testing OPE methods and analyzing their performance across various scenarios.
Findings
Diverse experimental setups reveal strengths and weaknesses of different OPE methods.
Guidelines for practitioners to select appropriate OPE techniques based on empirical evidence.
Open-source software facilitates reproducibility and further research in OPE evaluation.
Abstract
We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasing interest in deploying learning-based methods, there has been a flurry of recent proposals for OPE method, leading to a need for standardized empirical analyses. Our work takes a strong focus on diversity of experimental design to enable stress testing of OPE methods. We provide a comprehensive benchmarking suite to study the interplay of different attributes on method performance. We distill the results into a summarized set of guidelines for OPE in practice. Our software package, the Caltech OPE Benchmarking Suite (COBS), is open-sourced and we invite interested researchers to further contribute to the benchmark.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Software Reliability and Analysis Research · Formal Methods in Verification
