Benchmarks for Deep Off-Policy Evaluation
Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang,, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral, Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine

TL;DR
This paper introduces a comprehensive benchmark for off-policy evaluation in reinforcement learning, providing challenging tasks, datasets, and policies to standardize progress measurement and facilitate comparison of OPE methods.
Contribution
It presents a unified benchmark with diverse high-dimensional control tasks and policies, addressing the lack of standardized evaluation tools in off-policy evaluation research.
Findings
State-of-the-art algorithms evaluated on new benchmark
Benchmark includes challenging high-dimensional control tasks
Open-source datasets and code provided for future research
Abstract
Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making. The ability to learn offline is particularly important in many real-world domains, such as in healthcare, recommender systems, or robotics, where online data collection is an expensive and potentially dangerous process. Being able to accurately evaluate and select high-performing policies without requiring online interaction could yield significant benefits in safety, time, and cost for these applications. While many OPE methods have been proposed in recent years, comparing results between papers is difficult because currently there is a lack of a comprehensive and unified benchmark, and measuring algorithmic progress has been challenging due to the lack of difficult evaluation tasks. In order to address this gap, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
