Benchmarks for Deep Off-Policy Evaluation

Justin Fu; Mohammad Norouzi; Ofir Nachum; George Tucker; Ziyu Wang,; Alexander Novikov; Mengjiao Yang; Michael R. Zhang; Yutian Chen; Aviral; Kumar; Cosmin Paduraru; Sergey Levine; Tom Le Paine

arXiv:2103.16596·cs.LG·April 1, 2021·25 cites

Benchmarks for Deep Off-Policy Evaluation

Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang,, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral, Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces a comprehensive benchmark for off-policy evaluation in reinforcement learning, providing challenging tasks, datasets, and policies to standardize progress measurement and facilitate comparison of OPE methods.

Contribution

It presents a unified benchmark with diverse high-dimensional control tasks and policies, addressing the lack of standardized evaluation tools in off-policy evaluation research.

Findings

01

State-of-the-art algorithms evaluated on new benchmark

02

Benchmark includes challenging high-dimensional control tasks

03

Open-source datasets and code provided for future research

Abstract

Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making. The ability to learn offline is particularly important in many real-world domains, such as in healthcare, recommender systems, or robotics, where online data collection is an expensive and potentially dangerous process. Being able to accurately evaluate and select high-performing policies without requiring online interaction could yield significant benefits in safety, time, and cost for these applications. While many OPE methods have been proposed in recent years, comparing results between papers is difficult because currently there is a lack of a comprehensive and unified benchmark, and measuring algorithmic progress has been challenging due to the lack of difficult evaluation tasks. In order to address this gap, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Benchmarks for Deep Off-Policy Evaluation· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms