Benchmarks for Reinforcement Learning with Biased Offline Data and   Imperfect Simulators

Ori Linial; Guy Tennenholtz; Uri Shalit

arXiv:2407.00806·cs.LG·July 2, 2024

Benchmarks for Reinforcement Learning with Biased Offline Data and Imperfect Simulators

Ori Linial, Guy Tennenholtz, Uri Shalit

PDF

Open Access

TL;DR

This paper introduces benchmarks for offline reinforcement learning that combine biased data with imperfect simulators, addressing key challenges like modeling errors and partial observability to guide future research.

Contribution

It presents B4MRL, a set of dataset-simulator benchmarks designed to evaluate RL methods under realistic challenges involving imperfect simulators and biased offline data.

Findings

01

Benchmarks highlight the importance of addressing simulator errors.

02

Results show current methods struggle with partial observability.

03

Benchmarks facilitate future research in realistic offline RL scenarios.

Abstract

In many reinforcement learning (RL) applications one cannot easily let the agent act in the world; this is true for autonomous vehicles, healthcare applications, and even some recommender systems, to name a few examples. Offline RL provides a way to train agents without real-world exploration, but is often faced with biases due to data distribution shifts, limited coverage, and incomplete representation of the environment. To address these issues, practical applications have tried to combine simulators with grounded offline data, using so-called hybrid methods. However, constructing a reliable simulator is in itself often challenging due to intricate system complexities as well as missing or incomplete information. In this work, we outline four principal challenges for combining offline data with imperfect simulators in RL: simulator modeling error, partial observability, state and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics