Benchmarks for Reinforcement Learning with Biased Offline Data and Imperfect Simulators
Ori Linial, Guy Tennenholtz, Uri Shalit

TL;DR
This paper introduces benchmarks for offline reinforcement learning that combine biased data with imperfect simulators, addressing key challenges like modeling errors and partial observability to guide future research.
Contribution
It presents B4MRL, a set of dataset-simulator benchmarks designed to evaluate RL methods under realistic challenges involving imperfect simulators and biased offline data.
Findings
Benchmarks highlight the importance of addressing simulator errors.
Results show current methods struggle with partial observability.
Benchmarks facilitate future research in realistic offline RL scenarios.
Abstract
In many reinforcement learning (RL) applications one cannot easily let the agent act in the world; this is true for autonomous vehicles, healthcare applications, and even some recommender systems, to name a few examples. Offline RL provides a way to train agents without real-world exploration, but is often faced with biases due to data distribution shifts, limited coverage, and incomplete representation of the environment. To address these issues, practical applications have tried to combine simulators with grounded offline data, using so-called hybrid methods. However, constructing a reliable simulator is in itself often challenging due to intricate system complexities as well as missing or incomplete information. In this work, we outline four principal challenges for combining offline data with imperfect simulators in RL: simulator modeling error, partial observability, state and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
