MOReL : Model-Based Offline Reinforcement Learning
Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten, Joachims

TL;DR
MOReL introduces a model-based offline RL framework that learns a pessimistic MDP to ensure safe policy learning, achieving near-optimal performance with theoretical guarantees and strong empirical results.
Contribution
This work presents MOReL, a novel model-based offline RL algorithm that constructs a pessimistic MDP to improve policy learning and provides theoretical optimality guarantees.
Findings
MOReL matches or exceeds state-of-the-art offline RL benchmarks.
The framework is minimax optimal up to log factors.
Its modular design facilitates future improvements.
Abstract
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies offline can greatly expand the applicability of RL, its data efficiency, and its experimental velocity. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP (P-MDP) using the offline dataset; and (b) learning a near-optimal policy in this P-MDP. The learned P-MDP has the property that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the P-MDP. This enables it to serve as a good surrogate for purposes of policy evaluation and learning, and overcome…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Data Stream Mining Techniques
