MOReL : Model-Based Offline Reinforcement Learning

Rahul Kidambi; Aravind Rajeswaran; Praneeth Netrapalli; Thorsten; Joachims

arXiv:2005.05951·cs.LG·March 3, 2021·158 cites

MOReL : Model-Based Offline Reinforcement Learning

Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten, Joachims

PDF

Open Access 2 Repos 1 Video

TL;DR

MOReL introduces a model-based offline RL framework that learns a pessimistic MDP to ensure safe policy learning, achieving near-optimal performance with theoretical guarantees and strong empirical results.

Contribution

This work presents MOReL, a novel model-based offline RL algorithm that constructs a pessimistic MDP to improve policy learning and provides theoretical optimality guarantees.

Findings

01

MOReL matches or exceeds state-of-the-art offline RL benchmarks.

02

The framework is minimax optimal up to log factors.

03

Its modular design facilitates future improvements.

Abstract

In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies offline can greatly expand the applicability of RL, its data efficiency, and its experimental velocity. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP (P-MDP) using the offline dataset; and (b) learning a near-optimal policy in this P-MDP. The learned P-MDP has the property that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the P-MDP. This enables it to serve as a good surrogate for purposes of policy evaluation and learning, and overcome…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

MOReL: Model-Based Offline Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Data Stream Mining Techniques