Model-Based Offline Planning
Arthur Argenson, Gabriel Dulac-Arnold

TL;DR
This paper introduces Model-Based Offline Planning (MBOP), a method that learns models from offline data to enable planning-based control, achieving near-optimal policies and zero-shot goal conditioning in robotics tasks without system interaction.
Contribution
The paper presents a novel offline learning approach that generates controllable models for planning, improving interpretability and constraint handling over model-free policies.
Findings
Achieves near-optimal policies with minimal real-system interaction.
Demonstrates zero-shot goal-conditioned policy generation.
Effectively leverages planning to respect environmental constraints.
Abstract
Offline learning is a key part of making reinforcement learning (RL) useable in real systems. Offline RL looks at scenarios where there is data from a system's operation, but no direct access to the system when learning a policy. Recent work on training RL policies from offline data has shown results both with model-free policies learned directly from the data, or with planning on top of learnt models of the data. Model-free policies tend to be more performant, but are more opaque, harder to command externally, and less easy to integrate into larger systems. We propose an offline learner that generates a model that can be used to control the system directly through planning. This allows us to have easily controllable policies directly from data, without ever interacting with the system. We show the performance of our algorithm, Model-Based Offline Planning (MBOP) on a series of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Robot Manipulation and Learning
