MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning
Rafael Rafailov, Kyle Hatch, Victor Kolev, John D. Martin, Mariano, Phielipp, Chelsea Finn

TL;DR
This paper introduces MOTO, a novel on-policy model-based reinforcement learning method that effectively leverages offline data for high-dimensional robot tasks, enabling successful pixel-based manipulation and outperforming existing approaches.
Contribution
MOTO is the first method to enable offline pre-training and online fine-tuning for pixel-based robot manipulation tasks using model-based RL.
Findings
Successfully solves MetaWorld benchmark tasks.
Achieves complete pixel-based manipulation in Franka Kitchen environment.
Outperforms existing offline and online RL methods.
Abstract
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations in the context of realistic robot tasks. Recent offline model-free approaches successfully use online fine-tuning to either improve the performance of the agent over the data collection policy or adapt to novel tasks. At the same time, model-based RL algorithms have achieved significant progress in sample efficiency and the complexity of the tasks they can solve, yet remain under-utilized in the fine-tuning setting. In this work, we argue that existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains due to issues with distribution shifts, off-dynamics data, and non-stationary rewards. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
