MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot   Learning

Rafael Rafailov; Kyle Hatch; Victor Kolev; John D. Martin; Mariano; Phielipp; Chelsea Finn

arXiv:2401.03306·cs.LG·January 9, 2024·1 cites

MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning

Rafael Rafailov, Kyle Hatch, Victor Kolev, John D. Martin, Mariano, Phielipp, Chelsea Finn

PDF

Open Access

TL;DR

This paper introduces MOTO, a novel on-policy model-based reinforcement learning method that effectively leverages offline data for high-dimensional robot tasks, enabling successful pixel-based manipulation and outperforming existing approaches.

Contribution

MOTO is the first method to enable offline pre-training and online fine-tuning for pixel-based robot manipulation tasks using model-based RL.

Findings

01

Successfully solves MetaWorld benchmark tasks.

02

Achieves complete pixel-based manipulation in Franka Kitchen environment.

03

Outperforms existing offline and online RL methods.

Abstract

We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations in the context of realistic robot tasks. Recent offline model-free approaches successfully use online fine-tuning to either improve the performance of the agent over the data collection policy or adapt to novel tasks. At the same time, model-based RL algorithms have achieved significant progress in sample efficiency and the complexity of the tasks they can solve, yet remain under-utilized in the fine-tuning setting. In this work, we argue that existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains due to issues with distribution shifts, off-dynamics data, and non-stationary rewards. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research