How to Spend Your Robot Time: Bridging Kickstarting and Offline   Reinforcement Learning for Vision-based Robotic Manipulation

Alex X. Lee; Coline Devin; Jost Tobias Springenberg; Yuxiang Zhou,; Thomas Lampe; Abbas Abdolmaleki; Konstantinos Bousmalis

arXiv:2205.03353·cs.RO·May 9, 2022·1 cites

How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation

Alex X. Lee, Coline Devin, Jost Tobias Springenberg, Yuxiang Zhou,, Thomas Lampe, Abbas Abdolmaleki, Konstantinos Bousmalis

PDF

Open Access

TL;DR

This paper introduces two reinforcement learning algorithms that leverage suboptimal teacher policies and their collected data to reduce online interaction in vision-based robotic manipulation tasks, improving training efficiency.

Contribution

The work develops novel RL algorithms that utilize both teacher action distributions and data, enabling more efficient learning with limited online interactions in robotic manipulation.

Findings

01

Training on combined teacher and student data yields best performance with limited data.

02

Using suboptimal teachers can significantly accelerate learning in robotic stacking tasks.

03

Offline RL from teacher rollouts is effective with sufficient data.

Abstract

Reinforcement learning (RL) has been shown to be effective at learning control from experience. However, RL typically requires a large amount of online interaction with the environment. This limits its applicability to real-world settings, such as in robotics, where such interaction is expensive. In this work we investigate ways to minimize online interactions in a target task, by reusing a suboptimal policy we might have access to, for example from training on related prior tasks, or in simulation. To this end, we develop two RL algorithms that can speed up training by using not only the action distributions of teacher policies, but also data collected by such policies on the task at hand. We conduct a thorough experimental study of how to use suboptimal teachers on a challenging robotic manipulation benchmark on vision-based stacking with diverse objects. We compare our methods to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Auction Theory and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings