How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation
Alex X. Lee, Coline Devin, Jost Tobias Springenberg, Yuxiang Zhou,, Thomas Lampe, Abbas Abdolmaleki, Konstantinos Bousmalis

TL;DR
This paper introduces two reinforcement learning algorithms that leverage suboptimal teacher policies and their collected data to reduce online interaction in vision-based robotic manipulation tasks, improving training efficiency.
Contribution
The work develops novel RL algorithms that utilize both teacher action distributions and data, enabling more efficient learning with limited online interactions in robotic manipulation.
Findings
Training on combined teacher and student data yields best performance with limited data.
Using suboptimal teachers can significantly accelerate learning in robotic stacking tasks.
Offline RL from teacher rollouts is effective with sufficient data.
Abstract
Reinforcement learning (RL) has been shown to be effective at learning control from experience. However, RL typically requires a large amount of online interaction with the environment. This limits its applicability to real-world settings, such as in robotics, where such interaction is expensive. In this work we investigate ways to minimize online interactions in a target task, by reusing a suboptimal policy we might have access to, for example from training on related prior tasks, or in simulation. To this end, we develop two RL algorithms that can speed up training by using not only the action distributions of teacher policies, but also data collected by such policies on the task at hand. We conduct a thorough experimental study of how to use suboptimal teachers on a challenging robotic manipulation benchmark on vision-based stacking with diverse objects. We compare our methods to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Auction Theory and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
