Offline-to-online Reinforcement Learning for Image-based Grasping with Scarce Demonstrations
Bryan Chan, Anson Leung, James Bergstra

TL;DR
This paper introduces a novel offline-to-online reinforcement learning algorithm for image-based robotic grasping, achieving high success rates with scarce demonstrations and outperforming behavioral cloning and existing RL methods.
Contribution
The paper proposes a new O2O RL algorithm that replaces the target network with a neural tangent kernel-based regularization, enabling effective learning from limited demonstrations in real-world image-based tasks.
Findings
Achieves over 90% success rate within two hours of interaction.
Outperforms behavioral cloning and existing RL algorithms with only 50 demonstrations.
Effective in real-life robotic grasping with scarce data.
Abstract
Offline-to-online reinforcement learning (O2O RL) aims to obtain a continually improving policy as it interacts with the environment, while ensuring the initial policy behaviour is satisficing. This satisficing behaviour is necessary for robotic manipulation where random exploration can be costly due to catastrophic failures and time. O2O RL is especially compelling when we can only obtain a scarce amount of (potentially suboptimal) demonstrationsa scenario where behavioural cloning (BC) is known to suffer from distribution shift. Previous works have outlined the challenges in applying O2O RL algorithms under the image-based environments. In this work, we propose a novel O2O RL algorithm that can learn in a real-life image-based robotic vacuum grasping task with a small number of demonstrations where BC fails majority of the time. The proposed algorithm replaces the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Robotic Path Planning Algorithms
