Offline-to-online Reinforcement Learning for Image-based Grasping with   Scarce Demonstrations

Bryan Chan; Anson Leung; James Bergstra

arXiv:2410.14957·cs.RO·January 24, 2025

Offline-to-online Reinforcement Learning for Image-based Grasping with Scarce Demonstrations

Bryan Chan, Anson Leung, James Bergstra

PDF

Open Access

TL;DR

This paper introduces a novel offline-to-online reinforcement learning algorithm for image-based robotic grasping, achieving high success rates with scarce demonstrations and outperforming behavioral cloning and existing RL methods.

Contribution

The paper proposes a new O2O RL algorithm that replaces the target network with a neural tangent kernel-based regularization, enabling effective learning from limited demonstrations in real-world image-based tasks.

Findings

01

Achieves over 90% success rate within two hours of interaction.

02

Outperforms behavioral cloning and existing RL algorithms with only 50 demonstrations.

03

Effective in real-life robotic grasping with scarce data.

Abstract

Offline-to-online reinforcement learning (O2O RL) aims to obtain a continually improving policy as it interacts with the environment, while ensuring the initial policy behaviour is satisficing. This satisficing behaviour is necessary for robotic manipulation where random exploration can be costly due to catastrophic failures and time. O2O RL is especially compelling when we can only obtain a scarce amount of (potentially suboptimal) demonstrations $\unicode x 2014$ a scenario where behavioural cloning (BC) is known to suffer from distribution shift. Previous works have outlined the challenges in applying O2O RL algorithms under the image-based environments. In this work, we propose a novel O2O RL algorithm that can learn in a real-life image-based robotic vacuum grasping task with a small number of demonstrations where BC fails majority of the time. The proposed algorithm replaces the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Robotic Path Planning Algorithms