TL;DR
Q2RL is an offline-to-online reinforcement learning method that extracts Q-functions from behavior cloning policies, enabling efficient robot learning and outperforming state-of-the-art methods in manipulation tasks.
Contribution
The paper introduces Q2RL, a novel algorithm combining Q-estimation and Q-gating to improve offline-to-online learning for robotic manipulation.
Findings
Q2RL outperforms SOTA offline-to-online baselines in success rate and convergence time.
Q2RL enables on-robot RL for contact-rich tasks within 1-2 hours.
Q2RL achieves up to 100% success rate and 3.75x improvement over BC.
Abstract
Behavior Cloning (BC) has emerged as a highly effective paradigm for robot learning. However, BC lacks a self-guided mechanism for online improvement after demonstrations have been collected. Existing offline-to-online learning methods often cause policies to replace previously learned good actions due to a distribution mismatch between offline data and online learning. In this work, we propose Q2RL, Q-Estimation and Q-Gating from BC for Reinforcement Learning, an algorithm for efficient offline-to-online learning. Our method consists of two parts: (1) Q-Estimation extracts a Q-function from a BC policy using a few interaction steps with the environment, followed by online RL with (2) Q-Gating, which switches between BC and RL policy actions based on their respective Q-values to collect samples for RL policy training. Across manipulation tasks from D4RL and robomimic benchmarks, Q2RL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
