When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

Lakshita Dodeja; Ondrej Biza; Shivam Vats; Stephen Hart; Stefanie Tellex; Robin Walters; Karl Schmeckpeper; Thomas Weng

arXiv:2605.05172·cs.RO·May 7, 2026

When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

Lakshita Dodeja, Ondrej Biza, Shivam Vats, Stephen Hart, Stefanie Tellex, Robin Walters, Karl Schmeckpeper, Thomas Weng

PDF

1 Repo

TL;DR

Q2RL is an offline-to-online reinforcement learning method that extracts Q-functions from behavior cloning policies, enabling efficient robot learning and outperforming state-of-the-art methods in manipulation tasks.

Contribution

The paper introduces Q2RL, a novel algorithm combining Q-estimation and Q-gating to improve offline-to-online learning for robotic manipulation.

Findings

01

Q2RL outperforms SOTA offline-to-online baselines in success rate and convergence time.

02

Q2RL enables on-robot RL for contact-rich tasks within 1-2 hours.

03

Q2RL achieves up to 100% success rate and 3.75x improvement over BC.

Abstract

Behavior Cloning (BC) has emerged as a highly effective paradigm for robot learning. However, BC lacks a self-guided mechanism for online improvement after demonstrations have been collected. Existing offline-to-online learning methods often cause policies to replace previously learned good actions due to a distribution mismatch between offline data and online learning. In this work, we propose Q2RL, Q-Estimation and Q-Gating from BC for Reinforcement Learning, an algorithm for efficient offline-to-online learning. Our method consists of two parts: (1) Q-Estimation extracts a Q-function from a BC policy using a few interaction steps with the environment, followed by online RL with (2) Q-Gating, which switches between BC and RL policy actions based on their respective Q-values to collect samples for RL policy training. Across manipulation tasks from D4RL and robomimic benchmarks, Q2RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://pages.rai-inst.com/q2rl_website
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.