Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Mitsuhiko Nakamoto, Yuexiang Zhai, Anikait Singh, Max Sobol Mark, Yi, Ma, Chelsea Finn, Aviral Kumar, Sergey Levine

TL;DR
Cal-QL introduces a calibrated offline RL pre-training method that produces a conservative, well-scaled value function initialization, enabling efficient and effective online fine-tuning, outperforming existing methods on benchmark tasks.
Contribution
The paper proposes Cal-QL, a novel offline RL pre-training approach that learns a calibrated, conservative value function initialization for improved online fine-tuning.
Findings
Cal-QL outperforms state-of-the-art methods on 9 out of 11 fine-tuning benchmarks.
Cal-QL can be implemented with a simple modification to conservative Q-learning (CQL).
Calibrated value functions provide reliable bounds, facilitating better online adaptation.
Abstract
A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization from existing datasets followed by fast online fine-tuning with limited interaction. However, existing offline RL methods tend to behave poorly during fine-tuning. In this paper, we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities. Our approach, calibrated Q-learning (Cal-QL), accomplishes this by learning a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, in the sense that the learned Q-values are at a reasonable scale. We refer to this property as calibration, and define it formally as providing a lower bound on the true value function of the learned policy and an upper bound on the value of some other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Software Engineering Research
MethodsQ-Learning
