Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online   Fine-Tuning

Mitsuhiko Nakamoto; Yuexiang Zhai; Anikait Singh; Max Sobol Mark; Yi; Ma; Chelsea Finn; Aviral Kumar; Sergey Levine

arXiv:2303.05479·cs.LG·January 23, 2024·20 cites

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Mitsuhiko Nakamoto, Yuexiang Zhai, Anikait Singh, Max Sobol Mark, Yi, Ma, Chelsea Finn, Aviral Kumar, Sergey Levine

PDF

Open Access 3 Repos 1 Video

TL;DR

Cal-QL introduces a calibrated offline RL pre-training method that produces a conservative, well-scaled value function initialization, enabling efficient and effective online fine-tuning, outperforming existing methods on benchmark tasks.

Contribution

The paper proposes Cal-QL, a novel offline RL pre-training approach that learns a calibrated, conservative value function initialization for improved online fine-tuning.

Findings

01

Cal-QL outperforms state-of-the-art methods on 9 out of 11 fine-tuning benchmarks.

02

Cal-QL can be implemented with a simple modification to conservative Q-learning (CQL).

03

Calibrated value functions provide reliable bounds, facilitating better online adaptation.

Abstract

A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization from existing datasets followed by fast online fine-tuning with limited interaction. However, existing offline RL methods tend to behave poorly during fine-tuning. In this paper, we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities. Our approach, calibrated Q-learning (Cal-QL), accomplishes this by learning a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, in the sense that the learned Q-values are at a reasonable scale. We refer to this property as calibration, and define it formally as providing a lower bound on the true value function of the learned policy and an upper bound on the value of some other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Software Engineering Research

MethodsQ-Learning