Optimistic Critic Reconstruction and Constrained Fine-Tuning for General   Offline-to-Online RL

Qin-Wen Luo; Ming-Kun Xie; Ye-Wen Wang; Sheng-Jun Huang

arXiv:2412.18855·cs.LG·December 30, 2024

Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL

Qin-Wen Luo, Ming-Kun Xie, Ye-Wen Wang, Sheng-Jun Huang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a general offline-to-online reinforcement learning method that re-evaluates and calibrates critics to handle dataset-environment mismatches, enabling stable and efficient online fine-tuning from any offline policy.

Contribution

It proposes a novel approach to handle evaluation and improvement mismatches in O2O RL, allowing for general application across various offline and online methods.

Findings

01

Achieves stable performance improvement on multiple tasks

02

Outperforms state-of-the-art O2O RL methods

03

Effectively handles dataset-environment mismatches

Abstract

Offline-to-online (O2O) reinforcement learning (RL) provides an effective means of leveraging an offline pre-trained policy as initialization to improve performance rapidly with limited online interactions. Recent studies often design fine-tuning strategies for a specific offline RL method and cannot perform general O2O learning from any offline method. To deal with this problem, we disclose that there are evaluation and improvement mismatches between the offline dataset and the online environment, which hinders the direct application of pre-trained policies to online fine-tuning. In this paper, we propose to handle these two mismatches simultaneously, which aims to achieve general O2O learning from any offline method to any online method. Before online fine-tuning, we re-evaluate the pessimistic critic trained on the offline dataset in an optimistic way and then calibrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

QinwenLuo/OCR-CFT
pytorchOfficial

Videos

Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL· slideslive

Taxonomy

TopicsEmbedded Systems Design Techniques · Iterative Learning Control Systems · VLSI and Analog Circuit Testing