SAMG: Offline-to-Online Reinforcement Learning via   State-Action-Conditional Offline Model Guidance

Liyu Zhang; Haochi Wu; Xu Wan; Quan Kong; Ruilong Deng; Mingyang Sun

arXiv:2410.18626·cs.LG·February 24, 2025

SAMG: Offline-to-Online Reinforcement Learning via State-Action-Conditional Offline Model Guidance

Liyu Zhang, Haochi Wu, Xu Wan, Quan Kong, Ruilong Deng, Mingyang Sun

PDF

Open Access

TL;DR

SAMG introduces a novel offline-to-online reinforcement learning paradigm that leverages a frozen offline critic and adaptive weighting to improve efficiency and performance, eliminating the need for offline dataset retraining.

Contribution

The paper proposes SAMG, a new O2O RL method that uses a frozen offline critic and adaptive weighting, simplifying integration and enhancing performance over existing algorithms.

Findings

01

SAMG outperforms state-of-the-art O2O RL algorithms on D4RL benchmark.

02

Theoretical analysis confirms good optimality and lower estimation error.

03

SAMG effectively integrates with Q-function-based algorithms.

Abstract

Offline-to-online (O2O) reinforcement learning (RL) pre-trains models on offline data and refines policies through online fine-tuning. However, existing O2O RL algorithms typically require maintaining the tedious offline datasets to mitigate the effects of out-of-distribution (OOD) data, which significantly limits their efficiency in exploiting online samples. To address this deficiency, we introduce a new paradigm for O2O RL called State-Action-Conditional Offline \Model Guidance (SAMG). It freezes the pre-trained offline critic to provide compact offline understanding for each state-action sample, thus eliminating the need for retraining on offline data. The frozen offline critic is incorporated with the online target critic weighted by a state-action-adaptive coefficient. This coefficient aims to capture the offline degree of samples at the state-action level, and is updated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Elevator Systems and Control