SAMG: Offline-to-Online Reinforcement Learning via State-Action-Conditional Offline Model Guidance
Liyu Zhang, Haochi Wu, Xu Wan, Quan Kong, Ruilong Deng, Mingyang Sun

TL;DR
SAMG introduces a novel offline-to-online reinforcement learning paradigm that leverages a frozen offline critic and adaptive weighting to improve efficiency and performance, eliminating the need for offline dataset retraining.
Contribution
The paper proposes SAMG, a new O2O RL method that uses a frozen offline critic and adaptive weighting, simplifying integration and enhancing performance over existing algorithms.
Findings
SAMG outperforms state-of-the-art O2O RL algorithms on D4RL benchmark.
Theoretical analysis confirms good optimality and lower estimation error.
SAMG effectively integrates with Q-function-based algorithms.
Abstract
Offline-to-online (O2O) reinforcement learning (RL) pre-trains models on offline data and refines policies through online fine-tuning. However, existing O2O RL algorithms typically require maintaining the tedious offline datasets to mitigate the effects of out-of-distribution (OOD) data, which significantly limits their efficiency in exploiting online samples. To address this deficiency, we introduce a new paradigm for O2O RL called State-Action-Conditional Offline \Model Guidance (SAMG). It freezes the pre-trained offline critic to provide compact offline understanding for each state-action sample, thus eliminating the need for retraining on offline data. The frozen offline critic is incorporated with the online target critic weighted by a state-action-adaptive coefficient. This coefficient aims to capture the offline degree of samples at the state-action level, and is updated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Elevator Systems and Control
