An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines

Jianhai Su; Jinzhu Luo; Qi Zhang

arXiv:2512.00383·cs.LG·December 2, 2025

An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines

Jianhai Su, Jinzhu Luo, Qi Zhang

PDF

Open Access

TL;DR

This paper explores integrating offline reinforcement learning algorithms as subroutines within online RL to improve learning efficiency, revealing task-dependent effectiveness and the need for better fine-tuning methods.

Contribution

It formalizes a framework for offline RL integration into online RL and introduces techniques to enhance its effectiveness, supported by extensive empirical analysis.

Findings

01

Effectiveness varies with task nature

02

Proposed techniques significantly improve performance

03

Existing online fine-tuning methods are generally ineffective

Abstract

We take the novel perspective of incorporating offline RL algorithms as subroutines of tabula rasa online RL. This is feasible because an online learning agent can repurpose its historical interactions as offline dataset. We formalize this idea into a framework that accommodates several variants of offline RL incorporation such as final policy recommendation and online fine-tuning. We further introduce convenient techniques to improve its effectiveness in enhancing online learning efficiency. Our extensive and systematic empirical analyses show that 1) the effectiveness of the proposed framework depends strongly on the nature of the task, 2) our proposed techniques greatly enhance its effectiveness, and 3) existing online fine-tuning methods are overall ineffective, calling for more research therein.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Machine Learning and Algorithms · Advanced Bandit Algorithms Research